bl0b-k0s-cluster

GitOps repository for a home Kubernetes cluster running on k0s, managed with FluxCD v2.

Stack

Component Role
k0s Kubernetes distribution
FluxCD v2 GitOps continuous delivery
Traefik v3 Ingress controller
cert-manager TLS certificate automation (Let's Encrypt via Cloudflare DNS-01)
MetalLB LoadBalancer for bare-metal (L2 mode)
csi-driver-nfs NFS dynamic storage provisioner
kube-prometheus-stack Prometheus + Grafana + node-exporter
Loki Log aggregation
Promtail Log collector (OTel pipeline stages)
unpoller UniFi metrics exporter (Prometheus)
ruflo MCP bridge for AI assistant tooling
Weave GitOps Flux UI
SOPS + AGE Secret encryption at rest

Services

Service URL Notes
Grafana https://grafana.bl0b.io kube-prometheus-stack
Prometheus https://prometheus.bl0b.io kube-prometheus-stack
Weave GitOps https://gitops.bl0b.io Flux UI
Ruflo MCP https://ruflo-mcp.bl0b.io/mcp AI assistant MCP bridge
Bazarr https://bazarr.bl0b.io
Kapowarr https://kapowarr.bl0b.io
Komga https://komga.bl0b.io
Notifiarr https://notifiarr.bl0b.io
Overseerr https://overseerr.bl0b.io
Plex 192.168.10.101:32400 LoadBalancer, no ingress
Radarr https://radarr.bl0b.io
Sabnzbd https://sabnzbd.bl0b.io gluetun WireGuard sidecar
Sonarr https://sonarr.bl0b.io

Repository Structure

bl0b-k0s-cluster/
├── apps/
│   ├── infra/                  # Config for infrastructure services
│   │   ├── cert-manager/
│   │   ├── metallb/            # IP pool source of truth (ip-pool.yaml)
│   │   └── pihole/
│   ├── mediaserver/            # Local Helm charts for media server applications
│   │   ├── bazarr/
│   │   ├── kapowarr/
│   │   ├── komga/
│   │   ├── notifiarr/
│   │   ├── overseerr/
│   │   ├── plex/
│   │   ├── radarr/
│   │   ├── sabnzbd/
│   │   └── sonarr/
│   └── ruflo-mcp/              # Ruflo MCP bridge (vendored source + Dockerfile)
├── clusters/
│   └── production/
│       ├── flux-system/        # FluxCD bootstrap manifests (do not edit manually)
│       ├── infrastructure/     # Kustomization entrypoint for infra-level resources
│       └── applications/       # Kustomization entrypoints for app-level resources
│           ├── mediaserver/    # HelmRelease manifests for media apps
│           └── pihole/         # HelmRelease + Ingress manifests for Pi-hole
├── secrets/                    # SOPS-encrypted Kubernetes Secret manifests
├── scripts/                    # Validation scripts for cluster health checks
└── docs/                       # Design specs and plans

How FluxCD Works Here

FluxCD watches this repository and reconciles the cluster state continuously.

Reconciliation flow:

flux-system Kustomization (bootstrap, watches clusters/production/)
  └── infrastructure Kustomization (watches clusters/production/infrastructure/)
        ├── Namespace resources
        ├── HelmRepository sources
        ├── HelmRelease: traefik, metallb, csi-driver-nfs, kube-prometheus-stack, weave-gitops
        ├── HelmRelease: loki, promtail, unpoller
        ├── StorageClass: nfs-csi
        ├── cert-manager-clusterissuers Kustomization
        ├── cluster-secrets Kustomization  (SOPS-decrypted)
        ├── metallb-config Kustomization (IPAddressPool + L2Advertisement)
        ├── ruflo Kustomization (watches clusters/production/applications/ruflo/)
        └── mediaserver Kustomization (watches clusters/production/applications/mediaserver/)
              └── HelmRelease: bazarr, plex, sonarr, ... (9 apps)

After any push to main, Flux detects the new commit within ~1 minute. To force immediate reconciliation:

flux reconcile source git flux-system
flux reconcile kustomization <name>
flux reconcile helmrelease <name> -n <namespace>

UniFi Monitoring

unpoller polls the UniFi Dream Machine SE controller API and exposes metrics at :9130 for Prometheus scraping.

  • Controller: https://192.168.30.1 (local, username/password auth)
  • Namespace: monitoring, scraped via PodMonitor
  • Grafana dashboards (UniFi folder): Network Sites, UAP Insights, UDM Insights, Client Insights

Metrics flow: UDM SE → unpoller → Prometheus → Grafana


AI Tooling

ruflo is an MCP (Model Context Protocol) bridge that exposes AI tooling to Claude Code.

  • Endpoint: https://ruflo-mcp.bl0b.io/mcp
  • Namespace: ruflo, private Docker Hub image (bl0b/ruflo-mcp-bridge)
  • Active tool groups: intelligence, agents, memory, devtools
  • Claude Code config: project-local .claude.json (gitignored)

To add the MCP server to a new machine:

claude mcp add --transport http ruflo https://ruflo-mcp.bl0b.io/mcp

Helm Values Pattern

All apps (mediaserver and infrastructure) follow the same layering strategy:

  • values.yaml — generic chart defaults only (empty/disabled). Never contains environment-specific config.
  • HelmRelease values: — all instance-specific config (NFS mounts, ingress hostnames/TLS, storage class, resources).
  • SOPS Secret + valuesFrom: — sensitive values only (API keys, passwords, tokens).

This means changes to instance config only require updating the HelmRelease (no chart version bump needed). Changes to values.yaml require bumping the chart version to force Flux to re-package.


Adding a New Mediaserver App

Mediaserver apps use local Helm charts stored in apps/mediaserver/<app>/.

  1. Create the chart directory:

    apps/mediaserver/<app>/
    ├── Chart.yaml
    ├── values.yaml         # generic defaults only — no instance-specific values
    └── templates/
        ├── _helpers.tpl
        ├── deployment.yaml
        ├── service.yaml
        └── ingress.yaml
    
  2. Add a HelmRelease manifest at clusters/production/applications/mediaserver/<app>-helmrelease.yaml with all instance-specific config in the values: block:

    apiVersion: helm.toolkit.fluxcd.io/v2
    kind: HelmRelease
    metadata:
      name: <app>
      namespace: mediaserver
    spec:
      interval: 10m
      chart:
        spec:
          chart: ./apps/mediaserver/<app>
          sourceRef:
            kind: GitRepository
            name: flux-system
            namespace: flux-system
      install:
        createNamespace: true
        remediation:
          retries: 3
      upgrade:
        remediation:
          retries: 3
      values:
        nfs:
          enabled: true
          media:
            server: 192.168.30.20
            path: /media
        persistence:
          config:
            storageClassName: hostpath
        ingress:
          enabled: true
          className: traefik
          annotations:
            cert-manager.io/cluster-issuer: letsencrypt-prod
          hosts:
            - host: <app>.bl0b.io
              paths:
                - path: /
                  pathType: ImplementationSpecific
          tls:
            - secretName: <app>.bl0b.io
              hosts:
                - <app>.bl0b.io
    
  3. Register it in clusters/production/applications/mediaserver/kustomization.yaml.

  4. Open a PR, merge, and verify:

    flux get helmrelease <app> -n mediaserver
    kubectl get certificate -n mediaserver
    

Adding an Infrastructure Service

  1. Add a HelmRepository in clusters/production/infrastructure/ (or inline in the HelmRelease file, same pattern as metallb and monitoring).
  2. Add a Namespace resource if required (see monitoring-namespace.yaml — note privileged pod-security labels needed for DaemonSets using host networking/PID).
  3. Add a HelmRelease referencing the HelmRepository.
  4. Register all new files in clusters/production/infrastructure/kustomization.yaml.

MetalLB IP Pool Management

The IP address pool is the source of truth at apps/infra/metallb/ip-pool.yaml. Edit and push — Flux reconciles automatically. To assign a fixed IP to a LoadBalancer service:

service:
  type: LoadBalancer
  loadBalancerIP: "192.168.x.x"

NFS Storage

The nfs-csi StorageClass points to 192.168.30.20:/media. Use storageClassName: nfs-csi in any PVC for dynamic NFS provisioning.


Managing Secrets

Secrets are encrypted with SOPS using an AGE key. Encrypted files live in secrets/ and are auto-discovered by the cluster-secrets Kustomization (no manual registration needed).

Create a new secret:

  1. Write the plain Kubernetes Secret manifest:

    apiVersion: v1
    kind: Secret
    metadata:
      name: my-secret
      namespace: <namespace>
    stringData:
      values.yaml: |
        sensitiveKey: sensitiveValue
    
  2. Encrypt it in-place:

    sops --encrypt --in-place secrets/my-secret.yaml
    

    Or open directly in editor (for new or existing files):

    sops secrets/my-secret.yaml
    
  3. Commit and push — Flux decrypts it automatically via the AGE key in the sops-age secret.

Load secrets into a HelmRelease using valuesFrom:

valuesFrom:
  - kind: Secret
    name: my-secret

The secret must have a values.yaml key containing a valid Helm values block. The valuesFrom secret is merged before the inline values: block, so inline values take precedence.

Rotate a secret:

sops secrets/my-secret.yaml   # opens in editor, re-encrypts on save
git add secrets/my-secret.yaml && git commit -m "chore: rotate my-secret" && git push

Making Changes (PR Workflow)

All changes go through a branch → PR → merge cycle, even for solo work.

  1. Create a branch:

    git checkout -b <prefix>/<short-description> main
    

    Prefixes: feat/, fix/, chore/, docs/

  2. Make changes and commit using conventional commit messages.

  3. Open a PR and merge:

    gh pr create
    gh pr merge --squash --delete-branch
    git checkout main && git pull
    
  4. Force Flux to reconcile immediately:

    flux reconcile source git flux-system
    

Validation Scripts

Script When to run
scripts/validate-flux.sh After any push — confirms FluxCD controllers are up and GitRepository is syncing
scripts/validate-sops.sh After secret changes — validates SOPS/AGE decryption is working
scripts/validate-cert.sh After cert-manager changes — confirms ClusterIssuers are Ready
scripts/validate-traefik.sh After Traefik changes — confirms deployment is up and LB IP is correct
scripts/validate-pihole.sh <1|2|3> <dns-ip> <test-domain> After pihole changes — checks DNS resolution and admin UI

Troubleshooting

Flux is not reconciling

flux get kustomization          # check which kustomization is stuck
flux get helmrelease -A         # check all HelmReleases across namespaces

Force a reconcile:

flux reconcile source git flux-system
flux reconcile kustomization <name>

HelmRelease stuck in failed state

flux suspend helmrelease <name> -n <namespace>
helm uninstall <name> -n <namespace> --ignore-not-found
flux resume helmrelease <name> -n <namespace>

Namespace stuck Terminating

Usually caused by CRD finalizers from a deleted operator. Find and strip them:

kubectl get <crd-resource> -n <namespace>
kubectl patch <crd-resource> <name> -n <namespace> --type=json -p='[{"op":"remove","path":"/metadata/finalizers"}]'

cert-manager not issuing certificates

kubectl get certificate -n <namespace>
kubectl describe certificate <name> -n <namespace>
kubectl logs -n cert-manager -l app=cert-manager --tail=50

Common causes:

  • Ingress missing cert-manager.io/cluster-issuer: letsencrypt-prod annotation
  • Both cert-manager.io/issuer and cert-manager.io/cluster-issuer present (remove the issuer one)
  • Stale ACME order — delete the Certificate, CertificateRequest, and Order resources to force a fresh request

Stale HelmChart artifact (chart changes not picked up)

Flux caches the chart artifact. If you changed values.yaml or chart templates and Flux isn't picking them up, bump the chart version in Chart.yaml to force a re-package.

Check app logs

kubectl logs -n mediaserver deployment/<app>
kubectl describe pod -n mediaserver -l app.kubernetes.io/name=<app>

Roadmap

  • Loki Global Metrics dashboard — currently showing no data; needs investigation
  • Unpoller events & alarms — enable save_events and save_alarms in unpoller config for alert/event data in Grafana
  • Ruflo search endpoints — deploy in-cluster search proxy (Brave Search API or similar) to enable mcp__ruflo__search and web_research tools
  • Ruflo git access — mount the repo as a volume in the ruflo pod to enable diff analysis and code hooks
  • Marbles — write implementation plan from existing design doc
Created: 25 April 2026 Last updated: 25 April 2026