bl0b-k0s-cluster
GitOps repository for a home Kubernetes cluster running on k0s, managed with FluxCD v2.
Stack
| Component | Role |
|---|---|
| k0s | Kubernetes distribution |
| FluxCD v2 | GitOps continuous delivery |
| Traefik v3 | Ingress controller |
| cert-manager | TLS certificate automation (Let's Encrypt via Cloudflare DNS-01) |
| MetalLB | LoadBalancer for bare-metal (L2 mode) |
| csi-driver-nfs | NFS dynamic storage provisioner |
| kube-prometheus-stack | Prometheus + Grafana + node-exporter |
| Loki | Log aggregation |
| Promtail | Log collector (OTel pipeline stages) |
| unpoller | UniFi metrics exporter (Prometheus) |
| ruflo | MCP bridge for AI assistant tooling |
| Weave GitOps | Flux UI |
| SOPS + AGE | Secret encryption at rest |
Services
| Service | URL | Notes |
|---|---|---|
| Grafana | https://grafana.bl0b.io | kube-prometheus-stack |
| Prometheus | https://prometheus.bl0b.io | kube-prometheus-stack |
| Weave GitOps | https://gitops.bl0b.io | Flux UI |
| Ruflo MCP | https://ruflo-mcp.bl0b.io/mcp | AI assistant MCP bridge |
| Bazarr | https://bazarr.bl0b.io | |
| Kapowarr | https://kapowarr.bl0b.io | |
| Komga | https://komga.bl0b.io | |
| Notifiarr | https://notifiarr.bl0b.io | |
| Overseerr | https://overseerr.bl0b.io | |
| Plex | 192.168.10.101:32400 | LoadBalancer, no ingress |
| Radarr | https://radarr.bl0b.io | |
| Sabnzbd | https://sabnzbd.bl0b.io | gluetun WireGuard sidecar |
| Sonarr | https://sonarr.bl0b.io |
Repository Structure
bl0b-k0s-cluster/
├── apps/
│ ├── infra/ # Config for infrastructure services
│ │ ├── cert-manager/
│ │ ├── metallb/ # IP pool source of truth (ip-pool.yaml)
│ │ └── pihole/
│ ├── mediaserver/ # Local Helm charts for media server applications
│ │ ├── bazarr/
│ │ ├── kapowarr/
│ │ ├── komga/
│ │ ├── notifiarr/
│ │ ├── overseerr/
│ │ ├── plex/
│ │ ├── radarr/
│ │ ├── sabnzbd/
│ │ └── sonarr/
│ └── ruflo-mcp/ # Ruflo MCP bridge (vendored source + Dockerfile)
├── clusters/
│ └── production/
│ ├── flux-system/ # FluxCD bootstrap manifests (do not edit manually)
│ ├── infrastructure/ # Kustomization entrypoint for infra-level resources
│ └── applications/ # Kustomization entrypoints for app-level resources
│ ├── mediaserver/ # HelmRelease manifests for media apps
│ └── pihole/ # HelmRelease + Ingress manifests for Pi-hole
├── secrets/ # SOPS-encrypted Kubernetes Secret manifests
├── scripts/ # Validation scripts for cluster health checks
└── docs/ # Design specs and plans
How FluxCD Works Here
FluxCD watches this repository and reconciles the cluster state continuously.
Reconciliation flow:
flux-system Kustomization (bootstrap, watches clusters/production/)
└── infrastructure Kustomization (watches clusters/production/infrastructure/)
├── Namespace resources
├── HelmRepository sources
├── HelmRelease: traefik, metallb, csi-driver-nfs, kube-prometheus-stack, weave-gitops
├── HelmRelease: loki, promtail, unpoller
├── StorageClass: nfs-csi
├── cert-manager-clusterissuers Kustomization
├── cluster-secrets Kustomization (SOPS-decrypted)
├── metallb-config Kustomization (IPAddressPool + L2Advertisement)
├── ruflo Kustomization (watches clusters/production/applications/ruflo/)
└── mediaserver Kustomization (watches clusters/production/applications/mediaserver/)
└── HelmRelease: bazarr, plex, sonarr, ... (9 apps)
After any push to main, Flux detects the new commit within ~1 minute. To force immediate reconciliation:
flux reconcile source git flux-system
flux reconcile kustomization <name>
flux reconcile helmrelease <name> -n <namespace>
UniFi Monitoring
unpoller polls the UniFi Dream Machine SE controller API and exposes metrics at :9130 for Prometheus scraping.
- Controller:
https://192.168.30.1(local, username/password auth) - Namespace:
monitoring, scraped viaPodMonitor - Grafana dashboards (UniFi folder): Network Sites, UAP Insights, UDM Insights, Client Insights
Metrics flow: UDM SE → unpoller → Prometheus → Grafana
AI Tooling
ruflo is an MCP (Model Context Protocol) bridge that exposes AI tooling to Claude Code.
- Endpoint:
https://ruflo-mcp.bl0b.io/mcp - Namespace:
ruflo, private Docker Hub image (bl0b/ruflo-mcp-bridge) - Active tool groups: intelligence, agents, memory, devtools
- Claude Code config: project-local
.claude.json(gitignored)
To add the MCP server to a new machine:
claude mcp add --transport http ruflo https://ruflo-mcp.bl0b.io/mcp
Helm Values Pattern
All apps (mediaserver and infrastructure) follow the same layering strategy:
values.yaml— generic chart defaults only (empty/disabled). Never contains environment-specific config.- HelmRelease
values:— all instance-specific config (NFS mounts, ingress hostnames/TLS, storage class, resources). - SOPS Secret +
valuesFrom:— sensitive values only (API keys, passwords, tokens).
This means changes to instance config only require updating the HelmRelease (no chart version bump needed). Changes to values.yaml require bumping the chart version to force Flux to re-package.
Adding a New Mediaserver App
Mediaserver apps use local Helm charts stored in apps/mediaserver/<app>/.
-
Create the chart directory:
apps/mediaserver/<app>/ ├── Chart.yaml ├── values.yaml # generic defaults only — no instance-specific values └── templates/ ├── _helpers.tpl ├── deployment.yaml ├── service.yaml └── ingress.yaml -
Add a HelmRelease manifest at
clusters/production/applications/mediaserver/<app>-helmrelease.yamlwith all instance-specific config in thevalues:block:apiVersion: helm.toolkit.fluxcd.io/v2 kind: HelmRelease metadata: name: <app> namespace: mediaserver spec: interval: 10m chart: spec: chart: ./apps/mediaserver/<app> sourceRef: kind: GitRepository name: flux-system namespace: flux-system install: createNamespace: true remediation: retries: 3 upgrade: remediation: retries: 3 values: nfs: enabled: true media: server: 192.168.30.20 path: /media persistence: config: storageClassName: hostpath ingress: enabled: true className: traefik annotations: cert-manager.io/cluster-issuer: letsencrypt-prod hosts: - host: <app>.bl0b.io paths: - path: / pathType: ImplementationSpecific tls: - secretName: <app>.bl0b.io hosts: - <app>.bl0b.io -
Register it in
clusters/production/applications/mediaserver/kustomization.yaml. -
Open a PR, merge, and verify:
flux get helmrelease <app> -n mediaserver kubectl get certificate -n mediaserver
Adding an Infrastructure Service
- Add a HelmRepository in
clusters/production/infrastructure/(or inline in the HelmRelease file, same pattern as metallb and monitoring). - Add a Namespace resource if required (see
monitoring-namespace.yaml— note privileged pod-security labels needed for DaemonSets using host networking/PID). - Add a HelmRelease referencing the HelmRepository.
- Register all new files in
clusters/production/infrastructure/kustomization.yaml.
MetalLB IP Pool Management
The IP address pool is the source of truth at apps/infra/metallb/ip-pool.yaml. Edit and push — Flux reconciles automatically. To assign a fixed IP to a LoadBalancer service:
service:
type: LoadBalancer
loadBalancerIP: "192.168.x.x"
NFS Storage
The nfs-csi StorageClass points to 192.168.30.20:/media. Use storageClassName: nfs-csi in any PVC for dynamic NFS provisioning.
Managing Secrets
Secrets are encrypted with SOPS using an AGE key. Encrypted files live in secrets/ and are auto-discovered by the cluster-secrets Kustomization (no manual registration needed).
Create a new secret:
-
Write the plain Kubernetes Secret manifest:
apiVersion: v1 kind: Secret metadata: name: my-secret namespace: <namespace> stringData: values.yaml: | sensitiveKey: sensitiveValue -
Encrypt it in-place:
sops --encrypt --in-place secrets/my-secret.yamlOr open directly in editor (for new or existing files):
sops secrets/my-secret.yaml -
Commit and push — Flux decrypts it automatically via the AGE key in the
sops-agesecret.
Load secrets into a HelmRelease using valuesFrom:
valuesFrom:
- kind: Secret
name: my-secret
The secret must have a values.yaml key containing a valid Helm values block. The valuesFrom secret is merged before the inline values: block, so inline values take precedence.
Rotate a secret:
sops secrets/my-secret.yaml # opens in editor, re-encrypts on save
git add secrets/my-secret.yaml && git commit -m "chore: rotate my-secret" && git push
Making Changes (PR Workflow)
All changes go through a branch → PR → merge cycle, even for solo work.
-
Create a branch:
git checkout -b <prefix>/<short-description> mainPrefixes:
feat/,fix/,chore/,docs/ -
Make changes and commit using conventional commit messages.
-
Open a PR and merge:
gh pr create gh pr merge --squash --delete-branch git checkout main && git pull -
Force Flux to reconcile immediately:
flux reconcile source git flux-system
Validation Scripts
| Script | When to run |
|---|---|
scripts/validate-flux.sh |
After any push — confirms FluxCD controllers are up and GitRepository is syncing |
scripts/validate-sops.sh |
After secret changes — validates SOPS/AGE decryption is working |
scripts/validate-cert.sh |
After cert-manager changes — confirms ClusterIssuers are Ready |
scripts/validate-traefik.sh |
After Traefik changes — confirms deployment is up and LB IP is correct |
scripts/validate-pihole.sh <1|2|3> <dns-ip> <test-domain> |
After pihole changes — checks DNS resolution and admin UI |
Troubleshooting
Flux is not reconciling
flux get kustomization # check which kustomization is stuck
flux get helmrelease -A # check all HelmReleases across namespaces
Force a reconcile:
flux reconcile source git flux-system
flux reconcile kustomization <name>
HelmRelease stuck in failed state
flux suspend helmrelease <name> -n <namespace>
helm uninstall <name> -n <namespace> --ignore-not-found
flux resume helmrelease <name> -n <namespace>
Namespace stuck Terminating
Usually caused by CRD finalizers from a deleted operator. Find and strip them:
kubectl get <crd-resource> -n <namespace>
kubectl patch <crd-resource> <name> -n <namespace> --type=json -p='[{"op":"remove","path":"/metadata/finalizers"}]'
cert-manager not issuing certificates
kubectl get certificate -n <namespace>
kubectl describe certificate <name> -n <namespace>
kubectl logs -n cert-manager -l app=cert-manager --tail=50
Common causes:
- Ingress missing
cert-manager.io/cluster-issuer: letsencrypt-prodannotation - Both
cert-manager.io/issuerandcert-manager.io/cluster-issuerpresent (remove theissuerone) - Stale ACME order — delete the Certificate, CertificateRequest, and Order resources to force a fresh request
Stale HelmChart artifact (chart changes not picked up)
Flux caches the chart artifact. If you changed values.yaml or chart templates and Flux isn't picking them up, bump the chart version in Chart.yaml to force a re-package.
Check app logs
kubectl logs -n mediaserver deployment/<app>
kubectl describe pod -n mediaserver -l app.kubernetes.io/name=<app>
Roadmap
- Loki Global Metrics dashboard — currently showing no data; needs investigation
- Unpoller events & alarms — enable
save_eventsandsave_alarmsin unpoller config for alert/event data in Grafana - Ruflo search endpoints — deploy in-cluster search proxy (Brave Search API or similar) to enable
mcp__ruflo__searchandweb_researchtools - Ruflo git access — mount the repo as a volume in the ruflo pod to enable diff analysis and code hooks
- Marbles — write implementation plan from existing design doc