Appendix: Local Development Environment

Incident Hook

A learner tests every infrastructure and GitOps change only against the cloud cluster. Feedback is slow, mistakes are expensive, and trivial YAML errors burn real time and real money. By the time the change reaches the shared environment, the debugging loop is already too wide. A local cluster exists to reduce that blast radius before cloud validation even starts.

Why This Appendix Exists

The main course teaches the production path first. This appendix shows the fastest safe feedback loop for local experimentation:

a Terraform-managed kind cluster
generated kubeconfig and context wiring
optional local image registry
Flux bootstrap for GitOps-shaped testing

Use it when you need fast iteration on manifests, hooks, or application behavior before touching Hetzner-backed environments.

SafeOps Baseline

In the current SafeOps implementation:

Terraform manages the lifecycle of the local kind cluster.
the cluster is multi-node, so scheduling behavior is closer to reality than a single-node toy setup.
Flux Operator + FluxInstance can bootstrap the local cluster from the same GitOps layout.
local registry support keeps image iteration fast.

Investigation Snapshots

Here is the Terraform module layout used for the local cluster in the SafeOps system.

Local kind cluster Terraform module

Show the local cluster Terraform layout

infra/terraform/kind_cluster/.gitignore
infra/terraform/kind_cluster/Makefile
infra/terraform/kind_cluster/README.md
infra/terraform/kind_cluster/UPGRADE.md
infra/terraform/kind_cluster/main.tf
infra/terraform/kind_cluster/scripts/merge-kubeconfig.sh
infra/terraform/kind_cluster/templates/git-repository.yaml.tpl
infra/terraform/kind_cluster/templates/kustomization.yaml.tpl
infra/terraform/kind_cluster/values/components.yaml
infra/terraform/kind_cluster/variables.tf

Here is the local development runbook used in the SafeOps system.

Local development runbook

Show the local development runbook

This repo supports:
- a local `kind` cluster (fast feedback loop), and
- a Hetzner Cloud cluster (provider-realistic).

If you are using Hetzner as the primary environment, start with `docs/hetzner.md`.

Use Terraform to provision a local multi-node kind cluster. Terraform manages lifecycle and kubeconfig generation so you can focus on deploying workloads.

## Prerequisites
- Docker Engine running with adequate CPU/RAM for at least three nodes
- `curl`, `tar`, and `unzip` available on your workstation
- Go 1.24+ and Node.js 20+ with npm for backend/frontend development
- `make` (GNU make recommended)
- Terraform 1.3+ and `kubectl`

## Provision the Cluster with Terraform
Use the Terraform module under `infra/terraform/kind_cluster/` to create (or destroy) the local kind cluster. The module codifies the multi-node topology directly in Terraform, automatically merges the generated kubeconfig into `~/.kube/config`, and bootstraps Flux via Flux Operator + `FluxInstance`.
Optionally, configure GitOps reconciliation by setting:
```bash
export TF_VAR_flux_git_repository_url="https://github.com/safeops-course/sre.git"
export TF_VAR_flux_git_repository_branch="main"
export TF_VAR_flux_kustomization_path="./flux/bootstrap/flux-system"

before applying Terraform. Flux will then watch the specified path inside your Git repository.

cd infra/terraform/kind_cluster
terraform init
terraform apply

The Terraform workflow creates the three-node topology (one control plane, two workers), writes a kubeconfig alongside the module (kubeconfig.yaml), and becomes the single source of truth for lifecycle operations.

Configure Kubeconfig Context

Point kubectl to the generated kubeconfig and switch context:

export KUBECONFIG="$(pwd)/infra/terraform/kind_cluster/kubeconfig.yaml"
kubectl config use-context sre-control-plane

Terraform automatically merges the kubeconfig into your default config (~/.kube/config) and ensures the context sre-control-plane is available.

Configure Local Registry (Optional but Recommended)

Run a local container registry to speed up iterative image pushes:

docker run -d --restart=always -p 5001:5000 --name kind-registry registry:2

Ensure the registry is running before applying Terraform so the mirror entry in the cluster configuration is valid.

Destroy the Cluster

Use Terraform to tear down the environment when finished:

cd infra/terraform/kind_cluster
terraform destroy

Next Steps

Review docs/gitops/flux.md for Flux usage; controllers are installed automatically by Terraform.
Apply baseline namespaces and infrastructure from flux/bootstrap/infrastructure/base/ and observability from flux/infrastructure/observability/.
Build and run the backend locally: cd backend && go run ./cmd/api, then curl http://localhost:8080/healthz or scrape http://localhost:8080/metrics.
Build the container image via make -C backend image and push it to your preferred registry (kind can use the local mirror at localhost:5001).
Publish the production-ready image to GitHub Container Registry with make backend-publish (or make -C backend publish). Export DOCKER_PAT (PAT with write:packages) and optionally DOCKER_USER beforehand; override REGISTRY_HOST/REGISTRY_NAMESPACE/IMAGE_NAME/TAG as needed.
The image build embeds git metadata (APP_VERSION, APP_COMMIT, APP_COMMIT_SHORT, APP_BUILD_DATE) via Go ldflags; APP_VERSION defaults to the latest annotated tag (SemVer). Override it when publishing a release and verify /version reflects your build (including build_time).
Run make test (executes Go unit tests; Vue tests pending) before committing.
Explore the API via http://localhost:8080/swagger for Swagger UI or http://localhost:8080/openapi for the raw spec.

Safe Workflow (Step-by-Step)

Confirm local prerequisites first: Docker Engine, Terraform, kubectl, make, and enough CPU/RAM for a three-node kind cluster.
Move to infra/terraform/kind_cluster/ and decide whether local Flux should reconcile from the same Git path as the main platform.
If you want GitOps reconciliation locally, export the Flux repository variables before apply.
Run terraform init and terraform apply from the kind_cluster module.
Point kubectl to the generated kubeconfig or switch to the merged sre-control-plane context.
Verify nodes, namespaces, and Flux controllers before testing workloads.
If you iterate on images, start the local registry before cluster apply so the mirror config is valid.
Tear the cluster down with terraform destroy when the test cycle is finished.

Verification Commands

cd infra/terraform/kind_cluster
terraform init
terraform apply

export KUBECONFIG="$(pwd)/kubeconfig.yaml"
kubectl config use-context sre-control-plane
kubectl get nodes
kubectl -n flux-system get pods

Optional local registry:

docker run -d --restart=always -p 5001:5000 --name kind-registry registry:2

When to Prefer Local Development

Prefer the local path when:

you are validating manifests, hooks, or GitOps wiring
you need a fast loop for backend or frontend changes
you want to reproduce a failure without risking shared environments

Do not treat the local path as a substitute for provider-realistic verification. Hetzner, external DNS, cloud load balancers, and real certificate issuance still need cloud-side validation.

Guardrail Principle

Use the local cluster to shrink the feedback loop and the blast radius. Use the cloud cluster to validate provider-specific behavior. Do not confuse the two.

Done When

you can create and destroy the local cluster from Terraform
kubectl can target the local context without ambiguity
Flux controllers reconcile in the local cluster when enabled
you can explain which tests belong locally and which still require cloud validation