Chapter 01: AI Changes Two Things at Once
Chapter 01: AI Changes Two Things at Once
Incident Hook
A fast “AI-assisted” hotfix bundles two unrelated changes in one push:
- a backend image tag bump for
develop - an ingress manifest change intended for
staging
The change looks harmless in review because each diff is small. In practice, the combined blast radius is larger: routing breaks while backend behavior changes at the same time, making rollback and triage slower.
What AI Would Propose (Brave Junior)
- “Update image and ingress together to save one pipeline run.”
- “Apply quickly to unblock the demo.”
- “Skip context checks; it is just
develop.”
Why it sounds reasonable:
- fewer PRs
- faster merge
- faster “visible progress”
Why This Is Dangerous
- Missing context: target cluster/namespace is often assumed, not verified.
- Hidden coupling: app rollout + ingress mutation creates correlated failure modes.
- Production risk pattern: the same behavior scales into high-blast-radius incidents.
Guardrails That Stop It
- Context guard before any Kubernetes write:
scripts/guard-kube-context.sh --context <ctx> --namespace <ns>
- Plan-before-apply guard for Terraform:
scripts/guard-terraform-plan.sh plan ...scripts/guard-terraform-plan.sh apply ...
- Single-change policy:
- one PR for image/promotion
- separate PR for networking/ingress
- Git pre-hooks for repository hygiene:
scripts/pre-commit-master-check.shblocks direct work against protected branchesscripts/prevent-amend-after-push.shblocks amending already-pushed commitsscripts/flux-kustomize-validate.shblocks broken Flux Kustomize renders before commit
Local Git Guardrails (Pre-Hooks)
Install and verify local hooks before running labs:
make install-hooks
pre-commit run --all-files
These hooks enforce branch and history discipline before CI starts, so risky workflow mistakes are caught early on the workstation.
For GitOps manifest changes under flux/**, they also enforce local Kustomize render validity.
Safe Workflow (Step-by-Step)
- Verify context and namespace.
- Produce plan/diff first (Terraform or GitOps diff).
- Review for correlated changes.
- Apply one change type at a time.
- Verify health and routing separately.
- Keep rollback commands prepared before merge/apply.
Demo Commands
A. Kubernetes context/namespace guard
# Expected success example
scripts/guard-kube-context.sh \
--context sre-control-plane \
--namespace develop
Expected output:
[guard-kube] OK context=sre-control-plane namespace=develop
Failure example (wrong namespace):
scripts/guard-kube-context.sh \
--context sre-control-plane \
--namespace does-not-exist
Expected output:
[guard-kube] namespace 'does-not-exist' not found in context 'sre-control-plane'
B. Terraform plan-before-apply guard
# Create plan + metadata marker
scripts/guard-terraform-plan.sh plan \
--dir infra/terraform/hcloud_cluster \
--out tfplan
# Apply only from a fresh, reviewed planfile
scripts/guard-terraform-plan.sh apply \
--dir infra/terraform/hcloud_cluster \
--out tfplan \
--max-age-minutes 60
If plan marker is missing/stale, apply is blocked with an explicit error.
Rollback Checklist
- If Kubernetes deploy changed:
kubectl -n <ns> rollout undo deployment/<name>
- If ingress changed:
- revert ingress commit in Git and let Flux reconcile
- If Terraform apply changed infra:
- create a new reviewed plan and apply rollback change
- Verify:
/healthzon backend- ingress route with Host header
Exercises
- Split a mixed PR into two PRs:
- PR1: image tag update only
- PR2: ingress update only
- Intentionally run
guard-terraform-plan.sh applywithout a planfile and capture the failure output.
Done When
- Student can explain why “small but mixed” changes are high risk.
- Student can demonstrate both guard scripts before any apply action.