Chapter 02: Infrastructure as Code (IaC)

Chapter 02: Infrastructure as Code (IaC)

Why This Chapter Exists

In production, infrastructure mistakes are expensive and fast-moving. IaC is not only about automation speed. It is about:

  • repeatability
  • reviewability
  • rollback paths
  • controlled blast radius

This chapter introduces a guardrails-first Terraform workflow for Kubernetes platforms.

Learning Objectives

By the end of this chapter, learners can:

  • explain module boundaries and Terraform folder structure in this repo
  • run a safe plan -> review -> apply workflow
  • explain why remote state and locking are non-negotiable in team environments
  • detect drift and decide whether to reconcile or rollback
  • execute safe destroy practices with explicit scope checks

Repo Mapping

Relevant paths:

  • infra/terraform/hcloud_cluster/
  • infra/terraform/kind_cluster/
  • scripts/guard-terraform-plan.sh
  • .pre-commit-config.yaml
  • scripts/terraform-validate.sh
  • scripts/terraform-security.sh
  • scripts/flux-kustomize-validate.sh
  • docs/course/chapter-02-iac/review-checklist.md
  • docs/course/chapter-02-iac/drift-playbook.md
  • docs/hetzner.md

Core Concepts

  1. Terraform structure and modules
  • root configuration should stay thin and readable
  • provider/module versions must be pinned
  • reusable logic belongs in modules, not copy/paste blocks
  1. Remote state and locking
  • shared state enables team collaboration
  • locking prevents concurrent apply corruption
  • backend config is part of production reliability
  1. IAM and RBAC principles
  • least privilege by default
  • separate read/plan/apply responsibilities
  • no broad credentials for automation or AI tooling
  1. Drift detection
  • drift = actual infra != declared infra
  • detect drift before making unrelated changes
  • never hide drift by batching many changes together
  1. Safe destroy
  • destroy is valid, but only with explicit scope
  • always verify workspace, targets, and dependency impact
  • create a rollback/recreate plan before destructive actions

Chapter Flow

  1. Read this chapter and lab.md.
  2. Install and run local hooks: make install-hooks && pre-commit run --all-files.
  3. Run the lab with guardrail scripts.
  4. Validate expected outputs and complete quiz.md.

Pre-Commit Guardrails for IaC

Before Terraform changes are committed, hooks enforce:

  • terraform fmt -recursive -diff -check
  • scripts/terraform-validate.sh
  • scripts/terraform-security.sh
  • scripts/flux-kustomize-validate.sh (for any flux/** manifest changes in the same PR)

These checks reduce noisy reviews and block unsafe IaC changes before they reach CI/apply workflows.

Anti-Patterns to Avoid

  • Running terraform apply without reviewed plan.
  • Applying from stale plan output.
  • Sharing one credential set across all environments.
  • Using destroy in ambiguous context.

Next Chapter

Continue with Chapter 03 (Secrets Management with SOPS).

Chapter 02 Quiz: Infrastructure as Code (IaC)

Chapter 02 Quiz: Infrastructure as Code (IaC)

Questions

  1. Why is plan -> review -> apply safer than direct apply?

  2. What risk does Terraform state locking prevent?

  3. In this repo, what is the purpose of scripts/guard-terraform-plan.sh?

  4. You have a valid planfile but it is 4 hours old. What should you do and why?

  5. What is drift, and why should you address drift before unrelated infrastructure changes?

  6. Name two signals in a plan output that should trigger a stop-and-review decision.

    Drift Detection Playbook (Chapter 02)

    Drift Detection Playbook (Chapter 02)

    Use this playbook after terraform plan to classify drift and choose the right action.

    Fast Drift Check

    terraform -chdir=infra/terraform/hcloud_cluster plan -input=false -detailed-exitcode
    echo $?
    

    Exit code meaning:

    • 0: no drift / no changes
    • 2: changes detected (drift and/or intended config changes)
    • 1: error (stop and fix tooling/state issues first)

    Drift Classification Matrix

    Class A: Benign Drift

    Examples:

    • metadata-only fields changed by controllers/providers
    • ordering/noise that does not alter behavior

    Action:

    Lab: Safe Terraform Workflow for Production-Like Kubernetes

    Lab: Safe Terraform Workflow for Production-Like Kubernetes

    Goal

    Execute a guardrails-first Terraform workflow:

    • plan with explicit output artifact
    • review and validate intent
    • apply only from reviewed planfile
    • verify resulting state

    Guardrail companion:

    • review-checklist.md (must be completed before apply)
    • drift-playbook.md (required when drift is detected)

    Prerequisites

    • Terraform installed
    • pre-commit installed and hooks configured (make install-hooks)
    • Access to the target Terraform directory
    • Required environment variables/secrets for the selected environment
    • scripts/guard-terraform-plan.sh available and executable

    Target Options

    Choose one:

    Terraform Plan Review Checklist (Guardrails-First)

    Terraform Plan Review Checklist (Guardrails-First)

    Use this checklist before any apply.

    Change Metadata

    • Date:
    • Reviewer:
    • Terraform target dir:
    • Planfile:
    • Intended environment:

    1) Scope Validation

    • Plan affects only intended components.
    • No unrelated resources changed.
    • No hidden cross-environment impact.

    Notes:

    2) Destructive Actions

    • No unexpected destroy.
    • If destroy exists, it is intentional and approved.
    • Data-loss impact assessed.

    Notes:

    3) Security and Access

    • Least-privilege credentials used.
    • No plaintext secret values in diff/outputs.
    • State backend and locking are active.

    Notes: