Guardrails-First Course Materials

This course teaches production-grade Kubernetes and SRE practice through incidents, guardrails, and repeatable workflows.

The goal is not to memorize tools. The goal is to learn how to keep systems safe when pressure, ambiguity, and AI-assisted speed all show up at the same time.

Who This Is For

platform engineers moving from “it works” to “it survives mistakes”
DevOps engineers who want stronger operating discipline, not more tooling hype
SREs who want concrete labs, guardrails, and incident-shaped lessons

How the Course Works

Each chapter is built around one production failure pattern:

what broke
why the shortcut looked reasonable
how the investigation becomes confusing
which guardrail restores a safe operating path

Every core lesson includes:

a written incident walkthrough
a hands-on lab
a quiz to confirm the operating rule
runbooks or scorecards where the topic needs them

This course does not only teach how to operate Kubernetes around applications. It also shows what a production-ready Kubernetes application should look like so rollout safety, observability, GitOps reconciliation, and incident response work correctly in the first place.

The course uses the SafeOps reference applications as concrete examples:

safeops-course/backend, a small production-shaped Go API with health probes, metrics, tracing hooks, chaos endpoints, and OpenAPI/Swagger support
safeops-course/frontend, a Vue-based frontend with container hardening, runtime config injection, and Kubernetes deployment packaging

Many of the application patterns used throughout those reference apps are inspired by Podinfo by Stefan Prodan, including:

readiness and liveness probes
graceful shutdown on interrupt signals
config and secret reload patterns
Prometheus and OpenTelemetry instrumentation
structured logging
12-factor configuration
fault injection for safe drills
packaging and install paths with Timoni, Helm, and Kustomize
end-to-end validation with Kind and Helm
multi-arch images, signing, SBOMs, provenance, and CVE scanning

Video assets are optional. The written lesson remains the primary source of truth, and the video should make the same lesson easier to absorb, not replace the material.

Recommended Learning Path

Start with Intro: AI as a Very Well-Read Junior Engineer.
Go through Chapters 01-14 in order.
Run the lab before moving to the next chapter.
Use the quiz to confirm the main guardrail rule before continuing.
Move to the advanced modules only after the core path feels operationally natural.

Tracks

Core track:

Chapters 01-14 covering platform foundations, GitOps, CI/CD, security, observability, reliability, and on-call discipline

Advanced track:

Chapter 15: Supply Chain Security
Chapter 16: Admission Policy Guardrails
Chapter 17: Rollback and Data Migrations
Module: Progressive Delivery (Canary with Traefik + Flagger)

Reference appendices:

Appendix: Local Development Environment
Appendix: DNS and TLS Automation

References

Full structure and outcomes: Curriculum
Intro mental model: Intro: AI as a Very Well-Read Junior Engineer

Guardrails-First Course Materials

Who This Is For

How the Course Works

Recommended Learning Path

Tracks

References

Chapter 01: Blast Radius & the Shape of Safety

Chapter 02: Infrastructure as Code (IaC) with Kind

Chapter 03: Secrets Management (SOPS + Age)

Chapter 04: GitOps & Version Promotion

Chapter 05: CI/CD & Developer Guardrails

Checkpoint A: Your Delivery Pipeline

Chapter 06: Network Policies (Production Isolation)

Chapter 07: Security Context & Pod Hardening

Chapter 08: Resource Management & QoS

Chapter 09: Availability Engineering (HPA + PDB)

Checkpoint B: Your Runtime Safety Net

Chapter 10: Observability (Metrics, Logs, Traces)

Chapter 11: Backup & Restore Basics

Chapter 12: Controlled Chaos

Chapter 13: AI-Assisted SRE Guardian

Chapter 14: 24/7 Production SRE

Chapter 15: Supply Chain Security

Chapter 16: Admission Policy Guardrails

Chapter 17: Rollback & Data Migrations

Module: Progressive Delivery

Appendix: DNS and TLS Automation

Appendix: Local Development Environment

Intro: AI as a Very Well-Read Junior Engineer

Production-Grade Kubernetes with Guardrails & AI-Assisted SRE