Module: Progressive Delivery
Learning Objectives
By the end of this chapter, you will be able to:
- Configure Flagger canary analysis with weighted rollout progression
- Define Prometheus-driven abort criteria for canary deployments
- Execute controlled traffic shifting via Traefik ingress-level control
- Analyze canary metrics to make informed promotion or rollback decisions
Start with the video for the concept overview, then work through each lesson section.
A deployment reaches 100% of production traffic instantly. A hidden bug causes a global outage. In this module, we implement Progressive Delivery using Flagger to move away from high-risk “all-or-nothing” deployments toward automated, metric-driven canary releases.
1. The Problem: The “Big-Bang” Failure
Traditional “all-at-once” deployments have a 100% blast radius. If a bug reaches production, every single user is affected simultaneously. Manual rollbacks are slow and error-prone, leading to extended downtime and a high-stress environment for responders.
2. The Concept: Metric-Driven Canaries
We use Flagger to shift traffic incrementally while automatically analyzing system health.
- Initial Shift: Route a tiny fraction of traffic (e.g., 5%) to the new version.
- Analysis: The system checks real-time metrics (latency, error rate) from Prometheus.
- Automated Promotion: If healthy, traffic weight increases step-by-step.
- Automated Rollback: If metrics degrade, Flagger reverts to the stable version instantly, before the majority of users even notice.
3. The Code: The Canary Object
Our sre/ repo defines Canary objects that act as the brain of our release process. These objects define the analysis intervals, traffic steps, and health thresholds.
Canary object example
This file is available only to members with repository access.
4. The Guardrail: Automated Health Checks
We never “guess” if a release is safe. We use PromQL queries to enforce our production invariants at every stage of the traffic shift. If the canary’s error rate exceeds 1% or latency passes 500ms, the release is aborted automatically.
5. Verification: Did I Get It?
Verify your canary status and observe a traffic shift in real-time:
# Watch the canary analysis progress
kubectl get canaries -n develop -w
# Trigger a deployment and check the traffic split
kubectl get canary backend -n develop -o jsonpath='{.status.canaryWeight}'
Expected Output: You should see the weight increase incrementally (5, 10, 20…) until the promotion is complete or a rollback is triggered.