Advanced Module: Linkerd + Progressive Delivery (Canary / A-B)

Why This Module Exists

Safe delivery is not only “deploy or rollback”. This module adds service-mesh-driven progressive rollout guardrails:

Linkerd mTLS by default
canary rollout with measurable abort criteria
A/B routing with explicit experiment boundaries

The Incident Hook

A full rollout passes smoke checks but fails under real production traffic mix. Error rate and latency spike after deploy, and rollback starts late because detection is manual. The team needs controlled traffic progression with automatic safety checks.

What AI Would Propose (Brave Junior)

“Ship 100% now; we can rollback if needed.”
“Canary is too slow for this fix.”
“Use ad-hoc routing rules without SLO checks.”

Why this sounds reasonable:

fastest short-term path
fewer moving parts in one deploy

Why This Is Dangerous

blast radius is immediate and broad
no objective stop conditions during rollout
A/B test drift can hide impact in one segment

Guardrails That Stop It

traffic progression in controlled steps (for example 5% -> 25% -> 50% -> 100%)
abort on SLO violation (error rate, latency, success rate)
mTLS identity and policy checks before rollout
rollback path tested before canary start

Module Scope

Linkerd baseline (check, inject, identity, mTLS status).
Canary rollout flow (Flagger + Linkerd or equivalent controller).
A/B routing flow (header/cookie based).
Evidence capture for rollout decision and postmortem.

Repository Mapping

flux/infrastructure/progressive-delivery/linkerd/
flux/infrastructure/progressive-delivery/flagger/
flux/infrastructure/progressive-delivery/develop/
flux/bootstrap/flux-system/infrastructure.yaml (Linkerd + Flagger enabled, develop canary pack opt-in)

Files

lab.md
runbook-linkerd-progressive-delivery.md
quiz.md

Done When

learner can run canary with automated abort criteria
learner can execute bounded A/B experiment with clear success metrics
learner can explain mesh value for rollout risk reduction

Lab: Linkerd Canary Rollout and A/B Routing (Advanced)

Goal

Run one progressive delivery exercise in develop:

validate Linkerd health and mTLS
execute canary rollout with automated analysis
run one A/B route experiment and review outcomes

Prerequisites

Linkerd control plane installed and healthy
rollout controller installed (Flagger recommended)
test workload and service available in develop
baseline SLO signals available (Prometheus metrics)
progressive-delivery manifests present in:
- flux/infrastructure/progressive-delivery/linkerd/
- flux/infrastructure/progressive-delivery/flagger/
- flux/infrastructure/progressive-delivery/develop/

Quick checks:

linkerd check
kubectl -n linkerd get pods
kubectl -n develop get deploy,svc

Step 1: Verify Mesh Baseline

Confirm workload is meshed and identities are present:

Quiz: Advanced Module (Linkerd + Progressive Delivery)

Questions

Why is progressive rollout safer than immediate 100% rollout?
What is the main value of Linkerd in canary operations?
Which signal is mandatory for automated canary abort decisions?
Why should A/B routing be time-bounded?
Which statement is correct?

A) Canary without abort criteria is acceptable in production.
B) Mesh telemetry can provide per-route success/latency for rollout decisions.
C) A/B rules should stay permanently after experiment end.

Give one valid canary traffic progression pattern.
Runbook: Linkerd Progressive Delivery Operations (Advanced)
Runbook: Linkerd Progressive Delivery Operations (Advanced)
Purpose
Operate canary and A/B rollouts with objective safety gates.
Pre-Rollout Checklist
1. Linkerd control plane healthy (linkerd check).
2. Target workload meshed and observable.
3. Abort thresholds defined and approved.
4. Rollback action documented and tested.
Canary Operation Flow
1. Start canary at low traffic weight.
2. Evaluate window metrics (success rate, latency, error rate).
3. Promote only if all thresholds pass.
4. Abort automatically or manually on threshold breach.
5. Record decision with evidence.
A/B Operation Flow
1. Define experiment hypothesis and metric target.
2. Apply bounded route split (header/cookie/segment).
3. Run for fixed window.
4. Compare cohorts and decide keep/revert.
5. Remove temporary routing rules after decision.
Commands (Examples)
```
linkerd check
linkerd -n develop stat deploy
linkerd -n develop routes deploy/<app-name>
kubectl -n develop get canary
kubectl -n develop describe canary <app-name>
```
Failure Modes
1. Metric noise causes false abort:
- increase observation window; validate baseline first
1. Canary stuck:
- inspect controller events and policy spec; rollback if uncertain
1. A/B drift:
- ensure route selectors are explicit and temporary rules are removed

Advanced Module: Linkerd + Progressive Delivery (Canary / A-B)

Advanced Module: Linkerd + Progressive Delivery (Canary / A-B)

Why This Module Exists

The Incident Hook

What AI Would Propose (Brave Junior)

Why This Is Dangerous

Guardrails That Stop It

Module Scope

Repository Mapping

Files

Done When

Lab: Linkerd Canary Rollout and A/B Routing (Advanced)

Lab: Linkerd Canary Rollout and A/B Routing (Advanced)

Goal

Prerequisites

Step 1: Verify Mesh Baseline

Quiz: Advanced Module (Linkerd + Progressive Delivery)

Quiz: Advanced Module (Linkerd + Progressive Delivery)

Questions

Runbook: Linkerd Progressive Delivery Operations (Advanced)

Runbook: Linkerd Progressive Delivery Operations (Advanced)

Purpose

Pre-Rollout Checklist

Canary Operation Flow

A/B Operation Flow

Commands (Examples)

Failure Modes