Chapter 10: Backup & Restore Basics

Chapter 10: Backup & Restore Basics

Why This Chapter Exists

Backups are useful only if restore is tested and repeatable. This chapter uses CloudNativePG as real stateful target with PVC-backed PostgreSQL.

Data Plane Choice

CloudNativePG setup in this repo:

  • operator: flux/infrastructure/data/cnpg-operator/
  • clusters: flux/infrastructure/data/cnpg-clusters/{develop,staging,production}/
  • each environment has dedicated Cluster + ScheduledBackup

Backup Credential Model

Before SOPS integration, bootstrap credentials are created by Terraform:

  • secret name: cnpg-backup-s3
  • namespaces: develop, staging, production
  • keys: ACCESS_KEY_ID, ACCESS_SECRET_KEY, BUCKET (+ optional ENDPOINT, REGION)

Terraform source:

  • infra/terraform/hcloud_cluster/main.tf
  • infra/terraform/hcloud_cluster/variables.tf

Guardrails

  • No backup without tested restore path.
  • Backup target credentials must be secret-managed (SOPS path next).
  • Recovery drills must run in non-production first.
  • Evidence is required: backup status + restore validation query.

Lab Files

  • lab.md
  • runbook.md
  • quiz.md

Done When

  • learner can verify scheduled backups are running
  • learner can execute one manual backup
  • learner can perform restore simulation and validate recovered data

Lab: CloudNativePG Backup and Restore Simulation

Lab: CloudNativePG Backup and Restore Simulation

Goal

Run end-to-end backup basics in develop:

  • verify CNPG cluster and scheduled backup
  • trigger one on-demand backup
  • perform restore simulation into a separate cluster

Prerequisites

  • CNPG operator is ready
  • app-postgres exists in develop
  • secret cnpg-backup-s3 exists in develop
kubectl -n cnpg-system get pods
kubectl -n develop get cluster.postgresql.cnpg.io app-postgres
kubectl -n develop get secret cnpg-backup-s3

Step 1: Verify Scheduled Backup

kubectl -n develop get scheduledbackup
kubectl -n develop describe scheduledbackup app-postgres-daily

Expected:

Quiz: Chapter 10 (Backup & Restore Basics)

Quiz: Chapter 10 (Backup & Restore Basics)

Questions

  1. Why is a successful backup alone not enough?

  2. Which CNPG resource defines periodic backup schedule?

  3. Which secret name is used for object-store backup credentials in this repo?

  4. What is the safest environment for routine restore simulations?

  5. Which statement is correct?

  • A) Restore tests can wait until a production incident.
  • B) Backup reliability is proven only after successful restore validation.
  • C) One-time backup test is enough forever.
  1. What are the minimum credential fields needed for object storage in this setup?

    Runbook: Backup and Restore (CNPG)

    Runbook: Backup and Restore (CNPG)

    Purpose

    Provide a repeatable procedure to:

    • confirm backup health
    • execute manual backup
    • run restore simulation safely

    Scope

    • primary target: develop or staging
    • production restore only under incident protocol

    Step 1: Backup Health Check

    kubectl -n <env> get cluster.postgresql.cnpg.io app-postgres
    kubectl -n <env> get scheduledbackup
    kubectl -n <env> get backup
    

    If no recent successful backup:

    • trigger manual backup immediately
    • open incident/task for backup pipeline investigation

    Step 2: Manual Backup

    cat <<EOF | kubectl apply -f -
    apiVersion: postgresql.cnpg.io/v1
    kind: Backup
    metadata:
      name: app-postgres-manual-$(date +%Y%m%d%H%M%S)
      namespace: <env>
    spec:
      cluster:
        name: app-postgres
    EOF
    

    Track: