Appendix: DNS and TLS Automation
Incident Hook
A service works through a raw load balancer IP, but the real hostname fails during rollout. DNS still points at the wrong target, HTTPS is missing or expired, and the incident looks like an application bug. Time is wasted debugging pods while the actual failure sits at the edge. Production ingress needs automated DNS and automated certificate issuance together.
Why This Appendix Exists
The main course keeps early chapters focused on platform safety and GitOps. This appendix explains the edge automation layer used by the SafeOps platform:
external-dnsmanages DNS records from cluster statecert-managerissues certificates through Cloudflare DNS-01- Traefik ingresses reference the production issuer and TLS hosts
This is not a separate core chapter because ingress is not the center of the course. It is a supporting production capability you will rely on once the platform is running.
SafeOps Baseline
In the current SafeOps implementation:
- Traefik is the ingress controller.
external-dnsruns in thecert-managernamespace and syncs records for the target domain.cert-managermanagesClusterIssuerobjects for Let’s Encrypt staging and production.- ingresses request certificates by referencing the issuer and TLS hostnames.
- Cloudflare API token secret is the shared dependency for both DNS and certificate issuance.
Investigation Snapshots
Here is the DNS/TLS GitOps bundle used in the SafeOps system.
DNS and TLS GitOps bundle
Show the DNS and TLS bundle layout
flux/infrastructure/security/dns-and-certificates/cluster-issuer.yamlflux/infrastructure/security/dns-and-certificates/kustomization.yamlflux/infrastructure/security/dns-and-certificates/release-external-dns.yamlflux/infrastructure/security/dns-and-certificates/repository-external-dns.yaml
Here is the external-dns release used to synchronize DNS records.
external-dns release
Show the external-dns release
---
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
name: external-dns
namespace: flux-system
spec:
interval: 30m
chart:
spec:
chart: external-dns
version: "1.20.0"
sourceRef:
kind: HelmRepository
name: external-dns
namespace: flux-system
targetNamespace: cert-manager
install:
createNamespace: false
remediation:
retries: 3
upgrade:
remediation:
retries: 3
values:
provider:
name: cloudflare
env:
- name: CF_API_TOKEN
valueFrom:
secretKeyRef:
name: cloudflare-api-token
key: api-token
extraArgs:
- --annotation-filter=cloudflare-proxied=enabled
- --cloudflare-proxied
- --request-timeout=60s
- --cloudflare-dns-records-per-page=500
domainFilters:
- safeops.work
policy: sync
sources:
- ingress
txtOwnerId: "k8s-external-dns-${cluster_name}"
Here are the ClusterIssuer objects used for Let’s Encrypt staging and production.
ClusterIssuer configuration
Show the ClusterIssuer configuration
---
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-production
spec:
acme:
server: https://acme-v02.api.letsencrypt.org/directory
email: "admin@safeops.work"
privateKeySecretRef:
name: letsencrypt-production-key
solvers:
- dns01:
cloudflare:
apiTokenSecretRef:
name: cloudflare-api-token
key: api-token
---
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-staging
spec:
acme:
server: https://acme-staging-v02.api.letsencrypt.org/directory
email: "admin@safeops.work"
privateKeySecretRef:
name: letsencrypt-staging-key
solvers:
- dns01:
cloudflare:
apiTokenSecretRef:
name: cloudflare-api-token
key: api-token
Here is the frontend ingress pattern that requests TLS from cert-manager.
Frontend ingress with TLS
Show the frontend ingress pattern
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: frontend
annotations:
cert-manager.io/cluster-issuer: letsencrypt-production
cloudflare-proxied: "${cloudflare_proxied}"
spec:
ingressClassName: traefik
rules:
- host: frontend.local
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: frontend
port:
number: 8080
Safe Workflow (Step-by-Step)
- Confirm the Cloudflare token secret exists in the
cert-managernamespace before enabling either DNS sync or certificate issuance. - Reconcile the
dns-and-certificatesbundle soexternal-dnsand the issuers exist before you depend on them. - Verify
cert-managerandexternal-dnspods are healthy. - Confirm
ClusterIssuerreadiness for both staging and production issuers. - Add or verify ingress hostnames, TLS blocks, and issuer annotations in the application ingress.
- Wait for DNS record creation and certificate issuance before declaring the route healthy.
- Validate with real hostname and HTTPS, not only with raw service or load balancer IP checks.
Verification Commands
kubectl -n cert-manager get pods
kubectl get clusterissuer
kubectl -n cert-manager logs deploy/external-dns --since=10m
kubectl -n develop describe ingress frontend
kubectl -n develop get certificate,secret
Common Failure Patterns
- Cloudflare token secret missing or wrong, so DNS records and ACME challenges fail.
- Ingress host exists, but TLS block or issuer annotation is missing.
- DNS points correctly, but certificate is still pending because the ACME challenge never completed.
- Teams test only with raw IPs and miss that the real hostname path is still broken.
Guardrail Principle
Automate DNS and TLS together. Manual DNS records plus manual certificate handling create hidden outage debt.
Done When
external-dnsis reconciling without errors- staging and production
ClusterIssuerobjects are ready - ingress resources request TLS explicitly
- hostname resolution and HTTPS both succeed for the intended route
- you can explain whether a failure belongs to app routing, DNS sync, or certificate issuance