High Availability Setup
Replica, anti-affinity, RTO/RPO-based HA design and verification.
HA is not just more replicas—it designs fault isolation, automatic failover, and RTO/RPO together. Spread gateway instances across availability zones to remove single points of failure.
Production requirement
Production gateways default to 3 replicas (minimum 2), anti-affinity, and load-balancer health checks.
Production minimum baseline
- Gateway replica 3 + pod/node anti-affinity
- Automatic failover tied to load-balancer health checks
- Deploy: rolling update,
maxUnavailable: 0 - Monthly runbook-based failure simulation
Enable high availability
- Use at least two AZs in infrastructure.
- Apply replica ≥ 3 and anti-affinity on Kubernetes/Swarm.
- Separate readiness and liveness probes to avoid bad traffic.
- Centralize config and secrets via ConfigMap/Secret or external vault.
- Monthly instance/AZ failure tests with recorded RTO.
Operations checklist
| Item | Frequency | Pass criteria |
|---|---|---|
| Instance failure injection | Weekly | Traffic stays up |
| AZ failure simulation | Monthly | RTO within 10 min |
| Rollback rehearsal | Pre-deploy | Healthy within 5 min |