Common Issues & Solutions
Symptom-based incident matrix and response flow.
Standardizing symptom–cause–action patterns for recurring incidents greatly reduces MTTR.
| Symptom | Likely cause | Immediate action | Permanent fix |
|---|---|---|---|
| 404 / route miss spike | Priority conflict | Rollback recent rules | Normalize priorities |
| 401 / 403 spike | Token expiry, NTP drift | Fix auth and time sync | Monitor renewal |
| Timeouts rising | Downstream latency | CB and queue buffering | Capacity and pool tuning |
| Retry storm | Prolonged outage | Lower retry limits | Redesign backoff |
Incident response flow
- Record blast radius, start time, and related deploy within 5 minutes.
- Check error rate, P95, and route miss on dashboards.
- Apply temporary measures (rollback, CB, traffic block).
- Reconstruct the request chain via Correlation ID to confirm root cause.
- File RCA notes and prevention tickets.