Log Analysis Guide
Correlation-ID-based RCA and operational KPIs.
Effective RCA rebuilds the transaction chain with Correlation ID, not isolated log lines.
Analysis workflow
- Confirm incident window and affected APIs
- Collect gateway → upstream logs by Correlation ID
- Classify error codes, latency, and retry patterns
- Correlate DB/HTTP/broker metrics
- Apply temporary and permanent fixes separately
Key KPIs
| KPI | Description | Target example |
|---|---|---|
| MTTA | Time to detect | < 5 min |
| MTTR | Time to recover | < 30 min |
| P95 Latency | 95th percentile delay | Within SLO |
| Error Rate | Failure ratio | < 1% |
{
"timestamp": "...",
"level": "ERROR",
"service": "gateway-payment",
"correlationId": "trace-uuid",
"routeId": "payment-v2",
"upstream": "svc-billing",
"errorCode": "UPSTREAM_TIMEOUT",
"latencyMs": 742
}