INCIDENT RESPONSE
The Incident Management Lifecycle
The end-to-end journey of an incident from the moment it occurs until the post-incident review is completed.
Incident Lifecycle
The end-to-end journey of an incident from the moment it occurs until the post-incident review is completed.
From Chaos to Closure
Every incident, whether a minor bug or a major outage, follows the same lifecycle. Understanding these stages helps teams move faster.
The 5 Stages
- Detection: The system breaks. Alerts fire. (Metric: MTTD).
- Triage: Impact is assessed. Responders are paged. (Metric: MTTA).
- Response: The team investigates, communicates, and mitigates.
- Resolution: Service is restored. (Metric: MTTR).
- Post-Mortem: The team learns why it happened and prevents recurrence.
ExSaaS API Outage
"API goes down at 2 AM. Detected in 2 min, triaged in 5 min, team responded in 15 min. Workaround restored service in 45 min. Root cause found next day. Post-mortem completed within 48 hours with action items."
Impact
Reduced MTTR from 4 hours to 45 minutes over 6 months
Resolution
Implemented circuit breaker pattern to prevent cascade failures
Why Incident Lifecycle Matters
Provides a structured framework so teams know "what comes next".
Ensures no step (like the Post-Mortem) is skipped.
Common Pitfalls
Declaring resolution too early
Only declare resolution when the fix is verified in production. False closures damage customer trust.
Skipping post-mortem for "minor" incidents
Conduct lightweight post-mortems for all incidents. Small issues often indicate systemic problems.
Related Terms
Frequently Asked Questions
What is the most important stage of the incident lifecycle?
Detection is critical because you cannot fix what you do not know is broken. Improving MTTD often has the highest ROI for reducing overall MTTR.
How long should each stage of the incident lifecycle take?
Target times vary by severity: SEV0 should be resolved in under 1 hour, SEV1 in under 4 hours, SEV2 in under 24 hours. Post-mortems should be completed within 5 business days.
Can you skip stages in the incident lifecycle?
You should never skip the post-mortem stage. Even minor incidents have lessons. For major incidents, every stage is critical for learning and prevention.