On-Call Rotation
A schedule that determines which engineer is responsible for answering alerts during a specific time period.
A schedule that determines which engineer is responsible for answering alerts during a specific time period.
The Watchtower of Reliability
An On-Call Rotation is the mechanism that ensures software is reliable 24/7. It is a schedule where engineers take turns being "on duty," carrying a pager (or phone) to respond to system alerts.
The "You Build It, You Run It" Philosophy
In modern DevOps culture, the developers who write the code are also on-call for it. This creates a powerful feedback loop:
- Dev ships buggy code.
- Dev gets woken up at 3am.
- Dev fixes the bug and adds safeguards so they sleep next time.
- System reliability improves.
Types of Rotations
- Weekly: One person is primary for 7 days. Simple, but high burnout risk if the week is bad.
- Daily: Shifts rotate every 24 hours. Good for high-stress services.
- Follow-the-Sun: Teams in London, NY, and Sydney hand off shifts so everyone works during their daytime.
ExThe Burnout Spiral
"A team of 4 had a weekly rotation. One engineer quit. The remaining 3 were on-call every 3 weeks."
Why On-Call Rotation Matters
Ensures 24/7 coverage without burning out a single individual.
Creates clear accountability for who owns the system health at any given moment.
Incentivizes engineers to build reliable systems (since they are the ones woken up).