Learn/On-Call Rotation
ROLES

On-Call Rotation

A schedule that determines which engineer is responsible for answering alerts during a specific time period.

On-Call Rotation

A schedule that determines which engineer is responsible for answering alerts during a specific time period.

The Watchtower of Reliability

An On-Call Rotation is the mechanism that ensures software is reliable 24/7. It is a schedule where engineers take turns being "on duty," carrying a pager (or phone) to respond to system alerts.

The "You Build It, You Run It" Philosophy

In modern DevOps culture, the developers who write the code are also on-call for it. This creates a powerful feedback loop:

  1. Dev ships buggy code.
  2. Dev gets woken up at 3am.
  3. Dev fixes the bug and adds safeguards so they sleep next time.
  4. System reliability improves.

Types of Rotations

  • Weekly: One person is primary for 7 days. Simple, but high burnout risk if the week is bad.
  • Daily: Shifts rotate every 24 hours. Good for high-stress services.
  • Follow-the-Sun: Teams in London, NY, and Sydney hand off shifts so everyone works during their daytime.

ExThe Burnout Spiral

"A team of 4 had a weekly rotation. One engineer quit. The remaining 3 were on-call every 3 weeks."

Impact
Alert volume was high (20/week). Everyone burned out and 2 more quit.
Resolution
The VP froze feature work for 2 months to fix "flaky alerts". Alert volume dropped to 2/week. Hiring resumed.

Why On-Call Rotation Matters

Ensures 24/7 coverage without burning out a single individual.

Creates clear accountability for who owns the system health at any given moment.

Incentivizes engineers to build reliable systems (since they are the ones woken up).

Common Pitfalls

Hero Culture
One person volunteering to take all the holidays. It creates a "bus factor" of 1.
Unpaid On-Call
Expecting staff to work nights/weekends for free. It devalues their time.
Flaky Alerts
Paging for things that fix themselves ("cpu high") trains people to ignore the pager.

How to Use On-Call Rotation

⚖️
Fairness: Rotations should be even. No one should be permanently on "weekend duty".
💰
Compensation: Pay engineers for being on-call, regardless of whether they get paged.
🌙
Sleep Hygiene: If someone gets paged at 3am, they should sleep efficiently next day.

Frequently Asked Questions

How often should I be on-call?
Ideally, no more than 1 week every 6-8 weeks. Anything more frequent leads to fatigue.
Do I need a "secondary" on-call?
Yes. If the primary sleeps through a page or has a personal emergency, the secondary acts as the backup.
Should managers be on-call?
Managers should be on the *escalation* path (for decision support), but individual contributors usually handle the technical triage.

Learn More