ON-CALL MANAGEMENT
On-Call Responder
The engineer currently designated to receive and act upon system alerts and incidents.
On-Call Responder
The engineer currently designated to receive and act upon system alerts and incidents.
The Watchman
The responder's job is not necessarily to fix everything alone, but to respond.
- Acknowledge the alert (Stop the noise).
- Triage the issue (Is it real?).
- Mitigate or Escalate (Call for help).
ExStartup Responder Workflow
"2 AM database alert. Responder acknowledged in 3 minutes, checked dashboards, confirmed SEV1 (database down). Escalated to on-call DBA. While waiting, implemented cache workaround to restore read-only service. DBA arrived, full fix in 22 minutes."
Impact
Reduced MTTA from 15 min to 3 min, MTTR from 90 min to 22 min
Resolution
Documented runbook for common database failures
Why On-Call Responder Matters
The first line of defense for system reliability.
Common Pitfalls
Ignoring alerts due to fatigue
If you're experiencing alert fatigue, escalate to your manager. Don't silently ignore alerts—this causes outages.
Trying to fix everything alone
Know your limits. Escalate early if you're out of your depth. It's better to escalate than to let an incident drag on.
Related Terms
Frequently Asked Questions
What are the primary responsibilities of an on-call responder?
Acknowledge alerts within SLA (typically 5-15 min), triage severity, mitigate or escalate, document actions, and participate in post-mortems. The goal is rapid response, not solo heroics.
Should on-call responders fix everything themselves?
No. The responder's job is to respond, not necessarily to fix everything alone. Escalate to the right experts. Being on-call means owning the incident, not doing all the work.
How quickly should on-call responders acknowledge alerts?
Target MTTA is under 5 minutes for SEV0/SEV1, under 15 minutes for SEV2. Set up escalation rules if the primary responder doesn't acknowledge within the SLA.