CONCEPTS
Game Day
A scheduled practice session where teams simulate incidents to test response procedures.
Game Day
A scheduled practice session where teams simulate incidents to test response procedures.
"Fire Drill for Engineering"
A Game Day is a safe, scheduled time to break things and practice fixing them.
Goals
- Test the Humans: Does everyone know their role? Do they know how to open the war room?
- Test the Runbooks: Are the instructions accurate?
- Test the Tools: Did PagerDuty actually page the right person?
How to Run a Game Day
- Design: The "Master of Disaster" creates a secret scenario (e.g., "Database runs out of connections").
- Execute: They trigger the failure in Staging (or Prod).
- Respond: The team treats it like a real SEV1.
- Review: Debrief immediately. "Our runbook for DB restart was wrong."
ExThe Outdated Access Key
"During a Game Day, the team tried to failover to a backup region."
Impact
They discovered the SSH keys for the backup region had expired 6 months ago.
Resolution
They rotated the keys. If this had been a real disaster, they would have been locked out of their backup.
Why Game Day Matters
You don't want to figure out incident response during a real incident. Game days build muscle memory.
Teams that practice together respond faster and with less stress.
Common Pitfalls
Skipping the Debrief
The learning happens in the debrief, not the simulation. Don't skip it.
Shaming
If someone struggles during the drill, teach them. That is the point.
How to Use Game Day
🎯
Realistic Scenarios: Simulate actual outage causes.
👥
Rotate Roles: Let different people be Incident Commander.
📝
Debrief: Review what worked and what didn't.
Related Terms
Frequently Asked Questions
How often?
Quarterly is common. Monthly for high-velocity teams.