Subject Matter Expert (SME)
The technical specialist responsible for diagnosing and fixing the specific service or component causing the incident.
The technical specialist responsible for diagnosing and fixing the specific service or component causing the incident.
The Hands on the Keyboard
If the Incident Commander is the conductor, the Subject Matter Expert (SME) is the soloist. They are the engineer who knows exactly why the Redis queue is backing up or why the load balancer is throwing 502s.
Responsibilities
- Diagnosis: finding the root cause.
- Mitigation: stopping the bleeding (e.g., adding capacity, rolling back).
- Communication: telling the IC what they are doing before they do it.
How to be a Great SME
The best SMEs don't just fix things; they communicate.
- Bad SME: Goes silent for 20 minutes, then says "Fixed it."
- Good SME: "I suspect a bad migration. I am going to verify the schema version. (5 mins later) Verified. I requested permission to rollback."
One Area of Focus
An SME should focus on one thing. If the incident spans multiple services (e.g., Database + Frontend), you need multiple SMEs. The IC coordinates them; the SMEs fix the systems.
ExThe Hero Trap
"A senior engineer tried to fix a complex cascading failure alone. They worked for 12 hours straight without sleep."
Why Subject Matter Expert Matters
SMEs have the deep context that the IC lacks.
They are the only ones allowed to touch production during an incident.
They translate "it's broken" into "the connection pool is exhausted".