CONCEPTS
Toil
Operational work that tends to be manual, repetitive, and automatable.
Toil
Operational work that tends to be manual, repetitive, and automatable.
"Work That No One Wants to Do"
Toil is the kind of work that scales linearly with service growth. If you double your users, and you have to double your manual work, that is Toil.
Identifying Toil
Work is Toil if it is:
- Manual: Hands-on keyboard.
- Repetitive: Doing the same thing over and over.
- Automatable: A machine could do it.
- Tactical: No long-term value.
- No Enduring Value: The work stops as soon as you stop doing it.
The 50% Rule
Google SREs aim to spend max 50% of time on Toil and min 50% on Engineering (building automation). If Toil exceeds 50%, you stop shipping features and build robots.
ExThe Manual Onboarding
"A team manually ran SQL queries to provision new enterprise customers. It took 30 minutes per customer."
Impact
As sales grew, engineers spent 20 hours/week running SQL.
Resolution
They built a self-service admin portal. Now Sales clicks a button, and provisioning takes 0 engineering time. Toil eliminated.
Why Toil Matters
Toil is the enemy of engineering productivity. High toil means less time for feature work.
Google SRE principles suggest keeping toil under 50% of total work.
Common Pitfalls
Glorifying Toil
Promoting engineers who "work hard" doing manual tasks. Promote the ones who automate themselves out of a job.
How to Use Toil
🤖
Automate: If you do it twice, script it.
📊
Measure Toil: Track time spent on manual tasks.
🎯
Eliminate Root Causes: Fix systems so toil disappears.
Related Terms
Frequently Asked Questions
Is all operational work Toil?
No. responding to a unique, novel 3 AM outage is "Operational Work," but it is not Toil (because it is not repetitive/predicable).
Can we eliminate 100% of toil?
Unlikely. Aim to reduce it, but the law of diminishing returns applies.