An engineer on your team gets a Datadog alert while writing code in Cursor. Without switching tabs, their agent checks who's on call, acknowledges the incident, investigates recent deploys, pages the right responder, and logs everything to the timeline.
That's not a demo. That's what Runframe's MCP server does in Cursor and Claude Code today.
npx @runframe/mcp-server --setup
Works with Cursor, Claude Code, VS Code, and Claude Desktop.
Every incident management tool today assumes a human is clicking through every step. We built the MCP server for the workflows where that's no longer true, where an agent does the coordination and the engineer makes the calls.
What's in the box
Here's what we ship.
Incidents (9 tools):
list_incidents— filter by status, severity, teamget_incident— full details with timeline and participantscreate_incident— spin one up from an alertupdate_incident— change severity, assignment, descriptionchange_incident_status— move through the workflow (investigating → fixing → resolved)acknowledge_incident— ack it, auto-assign, track SLAadd_incident_event— log findings to the timelineescalate_incident— escalate through the policypage_someone— page a responder via Slack or email
On-call (1 tool):
get_current_oncall— who's on call right now, filterable by team
Services (2 tools):
list_services— search across servicesget_service— details plus on-call instructions
Postmortems (2 tools):
create_postmortem— draft with root cause and action itemsget_postmortem— pull up what happened
Teams (2 tools):
list_teams— see all teamsget_escalation_policy— who gets paged at each level
How an agent runs an incident
A Datadog alert fires for elevated API latency on the payments service.
First thing the agent does is call get_incident. SEV2, payments service, opened 3 minutes ago. The monitoring integration already logged the trigger on the timeline.
Then get_current_oncall, filtered to the payments team. Gets back the primary on-call engineer.
acknowledge_incident. The incident moves to "investigating." SLA clock starts. The rest of the team can see someone's on it.
The agent pulls logs from Datadog (separate MCP server), checks recent commits in the codebase, and finds a deploy 20 minutes ago that changed the payment retry logic. It calls add_incident_event with what it found: "Likely caused by deploy #1847, payment retry logic change at 14:32 UTC. Error rate spiked 4 minutes after deploy."
page_someone. The on-call engineer gets a Slack DM and email with the full context and the agent's findings. They don't start from zero.
change_incident_status to "fixing." The timeline has the whole story. When the fix ships, the engineer resolves it, or the CI/CD pipeline does via the API.
Later, create_postmortem with the root cause, timeline, and suggested action items. The engineer reviews and edits instead of writing from scratch.
A handful of calls. The agent did the running around. The engineer decided what to actually do about it.
Why we kept the tool set small
Most incident management MCP servers fall into two camps: auto-generated (every API endpoint becomes a tool, you end up with 70-100 in context) or hand-crafted but sprawling (30-70 tools covering every possible use case). Agents struggle with both.
Each tool definition costs 200-400 tokens (name, description, input schema). A server with 70+ tools burns tens of thousands of tokens before the agent even starts on your problem.
But the token cost is only part of it. The fewer tools an agent has to choose from, the more reliably it picks the right one. When there's one way to list incidents and one way to get an incident, the agent doesn't have to guess between list_incidents, get_incidents, search_incidents, and query_incidents.
We started with the workflow (what does an agent need to run an incident from alert to postmortem?) and worked backward to the tool set. No bulk operations. No user management. No webhook CRUD. No billing endpoints. If it doesn't help an agent run an incident, it stays out.
MCP works when you design for agents
There's a growing chorus that MCP is overhyped. That agents can't reliably use tools. That the whole thing is a gimmick.
We think it comes down to design. MCP (Model Context Protocol) does exactly what it says: lets an agent call tools with structured inputs and get structured outputs back. When an MCP server has well-named, well-described tools scoped to a single workflow, agents use them reliably. We've tested it.
The trick is treating tool design the same way you'd treat API design. Clear names. Descriptions written for LLMs, not humans reading docs. Each tool answers one question an agent would actually ask.
Getting started
Interactive setup (walks you through it):
npx @runframe/mcp-server --setup
Claude Code:
claude mcp add runframe -e RUNFRAME_API_KEY=rf_your_key -- npx -y @runframe/mcp-server
Cursor / VS Code, add to your MCP config:
{
"mcpServers": {
"runframe": {
"command": "npx",
"args": ["-y", "@runframe/mcp-server"],
"env": { "RUNFRAME_API_KEY": "rf_your_key" }
}
}
}
Get your API key from Settings → API Keys. Scoped permissions, so give the key only what it needs.
Start a free 28-day trial at runframe.io, no credit card required. MCP is included. MIT licensed, source on GitHub.
What's next
We're going to be laser-focused on adding only what agents actually need. If a tool doesn't make an agent better at handling incidents, it doesn't ship.
On the short list:
- Slack channel tools (create incident channels, post updates)
- Analytics (MTTR trends, incident frequency by service)
- Incident templates
That's it for now. We'd rather have 20 tools that work than 70 that look good in a README.
Common questions
What about write safety?
page_someone and escalate_incident) are clearly marked as destructive in their descriptions, so the agent knows to confirm before firing them. API keys are scoped, so you can give a key read-only access if you want.
Can I self-host it?
Is there an HTTP transport for CI/CD pipelines?
--transport http --port 3100. It takes a bearer token for auth, supports multiple clients, and is stateless so you can load-balance it.
Your agent is already in the IDE. Now it has an incident management layer that keeps up.