Article
MCP Security: When Protocol Becomes Attack Vector
Agentjacking demonstrated that MCP trust is not automatic. Here is how the attack works, why protocol-level security matters, and what controls need to change.
MCP Security: When Protocol Becomes Attack Vector
June 22, 2026
On June 12, 2026, Tenet Security’s Threat Labs disclosed a class of attacks they named Agentjacking. A developer using Claude Code with a Sentry integration had their coding agent extract an error stack trace from Sentry, analyze it, and then execute a command suggested by the trace. The trace was fake. An attacker with a publicly known Sentry DSN had submitted it.
The agent followed the instruction because the data arrived through an MCP channel it trusted.
This is not a model hallucination. This is a protocol trust gap.
The Architecture of Trust in MCP
Model Context Protocol (MCP) defines how an AI agent discovers and invokes external tools, data sources, and services. It is the layer that turns a language model into an operational agent capable of reading files, querying databases, calling APIs, and executing commands.
MCP servers advertise capabilities. Agents connect to them and request resources. When a Sentry MCP server returns an error report, the agent processes it as factual context — not as untrusted input.
The protocol specification does not define trust tiers for data sources. A database query returning customer records and a Sentry endpoint returning user-submitted error reports arrive through the same channel with the same trust level.
This is the architectural gap Agentjacking exploits.
The Attack Surface
Agentjacking requires three conditions:
-
A publicly accessible telemetry endpoint. Sentry DSNs are commonly public. They appear in configuration files, environment variables, and source code repositories. Anyone who knows the DSN can submit events.
-
An agent that consumes telemetry as context. Claude Code, Cursor, and Codex all fetch error context through MCP when debugging. The agent treats the telemetry data as ground truth for the debugging session.
-
Injected instructions in the telemetry payload. The attacker crafts an error report whose stack trace or metadata contains instructions. When the agent analyzes the report, it treats those instructions as task context and may execute them.
In the Agentjacking proof of concept, the injected instruction was a shell command. The agent, operating with the developer’s permissions, ran it.
Why Existing Controls Are Insufficient
| Control | What it prevents | What it misses |
|---|---|---|
| Prompt-level input filtering | Direct injection in user prompts | Data arriving through MCP channels is not user-generated — it is system-generated, and filtering is rarely applied to system data |
| Human approval for tool calls | Unauthorized tool invocations | The agent calls a legitimate tool (Sentry, filesystem). The danger is in what the tool returns, not which tool is called |
| Sandboxed execution | Command execution outside sandbox | The agent already has permissions for its workspace. Sentry data is part of that workspace |
| Output validation | Malicious output reaching users | The damage happens during execution, not in output |
The gap is that security controls operate at the model level or the tool-call level, but MCP transports data that crosses both layers without being checked at the protocol level.
A Trust Model for MCP Channels
A practical trust model for MCP channels should classify data sources along three axes:
1. Origin Authentication
Can the data source verify that events were generated by an authenticated system? Public Sentry DSNs: no. Authenticated APIs with signed payloads: yes.
Control: Require authenticated origins for any MCP data source whose content can affect agent behavior.
2. Content Provenance
Can the agent verify who authored each piece of data? In telemetry, individual events may come from different users, systems, or attackers — all arriving through the same channel.
Control: Separate data by provenance. A Sentry MCP server that offers contextual information should distinguish between “events your application generated” and “events anyone submitted.”
3. Action Distance
How far is the data from an executable action? A stack trace that contains a shell command has high action distance. A log line that says “connection reset” has low action distance.
Control: Agents should measure action distance before treating MCP data as context for tool invocations. If the data contains instructions with operational effects, those instructions should require human approval — even if they arrived through a trusted channel.
Implications for Agent Architecture
Protocol designers
MCP needs a trust model. A simple approach: tag MCP resources with trust levels. An agent should process a file from a local database differently than an event from a public Sentry endpoint. The MCP specification currently has no such mechanism.
Agent developers
Do not assume that data arriving through MCP is safe. Apply input validation to MCP resources the same way you would apply it to user prompts. The fact that data is system-generated does not make it trustworthy.
Security engineers
Map MCP channels in your threat model. For each agent deployment, document: which MCP servers does the agent connect to, what data do those servers serve, and who can submit data to each server. If any server accepts unauthenticated data, that is a trust boundary.
Platform teams
When an agent connects to Sentry, Datadog, or any observability platform through MCP, ask: can an external attacker submit data to this tool? If yes, the channel is a vector.
The Broader Pattern
Agentjacking is not an isolated bug. It is a pattern: when agents consume operational data through protocols designed for developer convenience, security is assumed rather than verified.
The same pattern appears in:
- CI/CD integrations where agents read build logs that contain injected instructions
- Database connections where an agent queries a vector store that contains poisoned data
- Browser automation where an agent reads page content that includes adversarial prompts
- Email and messaging where an agent processes messages as operational context
Each case shares the same root cause: the agent trusts the channel more than the content.
What This Means for the Risk Catalogue
Secure AI Atlas tracks Agentic Supply Chain Compromise, Insecure Tool Invocation, and Indirect Prompt Injection as distinct risks. Agentjacking sits at their intersection.
The risk is not fully covered by any single existing entry:
- Supply Chain Compromise covers the artifact chain (model, library, dependency)
- Insecure Tool Invocation covers unauthorized tool access
- Indirect Prompt Injection covers adversarial content in data sources
Agentjacking adds a fourth dimension: protocol-mediated trust, where the channel itself becomes the attack vector regardless of the data content.
We are registering MCP Channel Trust as a concept candidate for the next conceptual review.
References
- Tenet Security, “Agentjacking: Hijacking AI Coding Agents via MCP + Sentry” (PointGuard AI, June 12, 2026). https://www.pointguardai.com/ai-security-incidents/agentjacking-shows-coding-agents-can-take-the-bait
- NVD CVE-2026-2256 — MS-Agent Prompt Injection to RCE (CVSS 9.8). https://nvd.nist.gov/vuln/detail/CVE-2026-2256
- OWASP LLM Top 10 2026 Update. https://owasp.org/www-project-top-10-for-large-language-model-applications/
- MCP Specification — Model Context Protocol. https://modelcontextprotocol.io/