Secure AI Atlas mark Secure AI Atlas SECURITY & GOVERNANCE

Article

MCP Security: When Protocol Becomes Attack Vector

Agentjacking demonstrated that MCP trust is not automatic. Here is how the attack works, why protocol-level security matters, and what controls need to change.

mcp agent-security supply-chain indirect-prompt-injection agentjacking

MCP Security: When Protocol Becomes Attack Vector

June 22, 2026

On June 12, 2026, Tenet Security’s Threat Labs disclosed a class of attacks they named Agentjacking. A developer using Claude Code with a Sentry integration had their coding agent extract an error stack trace from Sentry, analyze it, and then execute a command suggested by the trace. The trace was fake. An attacker with a publicly known Sentry DSN had submitted it.

The agent followed the instruction because the data arrived through an MCP channel it trusted.

This is not a model hallucination. This is a protocol trust gap.

The Architecture of Trust in MCP

Model Context Protocol (MCP) defines how an AI agent discovers and invokes external tools, data sources, and services. It is the layer that turns a language model into an operational agent capable of reading files, querying databases, calling APIs, and executing commands.

MCP servers advertise capabilities. Agents connect to them and request resources. When a Sentry MCP server returns an error report, the agent processes it as factual context — not as untrusted input.

The protocol specification does not define trust tiers for data sources. A database query returning customer records and a Sentry endpoint returning user-submitted error reports arrive through the same channel with the same trust level.

This is the architectural gap Agentjacking exploits.

The Attack Surface

Agentjacking requires three conditions:

  1. A publicly accessible telemetry endpoint. Sentry DSNs are commonly public. They appear in configuration files, environment variables, and source code repositories. Anyone who knows the DSN can submit events.

  2. An agent that consumes telemetry as context. Claude Code, Cursor, and Codex all fetch error context through MCP when debugging. The agent treats the telemetry data as ground truth for the debugging session.

  3. Injected instructions in the telemetry payload. The attacker crafts an error report whose stack trace or metadata contains instructions. When the agent analyzes the report, it treats those instructions as task context and may execute them.

In the Agentjacking proof of concept, the injected instruction was a shell command. The agent, operating with the developer’s permissions, ran it.

Why Existing Controls Are Insufficient

ControlWhat it preventsWhat it misses
Prompt-level input filteringDirect injection in user promptsData arriving through MCP channels is not user-generated — it is system-generated, and filtering is rarely applied to system data
Human approval for tool callsUnauthorized tool invocationsThe agent calls a legitimate tool (Sentry, filesystem). The danger is in what the tool returns, not which tool is called
Sandboxed executionCommand execution outside sandboxThe agent already has permissions for its workspace. Sentry data is part of that workspace
Output validationMalicious output reaching usersThe damage happens during execution, not in output

The gap is that security controls operate at the model level or the tool-call level, but MCP transports data that crosses both layers without being checked at the protocol level.

A Trust Model for MCP Channels

A practical trust model for MCP channels should classify data sources along three axes:

1. Origin Authentication

Can the data source verify that events were generated by an authenticated system? Public Sentry DSNs: no. Authenticated APIs with signed payloads: yes.

Control: Require authenticated origins for any MCP data source whose content can affect agent behavior.

2. Content Provenance

Can the agent verify who authored each piece of data? In telemetry, individual events may come from different users, systems, or attackers — all arriving through the same channel.

Control: Separate data by provenance. A Sentry MCP server that offers contextual information should distinguish between “events your application generated” and “events anyone submitted.”

3. Action Distance

How far is the data from an executable action? A stack trace that contains a shell command has high action distance. A log line that says “connection reset” has low action distance.

Control: Agents should measure action distance before treating MCP data as context for tool invocations. If the data contains instructions with operational effects, those instructions should require human approval — even if they arrived through a trusted channel.

Implications for Agent Architecture

Protocol designers

MCP needs a trust model. A simple approach: tag MCP resources with trust levels. An agent should process a file from a local database differently than an event from a public Sentry endpoint. The MCP specification currently has no such mechanism.

Agent developers

Do not assume that data arriving through MCP is safe. Apply input validation to MCP resources the same way you would apply it to user prompts. The fact that data is system-generated does not make it trustworthy.

Security engineers

Map MCP channels in your threat model. For each agent deployment, document: which MCP servers does the agent connect to, what data do those servers serve, and who can submit data to each server. If any server accepts unauthenticated data, that is a trust boundary.

Platform teams

When an agent connects to Sentry, Datadog, or any observability platform through MCP, ask: can an external attacker submit data to this tool? If yes, the channel is a vector.

The Broader Pattern

Agentjacking is not an isolated bug. It is a pattern: when agents consume operational data through protocols designed for developer convenience, security is assumed rather than verified.

The same pattern appears in:

  • CI/CD integrations where agents read build logs that contain injected instructions
  • Database connections where an agent queries a vector store that contains poisoned data
  • Browser automation where an agent reads page content that includes adversarial prompts
  • Email and messaging where an agent processes messages as operational context

Each case shares the same root cause: the agent trusts the channel more than the content.

What This Means for the Risk Catalogue

Secure AI Atlas tracks Agentic Supply Chain Compromise, Insecure Tool Invocation, and Indirect Prompt Injection as distinct risks. Agentjacking sits at their intersection.

The risk is not fully covered by any single existing entry:

  • Supply Chain Compromise covers the artifact chain (model, library, dependency)
  • Insecure Tool Invocation covers unauthorized tool access
  • Indirect Prompt Injection covers adversarial content in data sources

Agentjacking adds a fourth dimension: protocol-mediated trust, where the channel itself becomes the attack vector regardless of the data content.

We are registering MCP Channel Trust as a concept candidate for the next conceptual review.

References

  1. Tenet Security, “Agentjacking: Hijacking AI Coding Agents via MCP + Sentry” (PointGuard AI, June 12, 2026). https://www.pointguardai.com/ai-security-incidents/agentjacking-shows-coding-agents-can-take-the-bait
  2. NVD CVE-2026-2256 — MS-Agent Prompt Injection to RCE (CVSS 9.8). https://nvd.nist.gov/vuln/detail/CVE-2026-2256
  3. OWASP LLM Top 10 2026 Update. https://owasp.org/www-project-top-10-for-large-language-model-applications/
  4. MCP Specification — Model Context Protocol. https://modelcontextprotocol.io/