Secure AI Atlas SECURITY & GOVERNANCE

Risk

Prompt Injection

Untrusted instructions enter an AI workflow and compete with the system's intended authority.

LLM trust boundary application security

Exposure

Prompt Injection appears when an AI system reads language that was not written by the system owner but still treats it as operationally relevant. The source may be a user message, retrieved document, web page, ticket, email, code comment, or tool result.

The risk increases when the same workflow can access private context, call tools, summarize sensitive records, or produce output that another system will trust.

Signals

  • Retrieved content contains instructions aimed at the model rather than the human reader.
  • The model changes task, ignores constraints, or asks for broader access after reading external text.
  • Tool calls follow language found in untrusted content.
  • Security controls rely mainly on prompt wording.

Failure pattern

The system collapses separate authority levels into one language stream. Trusted instruction, user intent, and hostile or accidental external text compete inside the same model context. ATLAS marks this as a trust-boundary failure, not a wording problem.

  • Treat retrieved and external content as untrusted.
  • Limit tool permissions to the minimum operational need.
  • Separate retrieval from action where possible.
  • Require human approval for sensitive operations.
  • Log prompts, retrieved context, tool calls, and final decisions where privacy limits allow.