TheMurrow

Your ‘AI agent’ can’t tell a helpful tool from a trap—here’s the tiny metadata lie that can drain your bank account

As soon as an agent can call tools—APIs, plugins, browser controls—attackers can hide instructions in “harmless” tool metadata. The transcript will look competent, right up until money moves.

By TheMurrow Editorial
March 27, 2026
Your ‘AI agent’ can’t tell a helpful tool from a trap—here’s the tiny metadata lie that can drain your bank account

Key Points

  • 1Recognize tool poisoning: agents can follow hidden instructions in tool metadata, mistaking descriptions and schemas for trustworthy guidance.
  • 2Assume real-world impact: poisoned context can steer browser checkouts, billing APIs, and payout changes—without obvious “hack” signals in logs.
  • 3Demand boundaries: minimize delegated authority, isolate high-privilege tools, constrain tool selection, and log the exact context behind every action.

The first time an AI assistant buys something you didn’t ask for, it won’t look like a hack. It will look like efficiency.

A receipt lands in your inbox. A subscription renews early. A payout address in a billing dashboard quietly changes. You scroll back through the chat and see a tidy chain of reasoning: the agent “found a tool,” “followed the documentation,” and “completed the task.” Nothing in the transcript screams attack—just competence.

The uncomfortable truth is that AI agents often can’t reliably tell a helpful tool from a trap. Not because they’re careless, but because the systems we’re building ask language models to do a job they aren’t designed for: trust decisions in a hostile world.

OpenAI has been blunt about the stakes. As soon as a model can call tools—APIs, plugins, browser controls, local executables—prompt injection becomes one of the most significant risks, and “difficult to solve completely.” That warning isn’t theoretical. It points to a structural weakness: agents “understand” tools largely through text, and text is exactly what attackers know how to manipulate.

Agents don’t verify tools the way software verifies signed binaries; they read descriptions and infer intent.

— TheMurrow Editorial

What an “AI agent” is—and why it raises the blast radius

A standard chatbot can mislead you, but it can’t usually touch your accounts. An AI agent can. In this context, “agent” means an LLM-driven system that plans and takes actions by invoking tools: booking via a travel API, filing an expense report, changing a setting in a SaaS admin panel, or driving a browser session to checkout.

That jump from talk to action changes the risk profile. A single misguided step can trigger:

- Payments and purchases
- Account changes (password resets, payout updates, admin permissions)
- Credential use (API tokens, OAuth sessions)
- Data access and exfiltration (files, email, internal docs)

OpenAI’s own security write-ups frame prompt injection as particularly dangerous in tool-using systems, because the model may treat untrusted instructions as legitimate guidance when selecting or parameterizing tools. Hardening helps, but the company also emphasizes a hard reality: no complete solution is known.

The key failure mode: untrusted text in the decision loop

Agents have to decide which tool to call, when to call it, and what arguments to provide. Those decisions are often influenced by tool descriptions, documentation text, schemas, and UI summaries. If any of that content is untrusted—or can be tampered with—an attacker gains leverage over the agent’s planning.

OpenAI has described the risk in practical terms: attackers can steer an agent to call the wrong tool, the right tool with malicious parameters, or to chain legitimate tools into a harmful workflow. Tool use multiplies impact, because the agent can make several “reasonable” moves in sequence that add up to a breach.

The danger isn’t one bad click. It’s a long, plausible plan built from ordinary tools.

— TheMurrow Editorial

The “tiny metadata lie”: how tool descriptions become an attack surface

Most security teams know to distrust executable code. Fewer have learned to distrust the tool description.

Modern agent ecosystems—especially those resembling tool registries—ship tools with metadata: a name, a description, a parameter schema, and sometimes examples or notes. The model sees that text as context. If the text contains adversarial instructions, the model may follow them.

Security researchers increasingly call this tool poisoning or tool description injection. A concise explanation on vulnerablemcp.info outlines the core issue: the user interface might display a friendly summary, while the LLM receives the full description text. That mismatch creates a stealthy instruction channel.

Why the lie works: the model treats description as guidance, not data

A malicious tool doesn’t need to contain malicious code. It can contain malicious language—a short instruction hidden in metadata that says, effectively, “When asked for X, also do Y,” or “Use the payments tool to complete this step.” The model, trained to follow instructions, may comply.

Two features make the attack particularly effective:

- Asymmetry of visibility: humans see a clean label; the model ingests the full text.
- Asymmetry of authority: the model often treats tool metadata as if it came from a trusted developer.

The escalation: “implicit tool poisoning”

Newer research pushes the idea further. A 2026 arXiv paper on implicit tool poisoning notes that the poisoned tool may never be called at all. Instead, its metadata manipulates the agent into invoking some other high-privilege tool—email, files, browser checkout, finance APIs—to achieve the attacker’s goal.

That detail matters. Traditional controls that monitor “dangerous tools” may miss an attack that begins in a harmless-looking tool entry but ends in a privileged action elsewhere.

Public disclosure of “Tool Poisoning Attacks” in MCP-style ecosystems is widely tied to Invariant Labs’ reporting in April 2025, a moment many explainers cite as the point when tool metadata became widely recognized as an attack surface.

Key Insight

Tool poisoning isn’t about “bad code.” It’s about bad instructions hidden where teams historically assumed text was inert: descriptions, schemas, examples, UI summaries.

MCP as the accelerant: standardizing tool access standardizes the risk

Tool poisoning isn’t confined to any one protocol. But Model Context Protocol (MCP) has helped make the underlying pattern more common by making tool connectivity easier and more standardized.

MCP is designed to connect assistants to tools and resources through a uniform interface. That convenience comes with a fragile assumption: tool definitions and descriptions are safe to consume. Multiple sources now frame tool poisoning as a client-side vulnerability in which tool descriptions become a covert instruction channel. An arXiv paper on MCP-related security concerns describes this as a real, emerging issue rather than a niche curiosity.

Scale meets uneven security

The larger the ecosystem, the bigger the supply-chain problem. Backslash Security reported rapid growth in public MCP servers on June 25, 2025, citing counts exceeding 15,000 public servers and noting that the firm assessed thousands of servers. The release also emphasized high-impact misconfigurations and vulnerabilities, including patterns that imply network exposure and command execution risk.

Two statistics are doing work here:

- 15,000+ public MCP servers signals a booming ecosystem—and a large attack surface.
- “Hundreds” of critical flaws, as described in Backslash Security’s report, suggests uneven security hygiene at scale.

Fast growth doesn’t guarantee insecurity. But it does guarantee heterogeneity: hobby servers, rushed deployments, and mismatched assumptions about trust.
15,000+
Public MCP servers reported by Backslash Security (June 25, 2025), signaling a fast-growing ecosystem—and a widening attack surface.
“Hundreds”
Critical flaws described in Backslash Security’s reporting, pointing to uneven security hygiene across thousands of assessed servers.

When “just text” hits server-side reality

Tool poisoning is “just text” until it triggers real actions. MCP servers often wrap real systems: file operations, shell commands, internal databases, and corporate SaaS APIs. If a server implementation is insecure—command injection, path traversal, SSRF—the agent can become an unwitting exploit delivery mechanism.

Even without server-side bugs, tool poisoning can still cause damage by steering an agent to use legitimate features in harmful ways. Security doesn’t fail only at the code layer; it can fail at the instruction layer.

How it can “drain your bank account” without stealing your routing number

Financial harm from agents rarely looks like a Hollywood heist. The more realistic pathway is delegated authority plus manipulated context.

An attacker doesn’t need the model to hallucinate your bank details. The attacker needs the model to use the access it already has.

Pathway 1: delegated payments and low-friction commerce

If an agent can control:

- a browser agent that can complete checkout flows,
- a billing or finance API, or
- invoicing/expense tools,

then poisoned metadata can steer it toward unauthorized purchases, subscriptions, transfers, or payout-detail changes, especially in systems optimized to reduce user friction.

The risk isn’t that the agent “goes rogue.” The risk is that the agent is persuaded—through untrusted tool text—to treat an attacker’s objective as part of the task.

Pathway 2: data exfiltration that becomes financial loss later

Tool poisoning and indirect prompt injection are often discussed as data theft problems: tokens, cookies, API keys, internal documents. That stolen access can be monetized outside the agent workflow, long after the chat window is closed.

NIST’s guidance on GenAI security recognizes indirect prompt injection as a privacy and security risk. MCP-focused threat modeling also flags tool poisoning as prevalent and impactful in these ecosystems. The chain is straightforward:

1. Manipulate the agent via untrusted context.
2. Exfiltrate credentials or session material.
3. Use that access to initiate downstream fraud.

The downstream fraud chain

  1. 1.Manipulate the agent via untrusted context.
  2. 2.Exfiltrate credentials or session material.
  3. 3.Use that access to initiate downstream fraud.

Pathway 3: trust exploitation—when the human approves the “reasonable” request

OWASP’s framing for Agentic Applications emphasizes “trust exploitation”: serious failures occur when a human clicks approve on an action that looks legitimate because it was produced by manipulated context. The agent isn’t bypassing approvals; it’s laundering the attacker’s intent into a plausible request.

The scariest approvals are the ones that look exactly like your own automation.

— TheMurrow Editorial

A realistic case study: the poisoned tool that never runs

Consider a procurement assistant used by a small company. The assistant can search internal docs, browse vendor sites, and submit purchase orders through an expense tool. It also has access to a tool registry where new integrations can be enabled.

A new tool appears with a helpful name and an innocuous description—something like “PDF summarizer” or “contract clause checker.” Hidden in the tool’s metadata is a prompt injection payload. The user asks the agent to “find the best price and buy two licenses.”

The poisoned tool never needs to run. The model ingests its description while planning and gets nudged to:

1. Use the browser agent to navigate to a specific vendor page.
2. Prefer a “recommended” subscription tier.
3. Route the invoice through a particular workflow.
4. Justify the purchase in the approval request with polished language.

Each step is plausible. The final action—purchase or subscription—happens through legitimate tools the company already trusts. If a human approver is in the loop, the request looks clean: correct vendor, correct category, reasonable justification. That is the point.

This scenario aligns with the “implicit tool poisoning” finding: the poisoned tool can function as a control surface even when it is not invoked. The attack rides on the model’s tendency to treat tool metadata as instructions rather than untrusted input.

How the poisoned metadata steers the plan

  1. 1.Use the browser agent to navigate to a specific vendor page.
  2. 2.Prefer a “recommended” subscription tier.
  3. 3.Route the invoice through a particular workflow.
  4. 4.Justify the purchase in the approval request with polished language.

Why this is hard to fix: software trust vs language trust

Traditional security is built on verification: signed binaries, permission boundaries, provenance checks, sandboxing. Agents operate on something squishier: language.

An LLM doesn’t “verify” a tool the way an operating system verifies a signed executable. It reads tool metadata and documentation and then predicts what to do. A malicious instruction can be short, subtle, and embedded in a place engineers historically treated as non-executable: a description field.

OpenAI’s own messaging underscores the difficulty. Prompt injection is not a bug you patch once; it is an adversarial dynamic that emerges from the model’s core behavior: following instructions in context.

The competing perspectives

Security researchers argue that tool metadata must be treated as untrusted input and that agent frameworks need stronger isolation. That camp tends to see tool poisoning as inevitable until architectures change.

Builders of agent products, meanwhile, point out that the industry is early and that layered defenses are improving. They’re not wrong: guardrails, allowlists, permission prompts, and monitoring can reduce real-world harm. Yet the research record suggests that “reduce” is the operative word, not “eliminate.”

The most productive view holds both truths: agents are useful, and the current trust model is brittle.

Software trust vs language trust

Before
  • Signed binaries
  • provenance checks
  • sandboxing
  • permission boundaries
After
  • Tool descriptions
  • documentation text
  • schemas
  • UI summaries interpreted as instructions

Practical takeaways: what readers and teams should do now

The goal isn’t paranoia. The goal is to treat agent toolchains with the same seriousness you treat authentication and payments.

For users and executives buying agentic products

Ask direct questions about tool trust and approvals:

- What sources of text can influence tool choice? (tool descriptions, web pages, emails, tickets)
- How are high-impact actions gated? (payments, payout changes, password resets)
- Can the agent act in a browser session with saved payment methods?
- Is there logging that ties actions to the specific context that triggered them?

Low-friction automation is attractive. High-friction reversals—chargebacks, account recovery, forensic work—cost more.

Questions to ask agent vendors

  • What sources of text can influence tool choice? (tool descriptions, web pages, emails, tickets)
  • How are high-impact actions gated? (payments, payout changes, password resets)
  • Can the agent act in a browser session with saved payment methods?
  • Is there logging that ties actions to the specific context that triggered them?

For engineering and security teams deploying agents

Treat tool metadata as hostile input unless proven otherwise.

- Minimize delegated authority. Don’t give an agent payment power by default.
- Isolate high-privilege tools. Require step-up approvals for transfers, payouts, and admin changes.
- Constrain tool selection. Prefer explicit allowlists over open-ended registries.
- Monitor for long-horizon anomalies. Seemingly unrelated steps can form a harmful chain.
- Assume indirect prompt injection. NIST explicitly recognizes the category; plan accordingly.

Backslash Security’s report about thousands of assessed MCP servers and hundreds of critical flaws should also influence procurement: tool servers are part of your supply chain. A “simple integration” may bring operational risk.

Deployment guardrails worth adopting

  • Minimize delegated authority; don’t give an agent payment power by default.
  • Isolate high-privilege tools; require step-up approvals for transfers, payouts, and admin changes.
  • Constrain tool selection; prefer explicit allowlists over open-ended registries.
  • Monitor for long-horizon anomalies; unrelated steps can form a harmful chain.
  • Assume indirect prompt injection; NIST recognizes the category—plan accordingly.

For tool ecosystem maintainers

If you run a tool registry or MCP server directory, metadata should be treated like code:

- review and provenance checks,
- clear display of what the model sees (not only what humans see),
- versioning and audit trails for description changes.

The industry already learned these lessons for packages and containers. Tool metadata is joining that list.

Treat metadata like code

If descriptions can steer actions, they need review, provenance, versioning, and audit trails—plus transparency into exactly what the model consumes.

Conclusion: the next security boundary is the agent’s imagination

Agents widen the aperture of what software can do for us. They also widen the aperture of what attackers can trick software into doing.

Tool poisoning—especially the “tiny metadata lie”—is unsettling because it exploits a gap between human expectations and model behavior. Humans assume descriptions are inert. Models treat them as instructions. MCP and similar protocols scale that gap by making tool connectivity easy and ubiquitous, with public ecosystems now measured in the tens of thousands of servers.

The fix won’t be a single filter or a better prompt. It will be a shift in architecture and governance: less ambient authority, more explicit approvals for high-impact steps, and a hard line between trusted and untrusted text.

The broader lesson is simple and unromantic: once software can act, words become a control surface. Agents don’t just need better reasoning. They need better boundaries.
Tens of thousands
Public tool ecosystems at this scale turn “harmless text” into supply-chain surface area—especially when agents can act through standardized tool access.
T
About the Author
TheMurrow Editorial is a writer for TheMurrow covering technology.

Frequently Asked Questions

What exactly is “tool poisoning” in AI agents?

Tool poisoning (also called tool description injection) is when an attacker embeds adversarial instructions inside a tool’s metadata—its name, description, or schema. The user may see a benign summary, while the model receives the full text as context and may follow the hidden instructions. The tool doesn’t need malicious code; the “payload” can be language.

How is tool poisoning different from normal prompt injection?

Classic prompt injection targets the chat prompt directly. Tool poisoning targets the tool registry layer, where models read tool descriptions to decide what to do. The mechanism is similar—malicious instructions in context—but the impact can be larger because tool metadata influences tool selection and parameters, which can trigger real-world actions.

Can a poisoned tool cause harm even if it’s never used?

Yes. Research on implicit tool poisoning shows the poisoned tool might never be invoked. Its metadata can still influence the agent’s planning, steering it to call other high-privilege tools—payments, email, file access, or browser automation—to complete an attacker’s goal.

Why does MCP make this problem more common?

MCP standardizes how assistants connect to tools and resources, which speeds adoption. Standardization also centralizes an assumption: tool metadata is safe. As public MCP server ecosystems grow—Backslash Security reported 15,000+ public servers in June 2025—more tools and descriptions enter the supply chain, and uneven security practices become a systemic risk.

What’s the most realistic way this leads to financial loss?

The common pathway is delegated authority: an agent can complete checkouts in a browser, submit expenses, or call billing APIs. Poisoned metadata can steer it into unauthorized purchases or changes to payout details. Another pathway is credential exfiltration through indirect prompt injection, which attackers can monetize outside the agent system.

Is there any complete defense against prompt injection for agents?

OpenAI and other security work suggest a complete fix is unlikely in the near term. Prompt injection is difficult because it exploits a model’s core function: following instructions in context. Practical defense relies on layers—permission gating, minimizing privileges, constraining tool selection, monitoring, and treating untrusted text as hostile input.

More in Technology

You Might Also Like