Google Saw Prompt-Injection Attacks Jump 32% in 3 Months—Here’s the Part Everyone Gets Wrong About ‘AI Agents’ (It’s Not the Model)
The 32% figure isn’t a breach count—it’s a signal that more malicious instructions are being planted in places machines read. As agents browse and take actions, the real risk is the trust boundary, not the model.

Key Points
- 1Reframe the 32%: Google measured malicious prompt-injection detections in Common Crawl archives—not successful breaches, success rates, or live enterprise incidents.
- 2Recognize the real agent risk: indirect prompt injection hides in pages, docs, emails, and invites—then manipulates tool-using systems into unsafe actions.
- 3Fix the architecture, not the hype: enforce trust boundaries, scope permissions, separate content from control, and design for containment when manipulation slips through.
Google’s newest prompt-injection number is the kind that can make even seasoned security teams sit up straighter: a 32% relative increase in malicious prompt-injection detections over just three months, from November 2025 to February 2026.
It’s also the kind of number that gets misread in ways that help nobody. Google did not claim a 32% surge in successful enterprise breaches, nor did it say attackers suddenly cracked some universal “LLM hack.” The measurement comes from repeated scans of multiple versions of the Common Crawl web archive, a large snapshot of public web pages—not the live web, not private corporate systems, and not a major slice of the modern internet’s distribution engine: big social media platforms, which Google says the dataset doesn’t capture.
Still, the signal matters. More malicious prompt-injection content is showing up in public web data, and the timing is not accidental. As AI products shift from chatbots to agents—systems that browse, call tools, and take actions—the payoff for embedding instructions in “ordinary” content rises sharply.
The 32% is not a breach statistic. It’s a trendline: more malicious instructions are being planted where machines will read them.
— — TheMurrow Editorial
What Google actually measured—and what the “32%” doesn’t mean
That context is the difference between useful intelligence and security theater. Common Crawl is not the same thing as “everything online,” and it is especially not the same thing as “what’s currently attacking your enterprise agents.”
Four common misinterpretations worth retiring
- Not a measure of real-world incident growth across deployed agents. Google is not saying successful prompt-injection incidents rose 32% across its products or anyone else’s.
- Not a success-rate metric. The figure reflects detections in a labeled category, not exploit effectiveness.
- Not a full picture of distribution channels. Google explicitly notes the dataset does not capture major social media sites, despite their central role in spreading malicious content.
- Not evidence of widespread attacker sophistication—yet. Google’s qualitative read: much of what it observed looked low sophistication, often “experiments or pranks,” and it did not see “significant amounts of advanced attacks” in that slice of data.
Those limitations do not make the stat meaningless. They make it specific: the public web contains more content designed to manipulate AI systems than it did a few months earlier, at least as detected by Google’s scans of web archives.
Treat the 32% as direction, not magnitude: a rising tide of malicious text in places machines increasingly read.
— — TheMurrow Editorial
Prompt injection, direct and indirect: the threat that hides in plain sight
The reason this issue keeps recurring is that prompt injection isn’t just one trick. It’s a family of failure modes that all exploit the same weakness: systems that cannot reliably distinguish between “instructions” and “content.”
Direct prompt injection: the obvious version
Indirect prompt injection: the version that scales
This matters because indirect injection turns ordinary information channels into delivery mechanisms. A web page can carry hidden strings. A document can contain “helpful” instructions that are anything but. A calendar invite can be weaponized if an agent reads it and acts.
Google has emphasized indirect prompt injection as a major concern for “complex AI applications with multiple data sources,” because every new data source is another doorway.
Indirect injection is the kind of attack you don’t “click.” Your agent reads it for you.
— — TheMurrow Editorial
Why AI agents raise the stakes: from bad answers to bad actions
Google’s framing is blunt: indirect prompt injection becomes especially relevant when LLMs sit inside systems with multiple data sources and can take actions. That matches the broader industry worry: the more “helpful” an agent is, the more dangerous it becomes to treat untrusted content as guidance.
What changes when tools enter the loop
- persuade the agent to send an email it shouldn’t send
- coax it into changing a file or overwriting information
- nudge it to exfiltrate data by summarizing or copying sensitive content into an external channel
- trigger a workflow that causes downstream harm
Those examples are not speculative fantasies; they are the natural consequence of coupling an LLM’s “follow instructions” behavior to real capabilities. OWASP’s risk description explicitly points to outcomes such as unauthorized function use and connected-system command execution.
A practical, real-world example: the “summarize what you found” trap
Even if the model “knows” it shouldn’t leak secrets, indirect injection is designed to confuse the system’s priorities—especially in architectures that indiscriminately mix retrieved content with instructions. The harm is not that the agent saw a web page. The harm is that the system treated the page like a colleague.
The failure isn’t “the model.” It’s the trust boundary.
The UK’s National Cyber Security Centre (NCSC) argues against treating prompt injection like a familiar vulnerability class such as SQL injection. LLM systems often lack an enforceable separation between data and instructions; the NCSC calls them “inherently confusable.” That phrase lands because it describes what engineers see in practice: the system can’t always tell what it’s supposed to obey.
Why “filtering” isn’t a strategy
From a defensive standpoint, the implication is uncomfortable but clarifying: assume manipulation will sometimes get through, and design the system so that manipulation has limited impact.
The security question to ask before any agent ships
- What can the agent do if it becomes confused?
Security teams understand this pattern from other domains. Compromises happen; blast radius is the variable. In agent design, blast radius is set by permissions, data access, tool constraints, and the discipline of separating untrusted content from decision-making.
Key Insight
Interpreting the trendline: low sophistication, rising volume, growing incentives
That sounds reassuring until you map it onto incentives. Low sophistication plus rising volume is often the early phase of an attack lifecycle—when attackers probe what works, seed content broadly, and wait for the ecosystem to make the attack valuable.
Four key numbers—and what they actually tell you
1. 32% relative increase in the malicious category of prompt-injection detections (Nov 2025 → Feb 2026) in Google’s Common Crawl scans.
Signal: more malicious content appears in public web archives over a short window.
2. Three-month measurement window (November 2025 to February 2026).
Signal: the uptick is recent, not an accumulation over years.
3. Multiple versions of a Common Crawl web archive were scanned.
Signal: Google used repeated snapshots, suggesting it tracked change over time rather than a one-off pass.
4. Major social media sites are not captured in the dataset, per Google’s note.
Signal: the measurement likely undercounts key distribution channels.
Those facts point to the same sober conclusion: the web’s content layer is becoming more adversarial for systems that read it automatically, and the measurement likely captures only part of the real exposure.
Attack maturity can be low while risk climbs—because the systems being targeted are getting more capable.
— — TheMurrow Editorial
Practical takeaways for teams building or buying agents
What to do differently when your AI browses, reads, or connects to tools
- Treat external content as hostile by default. Web pages, documents, emails, and invites are not “context.” They are inputs from unknown parties.
- Separate content from control. Architectures that blend retrieved text into the same channel as system instructions invite confusion. Even strong models can be manipulated when the system collapses trust boundaries.
- Reduce the agent’s authority. The more permissions and tools an agent has, the more valuable injection becomes.
- Plan for partial failure. Since OpenAI notes injections can look like social engineering, prevention alone will miss cases. Design for containment.
Agent hardening checklist (from the article’s implications)
- ✓Treat external content as hostile by default
- ✓Separate content from control (don’t mix retrieved text with privileged instructions)
- ✓Reduce the agent’s authority (scope permissions and tool access)
- ✓Plan for partial failure (assume some manipulation will slip through)
A case-study pattern worth recognizing: the “helpful document” problem
The lesson is not to stop using AI on documents. It’s to stop assuming provenance equals safety. Documents are how modern work moves—and therefore how modern manipulation moves.
What readers should demand from vendors (and from their own org)
So what should a smart buyer—or an internal champion—ask for?
Questions that reveal whether a product understands the problem
- What actions can the agent take, and under what constraints? Tool access should be explicit and narrowly scoped.
- What happens when the agent encounters conflicting instructions? “Follow user intent” is not a mechanism.
- What monitoring exists for injection-like patterns? Google’s work centers on detection at scale; production systems need their own telemetry.
- How is the system evaluated against indirect prompt injection? Direct injection tests are table stakes; IPI is where real deployments get hurt.
Editor’s Note
A fair counterpoint: don’t let fear freeze deployment
Security has navigated similar transitions before: browsers, email, mobile apps, cloud services. Each shift required new defaults, not a retreat from the technology.
The meaning of Google’s 32%: a warning about the content layer
That is the world agents are being built to inhabit. They read for us. They summarize for us. They act for us. As their autonomy expands, the value of corrupting what they read rises—and the cost of confusing data with instructions rises with it.
Google’s report also offers a quiet note of optimism: much of what it saw looked unsophisticated. Defenders still have time to set better defaults, harden architectures, and insist on trust boundaries that match reality.
The next phase will not be won by the team with the best “anti-prompt-injection filter.” It will be won by the teams who assume the web is adversarial—and build agents that stay useful even when they are being lied to.
Frequently Asked Questions
Did Google prove prompt injection attacks increased 32% everywhere?
No. Google reported a 32% relative increase in the malicious category of prompt-injection detections in scans of Common Crawl web archives between November 2025 and February 2026. That is a specific measurement of detected content in a particular dataset. It is not a claim about successful enterprise incidents across products or the live internet.
What is Common Crawl, and why does it matter?
Common Crawl is a large archive of public web pages used for research and analysis. Google’s scans used multiple versions of this archive, which helps show change over time. The dataset matters because it reflects what’s being published to the public web—one of the primary sources AI systems browse and ingest—though it is not the entire internet.
Why doesn’t the dataset capture the full threat?
Google notes its Common Crawl dataset does not capture major social media sites, which are key distribution channels for malicious links and content. It also doesn’t represent private SaaS platforms or internal enterprise documents. So the measurement is best treated as a partial signal—useful for trend direction, incomplete for total exposure.
What’s the difference between direct and indirect prompt injection?
Direct prompt injection is when a user tries to manipulate a model via the immediate chat input. Indirect prompt injection (as Google describes it) is when malicious instructions are embedded in external sources—web pages, emails, documents, calendar invites—that a model reads while completing a user’s request. Indirect attacks can be harder to spot because the user may never see the injected text.
Why are AI agents more vulnerable than chatbots?
Agents often browse, call tools, and take actions in other systems. OWASP notes prompt injection can lead to outcomes like data disclosure and unauthorized function use when models connect to tools. In a chatbot, manipulation might produce wrong text. In an agent, manipulation can attempt real actions—sending messages, modifying files, or triggering workflows—depending on permissions.
Can prompt injection be “fixed” like SQL injection?
The UK NCSC argues that treating prompt injection like a classic vulnerability category can be misleading because LLMs may lack a reliable separation between data and instructions, making them “inherently confusable.” That doesn’t mean defenses are impossible; it means mitigation needs system-level controls—permissions, isolation, and constraints—not only text filtering.















