TheMurrow

AI Code Isn’t “More Buggy”—It’s More Trusted. That’s Why 2026’s Next Mega‑Breach Will Start in Your Dependency Tree.

The breach risk isn’t “AI wrote bad code”—it’s that AI makes unreviewed change feel safe, especially when it quietly rewires your dependency tree under deadline pressure.

By TheMurrow Editorial
March 10, 2026
AI Code Isn’t “More Buggy”—It’s More Trusted. That’s Why 2026’s Next Mega‑Breach Will Start in Your Dependency Tree.

Key Points

  • 1Recognize the real risk: AI accelerates unreviewed dependency changes, turning “fix the build” moments into high-trust supply-chain openings.
  • 2Internalize the data: Veracode found risky security flaws in 45% of AI code tests—larger, newer models didn’t reliably reduce that rate.
  • 3Govern the tree: default-deny new dependencies, require SBOMs and pinning, and constrain agents with allowlists, proxies, and mandatory review.

The next breach won’t start with a clever exploit

The next major software breach may not begin with a clever exploit or an overlooked buffer overflow. It may begin with a developer—or an automated agent—trying to be helpful.

Picture a familiar scene: a build breaks minutes before a release. The AI assistant suggests a fix: “Install `fast-crypto-utils` and bump `parser-lib` to 2.3.1.” The tests pass. The pipeline goes green. Nobody asks where those packages came from, who maintains them, or whether they even existed last week.

The uncomfortable truth is that the industry’s rising exposure isn’t best explained by “AI writes buggier code.” The sharper problem is over-trust, paired with accelerated change. AI-assisted development makes it easier to accept code and—more crucially—dependencies that the team didn’t author, doesn’t fully understand, and doesn’t thoroughly review, especially under delivery pressure.

The risk isn’t that AI makes worse code. The risk is that it makes unreviewed change feel safe.

— TheMurrow Editorial

Evidence is already piling up

Evidence is already piling up. Veracode’s 2025 GenAI Code Security Report tested code generated by 100+ large language models across Java, JavaScript, Python, and C# and found “risky security flaws” in 45% of tests—with larger and newer models not improving security. In parallel, Sonatype’s supply-chain research warns that AI agents can become attack multipliers, “happily” installing whatever resolves an error unless guardrails force them to care about provenance and policy.

The question for teams isn’t whether to use AI. Many already do. The question is whether they can close the trust gap before the dependency tree becomes the easiest way in.
45%
Veracode’s 2025 testing found “risky security flaws” in 45% of AI-generated code tests across 100+ models and four languages.

The trust gap: why “looks right” beats “is secure”

AI assistants tend to produce plausible output with an air of authority. That’s useful for speed, and dangerous for security. In practice, the human brain treats a confident, syntactically correct patch as “probably fine”—especially when the patch unblocks work.

Veracode’s 2025 findings puncture a common assumption: higher-quality code generation automatically implies safer code. It doesn’t. In Veracode’s testing across four major languages, AI-generated code introduced risky security flaws in 45% of tests, and model size or freshness didn’t reliably reduce that rate. The implication is straightforward: teams can improve productivity while still importing vulnerabilities at scale.

“Works” and “secure” part ways under pressure

Security issues rarely announce themselves as errors. Many of the most damaging mistakes compile cleanly, ship quietly, and fail only when an attacker arrives. That’s why delivery pressure changes the calculus: when success is measured by passing tests and hitting deadlines, security becomes the thing people assume someone else is checking.

Veracode’s reporting also flags language-specific variation—Java appears especially prone to poor security outcomes in the generated code they tested. That shouldn’t turn into language panic. It should sharpen a more basic lesson: security quality diverges from functional quality, and AI can widen that gap by making change frictionless.

Trust isn’t just a user failing; it’s a product decision

Sonatype’s 2026 supply-chain report draws attention to a subtle design problem: AI tools can sound authoritative even when wrong. Sonatype even quantifies hallucination behavior by confidence level, underscoring that “confidence” signals are not a safety mechanism.

If the interface makes it easy to accept and apply changes—especially dependency changes—then the tool is shaping behavior. The outcome is predictable: more code lands with less scrutiny.

Speed doesn’t break security by itself. Unchecked speed does.

— TheMurrow Editorial

The real attack surface: dependency trees, not bespoke code

Modern software isn’t a monolith written by one team. It’s an assembly of packages, frameworks, build tools, container layers, and transitive dependencies that most developers never read. That’s not an indictment of open source; it’s the reality of software economics.

Sonatype notes that open source downloads have reached trillions annually, and modern applications are heavily composed of open source components—often summarized as “up to 90%.” When the base rate of dependency usage is that high, even a modest increase in “unsafe selection” can have an outsized blast radius.
Up to 90%
Sonatype notes modern applications are often summarized as being composed of “up to 90%” open source components.

A small dependency choice can become a systemic risk

The dependency tree creates multiplicative exposure:

- One direct dependency can pull in dozens of transitive dependencies.
- Build tooling can fetch artifacts automatically across environments.
- A compromised package can affect every downstream consumer, fast.

AI-assisted development increases how frequently those trees change. Sonatype highlights “paper cut” errors that become dangerous at scale: selecting non-existent versions, choosing unsafe packages, or installing whatever satisfies a build constraint without checking provenance. None of those mistakes look dramatic in a pull request. In aggregate, they reshape your attack surface every day.

A concrete example: “fix the build” as an infection path

Consider the most mundane ticket in engineering: a broken dependency after a minor update. An AI assistant can propose a replacement package name that looks right—or a version number that “should exist.” The developer accepts it because the immediate goal is restoration, not provenance.

That workflow becomes especially risky when the assistant (or agent) is empowered to run commands. When the tool can fetch and install packages, the boundary between suggestion and execution dissolves. Sonatype warns that assistants can be prompted to fetch and install malicious code when asked to fix dependency errors or install missing libraries.

Malware beats CVEs: the package registry problem (2024–2026)

Security teams built many of their processes around vulnerabilities: find CVEs, assess severity, patch. That model still matters. It’s also no longer the whole story.

Recent reporting citing Sonatype research describes 16,279 malicious open source packages detected across ecosystems in a recent quarter, with 4,400+ designed to steal data—secrets, tokens, credentials, and personally identifiable information—in Q2 of the cited period. That’s not a “scan and patch” problem. It’s an adversary-in-the-market problem.
16,279
Recent reporting citing Sonatype research describes 16,279 malicious open source packages detected across ecosystems in a recent quarter.
4,400+
In that same cited period, 4,400+ malicious packages were designed to steal data—secrets, tokens, credentials, and PII—in Q2.

Typosquatting is tailor-made for AI-assisted mistakes

Malicious package campaigns frequently rely on:

- Typosquatting (one character off)
- Look-alike naming (similar brand cues)
- Dependency confusion (public package overrides internal name)
- Credential stealing (tokens harvested during install or runtime)

These tricks work on humans because humans skim. They can work even better on AI agents if the agent’s objective is “make the build pass” rather than “obey an allowlist.” An agent can treat package selection like autocomplete.

Package registries are now contested territory. Your build system is negotiating with strangers.

— TheMurrow Editorial

Vulnerability data is struggling to keep up

Even when teams try to rely on standard metadata, the data layer is strained. Coverage of Sonatype research notes that of 1,552 open source vulnerabilities disclosed in 2025, 64% lacked severity scores in the NVD, according to the outlet’s reporting. That means automation can be blind or delayed: scanners may detect an issue but can’t quickly prioritize it using the usual severity signals.

The result is a quiet shift toward heuristic trust: “The package is popular, so it’s fine.” “The assistant recommended it, so it’s fine.” Those are human coping mechanisms for an overwhelmed system. Attackers thrive in those gaps.
64%
Reporting citing Sonatype research says 64% of 1,552 open source vulnerabilities disclosed in 2025 lacked NVD severity scores.

AI agents as “supply-chain attack multipliers”

The industry is moving from AI as a chatty assistant to AI as an actor: tools that open pull requests, resolve dependency errors, edit configuration files, and run package installs. The promise is fewer interruptions and faster delivery. The security tradeoff is new forms of automated credulity.

Sonatype’s 2026 State of the Software Supply Chain explicitly frames AI agents as multipliers. The report argues that agents are being integrated into workflows faster than security models are updated. In experiments Sonatype describes, agents with older knowledge can install whatever dependency resolves a build error without checking provenance or policy.

The “real-time intelligence” problem

Sonatype’s report puts it bluntly: “AI cannot detect threats that happened after it was trained. AI needs real-time intelligence.” That matters because package registries change daily. New malicious packages appear; old packages get compromised; maintainers change; ownership transfers.

A model trained on yesterday’s world may recommend a package that used to be safe. Or it may invent one that sounds like it should exist—nudging the user toward a typosquat that does.

Multiple perspectives: autonomy helps, if it’s bounded

It’s worth acknowledging the counterargument: automation can reduce human error. If an agent is constrained by strong policy—allowlists, signed artifacts, mandatory review—it could be safer than a rushed human.

That’s plausible. The key word is constrained. Unbounded autonomy turns the build pipeline into a high-speed procurement system with weak identity checks. Bounded autonomy can turn it into a disciplined system that merely moves faster.

What the evidence says about AI-generated vulnerabilities

Beyond industry testing, academic work is beginning to quantify patterns in the wild. A 2025 arXiv study analyzing AI-generated code vulnerabilities in public GitHub repositories reports identifying 4,241 CWE instances across 77 vulnerability types, using CodeQL-based analysis. That’s not a definitive measure of causality, and it isn’t a census of all AI code. It’s a signal that the “security debt” can be measurable and diverse.

Another arXiv study on trust calibration during Copilot usage over ten days suggests trust is dynamic: people change how much they rely on the tool as they gain experience, and training matters. The practical lesson is neither “never trust AI” nor “trust it more.” It’s: teach verification habits, and design workflows where verification is the default.

The mismatch between helpfulness and assurance

AI tools are optimized to help you ship. Security tools are optimized to slow down unsafe change. When one tool suggests and another tool blocks, many teams tune the blocker.

That’s a governance choice. It’s also where a lot of security programs quietly fail: they treat friction as the enemy rather than the price of confidence.

How teams close the gap: practical guardrails that scale

Security advice often collapses into vague slogans. Teams need operational moves that fit the reality of AI-assisted speed. The goal is not perfect safety. The goal is preventing “one-click” adoption of unknown code and unknown dependencies.

Put provenance ahead of cleverness

Dependency governance can sound bureaucratic until you compare it with incident response. Start with rules that are simple enough to follow:

- Default-deny new dependencies unless they come from approved sources.
- Require an SBOM (software bill of materials) and track transitive dependencies.
- Enforce pinning and avoid “floating” versions unless policy allows it.
- Use artifact signing/verification where available.
- Separate “suggest” from “apply”: AI can propose, but changes require review.

Sonatype’s warning about agents installing whatever resolves errors points to a specific fix: constrain resolution to an allowlist or internal proxy repository, where policy and intelligence can be applied.

Provenance-first dependency rules (starter set)

  • Default-deny new dependencies unless they come from approved sources
  • Require an SBOM and track transitive dependencies
  • Enforce pinning; avoid floating versions unless policy allows
  • Use artifact signing/verification where available
  • Separate “suggest” from “apply”: proposals are fine, changes require review

Treat dependency changes as security changes

A one-line change in `package.json` or `pom.xml` can be riskier than a hundred lines of application logic. Many teams still review code carefully and review dependencies casually. Flip that.

Practical review heuristics (human-scale, not heroic):

- Does the package name exactly match the intended project?
- Is the version real and consistent with upstream release patterns?
- Does the dependency introduce install scripts or unusual permissions?
- Does it add network calls, crypto, or credential-handling code paths?

Human-scale dependency review heuristics

  • Confirm the package name exactly matches the intended project
  • Verify the version is real and matches upstream release patterns
  • Watch for install scripts or unusual permissions
  • Note added network calls, crypto usage, or credential-handling paths

Make scanning resilient to missing metadata

The report that 64% of 2025 vulnerabilities lacked NVD severity scores should change how leaders think about automation. If prioritization depends entirely on NVD scoring, you have a single point of failure.

Teams can hedge by combining:

- Multiple vulnerability feeds (where available)
- Policy-based blocking for risky classes of components
- “Unknown severity” handling that triggers review rather than silence

Key Insight

If your vulnerability workflow can’t function when severity scores are missing, you don’t have automation—you have a brittle dependency on a single data source.

A realistic way forward: calibrate trust, don’t outlaw AI

AI-assisted development is already woven into many teams’ daily work. Bans tend to drive usage underground. The better approach is to treat AI as a powerful junior collaborator: fast, confident, and in need of supervision.

The evidence supports a balanced posture:

- Veracode shows security flaws in 45% of AI-generated code tests, and bigger models don’t magically fix that.
- Sonatype shows the dependency supply chain is a moving target: trillions of downloads, thousands of malicious packages, and real-time threats that outpace training data.
- Academic work suggests vulnerabilities can be measured in the wild, and trust calibration can be taught.

Security leaders should read that as a mandate to redesign workflows. If AI makes change cheaper, your process must make trust more expensive—at least where it counts: dependencies, secrets, and build steps.

The teams that thrive won’t be the ones that avoid AI. They’ll be the ones that treat AI-driven speed as a force to govern, not a gift to accept unexamined.

Bottom line

If AI makes change cheaper, your process must make trust more expensive—especially for dependencies, secrets, and build steps.
T
About the Author
TheMurrow Editorial is a writer for TheMurrow covering technology.

Frequently Asked Questions

Is AI-generated code inherently less secure than human-written code?

Veracode’s 2025 GenAI Code Security Report found risky security flaws in 45% of tests of AI-generated code across 100+ models and four major languages. That doesn’t prove humans always do better; it shows that functional correctness doesn’t guarantee security. The bigger risk is accepting output without rigorous review, testing, and security controls.

Why are dependencies a bigger risk than “bugs” in my own code?

Modern applications rely heavily on third-party components—Sonatype notes open source downloads reaching trillions annually, and many apps are composed largely of open source. A single dependency can bring dozens of transitive dependencies. Malicious packages, typosquats, and dependency confusion can compromise systems even if your own code is clean.

What makes AI agents more dangerous than AI chat assistants?

Agents can take actions: edit files, open PRs, run installs, and resolve dependency errors automatically. Sonatype’s 2026 report warns agents may install “whatever dependency resolves a build error” without checking provenance or policy. When suggestion becomes execution, the supply-chain risk rises sharply unless guardrails are enforced.

Can’t we just scan everything and rely on CVE severity scores?

Scanning helps, but the vulnerability data layer can be incomplete. Reporting citing Sonatype research says that of 1,552 vulnerabilities disclosed in 2025, 64% lacked NVD severity scores. If your prioritization depends on those scores, you may miss or delay response. Policies for “unknown severity” and multiple intelligence sources reduce that blind spot.

What’s the most practical first step to reduce AI-related supply-chain risk?

Start by controlling dependency introduction. Default-deny new dependencies unless they come from approved sources, and require review for dependency file changes. Sonatype’s research suggests agents will optimize for “make the build pass,” so put an allowlist or internal proxy in their path. Limit what tools can fetch automatically.

Are newer, larger models safer for secure coding?

Not necessarily. Veracode found that larger and newer models did not improve security in their testing. Model capability can improve helpfulness and fluency without improving secure defaults. Teams should assume any model can produce insecure code and enforce security review, testing, and dependency governance regardless of model brand or size.

More in Technology

You Might Also Like