AI Code Isn’t “More Buggy”—It’s More Trusted. That’s Why 2026’s Next Mega‑Breach Will Start in Your Dependency Tree.
The breach risk isn’t “AI wrote bad code”—it’s that AI makes unreviewed change feel safe, especially when it quietly rewires your dependency tree under deadline pressure.

Key Points
- 1Recognize the real risk: AI accelerates unreviewed dependency changes, turning “fix the build” moments into high-trust supply-chain openings.
- 2Internalize the data: Veracode found risky security flaws in 45% of AI code tests—larger, newer models didn’t reliably reduce that rate.
- 3Govern the tree: default-deny new dependencies, require SBOMs and pinning, and constrain agents with allowlists, proxies, and mandatory review.
The next breach won’t start with a clever exploit
Picture a familiar scene: a build breaks minutes before a release. The AI assistant suggests a fix: “Install `fast-crypto-utils` and bump `parser-lib` to 2.3.1.” The tests pass. The pipeline goes green. Nobody asks where those packages came from, who maintains them, or whether they even existed last week.
The uncomfortable truth is that the industry’s rising exposure isn’t best explained by “AI writes buggier code.” The sharper problem is over-trust, paired with accelerated change. AI-assisted development makes it easier to accept code and—more crucially—dependencies that the team didn’t author, doesn’t fully understand, and doesn’t thoroughly review, especially under delivery pressure.
The risk isn’t that AI makes worse code. The risk is that it makes unreviewed change feel safe.
— — TheMurrow Editorial
Evidence is already piling up
The question for teams isn’t whether to use AI. Many already do. The question is whether they can close the trust gap before the dependency tree becomes the easiest way in.
The trust gap: why “looks right” beats “is secure”
Veracode’s 2025 findings puncture a common assumption: higher-quality code generation automatically implies safer code. It doesn’t. In Veracode’s testing across four major languages, AI-generated code introduced risky security flaws in 45% of tests, and model size or freshness didn’t reliably reduce that rate. The implication is straightforward: teams can improve productivity while still importing vulnerabilities at scale.
“Works” and “secure” part ways under pressure
Veracode’s reporting also flags language-specific variation—Java appears especially prone to poor security outcomes in the generated code they tested. That shouldn’t turn into language panic. It should sharpen a more basic lesson: security quality diverges from functional quality, and AI can widen that gap by making change frictionless.
Trust isn’t just a user failing; it’s a product decision
If the interface makes it easy to accept and apply changes—especially dependency changes—then the tool is shaping behavior. The outcome is predictable: more code lands with less scrutiny.
Speed doesn’t break security by itself. Unchecked speed does.
— — TheMurrow Editorial
The real attack surface: dependency trees, not bespoke code
Sonatype notes that open source downloads have reached trillions annually, and modern applications are heavily composed of open source components—often summarized as “up to 90%.” When the base rate of dependency usage is that high, even a modest increase in “unsafe selection” can have an outsized blast radius.
A small dependency choice can become a systemic risk
- One direct dependency can pull in dozens of transitive dependencies.
- Build tooling can fetch artifacts automatically across environments.
- A compromised package can affect every downstream consumer, fast.
AI-assisted development increases how frequently those trees change. Sonatype highlights “paper cut” errors that become dangerous at scale: selecting non-existent versions, choosing unsafe packages, or installing whatever satisfies a build constraint without checking provenance. None of those mistakes look dramatic in a pull request. In aggregate, they reshape your attack surface every day.
A concrete example: “fix the build” as an infection path
That workflow becomes especially risky when the assistant (or agent) is empowered to run commands. When the tool can fetch and install packages, the boundary between suggestion and execution dissolves. Sonatype warns that assistants can be prompted to fetch and install malicious code when asked to fix dependency errors or install missing libraries.
Malware beats CVEs: the package registry problem (2024–2026)
Recent reporting citing Sonatype research describes 16,279 malicious open source packages detected across ecosystems in a recent quarter, with 4,400+ designed to steal data—secrets, tokens, credentials, and personally identifiable information—in Q2 of the cited period. That’s not a “scan and patch” problem. It’s an adversary-in-the-market problem.
Typosquatting is tailor-made for AI-assisted mistakes
- Typosquatting (one character off)
- Look-alike naming (similar brand cues)
- Dependency confusion (public package overrides internal name)
- Credential stealing (tokens harvested during install or runtime)
These tricks work on humans because humans skim. They can work even better on AI agents if the agent’s objective is “make the build pass” rather than “obey an allowlist.” An agent can treat package selection like autocomplete.
Package registries are now contested territory. Your build system is negotiating with strangers.
— — TheMurrow Editorial
Vulnerability data is struggling to keep up
The result is a quiet shift toward heuristic trust: “The package is popular, so it’s fine.” “The assistant recommended it, so it’s fine.” Those are human coping mechanisms for an overwhelmed system. Attackers thrive in those gaps.
AI agents as “supply-chain attack multipliers”
Sonatype’s 2026 State of the Software Supply Chain explicitly frames AI agents as multipliers. The report argues that agents are being integrated into workflows faster than security models are updated. In experiments Sonatype describes, agents with older knowledge can install whatever dependency resolves a build error without checking provenance or policy.
The “real-time intelligence” problem
A model trained on yesterday’s world may recommend a package that used to be safe. Or it may invent one that sounds like it should exist—nudging the user toward a typosquat that does.
Multiple perspectives: autonomy helps, if it’s bounded
That’s plausible. The key word is constrained. Unbounded autonomy turns the build pipeline into a high-speed procurement system with weak identity checks. Bounded autonomy can turn it into a disciplined system that merely moves faster.
What the evidence says about AI-generated vulnerabilities
Another arXiv study on trust calibration during Copilot usage over ten days suggests trust is dynamic: people change how much they rely on the tool as they gain experience, and training matters. The practical lesson is neither “never trust AI” nor “trust it more.” It’s: teach verification habits, and design workflows where verification is the default.
The mismatch between helpfulness and assurance
That’s a governance choice. It’s also where a lot of security programs quietly fail: they treat friction as the enemy rather than the price of confidence.
How teams close the gap: practical guardrails that scale
Put provenance ahead of cleverness
- Default-deny new dependencies unless they come from approved sources.
- Require an SBOM (software bill of materials) and track transitive dependencies.
- Enforce pinning and avoid “floating” versions unless policy allows it.
- Use artifact signing/verification where available.
- Separate “suggest” from “apply”: AI can propose, but changes require review.
Sonatype’s warning about agents installing whatever resolves errors points to a specific fix: constrain resolution to an allowlist or internal proxy repository, where policy and intelligence can be applied.
Provenance-first dependency rules (starter set)
- ✓Default-deny new dependencies unless they come from approved sources
- ✓Require an SBOM and track transitive dependencies
- ✓Enforce pinning; avoid floating versions unless policy allows
- ✓Use artifact signing/verification where available
- ✓Separate “suggest” from “apply”: proposals are fine, changes require review
Treat dependency changes as security changes
Practical review heuristics (human-scale, not heroic):
- Does the package name exactly match the intended project?
- Is the version real and consistent with upstream release patterns?
- Does the dependency introduce install scripts or unusual permissions?
- Does it add network calls, crypto, or credential-handling code paths?
Human-scale dependency review heuristics
- ✓Confirm the package name exactly matches the intended project
- ✓Verify the version is real and matches upstream release patterns
- ✓Watch for install scripts or unusual permissions
- ✓Note added network calls, crypto usage, or credential-handling paths
Make scanning resilient to missing metadata
Teams can hedge by combining:
- Multiple vulnerability feeds (where available)
- Policy-based blocking for risky classes of components
- “Unknown severity” handling that triggers review rather than silence
Key Insight
A realistic way forward: calibrate trust, don’t outlaw AI
The evidence supports a balanced posture:
- Veracode shows security flaws in 45% of AI-generated code tests, and bigger models don’t magically fix that.
- Sonatype shows the dependency supply chain is a moving target: trillions of downloads, thousands of malicious packages, and real-time threats that outpace training data.
- Academic work suggests vulnerabilities can be measured in the wild, and trust calibration can be taught.
Security leaders should read that as a mandate to redesign workflows. If AI makes change cheaper, your process must make trust more expensive—at least where it counts: dependencies, secrets, and build steps.
The teams that thrive won’t be the ones that avoid AI. They’ll be the ones that treat AI-driven speed as a force to govern, not a gift to accept unexamined.
Bottom line
Frequently Asked Questions
Is AI-generated code inherently less secure than human-written code?
Veracode’s 2025 GenAI Code Security Report found risky security flaws in 45% of tests of AI-generated code across 100+ models and four major languages. That doesn’t prove humans always do better; it shows that functional correctness doesn’t guarantee security. The bigger risk is accepting output without rigorous review, testing, and security controls.
Why are dependencies a bigger risk than “bugs” in my own code?
Modern applications rely heavily on third-party components—Sonatype notes open source downloads reaching trillions annually, and many apps are composed largely of open source. A single dependency can bring dozens of transitive dependencies. Malicious packages, typosquats, and dependency confusion can compromise systems even if your own code is clean.
What makes AI agents more dangerous than AI chat assistants?
Agents can take actions: edit files, open PRs, run installs, and resolve dependency errors automatically. Sonatype’s 2026 report warns agents may install “whatever dependency resolves a build error” without checking provenance or policy. When suggestion becomes execution, the supply-chain risk rises sharply unless guardrails are enforced.
Can’t we just scan everything and rely on CVE severity scores?
Scanning helps, but the vulnerability data layer can be incomplete. Reporting citing Sonatype research says that of 1,552 vulnerabilities disclosed in 2025, 64% lacked NVD severity scores. If your prioritization depends on those scores, you may miss or delay response. Policies for “unknown severity” and multiple intelligence sources reduce that blind spot.
What’s the most practical first step to reduce AI-related supply-chain risk?
Start by controlling dependency introduction. Default-deny new dependencies unless they come from approved sources, and require review for dependency file changes. Sonatype’s research suggests agents will optimize for “make the build pass,” so put an allowlist or internal proxy in their path. Limit what tools can fetch automatically.
Are newer, larger models safer for secure coding?
Not necessarily. Veracode found that larger and newer models did not improve security in their testing. Model capability can improve helpfulness and fluency without improving secure defaults. Teams should assume any model can produce insecure code and enforce security review, testing, and dependency governance regardless of model brand or size.















