TheMurrow

Your ‘AI Detection’ Tool Can Be 100% Right—and Still Lie: The New Proof That Provenance and Watermarks Can Contradict Each Other

A 2026 paper shows a cryptographically valid C2PA manifest and a highly reliable AI watermark can both “pass” yet imply incompatible stories. The result: verification that’s precise—but publicly misleading.

By TheMurrow Editorial
March 12, 2026
Your ‘AI Detection’ Tool Can Be 100% Right—and Still Lie: The New Proof That Provenance and Watermarks Can Contradict Each Other

Key Points

  • 1Recognize that C2PA can validate a signed manifest’s integrity while pixels still carry an AI watermark—both “true,” yet jointly misleading.
  • 2Distinguish tool claims: detectors infer patterns, watermark scans detect a specific signal, and provenance verifies what was signed—not what was omitted.
  • 3Treat authenticity as reconciliation: surface conflicts, explain scope, preserve originals, and use human review when provenance and watermark signals disagree.

A photo goes viral, and within hours two “truth machines” pronounce judgment.

The first is a cryptographic provenance check. It verifies a C2PA Content Credentials manifest—signed, intact, and confidently asserting the image’s origin and edit history. The second is a watermark scan. It detects an invisible signal associated with an AI system and reports the watermark’s presence with near-perfect reliability.

Both systems can be right. And the combined message can still mislead the public.

The 2026 paper that names the problem: an “Integrity Clash”

A March 2, 2026 paper on arXiv—Authenticated Contradictions from Desynchronized Provenance and Watermarking—puts a name to the problem: an “Integrity Clash.” The authors demonstrate that under ordinary, non-exotic workflows, an asset can pass provenance validation while also carrying a watermark that flags it as AI-generated—without breaking cryptography, and without either detector “failing” at what it was designed to measure. The unsettling part is not that detection is hard; it’s that verification can be precise and still produce a lie-sized impression.

“A system can be ‘100% right’ about a narrow signal and still be wrong about the story people think it’s telling.”

— TheMurrow Editorial

The paradox: “100% accurate” and still misleading

The public conversation about authenticity often assumes a single question: Is it real? Engineers, by necessity, answer narrower ones. A detector might ask whether an image resembles patterns learned from AI-generated datasets. A watermark detector asks whether a specific embedded signal is present. A provenance validator asks whether a signed manifest was tampered with.

Each can be accurate on its own terms, and still fail to answer what most readers care about: Who made this, how, and can I trust it?

Three technologies, three different promises

Precision matters here, because the tools are often lumped together:

- Detector (classifier/heuristic): Outputs a label or probability like “AI-generated.” It is inherently probabilistic, with false positives and negatives.
- Watermark detector (signal detector): Tests for the presence of a specific watermark scheme. It can be extremely accurate for that scheme, but it does not automatically establish authorship, context, or intent.
- Provenance / Content Credentials (C2PA): Provides cryptographically signed metadata about origin and edits. It validates the integrity of the manifest—what was signed and whether it was altered—not the completeness or truthfulness of what was never asserted in the first place. (C2PA Specification v2.1)

Those differences are not academic. They set up the central tension described in the 2026 paper: provenance and watermarking can produce mutually contradictory yet independently “valid” signals.

Where “100%” fits—and where it doesn’t

The arXiv authors propose a “cross-layer audit protocol” that jointly evaluates provenance metadata and watermark status. They report “100% classification accuracy” across 3,500 test images spanning four conflict states and multiple perturbation conditions.

That number is real, but it needs careful framing. The “100%” refers to classifying conflict states in their testbed, not to solving deepfakes or detecting AI content broadly. A reader seeing “100% accurate” may infer a universal truth-meter. The work shows something more interesting: even perfect recognition of contradictions doesn’t make contradictions disappear.

“Accuracy isn’t the same as clarity. A perfect check can still produce an imperfect public meaning.”

— TheMurrow Editorial
100%
Reported classification accuracy for identifying conflict states in the authors’ testbed—not universal AI detection or deepfake prevention.
3,500
Test images evaluated in the paper’s cross-layer audit protocol experiments.
4
Conflict states used to describe how provenance and watermark signals can align—or clash—beyond a simple real/fake binary.

What Content Credentials (C2PA) actually guarantee

C2PA—often surfaced to users as Content Credentials—is designed to help a viewer answer a specific question: Did this asset come with signed provenance information, and has that information been altered? That’s an ambitious, valuable goal, especially in an ecosystem built on reposting, re-encoding, and context collapse.

C2PA works by attaching a cryptographically signed manifest describing claims about an asset: who asserted what, with which tools, and which actions were taken. If someone tampers with that manifest, verification fails. The system’s strength is integrity of the credential itself, not omniscience about the asset.

Integrity of the manifest vs completeness of the story

A crucial distinction from the specification: C2PA validates the integrity of what is present. It cannot force all relevant facts to be included. A manifest can be valid and still omit something a viewer would consider essential context—because omission is not the same as tampering.

The 2026 paper’s abstract leans on precisely that gap, describing a workflow that exploits “semantic omission of an assertion field” permitted by current C2PA specification behavior. No keys are stolen. No signatures are broken. The system does what it promises.

Stripping is not a bug; it’s in the threat model

C2PA documentation is candid about another limitation: absence of credentials doesn’t prove anything. A missing manifest may mean it was never attached, or it may mean it was removed.

C2PA’s own security considerations discuss stripping manifests and reposting as a realistic attack class. The ecosystem makes this easy:

- Screenshots can drop embedded data.
- Re-encoding can remove or invalidate metadata.
- Platforms that don’t support C2PA may silently discard it.

C2PA is not DRM. It is not designed to prevent removal. It is designed to provide trustworthy information when it survives the trip.

“Provenance can tell you whether a claim was altered. It can’t tell you whether a claim was never made.”

— TheMurrow Editorial

Watermarks: strong signals with narrow meaning

Invisible watermarking tries to solve a different problem: persistence. If metadata is easy to strip, embed a signal in the pixels or audio itself. A robust watermark, in theory, can survive common transformations and give investigators a handle when the original file and its metadata are gone.

A watermark detector can be highly accurate at its job: detecting that watermark. The leap from “watermark present” to “this was made by AI” is where public confusion begins.

Watermark presence is evidence, not a verdict

A watermark detector generally answers a binary question: is the signal present above threshold? It does not know:

- whether the watermark was added at creation time or later,
- whether the content was edited after generation,
- whether a human authored the underlying scene and an AI tool touched up a portion,
- whether the watermark was copied, preserved, or transferred during workflow steps.

None of those nuances are hypotheticals in modern production. Many images are composites. Many go through multiple tools. The watermark’s semantics depend on the scheme and the policy around it.

The industry’s “belt and suspenders” assumption

In public policy conversations, a common premise is redundancy: pair C2PA provenance (the belt) with watermarking (the suspenders). If one fails—stripped metadata, say—the other survives. If both exist, the content should be easier to trust.

The 2026 paper challenges that premise by showing the two systems can be independently consistent and jointly contradictory. In other words, redundancy doesn’t guarantee convergence. Sometimes it guarantees conflict.

Key Insight

Pairing provenance with watermarking can improve coverage—but it can also create “valid” contradictions unless systems and UIs treat disagreement as first-class information.

The 2026 “Integrity Clash”: a contradiction without a cryptographic break

The arXiv paper’s contribution is not a vague warning. It formalizes and demonstrates a scenario where an asset carries:

1) a cryptographically valid C2PA manifest asserting human authorship, and
2) pixel content containing an AI watermark identifying it as AI-generated,

…and both checks pass independently. The authors emphasize that the attack requires no compromise of cryptography.

That is the heart of the editorial problem: if two “verification” tools both return green lights—each on its own terms—who is lying? Sometimes, neither tool is lying. The combined story is.

How the workflow works (in broad strokes)

The paper describes using “metadata washing workflows” through standard editing pipelines. Editorially, “washing” is an evocative word, but the mechanics are mundane: assets move through tools that preserve some information, discard other information, and allow certain fields to be absent.

The authors’ key point is that the contradiction can arise from desynchronization: provenance says one thing because of what was asserted and signed; the pixels say another because of what they carry. The gap is created by workflow behavior and permitted omission, not by breaking signatures.

A statistic that matters: 3,500 images, four states

The paper reports 100% classification accuracy across 3,500 test images for identifying four conflict states under multiple perturbation conditions. Even without over-reading “100%,” two numbers stand out:

- 3,500 is large enough to suggest the phenomenon is systematic, not a one-off fluke.
- Four conflict states implies the world is not binary (real vs fake). There are multiple coherent ways for signals to align or clash.

The lesson is less “we can detect everything” and more “we must be explicit about what we are detecting.”

Durable credentials and “soft bindings”: when the fix creates new seams

C2PA’s ecosystem is not blind to stripping. One response is Durable Content Credentials, enabled through soft bindings—mechanisms that help recover or associate credentials even if metadata is removed.

The C2PA specification explicitly supports a soft binding assertion labeled `c2pa.soft-binding`, with algorithm types that can be “watermark” or “fingerprint.” The idea: embed a recoverable signal (or compute a perceptual fingerprint) and use it to look up the associated manifest in a repository.

That is sensible engineering. It is also where provenance and watermarking become tightly coupled—and where contradictions become more operationally dangerous.

Soft bindings are helpful—and explicitly imperfect

The C2PA guidance includes caution that soft bindings are not guaranteed to be exact. Collisions can happen. Adversarial attacks are possible. The guidance recommends human-in-the-loop interactive verification, such as showing thumbnails for visual confirmation, and stresses that soft bindings must not replace hard bindings for binding claims.

Those cautions deserve more public attention than they get. “Durable credentials” can sound like permanence. The spec language is more careful: durability is probabilistic and context-dependent.

A real-world case: the screenshot problem

Consider the most common misinformation workflow: someone screenshots an image (or screen-records a video) and reposts it. The screenshot often strips metadata, including C2PA manifests. A platform might then rely on watermarking or fingerprinting to reconnect the asset to its credentials.

If the pixels carry a watermark that suggests AI involvement, and the retrieved manifest asserts human authorship, the user experience can become a credibility trap. A viewer sees two official-seeming labels and is forced to choose based on priors. Bad actors thrive in that ambiguity.

Editor's Note

“Durable” does not mean permanent: C2PA soft bindings are probabilistic and the spec itself recommends human-in-the-loop verification in some cases.

What “trust” should mean: reconciliation, not just detection

The temptation—especially for platforms—is to treat authenticity as a checkbox. Validate provenance: check. Detect watermark: check. Label content accordingly. The 2026 paper argues, by construction, that checklists can produce contradictory outcomes that are still “valid.”

Better systems treat authenticity as a reconciliation problem: multiple signals, multiple layers, and a need to surface conflicts honestly.

The cross-layer audit idea—and why it’s promising

The paper’s proposed “cross-layer audit protocol” is compelling because it admits the messy truth: provenance and pixels are different layers. Auditing should compare them, not simply report them side by side.

Their reported performance—100% accuracy on 3,500 images for classifying conflict states in their experimental setting—suggests that at least in controlled conditions, systems can reliably detect when the signals disagree. That’s a more realistic goal than perfect content classification.

What platforms and newsrooms can do now

No single technical measure will restore trust, but practices can reduce confusion:

- Surface conflicts, don’t hide them. If provenance and watermark disagree, label it as a conflict state, not as certainty.
- Expose the scope of each signal. “Valid manifest” should be explained as “manifest integrity verified,” not “image is authentic.”
- Preserve originals in editorial workflows. Newsrooms should retain source files with intact manifests and document transformations.
- Adopt human-in-the-loop review where stakes are high. C2PA guidance already suggests interactive verification for soft bindings; organizations should take that seriously.
- Avoid turning absence into accusation. C2PA’s own model acknowledges stripping. “No credentials found” should not be treated as evidence of wrongdoing.

The broader implication is cultural as much as technical: audiences need transparency about what tools can and cannot conclude.

When signals disagree, what to do

  • Surface conflicts explicitly (don’t collapse them into one label)
  • Explain what “valid manifest” actually verifies
  • Preserve originals and document transformations
  • Use human review for high-stakes decisions
  • Treat “no credentials found” as unknown—not guilt

The reader’s dilemma: how to think clearly when signals disagree

Most people won’t run a C2PA validator or a watermark scan. They’ll see a platform label, a newsroom embed, or a screenshot of a verification panel. The 2026 “Integrity Clash” matters because it predicts a new genre of argument: weaponized validation.

Bad actors do not need to defeat cryptography to create confusion; they only need to create situations where credible systems disagree. A partisan account can point to the provenance badge. A skeptic can point to the watermark result. Both can claim “proof.”

A practical mental model for non-specialists

When you encounter authenticity claims—especially confident ones—ask three questions:

1) What exactly was checked? Pixels, metadata, or a model’s pattern match?
2) What does a “pass” guarantee? Integrity of a manifest is not the same as truthfulness about the scene.
3) What would make this check fail? Understanding failure modes (stripping, re-encoding, omission) is often more revealing than the success case.

The goal isn’t cynicism. The goal is to reserve the word “proof” for situations where the scope of a system matches the claim being made.

Multiple perspectives: why builders still back provenance and watermarks

It would be a mistake to read the “Integrity Clash” as an argument against provenance or watermarking. Both are useful. C2PA makes it harder to invisibly alter claims once asserted. Watermarks can make certain kinds of provenance more durable across reposts.

The critique is about over-promising. Pairing two imperfect systems can improve coverage, but it can also create contradictions unless the ecosystem has norms—and user interfaces—for handling disagreement. The paper’s value is forcing that conversation into the open.

Conclusion: The next authenticity battle is over meaning

The most sobering lesson of the 2026 “Integrity Clash” is not that our tools are weak. It’s that our tools are precise in ways that don’t line up with human questions.

C2PA Content Credentials can cryptographically verify that a manifest hasn’t been altered. Watermark detectors can be extremely accurate at detecting a watermark. The arXiv authors show that both can return “valid” while describing incompatible stories about the same asset—without any cryptographic break, and through ordinary workflows.

If the industry wants authenticity infrastructure that earns trust, it will need more than better detectors. It will need better vocabulary, better disclosure, and systems that treat conflict as first-class information.

The future won’t be “verified” or “unverified.” It will be a set of signals—sometimes aligned, sometimes desynchronized—and the real test will be whether platforms, publishers, and policymakers tell the public what those signals can actually say.
T
About the Author
TheMurrow Editorial is a writer for TheMurrow covering explainers.

Frequently Asked Questions

What does C2PA actually prove?

C2PA proves the integrity of a signed manifest: who asserted certain claims about an asset, and whether that manifest was altered after signing. A valid C2PA check does not guarantee the asset’s scene is true or that all relevant information was disclosed. It validates what was signed, not what might have been omitted.

If an image has no Content Credentials, is it suspicious?

Not necessarily. C2PA explicitly recognizes stripping and “availability attacks” as realistic: screenshots, re-encoding, and non-supporting platforms can remove or discard manifests. Absence could mean credentials were never attached, or that they were removed in transit. Treat “no credentials found” as “unknown,” not as guilt.

Does an AI watermark mean the entire image is AI-generated?

A watermark detector typically indicates the presence of a specific embedded signal, not a full provenance narrative. Depending on the workflow and tools, a watermark might survive edits, be present in only part of a composite, or reflect a tool’s involvement rather than full authorship. Watermark presence is evidence, but not a complete verdict by itself.

What is the 2026 “Integrity Clash” paper claiming?

The March 2, 2026 arXiv paper Authenticated Contradictions from Desynchronized Provenance and Watermarking demonstrates that an asset can carry a cryptographically valid C2PA manifest asserting human authorship while also containing an AI watermark in the pixels—with both checks passing independently and without breaking cryptography.

The paper mentions “100% accuracy.” Does that mean deepfakes are solved?

No. The reported 100% classification accuracy applies to the authors’ cross-layer audit protocol classifying four conflict states in a testbed of 3,500 images under multiple perturbations. It does not mean universal AI detection is perfect. The key point is identifying contradictory states reliably, not proving absolute authenticity.

What are “soft bindings” in C2PA?

Soft bindings (assertion label `c2pa.soft-binding`) are mechanisms to help reconnect an asset to its Content Credentials even if metadata is stripped. C2PA allows soft bindings based on watermarks or fingerprints. The specification cautions that soft bindings are not guaranteed to be exact and recommends human-in-the-loop verification in some cases.

More in Explainers

You Might Also Like