Your ‘AI Detection’ Tool Can Be 100% Right—and Still Lie: The New Proof That Provenance and Watermarks Can Contradict Each Other
A 2026 paper shows a cryptographically valid C2PA manifest and a highly reliable AI watermark can both “pass” yet imply incompatible stories. The result: verification that’s precise—but publicly misleading.

Key Points
- 1Recognize that C2PA can validate a signed manifest’s integrity while pixels still carry an AI watermark—both “true,” yet jointly misleading.
- 2Distinguish tool claims: detectors infer patterns, watermark scans detect a specific signal, and provenance verifies what was signed—not what was omitted.
- 3Treat authenticity as reconciliation: surface conflicts, explain scope, preserve originals, and use human review when provenance and watermark signals disagree.
A photo goes viral, and within hours two “truth machines” pronounce judgment.
The first is a cryptographic provenance check. It verifies a C2PA Content Credentials manifest—signed, intact, and confidently asserting the image’s origin and edit history. The second is a watermark scan. It detects an invisible signal associated with an AI system and reports the watermark’s presence with near-perfect reliability.
Both systems can be right. And the combined message can still mislead the public.
The 2026 paper that names the problem: an “Integrity Clash”
“A system can be ‘100% right’ about a narrow signal and still be wrong about the story people think it’s telling.”
— — TheMurrow Editorial
The paradox: “100% accurate” and still misleading
Each can be accurate on its own terms, and still fail to answer what most readers care about: Who made this, how, and can I trust it?
Three technologies, three different promises
- Detector (classifier/heuristic): Outputs a label or probability like “AI-generated.” It is inherently probabilistic, with false positives and negatives.
- Watermark detector (signal detector): Tests for the presence of a specific watermark scheme. It can be extremely accurate for that scheme, but it does not automatically establish authorship, context, or intent.
- Provenance / Content Credentials (C2PA): Provides cryptographically signed metadata about origin and edits. It validates the integrity of the manifest—what was signed and whether it was altered—not the completeness or truthfulness of what was never asserted in the first place. (C2PA Specification v2.1)
Those differences are not academic. They set up the central tension described in the 2026 paper: provenance and watermarking can produce mutually contradictory yet independently “valid” signals.
Where “100%” fits—and where it doesn’t
That number is real, but it needs careful framing. The “100%” refers to classifying conflict states in their testbed, not to solving deepfakes or detecting AI content broadly. A reader seeing “100% accurate” may infer a universal truth-meter. The work shows something more interesting: even perfect recognition of contradictions doesn’t make contradictions disappear.
“Accuracy isn’t the same as clarity. A perfect check can still produce an imperfect public meaning.”
— — TheMurrow Editorial
What Content Credentials (C2PA) actually guarantee
C2PA works by attaching a cryptographically signed manifest describing claims about an asset: who asserted what, with which tools, and which actions were taken. If someone tampers with that manifest, verification fails. The system’s strength is integrity of the credential itself, not omniscience about the asset.
Integrity of the manifest vs completeness of the story
The 2026 paper’s abstract leans on precisely that gap, describing a workflow that exploits “semantic omission of an assertion field” permitted by current C2PA specification behavior. No keys are stolen. No signatures are broken. The system does what it promises.
Stripping is not a bug; it’s in the threat model
C2PA’s own security considerations discuss stripping manifests and reposting as a realistic attack class. The ecosystem makes this easy:
- Screenshots can drop embedded data.
- Re-encoding can remove or invalidate metadata.
- Platforms that don’t support C2PA may silently discard it.
C2PA is not DRM. It is not designed to prevent removal. It is designed to provide trustworthy information when it survives the trip.
“Provenance can tell you whether a claim was altered. It can’t tell you whether a claim was never made.”
— — TheMurrow Editorial
Watermarks: strong signals with narrow meaning
A watermark detector can be highly accurate at its job: detecting that watermark. The leap from “watermark present” to “this was made by AI” is where public confusion begins.
Watermark presence is evidence, not a verdict
- whether the watermark was added at creation time or later,
- whether the content was edited after generation,
- whether a human authored the underlying scene and an AI tool touched up a portion,
- whether the watermark was copied, preserved, or transferred during workflow steps.
None of those nuances are hypotheticals in modern production. Many images are composites. Many go through multiple tools. The watermark’s semantics depend on the scheme and the policy around it.
The industry’s “belt and suspenders” assumption
The 2026 paper challenges that premise by showing the two systems can be independently consistent and jointly contradictory. In other words, redundancy doesn’t guarantee convergence. Sometimes it guarantees conflict.
Key Insight
The 2026 “Integrity Clash”: a contradiction without a cryptographic break
1) a cryptographically valid C2PA manifest asserting human authorship, and
2) pixel content containing an AI watermark identifying it as AI-generated,
…and both checks pass independently. The authors emphasize that the attack requires no compromise of cryptography.
That is the heart of the editorial problem: if two “verification” tools both return green lights—each on its own terms—who is lying? Sometimes, neither tool is lying. The combined story is.
How the workflow works (in broad strokes)
The authors’ key point is that the contradiction can arise from desynchronization: provenance says one thing because of what was asserted and signed; the pixels say another because of what they carry. The gap is created by workflow behavior and permitted omission, not by breaking signatures.
A statistic that matters: 3,500 images, four states
- 3,500 is large enough to suggest the phenomenon is systematic, not a one-off fluke.
- Four conflict states implies the world is not binary (real vs fake). There are multiple coherent ways for signals to align or clash.
The lesson is less “we can detect everything” and more “we must be explicit about what we are detecting.”
Durable credentials and “soft bindings”: when the fix creates new seams
The C2PA specification explicitly supports a soft binding assertion labeled `c2pa.soft-binding`, with algorithm types that can be “watermark” or “fingerprint.” The idea: embed a recoverable signal (or compute a perceptual fingerprint) and use it to look up the associated manifest in a repository.
That is sensible engineering. It is also where provenance and watermarking become tightly coupled—and where contradictions become more operationally dangerous.
Soft bindings are helpful—and explicitly imperfect
Those cautions deserve more public attention than they get. “Durable credentials” can sound like permanence. The spec language is more careful: durability is probabilistic and context-dependent.
A real-world case: the screenshot problem
If the pixels carry a watermark that suggests AI involvement, and the retrieved manifest asserts human authorship, the user experience can become a credibility trap. A viewer sees two official-seeming labels and is forced to choose based on priors. Bad actors thrive in that ambiguity.
Editor's Note
What “trust” should mean: reconciliation, not just detection
Better systems treat authenticity as a reconciliation problem: multiple signals, multiple layers, and a need to surface conflicts honestly.
The cross-layer audit idea—and why it’s promising
Their reported performance—100% accuracy on 3,500 images for classifying conflict states in their experimental setting—suggests that at least in controlled conditions, systems can reliably detect when the signals disagree. That’s a more realistic goal than perfect content classification.
What platforms and newsrooms can do now
- Surface conflicts, don’t hide them. If provenance and watermark disagree, label it as a conflict state, not as certainty.
- Expose the scope of each signal. “Valid manifest” should be explained as “manifest integrity verified,” not “image is authentic.”
- Preserve originals in editorial workflows. Newsrooms should retain source files with intact manifests and document transformations.
- Adopt human-in-the-loop review where stakes are high. C2PA guidance already suggests interactive verification for soft bindings; organizations should take that seriously.
- Avoid turning absence into accusation. C2PA’s own model acknowledges stripping. “No credentials found” should not be treated as evidence of wrongdoing.
The broader implication is cultural as much as technical: audiences need transparency about what tools can and cannot conclude.
When signals disagree, what to do
- ✓Surface conflicts explicitly (don’t collapse them into one label)
- ✓Explain what “valid manifest” actually verifies
- ✓Preserve originals and document transformations
- ✓Use human review for high-stakes decisions
- ✓Treat “no credentials found” as unknown—not guilt
The reader’s dilemma: how to think clearly when signals disagree
Bad actors do not need to defeat cryptography to create confusion; they only need to create situations where credible systems disagree. A partisan account can point to the provenance badge. A skeptic can point to the watermark result. Both can claim “proof.”
A practical mental model for non-specialists
1) What exactly was checked? Pixels, metadata, or a model’s pattern match?
2) What does a “pass” guarantee? Integrity of a manifest is not the same as truthfulness about the scene.
3) What would make this check fail? Understanding failure modes (stripping, re-encoding, omission) is often more revealing than the success case.
The goal isn’t cynicism. The goal is to reserve the word “proof” for situations where the scope of a system matches the claim being made.
Multiple perspectives: why builders still back provenance and watermarks
The critique is about over-promising. Pairing two imperfect systems can improve coverage, but it can also create contradictions unless the ecosystem has norms—and user interfaces—for handling disagreement. The paper’s value is forcing that conversation into the open.
Conclusion: The next authenticity battle is over meaning
C2PA Content Credentials can cryptographically verify that a manifest hasn’t been altered. Watermark detectors can be extremely accurate at detecting a watermark. The arXiv authors show that both can return “valid” while describing incompatible stories about the same asset—without any cryptographic break, and through ordinary workflows.
If the industry wants authenticity infrastructure that earns trust, it will need more than better detectors. It will need better vocabulary, better disclosure, and systems that treat conflict as first-class information.
The future won’t be “verified” or “unverified.” It will be a set of signals—sometimes aligned, sometimes desynchronized—and the real test will be whether platforms, publishers, and policymakers tell the public what those signals can actually say.
Frequently Asked Questions
What does C2PA actually prove?
C2PA proves the integrity of a signed manifest: who asserted certain claims about an asset, and whether that manifest was altered after signing. A valid C2PA check does not guarantee the asset’s scene is true or that all relevant information was disclosed. It validates what was signed, not what might have been omitted.
If an image has no Content Credentials, is it suspicious?
Not necessarily. C2PA explicitly recognizes stripping and “availability attacks” as realistic: screenshots, re-encoding, and non-supporting platforms can remove or discard manifests. Absence could mean credentials were never attached, or that they were removed in transit. Treat “no credentials found” as “unknown,” not as guilt.
Does an AI watermark mean the entire image is AI-generated?
A watermark detector typically indicates the presence of a specific embedded signal, not a full provenance narrative. Depending on the workflow and tools, a watermark might survive edits, be present in only part of a composite, or reflect a tool’s involvement rather than full authorship. Watermark presence is evidence, but not a complete verdict by itself.
What is the 2026 “Integrity Clash” paper claiming?
The March 2, 2026 arXiv paper Authenticated Contradictions from Desynchronized Provenance and Watermarking demonstrates that an asset can carry a cryptographically valid C2PA manifest asserting human authorship while also containing an AI watermark in the pixels—with both checks passing independently and without breaking cryptography.
The paper mentions “100% accuracy.” Does that mean deepfakes are solved?
No. The reported 100% classification accuracy applies to the authors’ cross-layer audit protocol classifying four conflict states in a testbed of 3,500 images under multiple perturbations. It does not mean universal AI detection is perfect. The key point is identifying contradictory states reliably, not proving absolute authenticity.
What are “soft bindings” in C2PA?
Soft bindings (assertion label `c2pa.soft-binding`) are mechanisms to help reconnect an asset to its Content Credentials even if metadata is stripped. C2PA allows soft bindings based on watermarks or fingerprints. The specification cautions that soft bindings are not guaranteed to be exact and recommends human-in-the-loop verification in some cases.















