How to spot AI hallucinations in audit reports

A practical checklist for catching the four most common ways AI invents control references, misreads timestamps, or over-claims compliance.

By Piyawat Sritavong

The shape of an AI hallucination

When an AI gets an audit finding wrong, it almost never gets it obviously wrong. The dangerous failure mode is a finding that looks correct, is well-cited, and points at a passage that doesn't actually say what the AI claims.

After analyzing several thousand audit findings, we see four patterns:

  1. Phantom controls. The AI cites a control ID that doesn't exist in the standard you're auditing against.
  2. Drifted citations. The cited passage exists, but says something subtly different from what the AI claims it says.
  3. Date scrambles. "Reviewed in 2024" gets read as a current-year fact when the document is from 2026.
  4. Overclaimed satisfaction. A clause that partially addresses a control gets marked as fully matched.

A 5-minute verification checklist

For every audit report, before you act on it:

  1. Spot-check 5 findings at random. Open the cited passage and read it. Does the finding's claim match what the passage actually says?
  2. Verify every control ID. Most frameworks publish their numbering — ISO 27001:2022 has 93 controls numbered A.5.1 through A.8.34[1]. Anything outside that range is a hallucination.
  3. Look for date mismatches. Cross-reference dates in the finding text against the document's "Last Updated" footer.
  4. Re-read every "Fully Matched" finding. Partial matches are usually obvious; overclaimed satisfaction is what catches teams out. Anything labeled "Fully Matched" deserves 30 seconds of human verification.
  5. Question round numbers. "100% coverage" or "0 findings" is rarely real on a 50-page policy. If you see suspiciously clean numbers, re-run with a different role or prompt.

What good AI audit tools should do

A tool that wants you to trust it should make verification easy:

  • Every finding links to the exact highlighted span of text it evaluated.
  • The AI's reasoning is shown alongside the finding, not buried in a debug panel.
  • Confidence scores reflect actual uncertainty (not just every finding labeled "95% confident").
  • The tool warns you when a document is too short, too long, or in an unexpected format.

If your AI audit tool doesn't do these things, treat its findings as suggestions, not conclusions.

Why we still ship anyway

Despite all the above, AI-assisted audits beat the alternative. Manual audits also miss things — they just miss different things. The combination of "AI does the first pass, human verifies the top 10 findings" produces better coverage at lower cost than either approach alone[6].

The goal isn't infallible AI. The goal is a tool you can verify in 5 minutes that gets you 80% of the way there.

Sources

  1. [1]International Organization for Standardization. ISO/IEC 27001:2022 Information security, cybersecurity and privacy protection — Information security management systems. ISO, 2022. https://www.iso.org/standard/27001
  2. [6]EvidProof Research Team. EvidProof Internal Validation Study: AI Audit Accuracy Benchmark — PLACEHOLDER until real study is published. EvidProof, 2026. https://evidproof.com/research/accuracy-benchmark-2026

Related reading