My AI Agent Cited Its Own Hallucination as Fact — And How I Stopped It With 8 Lines of Config

My AI agent cited its own hallucination as fact. The Ouroboros Effect in AI agent memory — and how ACA stops it with source tiers, provenance chains, and audit trails.

江中喬

16 6月 2026 • 4 min read

I run a multi-agent system where Claude, Codex, Gemini, and Grok collaborate on software engineering tasks. One day, I noticed something disturbing.

An agent wrote a "fact" to our shared memory:

API rate limit should be set to 100 requests per minute per user.

No source. No evidence. Just an LLM's best guess, stored as a fact.

Two hours later, another agent read that memory, used it as the basis for a production constraint change, and wrote a new memory:

API rate limit confirmed at 100 req/min/user — based on existing system documentation.

It cited the first agent's guess as "existing documentation." The hallucination had been laundered into a trusted fact.

This is the Ouroboros Effect — when LLM-derived outputs get fed back into LLM inputs, each round adding a thin layer of false confidence until nobody can trace what's real.

The Problem Nobody Talks About

The AI agent ecosystem is obsessed with capability — tool calling (MCP), inter-agent communication (A2A), memory (Mem0, Zep, LangGraph). But nobody is asking the governance questions:

Who wrote this memory?
Where did it come from?
Can we trust it?
Has anyone reviewed it?
Was it ever changed, and by whom?

Every agent framework gives you a MemorySaver that's essentially a key-value store with no opinions. Write anything. Read anything. No audit trail. No provenance. No trust levels.

It's like building a financial system where anyone can edit the ledger and nobody keeps a log.

What Happened When I Added Governance

I built ACA — Agent Cognition Architecture to solve this. Here's what changed:

Before (plain memory store)

{
  "id": "mem-003",
  "content": "API rate limit should be 100 req/min",
  "created_at": "2026-06-11"
}

That's it. No metadata. No history. No way to know if this is a human directive or an LLM guess.

After (ACA-governed memory)

{
  "memory_id": "mem-003",
  "content": "API rate limit should be 100 req/min",
  "source": {
    "tier": "raw_source",
    "type": "agent",
    "ref": "codex-analysis-20260611"
  },
  "status": "superseded",
  "provenance_chain": {
    "origin": { "memory_id": "mem-003", "tier": "raw_source" },
    "transitions": [
      {
        "type": "supersede",
        "to_memory_id": "mem-004",
        "tier_before": "raw_source",
        "tier_after": "llm_derived"
      },
      {
        "type": "tier_upgrade",
        "to_memory_id": "mem-005",
        "tier_before": "llm_derived",
        "tier_after": "human_confirmed"
      }
    ]
  }
}

Now I can see: this started as a raw agent guess (raw_source), was reviewed and updated by another agent (llm_derived), and finally confirmed by a human with evidence (human_confirmed). The full chain is traceable. The original guess is still there, marked superseded — not deleted.

The 8 Lines That Prevent Ouroboros

{
  "store": "sqlite",
  "governance": {
    "dedup": true,
    "anti_ouroboros": true,
    "namespace_isolation": true,
    "write_gate": true
  }
}

That's ~/.amh/config.json. When anti_ouroboros is on, the system enforces:

Source tier tracking: Every memory is tagged raw_source, llm_derived, or human_confirmed
No circular re-feed: An llm_derived memory cannot be used as a source to produce another llm_derived memory without human review in between
Provenance chain: Every modification is recorded — who changed what, when, and why

The Ouroboros Effect dies because the system won't let an LLM cite another LLM's output as ground truth.

Seeing Governance (Not Just Enforcing It)

Rules are only useful if violations are visible. That's why I built ACA Inspector — a Web UI that makes governance tangible.

One page shows you the full story of any decision:

[Authority]     → [Decision + Reviews] → [Provenance DAG] → [Audit Trail]
 Who can act?     What was proposed?      Where did the       What happened
 What are the     Who opposed?            evidence come       and when?
 constraints?     Why was it approved?    from?

When one of my agents proposed allowing auto-modification of production constraints from LLM-derived sources, the Inspector showed me:

Codex opposed: "This bypasses the core governance principle. Confidence ≠ correctness."
Gemini conditionally approved: "Allow for operational constraints only, with a 24h cooling period."
I ratified Gemini's approach, incorporating Codex's concern about confidence vs. correctness.

All of this is visible in one screen. Not buried in logs. Not scattered across chat transcripts. Visible.

The eslint of Agent Governance

Beyond visualization, I needed automated detection. ACA Incident Analyzer scans your memory store for 8 types of governance violations:

Rule	What it catches
Authority Violation	Agent acted without required capability
Trust Escalation	Trust tier upgraded without evidence
Ouroboros Cycle	LLM output fed back as LLM input
Self-Approval	Same principal proposed and ratified
Decision Without Review	High-risk decision ratified without review
Namespace Cross-Write	Agent wrote outside its assigned scope
Revoke Without Audit	Memory revoked without audit trail
Tier Downgrade	Trust level lowered unexpectedly

import { analyzeIncidents } from "@chibakuma/aca-incident-analyzer";

const report = analyzeIncidents({ memories, auditEvents, decisions });
console.log(report.summary);
// { critical: 0, high: 1, medium: 0, low: 0, total: 1 }

Think of it as running eslint on your agent's behavior, not its code.

It Works With Your Existing Stack

ACA is a protocol, not a framework. It plugs into what you already use:

LangGraph.js: npm install @chibakuma/aca-langgraph — drop-in checkpointer that adds governance to every state transition
CrewAI: Already supports MCP — connect directly to AMH server with zero adapter code
Any MCP client: Claude Desktop, Cursor, Codex — just add AMH as an MCP server

npx @chibakuma/agent-memory-hall

One command. Your agent memories now have source tiers, provenance chains, and audit trails.

Certify Your Implementation

If you're building an ACA-conformant system, you can verify it:

npx @chibakuma/aca-certification --store sqlite --path ./my-agent.db

✅ L1: Memory Schema (100%)
✅ L2: Governance Policies (100%)
✅ L3: Trust & Provenance (100%)
✅ L4: Audit Trail (100%)
✅ L5: Decision Governance (100%)
Result: ACA Full Stack Conformant

Three certification levels, like Kubernetes or OAuth: Layer 1, Layer 1-3, or Full Stack Conformant.

Why This Matters Now

Every week, another AI agent framework launches. They compete on capability — more tools, faster inference, longer context. But none of them answer the question that will matter most when agents run in production:

"Why did the agent do that, and should we trust it?"

The Ouroboros Effect isn't theoretical. I caught it in my own system. If you're running multi-agent workflows with shared memory, you're probably already experiencing it — you just can't see it yet.

ACA makes it visible. Then verifiable. Then preventable.

Links:

GitHub: MakiDevelop/agent-memory-hall
npm: @chibakuma/agent-memory-hall
ACA Spec: Agent Civilization Architecture
Inspector: npx @chibakuma/agent-memory-hall inspector

I'm Maki (Chung-Chiao Chiang), an AI systems builder with roots in TPM and product management. I design multi-agent collaboration architectures and built ACA — the first open protocol for AI agent governance.