Loop Engineering Is More Than a Concept — I Built a Cross-Session State Handoff System
Loop Engineering is more than a concept. I built Session Baton — a cross-session state handoff system with three-tier handoff, Anti-Ouroboros gate, and one-line PyPI install.
Boris Cherny (Claude Code lead) said: "I don't prompt Claude anymore. My job is to write loops."
This blew up in the AI coding world. Addy Osmani systematized it into five building blocks. Everyone's talking about Loop Engineering.
But most discussions stay at the concept level. I spent a week building it.
The Problem: Your AI Sessions Are Flat
Most people use AI coding agents like this:
Session 1: prompt → work → done
Session 2: prompt → work → done (doesn't remember Session 1)
Session 3: prompt → work → done (doesn't remember Session 1 or 2)
Every session starts from scratch. You manually re-explain context. The agent forgets what mistakes it made, what decisions were taken, what's pending.
"But what about CLAUDE.md and session memory?"
Sure. But those are fuzzy-search knowledge (warm tier). You search "last deployment" and get a three-month-old record. What you actually need is precise state: "I deployed X last session — is it still alive?"
Not a Cron Job — It's a Loop
Adding a timer to run AI on schedule is a cron job. A loop is fundamentally different:
| Cron Job | Loop | |
|---|---|---|
| State | No memory | Knows what happened last time |
| Decision | Does the same thing every time | Decides based on last result |
| Verification | None | Tracks effect of last action |
| Escalation | None | Calls human only when stuck |
The output of each loop iteration becomes the input of the next. It spirals upward, not repeats flatly.
Three-Tier Handoff: Impact, Decision, Experience
While designing the system, I discovered that cross-session handoff involves three fundamentally different lifecycles:
Impact — Short-lived
"I deployed X, expected the health endpoint to return 200."
Next session auto-verifies. Pass → close. Fail → escalate.
Decision — Medium-lived
"Switched to --update-env-vars because --set-env-vars clears existing variables (root cause of last week's production incident)."
Carries rationale + evidence. Lives until superseded.
Experience — Long-lived
"Third time assuming Docker architecture without reading docker-compose.yml."
Single occurrence = anecdote. Repeated occurrences = pattern. At threshold → propose upgrade to enforced rule.
Anti-Ouroboros: LLM Cannot Self-Promote Its Own Rules
Here's a critical governance question: pattern-to-rule graduation means LLM output influences future LLM behavior.
Without a gate, this becomes a self-reinforcing loop — the LLM observes "I keep making this mistake" → auto-writes a rule → the rule changes all future behavior. Sounds efficient, but could permanently encode incorrect observations.
So I added an Anti-Ouroboros Gate:
- Every baton item carries a
source_tier:llm_derivedorhuman_confirmed llm_derivedpatterns cannot auto-graduate to rules- Human must confirm (changing tier to
human_confirmed) before writing torules/ - This is part of the ACA (Agent Civilization Architecture) protocol
The Most Overlooked Loop: Your Daily Conversations
When most people hear Loop Engineering, they think "which cron job can I upgrade?"
But the most powerful loop isn't an automation script — it's your daily working conversations with AI.
Think about it:
- Session start: load last session's decisions and progress
- Working: write code, solve problems, hit walls
- Session end: distill lessons into rules
- Next session: constrained by the rules from the previous round
This is a meta-loop — a loop that rewrites its own rules.
The loop Osmani describes modifies code. A meta-loop modifies the rules that modify code. The former is efficiency. The latter is compounding capability.
Session Baton: Open Source Implementation
I open-sourced the system: github.com/MakiDevelop/session-baton
Published on PyPI, one-line install:
pip install session-baton
python -m session_baton
# Server runs on http://127.0.0.1:9101
106 lines of Python. FastAPI + SQLite. Runs standalone or plugs into any existing memory system (memhall, mem0, Letta compatible).
Read baton at session start, write at session end. Full schema, skill templates, and spec included in the repo.
Industry Status
After scouting X/Twitter, GitHub, HN, Reddit, and arXiv:
- Compounding Knowledge Loop (MindStudio): dumps summary at session end. No action-outcome verification.
- bmo (ngrok): cross-session telemetry. Tracks tool success rates, not expected outcomes.
- Stability Tier (arXiv): three-tier memory architecture. Academic paper, no engineering implementation.
- Meta HyperAgents: failure → rule proposal pipeline. Requires human review, no automatic threshold.
Cross-session closed-loop verification + pattern-to-rule graduation + three-tier structured handoff — no public end-to-end implementation exists yet.
Session Baton is the first.
What's Next
Session Baton just started dogfooding. Over the next few weeks we'll observe:
- Does the baton actually spiral upward? Or is it just another log nobody reads?
- Is pattern-to-rule graduation practical in real use?
- How many "broke something last time but didn't notice" cases does action-outcome verification catch?
Will share findings as they come.
- Session Baton: chiba.tw/baton
- PyPI: pypi.org/project/session-baton
- ACA Protocol: chiba.tw/acap
- Loop Engineering knowledge base: chiba.tw/loop-engineering/
I'm Maki (Chung-Chiao Chiang), an AI systems builder with roots in TPM and product management. I design multi-agent collaboration architectures and built ACA — the first open protocol for AI agent governance.