SQLite

你的 Docker base image 用什麼版本的 SQLite？

去你的 production container 跑一行 `docker exec ... python -c "import sqlite3; print(sqlite3.sqlite_version)"`。如果回 3.46.x 或更舊——這篇是給你的。一次從 SQLite 3.46.1 升到 3.53.0 的 production deploy 紀錄，含 multi-stage Dockerfile、build-time assert、deploy preflight、跟一次 cascading failure 的 surgical fix。

江中喬

27 4月 2026 • 8 min read

去 SSH 你的 production container，跑這一行：

docker exec <your-container> python -c "import sqlite3; print(sqlite3.sqlite_version)"

如果回 3.46.x 或更舊——這篇文章是寫給你的。

我自己今晚做的同樣動作，回傳 3.46.1。然後我發現——production data 在一個 WAL-reset corruption bug 風險中已經 4 個月。

怎麼發現的

我有個個人 PKI memory layer 跑在 Mac mini 上，Python FastAPI + SQLite + sqlite-vec + bge-m3 embedder，全 Docker。今晚做健康巡檢，七位一體 AI agent 分工查問題。

過程中我給 Perplexity Max（深度 research agent）丟了一題：

sqlite-vec (v0.1.x) 在 production 的 stability 與 wal mode 互動有沒有 known issue？PRAGMA wal_checkpoint(PASSIVE) 回 0|0|0 但 WAL 仍 4MB 不縮——是 normal 還是 indicator？

我以為是個小問題（sqlite-vec extension 自己的 bug 之類）。Max 回的第一段大致 confirm「WAL 不縮是 SQLite 預期行為，有 reader transaction 持有 read lock」，沒事。

但接著：

SQLite 3.53.0（2026-04-09）和 3.51.3（2026-03-13）都修了一個「WAL-reset database corruption bug」——這代表之前版本（包含你可能跑的版本）確實有 WAL reset 時的 corruption 風險。強烈建議升到 SQLite 3.51.3 或 3.53.0，這是個 production 必要動作。

SQLite Release History 上明確寫著：

2026-03-13 (3.51.3). Fix the WAL-reset database corruption bug.

我立刻 SSH 上 production，跑了那一行，發現 3.46.1。

為什麼升 base image 不夠

第一個直覺：升我的 Dockerfile 從 python:3.12-slim 到 python:3.13-slim 應該就有新 SQLite 吧？

不對。

實測：

Base image	sqlite3 version
`python:3.12-slim` (Debian bookworm)	3.46.1
`python:3.13-slim` (Debian bookworm)	3.46.1
`python:3.13-slim-trixie` (Debian trixie)	3.46.1
`debian:bookworm-slim`	3.46.1
`debian:trixie-slim`	3.46.1
`debian:sid-slim`	3.46.x

整個 Debian 系列從 bookworm 到 sid 都還在 3.46.x。Ubuntu 24.04 / 25.04 同樣。

換句話說：你升你的 Python / 你升你的 base image / 你重 build 你的 image，SQLite 版本不會變。Debian/Ubuntu 上游遠晚於 SQLite 上游 release 4-6 個月以上。

而 Python 的 sqlite3 標準庫是 dynamic-link 系統 libsqlite3.so，所以 Python interpreter 升不升也跟 SQLite 版本無關——它用的是 OS 提供的那顆 libsqlite3。

Multi-stage build：從 source 編譯 SQLite 注入 base image

Dockerfile 改寫成 multi-stage，新增一個 stage 從 source 編譯 SQLite，然後 inject 到 builder 跟 runtime stage 的 ld config：

# ---- SQLite 3.53.0 builder ------------------------------------------------
FROM debian:bookworm-slim AS sqlite-builder

ARG SQLITE_VERSION=3530000
ARG SQLITE_YEAR=2026

RUN apt-get update && apt-get install -y --no-install-recommends \
        build-essential wget ca-certificates \
    && rm -rf /var/lib/apt/lists/*

WORKDIR /build
RUN wget -q "https://www.sqlite.org/${SQLITE_YEAR}/sqlite-autoconf-${SQLITE_VERSION}.tar.gz" \
    && tar xzf "sqlite-autoconf-${SQLITE_VERSION}.tar.gz" \
    && cd "sqlite-autoconf-${SQLITE_VERSION}" \
    && ./configure \
        --prefix=/opt/sqlite \
        --enable-fts5 \
        --enable-load-extension \
        CFLAGS="-O2 -DSQLITE_ENABLE_FTS5 -DSQLITE_ENABLE_JSON1 -DSQLITE_ENABLE_RTREE -DSQLITE_ENABLE_LOAD_EXTENSION" \
    && make -j"$(nproc)" install


# ---- Runtime --------------------------------------------------------------
FROM python:3.12-slim-bookworm AS runtime

ENV LD_LIBRARY_PATH=/opt/sqlite/lib:${LD_LIBRARY_PATH}

# Inject upgraded SQLite — Python sqlite3 dynamic-loads libsqlite3.so
# from LD_LIBRARY_PATH first, shadowing system 3.46.1.
COPY --from=sqlite-builder /opt/sqlite /opt/sqlite
RUN echo "/opt/sqlite/lib" > /etc/ld.so.conf.d/sqlite-upgrade.conf && ldconfig

# ... rest of your runtime

關鍵點：

多階段 build — sqlite-builder 是獨立 stage，不污染 final runtime image 的 build deps
/opt/sqlite/lib 加進 ld.so.conf.d — Python sqlite3 module 在 import 時 dynamic-load libsqlite3.so，會 follow ld config 順序，/opt/sqlite/lib 比系統 /lib/aarch64-linux-gnu/ 優先
LD_LIBRARY_PATH env — 雙保險，子程序都會繼承

幾個容易踩的坑：

--enable-json1 在 SQLite 3.46+ 已經是永久內建，3.53 的 ./configure 直接會 fail with Unknown option --json1。我第一次 build 就栽在這。CFLAGS 裡的 -DSQLITE_ENABLE_JSON1 保留無害（也是 default-on），但 configure flag 一定要拿掉。
base image OS 對齊 — sqlite-builder 用 debian:bookworm-slim，runtime 用 python:3.12-slim-bookworm，兩個都 bookworm。如果 sqlite-builder 用 bookworm、runtime 用 trixie，glibc 版本不同 .so link 不過。
記得 ldconfig — 加完 ld.so.conf.d 一定要跑一次 ldconfig，否則 cache 不更新。

Build-time fail-fast assert

這是最便宜也最有保護力的一段：

RUN /app/.venv/bin/python -c "\
import sqlite3, sqlite_vec; \
ver = sqlite3.sqlite_version_info; \
assert ver >= (3, 51, 3), f'SQLite {sqlite3.sqlite_version} < 3.51.3 (WAL-reset corruption bug)'; \
c = sqlite3.connect(':memory:'); \
c.enable_load_extension(True); \
sqlite_vec.load(c); \
v = c.execute('SELECT vec_version()').fetchone()[0]; \
print(f'SQLite={sqlite3.sqlite_version}, vec0={v}')"

四個保護：

Build 階段就 fail——而不是 deploy 後 production runtime 才發現
明確的 minimum version（3.51.3）寫在錯誤訊息裡——下一個碰到的人秒懂為什麼
同時驗證 sqlite-vec extension——升版後跟 vec0 兼容性也測完
未來 base image 漂移時自動偵測——如果某天 Debian 自己升了 SQLite 但版本仍 < 3.51.3，build 直接掛

對於依賴底層 library 又不是直接 own library code 的情況，build-time assert 比 runtime check 划算太多。

Deploy 路徑：當 mini 沒裝 buildx

我的 Mac mini production server 用 docker desktop 但沒裝 buildx plugin（mini 平常不 build image）。

新 Dockerfile 用了 --mount=type=cache,target=/root/.cache/uv 加速 uv sync——這是 BuildKit-only 語法，沒 buildx 直接掛。

可以選擇：

A. 在 mini 裝 buildx（但這次 deploy 不想動 mini docker config）
B. 修 Dockerfile 拿掉 --mount=type=cache（折衷）
C. MBP build 完，docker save → scp → mini load（image transfer）

我選 C：

# MBP
docker save memory-hall:0.1.0-sqlite-3.53 -o /tmp/mh.tar
scp /tmp/mh.tar mini:/tmp/mh.tar

# mini
docker load < /tmp/mh.tar
docker tag memory-hall:0.1.0-sqlite-3.53 memory-hall:0.1.0

83MB 走 Tailscale，30 秒。比裝 buildx 簡單，比改 Dockerfile 不破壞 best practice。

這算 workaround 嗎？我覺得不算——是合理的 distribution mechanism。docker save/load 是官方 supported workflow，不是 hacky 繞道。判斷標準是「root cause 有沒有被處理」——SQLite 升版是 root cause，docker transfer 是 distribution，兩者邏輯獨立。

Deploy preflight 救一命

這是最該寫的一段。

我 SSH 上 mini 準備跑 docker compose up -d --force-recreate memory-hall。停手前先做 dry-run：

ssh mini-ts 'cd ~/GitHub/memory-hall && ~/.docker/cli-plugins/docker-compose config' \
  | grep -E "MEMHALL_DATA_DIR|source"

輸出：

source: ./mh-data    # ← 不對！production data 在 /Users/maki/data/memory-hall

原因：SSH 非互動 shell 不繼承 我先前手動 export MEMHALL_DATA_DIR=... 的環境變數。docker-compose.yml 用的是 ${MEMHALL_DATA_DIR:-./mh-data} fallback。我如果直接跑 docker compose up，container 會 mount 一個空的 ./mh-data 進 /data——220 個 production entries 從 container 視角直接消失，看起來像 fresh empty install。

這跟我之前一次 named volume swap incident 同類問題（docker run → docker compose 切換時，project namespace 重新分配 volume，舊 data 變 orphan）。當時是事後從 9 小時前的 JSONL dump 救回來的。

這次救我一命的是 compose config dry-run + 知道要去 grep source:。如果直接 up -d --force-recreate——一秒鐘內 production data 從 container 角度消失。host 端 file 還在，但 container 不認得。

修法：先建一個 .env 在 mini repo dir（.gitignore 已涵蓋 .env），含完整環境變數：

MEMHALL_DATA_DIR=/Users/maki/data/memory-hall
MH_EMBEDDER_KIND=http
MH_EMBED_BASE_URL=http://...
MH_API_TOKEN=...

chmod 600 保密。docker compose 會 auto-load。

部署完不是結束

deploy 跑完，verify 都過：

SQLite version inside container = 3.53.0 ✅
220 entries 完整保留（搬家 + 升版後）✅
write/search/embed pipeline 通 ✅
container health = ok ✅

我以為收工了。

10 分鐘後另一個 conversation 戳我：「memhall search 500 了」。

SSH 上去，read endpoint 全 500，但 write endpoint OK。Container health 顯示 storage degraded，但同一支 SQLite 檔案 host 端跑 PRAGMA integrity_check 是 ok。

Pattern：

aiosqlite (async, application 用) → fail with disk I/O error
sqlite3 (sync, plain) → ok
container 內直接 sqlite3.connect() SELECT → ok（227 rows）
vector store 用 sync sqlite3 + sqlite-vec → semantic search 還能跑

換句話說：檔案沒壞，aiosqlite connection pool 累積了 stuck connections。

這個 incident 跟 SQLite 升版沒有直接關係，但是 cascading failure 的後果。我有另一個 service（GPU 上跑 bge-m3 embedder）那段時間 latency 從正常 200-500ms 飆到 7.7 秒（GPU 配置出問題降到 CPU 推論）。memhall 的 background reindex worker 每 120 秒掃 75 個 pending entries 做 embed，每個 embed 都 timeout 失敗。每次 timeout 在 aiosqlite layer 留一個 stuck connection，9 分鐘後 connection pool 全部 broken。

Surgical fix（沒重啟、沒從 backup 還原、沒改 code）：

UPDATE entries 
  SET sync_status = 'failed',
      last_embed_error = 'temporarily failed during downstream service recovery'
  WHERE sync_status = 'pending';

把 75 個 pending 暫標 failed。reindex worker 下個 cycle enumerate pending → 0 個 → 不再撞 embedder timeout → connection pool 不再累積 stuck state。然後 docker restart 清掉現有 broken pool。

下游 service 修好後，admin endpoint trigger full reindex（pending_only=False）會把 failed 一起重 embed。Codex 設計這個 schema 時就把 failed 留作可恢復狀態，不是死亡終點——這是好的 schema 設計。

教訓：deploy 只是一個 baseline，運行期的 cascading failure 才是真正的 production engineering。SQLite 升版是個 single PR 能 review 的事，downstream service degradation 帶出的 connection pool 行為不是。

收尾：四個 takeaway

去 check 你的 base image 用什麼版本的 SQLite——大概率是 3.46.1，大概率有 corruption bug 風險
Multi-stage build 從 source 編譯 SQLite + ld config inject 是 cleanest 的升版方式（base image 升不了你需要的版本）
Build-time fail-fast assert 是最便宜的長期保護，比 runtime check 划算 100 倍
Deploy 前一定要 compose config dry-run——SSH 非互動 shell 不繼承你手動 export 的環境變數，volume mount fallback 會讓 production data 在 container 視角消失

整個從巡檢到 production deploy 完成花了 4 小時。最後一條 commit：

4539ca9 fix(deploy): 升 SQLite 3.46.1 → 3.53.0 修 WAL-reset corruption bug

Production data 沒在 corruption bug 中受損，是因為今晚這個 routing 把對的問題交給了對的 agent——不過那是另一篇文章的故事。

memhall (memory-hall) 是 open source: https://github.com/MakiDevelop/memory-hall

相關 commits：fix/reliability-phase-a-2026-04-27 （已 merge to main）