ollama - Maki Chiang｜Notes

Maki Chiang｜Notes

Sign in Subscribe

ollama

A collection of 6 posts

Why Your Local LLM Is Returning Empty Responses (And How to Fix It)

Why Your Local LLM Is Returning Empty Responses (And How to Fix It)

Your Ollama API call returns 200 OK with done: true — but the response field is empty. The model isn't broken. It's writing everything into a thinking field you're not reading. One parameter fixes it.

為什麼你的本地 LLM 回傳空白？一個參數就能修好

為什麼你的本地 LLM 回傳空白？一個參數就能修好

你的 Ollama API 回 200 OK、done: true，但 response 是空字串。模型沒壞——它把所有 token 都寫進了 thinking 欄位。一個參數就能修好。

你的 AI 沒有反對黨：為什麼單一 LLM 是一場 Echo Chamber

你的 AI 沒有反對黨：為什麼單一 LLM 是一場 Echo Chamber

你讓 Claude 寫了一段 code，然後請它 review 自己寫的 code。它說「看起來不錯」。恭喜你，你剛跑了一場一人選舉。Multi-agent 不是軍備競賽，是治理結構。

Ollama 默默把你的 Gemma4 KV cache 撐到 256K：DGX Spark 配置優化的真實坑

Ollama 默默把你的 Gemma4 KV cache 撐到 256K：DGX Spark 配置優化的真實坑

DGX Spark + Gemma4 31B + Ollama 預設配置會默默把 KV cache 拉到 256K context，21GB unified memory 蒸發，inference 卡 28 分鐘。記錄 root cause 與最佳配置：FA=0、KV cache f16、num_ctx 鎖 8K、用 /api/chat 不用 /v1/chat/completions。

Ollama + bge-m3 Embedding 產生 NaN 導致寫入失敗：完整診斷與修復

Ollama + bge-m3 Embedding 產生 NaN 導致寫入失敗：完整診斷與修復

Ollama + bge-m3 embedding 對長中文文字產生 NaN，root cause 是 flash attention F16 overflow。一個環境變數修復：OLLAMA_FLASH_ATTENTION=0

本機 LLM 不是本機：Ollama 公網曝露的風險

本機 LLM 不是本機：Ollama 公網曝露的風險

當本機 LLM 服務端點曝露到公網，風險不只算力被偷用，更可能引發資料外洩與整合鏈的連鎖問題。這篇整理我會怎麼看、以及最低限度的防護做法。