MemPalace 開源 AI 記憶系統實測：96.6% 召回率的真相與爭議 | MemPalace Hands-On: The Open-Source AI Memory System With 96.6% Recall

By Kit 小克 | AI Tool Observer | 2026-04-11

🇹🇼 MemPalace 開源 AI 記憶系統實測：96.6% 召回率的真相與爭議

如果你有在追 AI Agent 開發圈的動態，這幾天一定看過 MemPalace 這個名字。這個由女演員 Milla Jovovich 和工程師 Ben Sigman 共同開發的開源 AI 記憶系統，在 2026 年 4 月初一上線就衝上 GitHub Trending 榜首，號稱在 LongMemEval 基準測試拿下 96.6% 的召回率，打趴所有付費方案。

但社群的反應卻很兩極——有人說這是 AI Agent 記憶問題的終極解法，也有人說跑分方法根本有問題。好不好用，試了才知道。

MemPalace 是什麼？為什麼 AI Agent 需要記憶系統？

目前主流 LLM 最大的痛點之一就是「記不住事情」。每次對話結束，context window 就清空了。MemPalace 的目標就是給 AI Agent 一個持久化、跨 session 的長期記憶層。

它的核心架構完全在本機運行：

ChromaDB：負責向量嵌入，儲存完整對話內容（不做摘要或萃取）
SQLite：建構時序知識圖譜，每筆事實都有有效時間視窗
本地檔案系統：存放原始資料
MCP 整合：透過 Model Context Protocol 無縫串接 Claude、ChatGPT 或本地 Ollama 模型

最關鍵的是：完全免費、Apache 2.0 授權、不需要任何外部 API。這也是它能在短時間內爆紅的主因。

96.6% 召回率是真的嗎？基準測試爭議解析

MemPalace 最初宣稱在 LongMemEval 拿下 100% 滿分（500/500），後來在社群壓力下修正為 96.6%。但即便是這個數字，也有幾個需要注意的地方：

96.6% 是 recall_any@5 指標，衡量的是「前 5 筆檢索結果中是否包含正確記憶」，而非「是否能正確回答問題」
測試使用的是 LongMemEval small variant，難度較低
100% 的 hybrid v4 分數是透過檢查失敗的題目、針對性修改程式碼後重測得出的——這在基準測試界是大忌
有獨立開發者實測發現，實際接上 LLM 問答時，正確回答率只有約 17%

換句話說，「記得住」和「答得對」是兩回事。MemPalace 的檢索層確實表現不錯，但從檢索到正確生成答案之間還有很大的落差。

實際使用體驗如何？

從技術角度來看，MemPalace 有幾個真正的亮點：

壓縮技術：自研的無損壓縮格式可以把 1,000 個 token 壓到約 120 個，人類和 LLM 都能直接閱讀
零雲端依賴：所有資料都在你的機器上，對隱私敏感場景非常友善
MCP 支援：設定簡單，幾分鐘內就能讓 Claude 或 ChatGPT 擁有跨 session 記憶

但也有明顯的局限：

初始索引大量對話紀錄時速度偏慢
面對複雜的多跳推理問題，檢索品質會明顯下降
目前社群活躍度高但穩定性仍需時間驗證

MemPalace vs Mem0 vs Zep：該選哪個？

目前市場上主要的 AI 記憶方案比較：

MemPalace：免費開源、本地運行、recall_any@5 達 96.6%，但實際問答正確率待驗證
Mem0：商用方案、LongMemEval 約 85%、API 穩定但需付費
Zep：企業級方案、效能約 85%、有完整的 SDK 和技術支援

如果你是個人開發者或重視隱私，MemPalace 值得一試。如果是正式產品環境，目前建議再觀望。

常見問題 FAQ

MemPalace 真的能讓 AI 記住所有對話嗎？

MemPalace 會儲存完整的對話內容在本地 ChromaDB 中，檢索召回率達 96.6%。但「記住」和「正確使用記憶來回答」是不同的，實際問答正確率可能遠低於召回率。

MemPalace 需要付費或使用雲端 API 嗎？

完全不需要。MemPalace 是 Apache 2.0 開源授權，所有運算都在本機完成，不依賴任何外部服務。你只需要一台能跑 Python 和 ChromaDB 的電腦。

MemPalace 的 LongMemEval 100% 滿分是真的嗎？

這個分數有爭議。原始 100% 是透過 hybrid v4 模式達成，但社群發現這是針對失敗題目修改程式碼後重測的結果。修正後的 raw 模式分數為 96.6%（recall_any@5），仍是免費工具中最高。

哪些 AI 模型可以搭配 MemPalace 使用？

透過 MCP 整合，MemPalace 可以搭配 Claude、ChatGPT 等主流模型使用。也支援透過 Python/CLI 直接連接 Ollama 等本地模型。

🇺🇸 MemPalace Hands-On: The Open-Source AI Memory System With 96.6% Recall — Hype or Real?

If you have been following the AI agent developer community this week, you have almost certainly seen MemPalace trending everywhere. This open-source AI memory system, co-created by actress Milla Jovovich and engineer Ben Sigman, hit the top of GitHub Trending in early April 2026, claiming a 96.6% recall score on LongMemEval that beats every paid alternative.

But the community response has been sharply divided. Some call it the ultimate solution for AI agent memory; others say the benchmarks are fundamentally flawed. Time to find out for ourselves.

What Is MemPalace and Why Do AI Agents Need Memory?

One of the biggest pain points with current LLMs is that they forget everything once the context window resets. MemPalace aims to give AI agents a persistent, cross-session long-term memory layer that runs entirely on your local machine:

ChromaDB for vector embeddings, storing full conversations without summarization
SQLite for a temporal knowledge graph with validity windows per fact
Local filesystem for raw data storage
MCP integration for seamless connection to Claude, ChatGPT, or local Ollama models

The killer feature: it is completely free, Apache 2.0 licensed, and requires zero external APIs. That alone explains why it went viral overnight.

Is the 96.6% Recall Score Legit? Benchmark Controversy Explained

MemPalace originally claimed a perfect 100% score on LongMemEval (500/500), later revised to 96.6% under community pressure. Here is what you need to know about these numbers:

The 96.6% measures recall_any@5 — whether the correct memory appears in the top 5 retrieved results, not whether the system answers correctly
Testing used the LongMemEval small variant, which is considerably easier than the full benchmark
The 100% hybrid v4 score was achieved by inspecting failing questions, writing targeted code fixes, and retesting on the same set — a serious methodological red flag
Independent testing showed that when MemPalace is plugged into an LLM for actual Q&A, the correct answer rate drops to roughly 17%

In short, retrieving the right memory and generating the right answer are two very different things. MemPalace excels at the retrieval layer but the gap between retrieval and correct generation remains significant.

What Is the Actual User Experience Like?

From a technical standpoint, MemPalace has genuine strengths:

Token compression: A custom lossless format reduces 1,000 tokens to roughly 120, readable by both humans and LLMs without a special decoder
Zero cloud dependency: All data stays on your machine, ideal for privacy-sensitive workflows
Easy MCP setup: You can give Claude or ChatGPT cross-session memory in minutes

The limitations are equally clear:

Initial indexing of large conversation histories is slow
Retrieval quality degrades noticeably on complex multi-hop reasoning queries
The project is young and stability needs more real-world validation

MemPalace vs Mem0 vs Zep: Which Should You Choose?

A quick comparison of the main AI memory solutions available today:

MemPalace: Free and open-source, local-only, 96.6% recall_any@5, but real-world Q&A accuracy needs verification
Mem0: Commercial solution, approximately 85% on LongMemEval, stable API but requires payment
Zep: Enterprise-grade, approximately 85% performance, full SDK and support

For individual developers or privacy-first use cases, MemPalace is worth trying. For production environments, it might be wise to wait.

FAQ

Can MemPalace Really Make AI Remember All Conversations?

MemPalace stores complete conversations in local ChromaDB with 96.6% retrieval recall. However, retrieval and correct answer generation are different — actual Q&A accuracy may be significantly lower than the recall score suggests.

Does MemPalace Require Payment or Cloud APIs?

No. MemPalace is Apache 2.0 open-source and runs entirely locally. You need a machine capable of running Python and ChromaDB, nothing more.

Is the LongMemEval 100% Perfect Score Real?

This score is disputed. The 100% was achieved in hybrid v4 mode by fixing code specifically for failing questions and retesting. The corrected raw mode score is 96.6% (recall_any@5), which is still the highest among free tools.

Which AI Models Work With MemPalace?

Via MCP integration, MemPalace works with Claude, ChatGPT, and other major models. It also supports direct Python/CLI connections to local models like Ollama.

Sources / 資料來源

常見問題 FAQ

MemPalace 真的能讓 AI 記住所有對話嗎？

MemPalace 在本地 ChromaDB 儲存完整對話，檢索召回率達 96.6%，但實際問答正確率可能遠低於此數字。

MemPalace 需要付費或雲端 API 嗎？

不需要。MemPalace 是 Apache 2.0 開源授權，完全在本機運行，不依賴外部服務。

MemPalace 的 LongMemEval 100% 滿分是真的嗎？

有爭議。100% 是 hybrid v4 模式針對失敗題目修改後重測的結果，修正後的 raw 分數為 96.6%。

哪些 AI 模型可以搭配 MemPalace？

透過 MCP 可搭配 Claude、ChatGPT 等主流模型，也支援 Ollama 等本地模型。

MemPalace 和 Mem0、Zep 比較哪個好？

MemPalace 免費開源且召回率最高，但穩定性和實際問答正確率仍需驗證；Mem0 和 Zep 是付費方案但更成熟穩定。

延伸閱讀 / Related Articles

AI 工具觀察站 — 每日精選 AI Agent 與工具趨勢
AI Tool Observer — Daily curated AI Agent & tool trends

Stanford 研究登上《Science》：11 個 AI 模型有 47% 機率說你對，即使你錯了 | Stanford Study in Science: AI Models Validate Harmful Behavior 47% of the Time — Sycophancy Is a Real Problem

3月 28, 2026

閱讀完整內容

搜尋此網誌

AI小貼士