Back to Articles

MemPalace: The AI Memory System That Refuses to Forget (or Summarize) Anything

[ View on GitHub ]

MemPalace: The AI Memory System That Refuses to Forget (or Summarize) Anything

Hook

Most AI memory systems throw away your original words within seconds, replacing them with compressed summaries. MemPalace takes the opposite bet: what if the cure for forgetful AI assistants is simply refusing to delete anything?

Context

If you've used Claude, ChatGPT, or any AI coding assistant for more than a few sessions, you've hit the context window wall. The AI forgets your project conventions, repeats questions about your architecture, and reintroduces bugs you fixed three conversations ago. The industry's answer has been memory systems—tools that extract 'key facts' and store compressed summaries to feed back into future conversations.

But summarization introduces a lossy compression problem. When an LLM condenses 'We decided against using Redis because the team lacks ops experience and the latency benefit is negligible for our read-heavy workload' into 'Project does not use Redis,' you lose the reasoning. Future retrieval returns the fact without context, and the AI makes confident but misaligned suggestions. MemPalace rejects this paradigm entirely. Instead of summarizing, it stores every conversation verbatim, organizes it with a spatial metaphor borrowed from ancient memory techniques, and uses hybrid search to surface exactly what you said—word-for-word—when you said it.

Technical Insight

MemPalace's architecture revolves around three design pillars: verbatim storage, structured organization via the palace metaphor, and a hybrid retrieval pipeline that combines semantic search with temporal and keyword signals.

The palace metaphor divides memory into wings (projects or people), rooms (topics within wings), and drawers (individual conversation chunks or code snippets). This isn't just whimsical naming—it's a scoped search strategy. When you query for authentication logic, MemPalace searches only the relevant wing (your current project) and room (backend architecture), rather than scanning a flat corpus of every conversation you've ever had. This dramatically reduces false positives in multi-project environments.

Under the hood, MemPalace uses ChromaDB by default for vector storage, paired with local embeddings (no API keys required). Every conversation chunk gets embedded and stored alongside metadata: timestamp, wing/room IDs, speaker, and keyword tags. The retrieval pipeline ranks results using a weighted combination of cosine similarity, recency decay, and BM25 keyword matching. Here's how you'd initialize a wing and store a conversation:

from mempalace import Palace, Wing, Room

# Initialize palace with local ChromaDB backend
palace = Palace(backend="chromadb", persist_directory="./my_palace")

# Create a wing for your project
project_wing = palace.create_wing(
    name="acme-api",
    description="E-commerce API rewrite"
)

# Create a room for authentication discussions
auth_room = project_wing.create_room(
    name="auth-architecture",
    description="Authentication and session management decisions"
)

# Store a conversation chunk verbatim
auth_room.add_memory(
    content="We're using JWT with refresh tokens. Access tokens expire in 15m, refresh in 7d. Decided against session cookies because the mobile app needs stateless auth.",
    speaker="alice",
    tags=["jwt", "tokens", "mobile"]
)

# Later: retrieve with hybrid search
results = auth_room.search(
    query="Why aren't we using session cookies?",
    top_k=5,
    rerank=True  # Optional LLM reranking for precision
)

for result in results:
    print(f"[{result.timestamp}] {result.content}")
    print(f"Score: {result.score}\n")

The search() method returns chunks ranked by relevance, but critically, it returns the exact original text. No paraphrasing, no summarization. If Alice said 'mobile app needs stateless auth,' that's what you get back—not a rephrased version that might miss nuance.

MemPalace's killer feature is its MCP (Model Context Protocol) integration. It exposes 29 tools that Claude Code and compatible assistants can invoke directly: create_wing, search_room, list_recent_memories, find_related_entities, and more. The auto-save hook is particularly clever—it automatically captures Claude's responses and your prompts into the appropriate wing/room without manual intervention:

from mempalace.integrations import MCPServer

# Start MCP server for Claude Code
server = MCPServer(palace=palace)
server.enable_autosave(
    wing="acme-api",
    room="current-session",
    save_assistant_responses=True,
    save_user_prompts=True
)
server.start(port=8765)

Now every interaction with Claude gets filed into your palace automatically. When you return to the project weeks later, Claude can query the palace for 'what did we decide about caching?' and retrieve your exact conversation—timestamps, reasoning, and all.

The temporal knowledge graph adds another layer. MemPalace maintains a SQLite database tracking entities (people, files, concepts) and their relationships over time. When you mention 'the Redis decision' across three different conversations, the graph links those references, enabling queries like 'show me all discussions where Alice disagreed with a technical decision.' This graph is separate from the vector store, providing deterministic relationship queries without embedding drift.

Benchmark transparency is rare in AI tooling, but MemPalace commits reproducible test suites to the repo. The 96.6% R@5 on LongMemEval—a conversational recall benchmark—was achieved using only semantic search, no LLM calls for reranking or summarization. That's crucial: you get best-in-class retrieval without burning API credits or leaking data to third parties.

Gotcha

Verbatim storage is MemPalace's philosophical strength and its operational Achilles' heel. A single multi-month project with daily AI pair-programming sessions can generate tens of gigabytes of conversation text. Unlike summarization-based systems that compress aggressively, MemPalace keeps everything. The repo provides no built-in pruning, archival, or compression strategies—you're expected to manage storage yourself or accept the bloat. For teams with compliance requirements around data retention, this could be a feature, but for individual developers on laptops, it's a ticking storage bomb.

The palace metaphor, while elegant for organization, introduces mandatory curation overhead. You can't just 'turn on' MemPalace and let it auto-organize your life. Every conversation needs a wing and room assignment. The system won't catastrophically fail if you dump everything into a single generic room, but you'll lose the scoped search advantages that make the architecture valuable. This works beautifully for disciplined teams with clear project boundaries; it's friction for solo developers juggling dozens of exploratory side projects.

Finally, the benchmarks focus heavily on conversational recall—'What did Alice say about authentication?'—but don't separately validate code-specific retrieval patterns. Developers often need narrow queries like 'Show me the error handling pattern we used for database timeouts' or 'Find that utility function for parsing ISO dates.' It's unclear whether MemPalace's semantic search handles these code-centric queries as effectively as prose conversations. The lack of code-aware chunking strategies (respecting function boundaries, docstrings, imports) suggests it may struggle with fragmented code snippets.

Verdict

Use MemPalace if you're running long-term AI assistant workflows (especially Claude Code) where context accuracy matters more than storage costs, you value privacy and want zero-API-key operation with local embeddings, or you manage multiple projects and need scoped search to avoid cross-contamination. The verbatim approach and benchmark transparency make it ideal for teams that need auditability—knowing exactly what the AI 'remembers' without summarization drift. Skip it if you need lightweight memory for ephemeral chats or single-session use cases where the palace structure adds more friction than value, you're storage-constrained and prefer aggressive summarization to keep data lean, or you require proven code-specific retrieval patterns rather than general conversational recall. For solo developers, the curation overhead may outweigh benefits unless you're deeply committed to structured knowledge management.

// ADD TO YOUR README
[![Featured on Starlog](https://starlog.is/api/badge/llm-engineering/mempalace-mempalace.svg)](https://starlog.is/api/badge-click/llm-engineering/mempalace-mempalace)