Back to Articles

MemPalace: The Local-First AI Memory System That Remembers Everything

[ View on GitHub ]

MemPalace: The Local-First AI Memory System That Remembers Everything

Hook

What if the best AI memory system doesn’t use AI to decide what’s important? MemPalace achieves the highest benchmark score ever recorded for free memory systems by storing everything and letting structure do the work.

Context

Every conversation with an AI assistant is ephemeral by default. Claude forgets what you discussed yesterday. ChatGPT treats each session as a blank slate unless you pay for memory features. Local LLMs have context windows measured in tokens, not months. The industry’s answer has been AI-curated summaries—let the model compress your conversations into ‘important highlights’ and retrieve those when relevant.

But this approach has a fatal flaw: the AI decides what matters before you know what you’ll need. That joke you made about database migrations? Summarized away. The specific error message you mentioned in passing? Lost to compression. The conventional wisdom says you must sacrifice completeness for relevance, that storing everything creates noise. MemPalace rejects this entirely. It’s a local-first memory system that stores every word you’ve ever said to an AI, compresses it 30x using a custom format called AAAK, and organizes it into a spatial hierarchy that achieved 96.6% R@5 on LongMemEval—the highest score for any free memory system. No cloud. No subscriptions. No AI deciding what you should remember.

Technical Insight

Raw Text

Structured Data

Original Content

30x Compressed

Compressed Summaries

Spatial Filter

Vector Search

Exact Content

MCP Protocol

CLI/API

Input Sources

(Conversations, Code, Docs)

Data Miners

(Conversation/Code/Classification)

Spatial Hierarchy

(Wing → Hall → Room)

Drawers

(Verbatim Storage)

AAAK Compressor

Closets

(ChromaDB Vectors)

Query Input

Retrieval Engine

Cloud AI Assistants

Local Models

System architecture — auto-generated

The architecture is deliberately un-magical. MemPalace organizes memory as a spatial metaphor with five layers: wings (projects or people), halls (memory types like conversations or code), rooms (specific topics), closets (compressed summaries), and drawers (original verbatim content). This isn’t just cute naming—it’s a retrieval strategy. When you query memory, the system narrows spatially: which wing → which hall → which room → which closet → which drawer. Each layer filters the search space before vector similarity ever runs.

The data mining happens in three phases. First, it ingests conversation exports from Claude, ChatGPT, or Slack, parsing them into structured exchanges. Second, it mines code repositories, extracting READMEs, docstrings, and commit messages. Third, it processes general classifications—documents, notes, anything text-based. Everything gets stored twice: once in its original form (drawers) and once in AAAK-compressed form (closets).

Here’s how you’d set up a basic memory palace and add a conversation:

from mempalace import Palace, Wing, Hall, Room
from mempalace.miners import ConversationMiner

# Initialize the palace with ChromaDB backend
palace = Palace(
    storage_path="./my_memory",
    compression_format="aaak",
    embed_model="all-MiniLM-L6-v2"  # Local embedding model
)

# Create structure: wing for work projects, hall for AI chats
work_wing = palace.add_wing("work_projects")
ai_hall = work_wing.add_hall("ai_conversations", memory_type="chat")

# Mine conversations from export file
miner = ConversationMiner(source="claude")
conversations = miner.parse("./claude_export.json")

for conv in conversations:
    # Extract topic and create/find appropriate room
    topic = conv.infer_topic()  # Uses local LLM or keyword extraction
    room = ai_hall.get_or_create_room(topic)
    
    # Store verbatim (drawer) and compressed (closet)
    room.add_drawer(conv.full_text, metadata=conv.metadata)
    room.add_closet(conv.compress("aaak"), summary_level="full")

# Query across all memory
results = palace.search(
    query="What did I say about database migration errors?",
    top_k=5,
    search_strategy="hierarchical"  # Narrows by wing→hall→room first
)

for result in results:
    print(f"Wing: {result.wing} | Room: {result.room}")
    print(f"Relevance: {result.score}")
    print(f"Content: {result.decompressed_text[:200]}...")

The AAAK compression format is where things get interesting. Traditional summarization uses an LLM to paraphrase content into fewer tokens. AAAK instead creates a shorthand dialect using aggressive abbreviations, symbol substitution, and structural compression. A sentence like “The database migration failed because the foreign key constraint wasn’t satisfied” becomes “DB migr fail: FK constr !sat”. The claim is 30x compression with zero information loss—a controversial assertion that means any LLM can expand the compressed text back to its original semantic meaning without special training.

The hierarchical search strategy is what drives the benchmark performance. Instead of dumping all vectors into ChromaDB and hoping similarity search finds the right needle, MemPalace first filters structurally. If you query about “API design decisions,” it identifies that this relates to your “backend_services” wing, the “technical_discussions” hall, and the “architecture” room before running vector search only within that subset. This structural pre-filtering is why the system achieves 34% better retrieval than flat vector stores according to the repo’s benchmarks.

For MCP integration (Model Context Protocol), MemPalace exposes tools that Claude, ChatGPT, or Cursor can call:

from mempalace.mcp import MCPServer

# Start MCP server for tool-based access
server = MCPServer(palace)
server.register_tools([
    "memory_search",      # Search across palace
    "add_memory",         # Store new information
    "get_room_context",   # Retrieve full room contents
    "list_wings"          # Browse structure
])

server.start(host="localhost", port=3000)

Cloud AI assistants call these tools during conversation. When you ask Claude “What were my thoughts on async Python?”, Claude calls memory_search, receives compressed results, decompresses them in context, and incorporates that history into its response. The entire retrieval loop happens locally—no data leaves your machine.

The zero-API-dependency claim is literal. Embeddings use sentence-transformers locally. Compression doesn’t call OpenAI. Retrieval is pure ChromaDB. You can run MemPalace on a laptop with the network cable unplugged and it works identically to the cloud-connected version. This is the philosophical core: memory should be private infrastructure, not a SaaS subscription.

Gotcha

The AAAK compression claims strain credulity. Thirty-times compression with zero information loss is an extraordinary assertion that requires extraordinary evidence. The repo shows examples of compressed text, but there’s no independent verification, no academic paper, no systematic evaluation across diverse content types. Can it really compress a nuanced philosophical discussion 30x and have GPT-4 reconstruct the original semantics perfectly? Does it work equally well on code, poetry, and technical documentation? Until peer review confirms these numbers, treat AAAK as ‘promising shorthand format’ rather than ‘proven lossless compression.’ The entire value proposition rests on this unverified foundation.

The palace metaphor creates cognitive overhead that flat memory systems avoid. You must decide: is this conversation about project X or person Y? Does it belong in the ‘technical’ hall or ‘planning’ hall? What granularity should rooms have? These decisions matter because they affect retrieval. Put a conversation in the wrong room and the hierarchical search might miss it. The system provides tools to reorganize, but maintaining a coherent spatial structure across thousands of conversations requires discipline. Flat vector stores don’t care about your ontology—they just embed and retrieve. MemPalace demands you be an active librarian of your own memory.

Conversation mining supports Claude, ChatGPT, and Slack exports, but real chat history is messier. What about Discord threads? Telegram groups? Linear comments? In-person meetings you transcribed? The parsers expect specific JSON formats. Extending support means writing custom miners, and the repo’s documentation on the miner interface is sparse. If your memory lives in unsupported formats, you’re looking at implementation work before you can even start using the system.

Verdict

Use MemPalace if you’re committed to local-first AI workflows and the idea of subscription-based memory services offends you philosophically. It’s ideal for privacy-sensitive work (legal, medical, proprietary research), for users running local LLMs who want persistent memory without cloud dependencies, and for anyone willing to invest setup time to own their AI memory infrastructure completely. The hierarchical organization and total retention approach are genuinely novel, and the benchmark scores suggest the architectural decisions work. Skip it if you need plug-and-play simplicity, if the palace metaphor feels like unnecessary complexity for your use case, or if you’re skeptical of the AAAK compression claims and want proven technology. Also skip if your conversations live in formats beyond the supported exporters—you’ll spend more time writing parsers than benefiting from memory. The local-only constraint is either the best feature or a dealbreaker depending on your threat model and infrastructure preferences.

// ADD TO YOUR README
[![Featured on Starlog](https://starlog.is/api/badge/llm-engineering/milla-jovovich-mempalace.svg)](https://starlog.is/api/badge-click/llm-engineering/milla-jovovich-mempalace)