Claude-Mem: Giving Claude Code a long-term memory

You close a Claude Code session, open a new one, and the model has no idea what you spent the last three hours doing. You re-explain your project structure. You re-explain your naming conventions. You re-explain the database schema decision you already talked through twice. It’s like working with someone who has amnesia between meetings.

Claude-Mem is a plugin that tries to fix this. It records what Claude does during a session, compresses those observations, and feeds the compressed version back when you start a new session. Memory that outlasts the conversation window.

How it actually works

Five lifecycle hooks get installed into Claude Code: session start, prompt submission, after tool use, on stop, and session end. Each one captures what’s happening and writes it to a local SQLite database. Between sessions, an AI agent compresses the raw captures into summaries.

Next time you open a session, relevant history gets injected into context automatically. Claude just… knows what happened before. Or at least has a compressed version of it.

There’s a worker service on port 37777, running on Bun, with a search API (ten endpoints) and a web viewer for watching the memory stream build up. Underneath that sits a Chroma vector database doing hybrid semantic and keyword search. It’s a surprisingly deep stack for a plugin.

Progressive disclosure is the interesting bit

Raw session data is enormous. You can’t just shove it all back into context. You’d burn through your token budget before the conversation starts.

So Claude-Mem does retrieval in three layers. You get a compact index first, just observation IDs and short labels. From there you can pull up the timeline around a specific observation. Only when you actually need the full record does it fetch everything.

The project claims about 10x token savings over naive context injection. I haven’t verified that number, but the architecture makes it plausible. Filtering before fetching is the right instinct.

Getting it running

One command:

npx claude-mem install

That handles dependencies, plugin registration, and starts the worker. It’ll install Bun if you don’t have it, and sets up uv for the vector search layer.

Or install for Gemini CLI (auto-detects ~/.gemini):

npx claude-mem install --ide gemini-cli

Or install from the plugin marketplace inside Claude Code:

/plugin marketplace add thedotmack/claude-mem
/plugin install claude-mem

What gets stored

Every observation has a session ID, timestamp, and type. The compression step groups these into semantic summaries by topic and timeframe. Search indexes both raw observations and compressed versions, so you can query at whatever granularity you want.

Configuration lives in ~/.claude-mem/settings.json, where you pick the compression model, change the worker port, set logging levels, that sort of thing.

The privacy angle

You can wrap content in <private> tags and Claude-Mem skips it entirely. Good for sessions where you’re handling credentials or sensitive config. Not something every memory tool thinks about.

There’s also a beta channel with something called “Endless Mode,” though I couldn’t find much documentation on what it actually does.

Where it gets shaky

Bun as a runtime dependency is one more thing to keep updated. The SQLite database is local only, no cross-machine sync. Windows users apparently need to watch their Node.js PATH setup.

But the real question is whether AI-compressed summaries preserve the right information. Compression is lossy. The system might remember that you refactored the auth module but forget why you picked that specific approach over the other two you considered. The raw data is still in the database if you dig for it, but the default path goes through summaries, and summaries flatten nuance. I keep going back and forth on whether that tradeoff is acceptable.

Who this is for

If you work on the same project across many Claude Code sessions, this solves a genuine problem. The constant re-explaining adds up. Having continuity between sessions changes the way you think about each conversation, less throwaway, more iterative.

If your sessions are mostly one-off tasks, the overhead doesn’t make sense. A worker process, a database, a vector search layer. That’s a lot of infrastructure for context you’ll never look at again.

The project has over 1,500 commits and is AGPL-3.0 licensed (the ragtime directory uses a separate noncommercial license). Active development, active community.