Your AI assistant never forgets
— and doesn't need OpenAI to remember
We removed the biggest barrier to AI memory: the mandatory API key. Your coding assistant now remembers everything it learned — zero config, works offline, 3ms recall.
The real problem: every session starts from zero
You spent 45 minutes debugging a deploy issue with Claude. It found the fix, you shipped it, session over. Next day, same type of problem. New chat window. Claude has no idea what happened yesterday. It reads your entire codebase again, tries the same wrong approaches, and you spend another 30 minutes getting back to where you were.
This is the daily reality for every developer using AI coding assistants. Sessions are disposable. Knowledge is lost. Even if your assistant is brilliant within a session, it has the memory of a goldfish across sessions.
Before and after
| Before | After (Cachly Brain) | |
|---|---|---|
| New session | Reads entire codebase, asks same questions | Gets full briefing in one call: what happened, what's left, what worked |
| Same bug twice | Re-investigates from scratch | Recalls the fix in 3ms, applies it immediately |
| Chat window limit | 'Continue' = lost context, broken code, skipped tasks | Handoff saves TODOs, progress, broken files — next window picks up exactly |
| Team knowledge | Locked in one developer's head | Shared brain: everyone's fixes are everyone's knowledge |
| Setup | Need OpenAI API key + embedding config | Zero config. Works immediately. |
| Recall speed | ~2s (embedding API call + vector search) | 3ms (keyword search, no network) |
How we solved it: two search layers
Most AI memory systems rely entirely on vector embeddings — you send text to an API (OpenAI, Cohere, etc.), get a vector back, store it, and search by cosine similarity. This works well for semantic understanding but comes with a hard requirement: an API key.
We added a second search layer that runs without any external dependency. It uses BM25+ — the same algorithm family that powers Elasticsearch, Apache Solr, and every major search engine — adapted for the small, personal knowledge base that an AI brain typically holds (10-500 documents).
# Before: needed this in your .env
OPENAI_API_KEY=sk-... # $0.0001 per recall, requires internet
# After: just works
npx @cachly-dev/mcp-server setup
# Brain is active. No key needed. Done.How it compares
| Elasticsearch | pgvector | Cachly Brain | |
|---|---|---|---|
| Setup | JVM + cluster + config | PostgreSQL + extension | npm install, done |
| Designed for | Millions of documents | Vector similarity | 10-500 personal docs |
| Latency | 10-50ms | 5-20ms | 2-3ms |
| External deps | JVM, 1GB+ RAM | Embedding API key | None |
| Fuzzy/typos | Yes (configurable) | Inherent (vector) | Yes (Levenshtein ≤ 2) |
| Semantic | BM25 only | Yes | Optional (both layers) |
| Recency boost | Manual function_score | No | Built-in (7-day half-life) |
The key insight: AI memory is not web search. You're not searching millions of documents — you're searching your 50 lessons and 30 context entries. At that scale, a well-tuned BM25+ with fuzzy matching beats the overhead of Elasticsearch and doesn't need the API key that pgvector requires.
The new handoff: context survives across chat windows
While we were at it, we solved another pain point: what happens when you hit the chat window limit? Before, "continue" meant praying the next window would figure out what was going on. Half-finished files, skipped tasks, broken code.
Now there's a session_handoff tool. Before closing a window, the assistant stores: which tasks are done, which remain, which files are partially edited, and specific instructions for the next window. The next session_start loads all of this automatically.
# What the next window sees:
🤝 Handoff from previous window (12m ago):
⏳ Remaining tasks:
- Write tests for brain-search handler
- Deploy to production
✅ Already done: API endpoint, SDK types, changelog
⚠️ Needs fix: handler.go (partial: missing error handling)
📝 Instructions: Run go test before deployingGet started in 30 seconds
npx @cachly-dev/mcp-server setupThat's it. The wizard detects your editors (VS Code, Cursor, Claude, Windsurf), writes the MCP config, and your AI assistant starts building memory from the first session. No API keys, no database setup, no configuration files.
If you later add an OpenAI key (or Mistral, Cohere, Gemini, Ollama), semantic vector search activates automatically on top. But it's never required.
The brain is always on. Every session makes your AI assistant smarter. Every fix it remembers is a fix you never debug twice. That's the point.