cachly is a persistent AI memory platform for developers. It gives AI coding assistants like Claude Code, Cursor, GitHub Copilot and Windsurf a brain that remembers every lesson, fix, and architecture decision — forever. It connects via the MCP (Model Context Protocol) standard and includes 126 MCP tools. Free tier available. Runs on German (EU) servers.

How does cachly work?

Run 'npx @cachly-dev/mcp-server@latest autopilot' once. The wizard auto-detects every AI editor you have installed (Claude Code, Cursor, Copilot, Windsurf, Cline, Zed) and writes the correct config for each. It then reads your entire git history with brain_from_git and loads years of team knowledge into your Brain before your first session. From that point, sessions start automatically, memory is shared across all your editors simultaneously, and a git post-commit hook teaches cachly from every commit.

Does cachly auto-detect my editors?

Yes. The cachly setup wizard automatically detects Claude Code, Cursor, GitHub Copilot, Windsurf, Cline, Zed, and Continue.dev — any editor that supports MCP. It writes the correct config file for each editor in one pass. You never manually edit JSON config files.

Is memory shared across all my AI editors?

Yes. cachly uses a single Brain that all your AI editors connect to simultaneously. A lesson remembered in Claude Code is instantly available in Cursor and GitHub Copilot. If your team uses different editors, all of you share the same persistent memory pool.

What is brain_from_git?

brain_from_git is a cachly tool that reads your entire git history before your first session and extracts lessons from every commit, PR, and revert. Your AI arrives knowing years of architectural decisions, bug fixes, and team conventions — without you writing a single line of documentation. Zero onboarding.

What is causal_trace?

causal_trace is a cachly tool that traces the history of any file or bug across your entire git history in seconds — replacing 30+ minutes of manual git blame. Describe a problem in plain English. It returns the root cause, the failure chain, and the exact fix that worked — with date, command, and file path.

What is brain_predict?

brain_predict is a cachly tool that scans your Brain for failure patterns before every deploy, migration, or dependency upgrade. It returns probability-weighted warnings based on your team's actual incident history — so you catch the next incident before it happens.

Does cachly work with Claude Code, Cursor, and GitHub Copilot?

Yes. cachly works with Claude Code, Cursor, GitHub Copilot, Windsurf, Cline, Zed, and Continue.dev — anywhere that supports MCP. Run 'npx @cachly-dev/mcp-server@latest autopilot' to configure all editors in one step. Memory is shared across all editors simultaneously.

Can cachly search memory across languages?

Yes. cachly uses semantic vector embeddings, not keyword search. A lesson stored in German appears when you search in English. A fix documented in Arabic matches a Japanese query about the same bug pattern. Supported languages include English, German, French, Spanish, Italian, Portuguese, Japanese, Chinese (Simplified and Traditional), Korean, Arabic, Hebrew, and more.

How is cachly different from mem0?

mem0 is a memory layer for Python LLM apps and chatbots — great for building AI products. cachly is built specifically for developer tooling: it connects to your AI editor via MCP, learns from your git history automatically, predicts failures before deploy, and gives your whole team shared memory. cachly runs on EU servers and is GDPR-native. For developers using Claude Code, Cursor, or Copilot, cachly is the right choice.

Is cachly GDPR compliant?

Yes. cachly runs exclusively on German servers (Hetzner). All data stays in the EU. No data is shared with third parties. cachly is fully GDPR compliant. An AVV (Auftragsverarbeitungsvertrag / Data Processing Agreement) is available for Business and Enterprise customers.

AI MemoryProduct·April 18, 2026·5 min read

Your AI assistant never forgets
— and doesn't need OpenAI to remember

We removed the biggest barrier to AI memory: the mandatory API key. Your coding assistant now remembers everything it learned — zero config, works offline, 3ms recall.

The real problem: every session starts from zero

You spent 45 minutes debugging a deploy issue with Claude. It found the fix, you shipped it, session over. Next day, same type of problem. New chat window. Claude has no idea what happened yesterday. It reads your entire codebase again, tries the same wrong approaches, and you spend another 30 minutes getting back to where you were.

This is the daily reality for every developer using AI coding assistants. Sessions are disposable. Knowledge is lost. Even if your assistant is brilliant within a session, it has the memory of a goldfish across sessions.

Before and after

	Before	After (Cachly Brain)
New session	Reads entire codebase, asks same questions	Gets full briefing in one call: what happened, what's left, what worked
Same bug twice	Re-investigates from scratch	Recalls the fix in 3ms, applies it immediately
Chat window limit	'Continue' = lost context, broken code, skipped tasks	Handoff saves TODOs, progress, broken files — next window picks up exactly
Team knowledge	Locked in one developer's head	Shared brain: everyone's fixes are everyone's knowledge
Setup	Need OpenAI API key + embedding config	Zero config. Works immediately.
Recall speed	~2s (embedding API call + vector search)	3ms (keyword search, no network)

How we solved it: two search layers

Most AI memory systems rely entirely on vector embeddings — you send text to an API (OpenAI, Cohere, etc.), get a vector back, store it, and search by cosine similarity. This works well for semantic understanding but comes with a hard requirement: an API key.

We added a second search layer that runs without any external dependency. It uses BM25+ — the same algorithm family that powers Elasticsearch, Apache Solr, and every major search engine — adapted for the small, personal knowledge base that an AI brain typically holds (10-500 documents).

# Before: needed this in your .env
OPENAI_API_KEY=sk-...  # $0.0001 per recall, requires internet

# After: just works
npx @cachly-dev/mcp-server setup
# Brain is active. No key needed. Done.

💡Embeddings still work and improveresults when configured. But they're optional now. The brain is always on, even without them.

How it compares

	Elasticsearch	pgvector	Cachly Brain
Setup	JVM + cluster + config	PostgreSQL + extension	npm install, done
Designed for	Millions of documents	Vector similarity	10-500 personal docs
Latency	10-50ms	5-20ms	2-3ms
External deps	JVM, 1GB+ RAM	Embedding API key	None
Fuzzy/typos	Yes (configurable)	Inherent (vector)	Yes (Levenshtein ≤ 2)
Semantic	BM25 only	Yes	Optional (both layers)
Recency boost	Manual function_score	No	Built-in (7-day half-life)

The key insight: AI memory is not web search. You're not searching millions of documents — you're searching your 50 lessons and 30 context entries. At that scale, a well-tuned BM25+ with fuzzy matching beats the overhead of Elasticsearch and doesn't need the API key that pgvector requires.

The new handoff: context survives across chat windows

While we were at it, we solved another pain point: what happens when you hit the chat window limit? Before, "continue" meant praying the next window would figure out what was going on. Half-finished files, skipped tasks, broken code.

Now there's a session_handoff tool. Before closing a window, the assistant stores: which tasks are done, which remain, which files are partially edited, and specific instructions for the next window. The next session_start loads all of this automatically.

# What the next window sees:
🤝 Handoff from previous window (12m ago):
   ⏳ Remaining tasks:
     - Write tests for brain-search handler
     - Deploy to production
   ✅ Already done: API endpoint, SDK types, changelog
   ⚠️ Needs fix: handler.go (partial: missing error handling)
   📝 Instructions: Run go test before deploying

Get started in 30 seconds

npx @cachly-dev/mcp-server setup

That's it. The wizard detects your editors (VS Code, Cursor, Claude, Windsurf), writes the MCP config, and your AI assistant starts building memory from the first session. No API keys, no database setup, no configuration files.

If you later add an OpenAI key (or Mistral, Cohere, Gemini, Ollama), semantic vector search activates automatically on top. But it's never required.

cachly is a persistent AI Brain for developers — memory shared across Claude Code, Cursor, GitHub Copilot & Windsurf simultaneously. Auto-detects every editor. Bootstraps from your git history. 126 MCP tools. Free tier, EU servers, no credit card.

Your AI is forgetting everything right now.

Every session starts blank. Every bug re-discovered. Every deploy procedure re-explained. cachly fixes that in 30 seconds — your AI remembers every lesson, every fix, every teammate's hard-won knowledge. Forever.

Give your AI a memory →See how it works

🇪🇺 EU servers · GDPR-compliant🆓 Free tier — forever, no credit card⚡ 30-second setup via npx🔌 Claude Code · Cursor · Copilot · Windsurf