cachly is a persistent AI memory platform for developers. It gives AI coding assistants like Claude Code, Cursor, GitHub Copilot and Windsurf a brain that remembers every lesson, fix, and architecture decision — forever. It connects via the MCP (Model Context Protocol) standard and includes 121 tools. Free tier available. Runs on German (EU) servers.

How does cachly work?

Run 'npx @cachly-dev/mcp-server@latest autopilot' once. The wizard auto-detects every AI editor you have installed (Claude Code, Cursor, Copilot, Windsurf, Cline, Zed) and writes the correct config for each. It then reads your entire git history with brain_from_git and loads years of team knowledge into your Brain before your first session. From that point, sessions start automatically, memory is shared across all your editors simultaneously, and a git post-commit hook teaches cachly from every commit.

Does cachly auto-detect my editors?

Yes. The cachly setup wizard automatically detects Claude Code, Cursor, GitHub Copilot, Windsurf, Cline, Zed, and Continue.dev — any editor that supports MCP. It writes the correct config file for each editor in one pass. You never manually edit JSON config files.

Is memory shared across all my AI editors?

Yes. cachly uses a single Brain that all your AI editors connect to simultaneously. A lesson remembered in Claude Code is instantly available in Cursor and GitHub Copilot. If your team uses different editors, all of you share the same persistent memory pool.

What is brain_from_git?

brain_from_git is a cachly tool that reads your entire git history before your first session and extracts lessons from every commit, PR, and revert. Your AI arrives knowing years of architectural decisions, bug fixes, and team conventions — without you writing a single line of documentation. Zero onboarding.

What is causal_trace?

causal_trace is a cachly tool that traces the history of any file or bug across your entire git history in seconds — replacing 30+ minutes of manual git blame. Describe a problem in plain English. It returns the root cause, the failure chain, and the exact fix that worked — with date, command, and file path.

What is brain_predict?

brain_predict is a cachly tool that scans your Brain for failure patterns before every deploy, migration, or dependency upgrade. It returns probability-weighted warnings based on your team's actual incident history — so you catch the next incident before it happens.

Does cachly work with Claude Code, Cursor, and GitHub Copilot?

Yes. cachly works with Claude Code, Cursor, GitHub Copilot, Windsurf, Cline, Zed, and Continue.dev — anywhere that supports MCP. Run 'npx @cachly-dev/mcp-server@latest autopilot' to configure all editors in one step. Memory is shared across all editors simultaneously.

Can cachly search memory across languages?

Yes. cachly uses semantic vector embeddings, not keyword search. A lesson stored in German appears when you search in English. A fix documented in Arabic matches a Japanese query about the same bug pattern. Supported languages include English, German, French, Spanish, Italian, Portuguese, Japanese, Chinese (Simplified and Traditional), Korean, Arabic, Hebrew, and more.

How is cachly different from mem0?

mem0 is a memory layer for Python LLM apps and chatbots — great for building AI products. cachly is built specifically for developer tooling: it connects to your AI editor via MCP, learns from your git history automatically, predicts failures before deploy, and gives your whole team shared memory. cachly runs on EU servers and is GDPR-native. For developers using Claude Code, Cursor, or Copilot, cachly is the right choice.

Is cachly GDPR compliant?

Yes. cachly runs exclusively on German servers (Hetzner). All data stays in the EU. No data is shared with third parties. cachly is fully GDPR compliant. An AVV (Auftragsverarbeitungsvertrag / Data Processing Agreement) is available for Business and Enterprise customers.

Memory Crystals: Distilling Team Knowledge into Instant AI Context

The scaling problem with AI memory

When we first built the AI Brain, we were focused on a simple problem: AI assistants don't remember anything between sessions. The fix was straightforward — store lessons, retrieve them at session start.

But as teams use the Brain over months, a new problem emerges. The Brain gets large. Not unusably large — a well-curated Brain with hundreds of lessons is genuinely valuable. But you can't inject hundreds of lessons into every session start. Context windows have limits, and beyond a certain size, the noise starts to overwhelm the signal.

The naive solution — just surface the top N lessons by severity and recall frequency — works up to a point. But it misses something: the lessons aren't independent. They have relationships, themes, and patterns that aren't visible when you look at them one by one. A good briefing isn't a list of facts — it's a coherent picture.

What a Crystal is

A Memory Crystal is a distillation of the entire Brain — or a subset of it — into a single dense document. Not a summary in the "here are the main points" sense, but a synthesized knowledge artifact that captures the patterns, the critical rules, the architectural decisions, and the hard-won lessons in a form optimized for rapid AI consumption.

Think of it like the difference between a textbook and a cheat sheet written by someone who has aced the exam. The textbook has everything; the cheat sheet has what actually matters, organized for fast recall under pressure.

Crystals are created explicitly — you decide when the Brain has enough accumulated knowledge to be worth distilling. Once created, a Crystal is injected at the very start of every session, before any task-specific context is loaded. Every AI assistant on your team starts every session pre-loaded with the Crystal.

The token economics

Context window space is a real constraint. Every token you spend loading context is a token you're not spending on actual work. This is why naive memory approaches fail at scale — they're not efficient with context.

Crystals are specifically optimized for token density. The distillation process isn't just compression — it's restructuring. Redundant lessons are merged. Contradictory lessons are resolved (the more recent, higher-severity version wins). Related lessons are grouped so the AI can process them as a unit rather than as isolated facts.

150+

typical lessons in a mature Brain

accumulated over weeks of use

Crystal injection per session

dense, pre-distilled, optimized

~60%

token reduction vs. raw lessons

same knowledge, less context overhead

Freshness: when to re-crystallize

A Crystal is a snapshot. As your team keeps learning and adding to the Brain, the Crystal gets stale. The lessons it doesn't yet contain are still available via normal retrieval — but they aren't in the pre-loaded context.

We solved this with two mechanisms. First, brain_doctor reports Crystal age and estimates freshness based on how many new lessons have been added since the last crystallization. When freshness drops below a threshold, it surfaces a recommendation.

Second, re-crystallization is cheap and fast. It's not a heavy operation that requires careful timing — you can re-crystallize as often as makes sense for your team's pace. Weekly is a reasonable cadence for most teams; daily is fine for fast-moving projects.

Scoped Crystals for large teams

As organizations get larger, a single Brain Crystal covers too much ground. A backend developer joining an incident investigation doesn't need the Crystal loaded with frontend styling conventions.

Crystals can be scoped by topic prefix — so the backend team might maintain a Crystal covering infrastructure, deployment, and API lessons, while the frontend team maintains one covering build tooling, performance patterns, and design system conventions. At session start, the right Crystal loads based on the session's declared focus area.

This scoping also helps with onboarding: new team members can choose to load a broader Crystal that covers the whole system at a higher level, then narrow to scoped Crystals as they specialize.

The analogy that helped us design it

When we were designing Crystals, we kept coming back to how expert developers think. A senior engineer who has been on a codebase for three years doesn't consciously recall every past incident before making a decision. They have pattern-matched, condensed versions of those experiences sitting in working memory — immediately accessible, highly compressed, ready to apply.

That's what a Crystal is for an AI assistant. Not a filing cabinet to search through, but a loaded mental model. The AI arrives at each session with the team's accumulated wisdom already active, in the same way a senior developer carries years of experience into every code review.

The Brain stores everything. The Crystal makes the most important parts of everything instantly available. Together, they give AI assistants something closer to genuine expertise — not just recall, but judgment.

Memory Crystals: distilling team knowledge
into instant AI context