cachly is a persistent AI memory platform for developers. It gives AI coding assistants like Claude Code, Cursor, GitHub Copilot and Windsurf a brain that remembers every lesson, fix, and architecture decision — forever. It connects via the MCP (Model Context Protocol) standard and includes 126 MCP tools. Free tier available. Runs on German (EU) servers.

How does cachly work?

Run 'npx @cachly-dev/mcp-server@latest autopilot' once. The wizard auto-detects every AI editor you have installed (Claude Code, Cursor, Copilot, Windsurf, Cline, Zed) and writes the correct config for each. It then reads your entire git history with brain_from_git and loads years of team knowledge into your Brain before your first session. From that point, sessions start automatically, memory is shared across all your editors simultaneously, and a git post-commit hook teaches cachly from every commit.

Does cachly auto-detect my editors?

Yes. The cachly setup wizard automatically detects Claude Code, Cursor, GitHub Copilot, Windsurf, Cline, Zed, and Continue.dev — any editor that supports MCP. It writes the correct config file for each editor in one pass. You never manually edit JSON config files.

Is memory shared across all my AI editors?

Yes. cachly uses a single Brain that all your AI editors connect to simultaneously. A lesson remembered in Claude Code is instantly available in Cursor and GitHub Copilot. If your team uses different editors, all of you share the same persistent memory pool.

What is brain_from_git?

brain_from_git is a cachly tool that reads your entire git history before your first session and extracts lessons from every commit, PR, and revert. Your AI arrives knowing years of architectural decisions, bug fixes, and team conventions — without you writing a single line of documentation. Zero onboarding.

What is causal_trace?

causal_trace is a cachly tool that traces the history of any file or bug across your entire git history in seconds — replacing 30+ minutes of manual git blame. Describe a problem in plain English. It returns the root cause, the failure chain, and the exact fix that worked — with date, command, and file path.

What is brain_predict?

brain_predict is a cachly tool that scans your Brain for failure patterns before every deploy, migration, or dependency upgrade. It returns probability-weighted warnings based on your team's actual incident history — so you catch the next incident before it happens.

Does cachly work with Claude Code, Cursor, and GitHub Copilot?

Yes. cachly works with Claude Code, Cursor, GitHub Copilot, Windsurf, Cline, Zed, and Continue.dev — anywhere that supports MCP. Run 'npx @cachly-dev/mcp-server@latest autopilot' to configure all editors in one step. Memory is shared across all editors simultaneously.

Can cachly search memory across languages?

Yes. cachly uses semantic vector embeddings, not keyword search. A lesson stored in German appears when you search in English. A fix documented in Arabic matches a Japanese query about the same bug pattern. Supported languages include English, German, French, Spanish, Italian, Portuguese, Japanese, Chinese (Simplified and Traditional), Korean, Arabic, Hebrew, and more.

How is cachly different from mem0?

mem0 is a memory layer for Python LLM apps and chatbots — great for building AI products. cachly is built specifically for developer tooling: it connects to your AI editor via MCP, learns from your git history automatically, predicts failures before deploy, and gives your whole team shared memory. cachly runs on EU servers and is GDPR-native. For developers using Claude Code, Cursor, or Copilot, cachly is the right choice.

Is cachly GDPR compliant?

Yes. cachly runs exclusively on German servers (Hetzner). All data stays in the EU. No data is shared with third parties. cachly is fully GDPR compliant. An AVV (Auftragsverarbeitungsvertrag / Data Processing Agreement) is available for Business and Enterprise customers.

SEMANTIC CACHE

Stop paying for the
same answer twice.

Cachly Semantic Cache serves LLM responses based on what a prompt means — not whether it matches character-for-character. pgvector HNSW similarity search, sub-100ms lookups, GDPR-compliant on German servers.

Start caching free →Read the docs

No credit card · €0 to start

Why semantic beats exact-match

A traditional cache only helps when the input is byte-identical. Real prompts never are.

🎯

Match on meaning, not strings

Two prompts that mean the same thing get the same cached answer. Embeddings + pgvector HNSW find the nearest prior response above your similarity threshold.

💸

Every hit is a token you don't pay for

Repeated and near-duplicate prompts are the bulk of most LLM bills. Serve them from cache and cut spend on your most expensive model calls.

⚡

Sub-100ms lookups

HNSW approximate-nearest-neighbour search returns a hit in single-digit milliseconds — far faster (and cheaper) than a fresh completion.

🇩🇪

GDPR by default

Cached prompts and completions stay on German Hetzner servers. Client-side PII redaction keeps sensitive strings out of the cache entirely.

How it works

Point your client at Cachly

Wrap your existing LLM call with the Cachly SDK or route through the MCP server. No prompt changes.

We embed & search

Each prompt is embedded and compared against prior prompts with pgvector HNSW. Above threshold → cache hit.

Serve or store

A hit returns the stored completion in milliseconds. A miss calls your model, then caches the answer for next time.

Pricing that scales with your cache

Start free, upgrade for more storage and Dragonfly-backed sub-millisecond recall. Same account, same dashboard as every Cachly product.

Dev

€19/mo

200 MB cache storage
pgvector HNSW similarity search
Sub-100ms lookups
€15/mo billed annually

Get started →

Pro

Popular

€49/mo

900 MB cache storage
pgvector HNSW similarity search
Sub-100ms lookups
€39/mo billed annually

Get started →

Speed

€79/mo

900 MB cache storage
pgvector HNSW similarity search
Dragonfly · sub-ms recall
€63/mo billed annually

Get started →

Need a production HA cluster? See all plans →

🧠

Already caching? Give your team a shared Brain.

The same platform that caches your LLM calls can remember what your whole team has learned. Team Brain turns every fix, decision, and gotcha into auditable shared memory your AI tools recall automatically.

Explore Team Brain →

Stop paying for thesame answer twice.

Why semantic beats exact-match

Match on meaning, not strings

Every hit is a token you don't pay for

Sub-100ms lookups

GDPR by default

How it works

Point your client at Cachly

We embed & search

Serve or store

Pricing that scales with your cache

Dev

Pro

Speed

Already caching? Give your team a shared Brain.

Stop paying for the
same answer twice.