cachly is a persistent AI memory platform for developers. It gives AI coding assistants like Claude Code, Cursor, GitHub Copilot and Windsurf a brain that remembers every lesson, fix, and architecture decision — forever. It connects via the MCP (Model Context Protocol) standard and includes 121 tools. Free tier available. Runs on German (EU) servers.

How does cachly work?

Run 'npx @cachly-dev/mcp-server@latest autopilot' once. The wizard auto-detects every AI editor you have installed (Claude Code, Cursor, Copilot, Windsurf, Cline, Zed) and writes the correct config for each. It then reads your entire git history with brain_from_git and loads years of team knowledge into your Brain before your first session. From that point, sessions start automatically, memory is shared across all your editors simultaneously, and a git post-commit hook teaches cachly from every commit.

Does cachly auto-detect my editors?

Yes. The cachly setup wizard automatically detects Claude Code, Cursor, GitHub Copilot, Windsurf, Cline, Zed, and Continue.dev — any editor that supports MCP. It writes the correct config file for each editor in one pass. You never manually edit JSON config files.

Is memory shared across all my AI editors?

Yes. cachly uses a single Brain that all your AI editors connect to simultaneously. A lesson remembered in Claude Code is instantly available in Cursor and GitHub Copilot. If your team uses different editors, all of you share the same persistent memory pool.

What is brain_from_git?

brain_from_git is a cachly tool that reads your entire git history before your first session and extracts lessons from every commit, PR, and revert. Your AI arrives knowing years of architectural decisions, bug fixes, and team conventions — without you writing a single line of documentation. Zero onboarding.

What is causal_trace?

causal_trace is a cachly tool that traces the history of any file or bug across your entire git history in seconds — replacing 30+ minutes of manual git blame. Describe a problem in plain English. It returns the root cause, the failure chain, and the exact fix that worked — with date, command, and file path.

What is brain_predict?

brain_predict is a cachly tool that scans your Brain for failure patterns before every deploy, migration, or dependency upgrade. It returns probability-weighted warnings based on your team's actual incident history — so you catch the next incident before it happens.

Does cachly work with Claude Code, Cursor, and GitHub Copilot?

Yes. cachly works with Claude Code, Cursor, GitHub Copilot, Windsurf, Cline, Zed, and Continue.dev — anywhere that supports MCP. Run 'npx @cachly-dev/mcp-server@latest autopilot' to configure all editors in one step. Memory is shared across all editors simultaneously.

Can cachly search memory across languages?

Yes. cachly uses semantic vector embeddings, not keyword search. A lesson stored in German appears when you search in English. A fix documented in Arabic matches a Japanese query about the same bug pattern. Supported languages include English, German, French, Spanish, Italian, Portuguese, Japanese, Chinese (Simplified and Traditional), Korean, Arabic, Hebrew, and more.

How is cachly different from mem0?

mem0 is a memory layer for Python LLM apps and chatbots — great for building AI products. cachly is built specifically for developer tooling: it connects to your AI editor via MCP, learns from your git history automatically, predicts failures before deploy, and gives your whole team shared memory. cachly runs on EU servers and is GDPR-native. For developers using Claude Code, Cursor, or Copilot, cachly is the right choice.

Is cachly GDPR compliant?

Yes. cachly runs exclusively on German servers (Hetzner). All data stays in the EU. No data is shared with third parties. cachly is fully GDPR compliant. An AVV (Auftragsverarbeitungsvertrag / Data Processing Agreement) is available for Business and Enterprise customers.

Self-Host a Semantic LLM Cache in 5 Minutes

Why self-host?

Three types of teams ask us for self-hosting every week:

Healthcare / finance — data cannot leave a specific VPC or country
Air-gapped environments — no outbound internet from the inference cluster
Cost control— running on existing hardware that's already paid for

Cachly ships as a standard Docker image. The managed cloud version and the self-hosted version run identical code — the only difference is who operates it.

What you need

Any Linux server with 1 GB RAM (or more)
Docker + Docker Compose
An embedding API key (OpenAI, Cohere, or Mistral) — or use a local model

No Kubernetes. No Helm charts. No cert-manager. If you can run docker compose up, you can run Cachly.

Quick start

Create a compose.yml:

services:
  cachly-api:
    image: ghcr.io/cachly-dev/cachly-api:latest
    environment:
      SELF_HOSTED: "true"
      DATABASE_URL: postgres://cachly:secret@postgres:5432/cachly
      VALKEY_URL: redis://valkey:6379
      EMBEDDING_PROVIDER: openai
      EMBEDDING_API_KEY: sk-...
      ENCRYPTION_KEY: ${CACHLY_ENCRYPTION_KEY}
    ports:
      - "3001:3001"
    depends_on:
      - postgres
      - valkey

  postgres:
    image: pgvector/pgvector:pg16
    environment:
      POSTGRES_DB: cachly
      POSTGRES_USER: cachly
      POSTGRES_PASSWORD: secret
    volumes:
      - pg_data:/var/lib/postgresql/data

  valkey:
    image: valkey/valkey:8-alpine
    volumes:
      - valkey_data:/data

volumes:
  pg_data:
  valkey_data:

Then:

export CACHLY_ENCRYPTION_KEY=$(openssl rand -hex 32)
docker compose up -d

# Create your first instance
curl -X POST http://localhost:3001/api/v1/instances \
  -H "Content-Type: application/json" \
  -d '{"name":"my-cache","tier":"free"}'

That's it. Cachly is running locally. Port 3001 serves the REST API — point your SDK at it.

Connect your app

Use any Cachly SDK, just override the base URL:

# Python
from cachly import CachlyClient
client = CachlyClient(
    base_url="http://localhost:3001",
    api_key="your-self-hosted-key"
)

# TypeScript
import { CachlyClient } from "@cachly-dev/sdk";
const client = new CachlyClient({
  baseUrl: "http://localhost:3001",
  apiKey: "your-self-hosted-key",
});

# LangChain
from langchain_community.cache import CachlySemanticCache
import langchain
langchain.llm_cache = CachlySemanticCache(
    vector_url="http://localhost:3001/api/v1/sem/YOUR_TOKEN",
    threshold=0.92,
)

What SELF_HOSTED=true disables

The SELF_HOSTED=true flag disables everything cloud-specific:

Feature	Managed	Self-Hosted
Stripe billing	✅	❌ disabled
Kubernetes provisioning	✅	❌ disabled
Multi-tenant isolation	✅	Single-tenant
Semantic cache (pgvector)	✅	✅
Exact cache (Valkey)	✅	✅
AI Brain (MCP tools)	✅	✅
REST API	✅	✅
All 15+ SDKs	✅	✅
Data sovereignty	EU/US/APAC	Your server

Air-gapped setup (no internet)

If your inference server has no outbound internet, use a local embedding model instead of OpenAI:

# In compose.yml, add Ollama as the embedding provider:
  cachly-api:
    environment:
      EMBEDDING_PROVIDER: ollama
      EMBEDDING_BASE_URL: http://ollama:11434
      EMBEDDING_MODEL: nomic-embed-text

  ollama:
    image: ollama/ollama
    volumes:
      - ollama_data:/root/.ollama

nomic-embed-text produces 768-dim embeddings. Pull it once with docker exec ollama ollama pull nomic-embed-textand you're offline-ready.

Production checklist

Set ENCRYPTION_KEY to a random 32-byte hex string (never commit it)
Mount a persistent volume for Postgres — the pgvector index lives there
Add nginx or Caddy in front for TLS termination
Enable Valkey persistence (appendonly yes) for the L1 exact cache
Set CORS_ORIGINS to your frontend domain

What's next

Once you have the cache running locally, check out the MCP server setup to give your AI coding assistant persistent memory backed by your self-hosted instance. All 30 Brain tools work identically against a local Cachly instance.

Need help with the air-gapped setup or a custom SLA? [email protected]