cachly is a persistent AI memory platform for developers. It gives AI coding assistants like Claude Code, Cursor, GitHub Copilot and Windsurf a brain that remembers every lesson, fix, and architecture decision — forever. It connects via the MCP (Model Context Protocol) standard and includes 126 MCP tools. Free tier available. Runs on German (EU) servers.

How does cachly work?

Run 'npx @cachly-dev/mcp-server@latest autopilot' once. The wizard auto-detects every AI editor you have installed (Claude Code, Cursor, Copilot, Windsurf, Cline, Zed) and writes the correct config for each. It then reads your entire git history with brain_from_git and loads years of team knowledge into your Brain before your first session. From that point, sessions start automatically, memory is shared across all your editors simultaneously, and a git post-commit hook teaches cachly from every commit.

Does cachly auto-detect my editors?

Yes. The cachly setup wizard automatically detects Claude Code, Cursor, GitHub Copilot, Windsurf, Cline, Zed, and Continue.dev — any editor that supports MCP. It writes the correct config file for each editor in one pass. You never manually edit JSON config files.

Is memory shared across all my AI editors?

Yes. cachly uses a single Brain that all your AI editors connect to simultaneously. A lesson remembered in Claude Code is instantly available in Cursor and GitHub Copilot. If your team uses different editors, all of you share the same persistent memory pool.

What is brain_from_git?

brain_from_git is a cachly tool that reads your entire git history before your first session and extracts lessons from every commit, PR, and revert. Your AI arrives knowing years of architectural decisions, bug fixes, and team conventions — without you writing a single line of documentation. Zero onboarding.

What is causal_trace?

causal_trace is a cachly tool that traces the history of any file or bug across your entire git history in seconds — replacing 30+ minutes of manual git blame. Describe a problem in plain English. It returns the root cause, the failure chain, and the exact fix that worked — with date, command, and file path.

What is brain_predict?

brain_predict is a cachly tool that scans your Brain for failure patterns before every deploy, migration, or dependency upgrade. It returns probability-weighted warnings based on your team's actual incident history — so you catch the next incident before it happens.

Does cachly work with Claude Code, Cursor, and GitHub Copilot?

Yes. cachly works with Claude Code, Cursor, GitHub Copilot, Windsurf, Cline, Zed, and Continue.dev — anywhere that supports MCP. Run 'npx @cachly-dev/mcp-server@latest autopilot' to configure all editors in one step. Memory is shared across all editors simultaneously.

Can cachly search memory across languages?

Yes. cachly uses semantic vector embeddings, not keyword search. A lesson stored in German appears when you search in English. A fix documented in Arabic matches a Japanese query about the same bug pattern. Supported languages include English, German, French, Spanish, Italian, Portuguese, Japanese, Chinese (Simplified and Traditional), Korean, Arabic, Hebrew, and more.

How is cachly different from mem0?

mem0 is a memory layer for Python LLM apps and chatbots — great for building AI products. cachly is built specifically for developer tooling: it connects to your AI editor via MCP, learns from your git history automatically, predicts failures before deploy, and gives your whole team shared memory. cachly runs on EU servers and is GDPR-native. For developers using Claude Code, Cursor, or Copilot, cachly is the right choice.

Is cachly GDPR compliant?

Yes. cachly runs exclusively on German servers (Hetzner). All data stays in the EU. No data is shared with third parties. cachly is fully GDPR compliant. An AVV (Auftragsverarbeitungsvertrag / Data Processing Agreement) is available for Business and Enterprise customers.

AI Agent Persistent Memory with Cachly

The stateless agent problem

LangChain, AutoGen, and CrewAI are brilliant at reasoning. But they all share a fatal flaw: agents are stateless by default.

Run a 10-step CrewAI workflow on Monday. Run it again on Tuesday. Your agents start from zero. They'll make the same mistakes, ask the same clarifying questions, and rediscover the same solutions — again.

The typical workaround is to dump everything into the context window. That costs tokens. It exceeds context limits. It's not searchable. And it doesn't work across different agent runs or different team members' agents.

What you need is a persistent, queryable memory layer — separate from the context window, shared across runs, and fast enough to not be a bottleneck.

Cachly Agent Memory: how it works

Cachly provides three memory primitives for AI agents:

Key-Value Memory — store and retrieve any structured data by key. Persists across runs, restarts, and scale-out. Used for agent state, intermediate results, and workflow checkpoints.
Semantic Memory — store facts by content, retrieve by meaning. No exact key needed. CachlySemanticMemory uses pgvector for similarity search. Works offline with local embeddings via Ollama.
Chat History — persistent message store with TTL and session isolation. Drop-in replacement for ConversationBufferWindowMemory in LangChain.

LangChain integration

3 lines to replace the default in-memory store:

from cachly_agents import CachlyMemory, CachlySemanticMemory
from langchain.agents import initialize_agent

# Persistent chat history — survives restarts and scale-out
memory = CachlyMemory(
    cachly_url=os.environ["CACHLY_URL"],
    session_id="user-42",
    ttl=86400 * 7,  # 7 days
)

# Semantic memory — recall by meaning, not key
semantic = CachlySemanticMemory(
    cachly_url=os.environ["CACHLY_URL"],
    namespace="product-knowledge",
    embed_fn=openai_embed,  # or None for free Ollama
)

# Store a fact
semantic.store("Our refund policy is 30 days, no questions asked")

# Recall by meaning in any future run
facts = semantic.recall("what is the return policy?", top_k=3)

# Standard LangChain agent with persistent memory
agent = initialize_agent(
    tools=tools,
    llm=llm,
    memory=memory,
    agent="conversational-react-description",
)

CrewAI: shared crew knowledge base

CrewAI agents can share a knowledge base — every researcher agent stores findings, every analyst reads them, every writer synthesizes them. Across runs. Across crew restarts. Across different deployments.

from cachly_agents import CachlyCrewMemory

memory = CachlyCrewMemory(
    cachly_url=os.environ["CACHLY_URL"],
    crew_id="market-research-crew",
)

# Researcher agent stores findings
memory.store(
    key="competitor:acme:pricing",
    value={"price": 99, "tier": "pro", "updated": "2026-04-28"},
    tags=["competitor", "pricing"],
)

# Analyst agent reads across runs
findings = memory.search_by_tag("pricing")
snapshot = memory.get_snapshot()  # full crew context

# Writer agent recalls by meaning
relevant = memory.semantic_recall(
    "What do we know about competitor pricing?"
)

AutoGen: multi-agent memory with TTL

AutoGen's conversation model means agents share context through the message thread. But that context disappears when the conversation ends. Cachly adds a durable memory layer that persists the important parts:

from cachly_agents import CachlyAutoGenMemory
from autogen import AssistantAgent, UserProxyAgent

memory = CachlyAutoGenMemory(
    cachly_url=os.environ["CACHLY_URL"],
    conversation_id="project-x-planning",
)

# Inject memory into system prompts
assistant = AssistantAgent(
    name="planner",
    system_message=(
        "You are a senior engineer. "
        + memory.get_context_prompt(top_k=10)
    ),
    llm_config=llm_config,
)

# After each conversation, store key decisions
memory.store_decisions(conversation_result)  # auto-extracted

Performance: why Valkey beats a vector DB

Most persistent memory solutions for agents use a vector database (Pinecone, Weaviate, Qdrant) for semantic search. That adds a network hop, a separate service to manage, and $30–200/month in additional costs.

Cachly stores semantic vectors in pgvector alongside your structured data. The semantic search endpoint has p99 latency of 3–8ms — fast enough to use on every agent turn without adding noticeable latency. For non-semantic lookups, Valkey delivers sub-millisecond reads.

For offline or air-gapped deployments, set CACHLY_EMBED_PROVIDER=ollama and point it at your local Llama model. Zero external API calls, full semantic search capability.

Get started

pip install cachly-agents

# Then set:
# CACHLY_URL=redis://:token@host:port

# Optional semantic search:
# CACHLY_VECTOR_URL=https://api.cachly.dev/v1/sem/token
# CACHLY_EMBED_PROVIDER=openai   # or mistral, cohere, gemini, ollama

Free tier: 25 MB, 1 instance, no credit card. That's enough for a full CrewAI research workflow with semantic memory across dozens of runs.

AI Agent Persistent Memory: LangChain, AutoGen & CrewAI that Actually Remember