cachly is an AI memory brain and semantic cache platform. It gives AI assistants like Claude Code, Cursor, GitHub Copilot and Windsurf persistent memory across sessions, and cuts LLM API costs by 60% through semantic caching.

How does AI memory work with cachly?

cachly provides an MCP (Model Context Protocol) server that connects to your AI editor. It stores lessons learned, architecture decisions, bug fixes, and session context in a persistent Redis/Valkey database. Your AI recalls this context automatically at the start of every session.

What is semantic cache for LLMs?

Semantic cache detects when two prompts mean the same thing — even if worded differently — and returns the cached response instantly. cachly uses pgvector embeddings to compare prompt similarity. On average, 60% of LLM calls are semantic duplicates.

Does cachly work with Claude Code, Cursor, and GitHub Copilot?

Yes. cachly works with Claude Code, Cursor, GitHub Copilot, and Windsurf via the MCP (Model Context Protocol) standard. Run 'npx @cachly-dev/mcp-server setup' to configure all editors in one step.

Is cachly GDPR compliant?

Yes. cachly runs exclusively on German (EU) servers. All data stays in the EU. No data is shared with third parties. cachly is fully GDPR compliant.

12,000 downloads in 23 days: MCP distribution & the memory pivot

The numbers

@cachly-dev/mcp-server launched on npm on April 16, 2026. 23 days later: 11,953 downloads in the trailing 30-day window. Peak: 1,944 downloads in a single day (May 6).

That curve didn't come from a launch post — we didn't write one. No Product Hunt, no Reddit, no Show HN. Purely organic.

11,953

Downloads in 30 days

1,944

Peak in one day (May 6)

Days since launch

What we thought would work

We built cachly as a semantic cache for AI applications. The pitch: save LLM API costs by caching similar requests. Fast, EU-hosted, GDPR-compliant. The cache pricing angle was technically correct. It was also easy to defer. Nobody's LLM bill was a problem yet — and even if it was, setting up cache middleware requires a code change. High friction.

What actually drove installs: the pivot to memory

The developers who became power users weren't using cachly as a cache. They were using it as AI memory— storing lessons from debugging sessions, recalling past fixes, persisting context across editor restarts. We'd built memory infrastructure while trying to sell a cache.

The repositioning was blunt: Your AI coding assistant forgets everything at the end of every session. cachly gives it a brain that persists. Installs accelerated from that week onward.

The MCP distribution effect

MCP (Model Context Protocol) is the most efficient developer tool distribution channel we've seen. When you publish an MCP server on npm, you get discovered by:

Claude's tool ecosystem (developers actively looking for tools)
Cursor, Windsurf, Cline — all have MCP server discovery built in
GitHub searches for mcp server memory, mcp persistent context
“Top MCP servers” lists shared in developer communities

The search intent is already qualified. Someone who finds @cachly-dev/mcp-serverwhile searching for “mcp server memory” has near-zero additional friction to install. One npx command, one config block, done.

What the Brain has learned (meta-lesson)

Here's a sample of what our own Brain instance has stored from running cachly in production:

Topic	What worked
infra:clickhouse-ipv6	listen_host = 0.0.0.0 + 127.0.0.1 in healthcheck
bash:macos-lowercase	tr '[:upper:]' '[:lower:]' not ${var,,}
fix:stripe-webhook	express.raw() must come before json() middleware
deploy:docker	--force-recreate prevents stale container issues
fix:nextauth-session	Middleware must export config with matcher
fix:playwright-port	E2E uses port 4000, prod uses 3000 — use env var

Every new session — on any machine, by any engineer — benefits from these. That's the value proposition in practice.

What's next

23 days and ~12k downloads is a start. The users who benefit most — the ones who've recalled a critical fix that saved an hour of debugging — are the ones who stay and upgrade. The product bet: if the Brain is useful enough, developers will never want to work without it.

12,000 downloads in 23 days: MCP distribution and the memory pivot