Free, Private Embeddings for Your AI Dev Brain — Powered by Ollama
We now run a local embedding model on our infrastructure. Zero API keys. No prompt data leaves Germany.
The problem we kept hearing
When we launched the Cachly AI Dev Brain — persistent cross-session memory for AI coding assistants — the most common question was:
Short answer: you shouldn't have to. Memory is infrastructure. You don't pay Stripe every time you write to a database. So we fixed it.
What we built
We now run nomic-embed-text via Ollama directly on our cachly infrastructure (Hetzner, Germany). Every embedding operation for the AI Brain — index_project, smart_recall, learn_from_attempts, remember_context — goes through this local model.
OPENAI_API_KEY needed — just your CACHLY_JWTnomic-embed-text: Why we chose it
| Model | Dims | Size | MTEB Score | License |
|---|---|---|---|---|
| nomic-embed-text ★ | 768 | 274 MB | 62.4 | Apache 2.0 |
| text-embedding-3-small | 1536 | API | ~62.0 | OpenAI ToS |
| text-embedding-ada-002 | 1536 | API | 60.9 | OpenAI ToS |
| all-MiniLM-L6-v2 | 384 | 91 MB | 56.3 | Apache 2.0 |
nomic-embed-text scores on par with OpenAI's best embedding models, is fully open-source, and fits under 300 MB — making it practical to run on existing server infrastructure without a GPU.
How it works under the hood
MCP Tool Call: smart_recall("fix deploy")
↓
cachly MCP Server (npm)
↓ HTTP
cachly API (Go) — POST /api/v1/semantic/search
↓
EmbedHandler.Embed("fix deploy")
↓ HTTP (internal Docker network)
Ollama: POST http://ollama:11434/api/embeddings
↓
nomic-embed-text → 768-dim float32 vector
↓
pgvector: SELECT ... ORDER BY embedding <=> $1 LIMIT 10
↓
Top-k lessons returned to your AI assistantEverything runs on a single Hetzner CPX32 node in Germany. The Ollama container uses max 700 MB RAM and stays idle at ~62 MB between requests.
Bring your own model (optional)
If you'd rather use your own embedding provider, set these in your MCP config:
{
"env": {
"CACHLY_JWT": "your-jwt",
"CACHLY_BRAIN_INSTANCE_ID": "your-instance-id",
"CACHLY_EMBED_PROVIDER": "openai",
"OPENAI_API_KEY": "sk-..."
}
}The Brain falls back automatically to your provider when CACHLY_EMBED_PROVIDER is set. Otherwise it uses our hosted nomic-embed-text instance.
Get started
npx @cachly-dev/mcp-server@latest autopilotNo OPENAI_API_KEY needed. Just your CACHLY_JWT.
cachly is a persistent AI Brain for developers — memory shared across Claude Code, Cursor, GitHub Copilot & Windsurf simultaneously. Auto-detects every editor. Bootstraps from your git history. 115 MCP tools. Free tier, EU servers, no credit card.
Your AI is forgetting everything right now.
Every session starts blank. Every bug re-discovered. Every deploy procedure re-explained. cachly fixes that in 30 seconds — your AI remembers every lesson, every fix, every teammate's hard-won knowledge. Forever.