3-Layer AI Memory System
The first cache that teaches your AI to learn from its own work. Set up once — your AI assistant remembers solutions, architecture decisions, and lessons across every session, automatically.
How the 3 Layers Work Together
Storage — your Valkey instance
Persistent brain. Stores all lessons, context, and architecture decisions. Survives restarts, context window resets, and new chat sessions. Lives on your cachly instance.
Tools — the memory API
learn_from_attempts saves what worked and what failed. recall_best_solution retrieves the best known solution before any task. smart_recall finds context by meaning — no exact key needed.
Autopilot — copilot-instructions.md
A single file in your repo that tells GitHub Copilot, Claude, and Cursor to run the memory tools automatically — before every task to check memory, and after every task to save lessons. Zero manual effort.
Result: Your AI never solves the same problem twice.
Quick Setup — 3 Steps
Or use the MCP tool setup_ai_memory to generate everything automatically in one command.
Choose your embedding provider
Add to .mcp.json — works with Claude, Cursor, GitHub Copilot, Windsurf
{
"mcpServers": {
"cachly": {
"command": "npx",
"args": [
"-y",
"@cachly-dev/mcp-server@latest"
],
"env": {
"CACHLY_JWT": "your-api-token",
"OPENAI_API_KEY": "sk-..."
}
}
}
}Replace your-api-token with your token from cachly.dev/instances → Settings.
Activate Layer 3 — the Autopilot
Create .github/copilot-instructions.md in your project. This tells GitHub Copilot, Claude, and Cursor to run memory tools automatically — before and after every task.
Fastest: use the MCP tool (one command)
Tell your AI assistant:
setup_ai_memory( instance_id = "your-instance-uuid", project_dir = "/path/to/your/project", embed_provider = "openai" )
This writes .github/copilot-instructions.md directly to your project and prints the full setup summary.
Or copy manually:
# cachly AI Memory — 3-Layer Autopilot
## BEFORE every task
1. Call `recall_best_solution("topic")` — check for known solutions
2. Call `smart_recall("description")` — find context by meaning
3. If found, use it directly — skip re-discovery
## AFTER every task
1. Call `learn_from_attempts(topic, outcome, what_worked, what_failed)`
2. Call `remember_context("key", "analysis", category)` for code findings✅ You're set up.
Your AI assistant will now automatically check its memory before every task and save lessons after. First thing to try:
recall_best_solution("deploy:web")Before deploying — check if it's been done before
learn_from_attempts(topic="debug:auth", outcome="success", what_worked="...")After fixing a bug — save for next time
smart_recall("how does the database schema work")Find cached architecture notes by meaning
All 30 MCP Tools
The full tool surface — available in GitHub Copilot, Claude, Cursor, Windsurf, and any MCP-compatible AI tool.
🧠 AI Memory (the killer feature)
learn_from_attemptsStore a lesson — what worked, what failed, the root cause
recall_best_solutionRetrieve the best known solution for a topic before attempting it
smart_recallSemantic search over cached context by natural language
remember_contextSave architecture notes, file summaries, analysis
recall_contextGet exact context by key (supports glob: file:*)
list_rememberedSee everything the AI has already cached
forget_contextDelete stale context entries
setup_ai_memoryOne-shot setup: generates .mcp.json + copilot-instructions.md
🔍 Semantic Cache
semantic_searchFind cached entries by meaning using hybrid semantic + keyword search
detect_namespaceClassify a prompt into semantic namespace in <0.1ms
cache_warmupPre-warm the semantic cache with prompt/value pairs
index_projectIndex your codebase semantically for file discovery
⚡ Live Cache Operations
cache_getGet a value by key
cache_setSet a key-value pair with optional TTL
cache_deleteDelete one or more keys
cache_existsCheck if keys exist
cache_ttlInspect TTL of a key
cache_keysList keys matching a glob pattern
cache_statsMemory, hit rate, ops/sec, keyspace info
cache_msetSet multiple keys in one pipeline round-trip
cache_mgetGet multiple keys in one round-trip
🔒 Distributed Locks & Streams
cache_lock_acquireAcquire a distributed lock (Redlock-lite, fencing token)
cache_lock_releaseRelease a lock atomically via Lua script
cache_stream_setCache LLM token stream (RPUSH)
cache_stream_getReplay a cached stream as ordered chunks
🏗️ Instance Management
list_instancesList all your cache instances
create_instanceProvision a new instance (free or paid)
get_instanceGet details + connection string
get_connection_stringGet the redis:// URL
delete_instancePermanently delete an instance
get_api_statusCheck API health + JWT auth info
Embedding Provider Reference
| Provider | Env Variable | Default Model | CACHLY_EMBED_PROVIDER | Cost |
|---|---|---|---|---|
| 🟢 OpenAI | OPENAI_API_KEY | text-embedding-3-small | (default) | Paid API |
| 🟠 Mistral | MISTRAL_API_KEY | mistral-embed | mistral | Paid API |
| 🔵 Cohere | COHERE_API_KEY | embed-english-v3.0 | cohere | Paid API |
| 🟡 Gemini | GEMINI_API_KEY | text-embedding-004 | gemini | Paid API |
| 🦙 Ollama (Local) | OLLAMA_BASE_URL | nomic-embed-text | ollama | Free / Local |
Switch providers by changing CACHLY_EMBED_PROVIDER in your .mcp.json env — no code changes required.
Ready to give your AI a memory?
Free tier available. No credit card. Provisioned in 30 seconds.