AI Memory

3-Layer AI Memory System

The first cache that teaches your AI to learn from its own work. Set up once — your AI assistant remembers solutions, architecture decisions, and lessons across every session, automatically.

How the 3 Layers Work Together

Layer 1

Storage — your Valkey instance

Persistent brain. Stores all lessons, context, and architecture decisions. Survives restarts, context window resets, and new chat sessions. Lives on your cachly instance.

Layer 2

Tools — the memory API

learn_from_attempts saves what worked and what failed. recall_best_solution retrieves the best known solution before any task. smart_recall finds context by meaning — no exact key needed.

Layer 3

Autopilot — copilot-instructions.md

A single file in your repo that tells GitHub Copilot, Claude, Cursor, and Windsurf to run the memory tools automatically — before every task to check memory, and after every task to save lessons. Zero manual effort.

Layer 3 calls Layer 2 automatically → Layer 2 reads/writes Layer 1
Result: Your AI never solves the same problem twice.

Quick Setup — 3 Steps

Or use the MCP tool setup_ai_memory to generate everything automatically in one command.

Choose your embedding provider

OpenAI — Most popular. Get key at platform.openai.com/api-keysGet API key →

Add to `.mcp.json` — works with Claude, Cursor, GitHub Copilot, Windsurf

.mcp.json

{
  "mcpServers": {
    "cachly": {
      "command": "npx",
      "args": [
        "-y",
        "@cachly-dev/mcp-server@latest"
      ],
      "env": {
        "CACHLY_JWT": "your-api-token",
        "OPENAI_API_KEY": "sk-..."
      }
    }
  }
}

Replace your-api-token with your token from cachly.dev/instances → Settings.

Activate Layer 3 — the Autopilot

Create .github/copilot-instructions.md in your project. This tells GitHub Copilot, Claude, Cursor, and Windsurf to run memory tools automatically — before and after every task.

Fastest: use the MCP tool (one command)

Tell your AI assistant:

tool call

setup_ai_memory(
  instance_id = "your-instance-uuid",
  project_dir = "/path/to/your/project",
  embed_provider = "openai"
)

This writes .github/copilot-instructions.md directly to your project and prints the full setup summary.

Or copy manually:

copilot-instructions.md

# cachly AI Memory — 3-Layer Autopilot

## BEFORE every task
1. Call `recall_best_solution("topic")` — check for known solutions
2. Call `smart_recall("description")` — find context by meaning
3. If found, use it directly — skip re-discovery

## AFTER every task
1. Call `learn_from_attempts(topic, outcome, what_worked, what_failed)`
2. Call `remember_context("key", "analysis", category)` for code findings

You're set up.

Your AI assistant will now automatically check its memory before every task and save lessons after. First thing to try:

recall_best_solution("deploy:web")

Before deploying — check if it's been done before

learn_from_attempts(topic="debug:auth", outcome="success", what_worked="...")

After fixing a bug — save for next time

smart_recall("how does the database schema work")

Find cached architecture notes by meaning

All 30 MCP Tools

The full tool surface — available in GitHub Copilot, Claude, Cursor, Windsurf, and any MCP-compatible AI tool.

AI Memory (the killer feature)

learn_from_attempts

Store a lesson — what worked, what failed, the root cause

recall_best_solution

Retrieve the best known solution for a topic before attempting it

smart_recall

Semantic search over cached context by natural language

remember_context

Save architecture notes, file summaries, analysis

recall_context

Get exact context by key (supports glob: file:*)

list_remembered

See everything the AI has already cached

forget_context

Delete stale context entries

setup_ai_memory

One-shot setup: generates .mcp.json + copilot-instructions.md

Semantic Cache

semantic_search

Find cached entries by meaning using hybrid semantic + keyword search

detect_namespace

Classify a prompt into semantic namespace in <0.1ms

cache_warmup

Pre-warm the semantic cache with prompt/value pairs

index_project

Index your codebase semantically for file discovery

Live Cache Operations

cache_get

Get a value by key

cache_set

Set a key-value pair with optional TTL

cache_delete

Delete one or more keys

cache_exists

Check if keys exist

cache_ttl

Inspect TTL of a key

cache_keys

List keys matching a glob pattern

cache_stats

Memory, hit rate, ops/sec, keyspace info

cache_mset

Set multiple keys in one pipeline round-trip

cache_mget

Get multiple keys in one round-trip

Distributed Locks & Streams

cache_lock_acquire

Acquire a distributed lock (Redlock-lite, fencing token)

cache_lock_release

Release a lock atomically via Lua script

cache_stream_set

Cache LLM token stream (RPUSH)

cache_stream_get

Replay a cached stream as ordered chunks

Instance Management

list_instances

List all your cache instances

create_instance

Provision a new instance (free or paid)

get_instance

Get details + connection string

get_connection_string

Get the redis:// URL

delete_instance

Permanently delete an instance

get_api_status

Check API health + JWT auth info

Embedding Provider Reference

Provider	Env Variable	Default Model	CACHLY_EMBED_PROVIDER	Cost
OpenAI	OPENAI_API_KEY	text-embedding-3-small	(default)	Paid API
Mistral	MISTRAL_API_KEY	mistral-embed	mistral	Paid API
Cohere	COHERE_API_KEY	embed-english-v3.0	cohere	Paid API
Gemini	GEMINI_API_KEY	text-embedding-004	gemini	Paid API
Ollama (Local)	OLLAMA_BASE_URL	nomic-embed-text	ollama	Free / Local

Switch providers by changing CACHLY_EMBED_PROVIDER in your .mcp.json env — no code changes required.

Ready to give your AI a memory?

Free tier available. No credit card. Provisioned in 30 seconds.

Create free instance →Back to docs

🌐

Start with community knowledge

Don't start from scratch — import a curated Brain snapshot from the marketplace. One command seeds your AI with hundreds of battle-tested lessons for your stack.

React brains →Go brains →Python brains →TypeScript brains →

brain_import({"share_id": "<id from /brains>"})Browse Brain Marketplace →