MCP · Model Context Protocol

MCP Integration

cachly speaks MCP. Connect it to Claude, Cursor, Windsurf, Copilot, or any MCP-compatible AI editor. Provision instances, monitor hit rates, manage keys — and let your AI agents cache their own LLM responses.

Quick Setup

The same config works for all MCP-compatible editors. Find your editor's config path below.

mcp config (any editor)
{
  "mcpServers": {
    "cachly": {
      "command": "npx",
      "args": ["-y", "@cachly-dev/mcp-server"],
      "env": {
        "CACHLY_JWT": "your-jwt-token-here"
      }
    }
  }
}

Get your JWT from the cachly dashboard → Account → API Token.

EditorConfig pathStatus
🤖Claude Code~/.claude/claude_desktop_config.json✓ Native
Cursor.cursor/mcp.json✓ Supported
🏄Windsurf~/.codeium/windsurf/mcp_config.json✓ Supported
🐙GitHub Copilot (VS Code).vscode/settings.json✓ Supported
▶️Continue.dev~/.continue/config.json✓ Supported

Available Tools

8 tools across instance management, live cache operations, and semantic search. Use them via natural language — your AI translates intent to the right tool call.

list_instances

List all your cache instances with status and connection strings

create_instance

Provision a new free or paid instance in any region

get_connection_string

Get the redis:// connection URL for an instance

cache_get / cache_set

Read and write keys directly — useful for debugging

cache_stats

Memory usage, hit rate, ops/sec, keyspace info

semantic_search

Find semantically similar cached entries by meaning

cache_keys

List keys matching a glob pattern

delete_instance

Permanently delete a cache instance

🤯

Use-Case: The AI That Saves Itself Money

The most powerful cachly use-case: an AI agent uses the cachly MCP to manage the cache that stores its own LLM responses. Semantically similar questions are answered from cache — zero tokens spent. Teams running 10k+ LLM calls/day report 60–70% cost reduction.

LangChain agent with cachly semantic cache
import { createClient } from '@cachly-dev/sdk'
import { ChatOpenAI } from '@langchain/openai'
import { HumanMessage } from '@langchain/core/messages'

const cache = createClient({ url: process.env.CACHLY_URL })
const llm = new ChatOpenAI({ model: 'gpt-4o-mini' })

// LangChain agent that caches its own responses via cachly MCP
async function cachedAgent(userMessage: string) {
  // 1. Check semantic cache first (sub-ms)
  const { value, hit, similarity } = await cache.semantic!.getOrSet(
    userMessage,
    async () => {
      // 2. Only call OpenAI on cache miss
      const response = await llm.invoke([new HumanMessage(userMessage)])
      return response.content as string
    },
    { similarityThreshold: 0.92 }
  )

  console.log(hit
    ? `⚡ Cache hit (similarity ${(similarity! * 100).toFixed(1)}%) — 0 tokens spent`
    : `🔄 LLM call — cached for next time`
  )

  return value
}

// These two calls cost tokens only once:
await cachedAgent("How do I cancel my order?")
await cachedAgent("Cancel an order please")  // ⚡ cache hit — 92% similar
🧠

Use-Case: AI Dev Brain — Persistent Engineering Memory

Use cachly as an always-on engineering brain for your AI assistant. Every solved bug, every deploy trick, every architecture decision — stored semantically, recalled instantly. Your AI never forgets a lesson, never debugs the same bug twice.

Claude Code + cachly Brain MCP
# Your AI dev assistant uses cachly to remember everything
# across sessions — without losing context.

session_start(
  instance_id = "your-brain-instance-id",
  focus       = "add stripe webhook handling",
)
# → Returns: last session state, open bugs, relevant lessons

# After solving a problem:
learn_from_attempts(
  topic      = "stripe:webhook-signature",
  outcome    = "success",
  what_worked = "Use stripe.webhooks.constructEvent() before parsing body",
  what_failed = "express.json() middleware strips raw body — use express.raw()",
  severity   = "critical",
)
# → Stored forever. Never debug this again.

See the full Brain setup guide in AI Memory docs →

💬

Use-Case: Natural Language Cache Management

No dashboard. No clicking. Talk to your cache the same way you talk to your AI assistant. Provision infrastructure, debug issues, monitor performance — all in natural language from your editor.

You

Create a free cache instance for my staging environment in Germany

AI
✅ Instance "staging" provisioned (eu-central, Free tier)
   Connection: redis://staging.cachly.dev:31201
   Status: running · 25 MB · Valkey 8
You

What's the hit rate on my prod cache this week?

AI
📊 prod-cache stats:
   Hit rate: 71.4% · 2.4M requests · 847 ms avg saved
   Estimated LLM savings: ~$340 this week
You

Find all cached responses about payment errors

AI
🔍 Semantic search: "payment errors"
   Found 23 similar entries (threshold: 0.90)
   Top match: "stripe payment declined" (0.97 similarity)

Ready to give your AI a memory?

Free instance in under 30 seconds. No credit card required.