MCP Integration
cachly speaks MCP. Connect it to Claude, Cursor, Windsurf, Copilot, or any MCP-compatible AI editor. Provision instances, monitor hit rates, manage keys — and let your AI agents cache their own LLM responses.
Quick Setup
The same config works for all MCP-compatible editors. Find your editor's config path below.
{
"mcpServers": {
"cachly": {
"command": "npx",
"args": ["-y", "@cachly-dev/mcp-server"],
"env": {
"CACHLY_JWT": "your-jwt-token-here"
}
}
}
}Get your JWT from the cachly dashboard → Account → API Token.
| Editor | Config path | Status |
|---|---|---|
| 🤖Claude Code | ~/.claude/claude_desktop_config.json | ✓ Native |
| ⚡Cursor | .cursor/mcp.json | ✓ Supported |
| 🏄Windsurf | ~/.codeium/windsurf/mcp_config.json | ✓ Supported |
| 🐙GitHub Copilot (VS Code) | .vscode/settings.json | ✓ Supported |
| ▶️Continue.dev | ~/.continue/config.json | ✓ Supported |
Available Tools
8 tools across instance management, live cache operations, and semantic search. Use them via natural language — your AI translates intent to the right tool call.
list_instancesList all your cache instances with status and connection strings
create_instanceProvision a new free or paid instance in any region
get_connection_stringGet the redis:// connection URL for an instance
cache_get / cache_setRead and write keys directly — useful for debugging
cache_statsMemory usage, hit rate, ops/sec, keyspace info
semantic_searchFind semantically similar cached entries by meaning
cache_keysList keys matching a glob pattern
delete_instancePermanently delete a cache instance
Use-Case: The AI That Saves Itself Money
The most powerful cachly use-case: an AI agent uses the cachly MCP to manage the cache that stores its own LLM responses. Semantically similar questions are answered from cache — zero tokens spent. Teams running 10k+ LLM calls/day report 60–70% cost reduction.
import { createClient } from '@cachly-dev/sdk'
import { ChatOpenAI } from '@langchain/openai'
import { HumanMessage } from '@langchain/core/messages'
const cache = createClient({ url: process.env.CACHLY_URL })
const llm = new ChatOpenAI({ model: 'gpt-4o-mini' })
// LangChain agent that caches its own responses via cachly MCP
async function cachedAgent(userMessage: string) {
// 1. Check semantic cache first (sub-ms)
const { value, hit, similarity } = await cache.semantic!.getOrSet(
userMessage,
async () => {
// 2. Only call OpenAI on cache miss
const response = await llm.invoke([new HumanMessage(userMessage)])
return response.content as string
},
{ similarityThreshold: 0.92 }
)
console.log(hit
? `⚡ Cache hit (similarity ${(similarity! * 100).toFixed(1)}%) — 0 tokens spent`
: `🔄 LLM call — cached for next time`
)
return value
}
// These two calls cost tokens only once:
await cachedAgent("How do I cancel my order?")
await cachedAgent("Cancel an order please") // ⚡ cache hit — 92% similarUse-Case: AI Dev Brain — Persistent Engineering Memory
Use cachly as an always-on engineering brain for your AI assistant. Every solved bug, every deploy trick, every architecture decision — stored semantically, recalled instantly. Your AI never forgets a lesson, never debugs the same bug twice.
# Your AI dev assistant uses cachly to remember everything # across sessions — without losing context. session_start( instance_id = "your-brain-instance-id", focus = "add stripe webhook handling", ) # → Returns: last session state, open bugs, relevant lessons # After solving a problem: learn_from_attempts( topic = "stripe:webhook-signature", outcome = "success", what_worked = "Use stripe.webhooks.constructEvent() before parsing body", what_failed = "express.json() middleware strips raw body — use express.raw()", severity = "critical", ) # → Stored forever. Never debug this again.
See the full Brain setup guide in AI Memory docs →
Use-Case: Natural Language Cache Management
No dashboard. No clicking. Talk to your cache the same way you talk to your AI assistant. Provision infrastructure, debug issues, monitor performance — all in natural language from your editor.
“Create a free cache instance for my staging environment in Germany”
✅ Instance "staging" provisioned (eu-central, Free tier) Connection: redis://staging.cachly.dev:31201 Status: running · 25 MB · Valkey 8
“What's the hit rate on my prod cache this week?”
📊 prod-cache stats: Hit rate: 71.4% · 2.4M requests · 847 ms avg saved Estimated LLM savings: ~$340 this week
“Find all cached responses about payment errors”
🔍 Semantic search: "payment errors" Found 23 similar entries (threshold: 0.90) Top match: "stripe payment declined" (0.97 similarity)
Ready to give your AI a memory?
Free instance in under 30 seconds. No credit card required.