Cachly SDK Integrations: 3 lines to semantic caching
Copy-paste examples for LangChain, Vercel AI SDK, OpenAI direct, Go, Ruby, PHP, Rust, and more. Every stack, same result: 60–90% fewer LLM API calls.
How it works
Cachly sits between your app and the LLM. When a prompt arrives:
- L1: Exact match — Valkey checks for a byte-identical cached response (sub-millisecond)
- L2: Semantic match — pgvector HNSW finds prompts with cosine similarity ≥ threshold
- L3: LLM call — only if both caches miss, the request reaches your LLM provider
The result: one user's "What's the capital of France?" caches the answer for every future "Which city is France's capital?" — without any app changes.
Examples by language
Python (direct)
from cachly import CachlyClient
client = CachlyClient(api_key="cky_live_...")
instance = client.instances.get("my-cache")
# Semantic cache lookup
hit = instance.semantic_search("What is semantic caching?", threshold=0.92)
if hit:
print(hit.value) # cached answer
else:
answer = call_llm(prompt)
instance.semantic_set("What is semantic caching?", answer)LangChain
from langchain_community.cache import CachlySemanticCache
import langchain
langchain.llm_cache = CachlySemanticCache(
vector_url="https://api.cachly.dev/v1/sem/cky_live_...",
threshold=0.92, # cosine similarity — 0.92 recommended
)
# Now every LLM call goes through the semantic cache automatically
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model="gpt-4o")
llm.predict("What is semantic caching?") # cached after first callTypeScript / Vercel AI SDK
import { createCachlyMiddleware } from "@cachly-dev/sdk";
import { generateText } from "ai";
import { openai } from "@ai-sdk/openai";
const cache = createCachlyMiddleware({
vectorUrl: "https://api.cachly.dev/v1/sem/cky_live_...",
threshold: 0.92,
});
// Wrap any generateText / streamText call
const result = await cache.withText(
"What is semantic caching?",
() => generateText({ model: openai("gpt-4o"), prompt: "What is semantic caching?" })
);Go
import "github.com/cachly-dev/cachly-go"
client := cachly.NewClient("cky_live_...")
instance := client.Instance("my-cache")
// Semantic search
results, _ := instance.SemanticSearch(ctx, cachly.SearchRequest{
Query: "What is semantic caching?",
Threshold: 0.92,
Limit: 1,
})
if len(results) > 0 {
fmt.Println(results[0].Value) // cache hit
}Ruby
require "cachly"
client = Cachly::Client.new(api_key: "cky_live_...")
instance = client.instance("my-cache")
result = instance.semantic_search("What is semantic caching?", threshold: 0.92)
puts result&.value || "cache miss"PHP
use Cachly\CachlyClient;
$client = new CachlyClient("cky_live_...");
$instance = $client->getInstance("my-cache");
$result = $instance->semanticSearch("What is semantic caching?", 0.92);
echo $result ? $result->value : "cache miss";MCP server (Claude / Cursor)
For AI coding assistants, the MCP server gives Claude or Cursor 30 tools to read, write, and search the cache — plus persistent AI memory across sessions.
# Install npx @cachly-dev/mcp-server setup # Claude Code: adds automatically to ~/.claude/settings.json # Cursor: adds to .cursor/mcp.json # Available tools (30 total): # cache_get, cache_set, cache_delete, cache_mget, cache_mset # semantic_search, cache_stream_get, cache_stream_set # remember_context, recall_context, smart_recall # session_start, session_end, session_handoff # ... and 16 more
Choosing a threshold
The similarity threshold controls how aggressively the cache matches:
| Threshold | Behaviour | Best for |
|---|---|---|
| 0.85 | Aggressive — matches paraphrases broadly | FAQs, support bots |
| 0.92 | Balanced (recommended default) | Most production use cases |
| 0.97 | Conservative — near-identical prompts only | Code generation, structured data |
| 1.0 | Exact match only | Same as Redis, no semantic benefit |
All supported languages
Full SDK docs and install commands: cachly.dev/docs
cachly is a managed AI Brain for developers — persistent memory, team knowledge sharing, and semantic cache for Claude Code, Cursor, GitHub Copilot & Windsurf. One MCP server. 51 tools. Free tier, EU servers, no credit card.
Your AI is forgetting everything right now.
Every session starts blank. Every bug re-discovered. Every deploy procedure re-explained. cachly fixes that in 30 seconds — your AI remembers every lesson, every fix, every teammate's hard-won knowledge. Forever.