Integrations

Cachly SDK Integrations: 3 lines to semantic caching

Copy-paste examples for LangChain, Vercel AI SDK, OpenAI direct, Go, Ruby, PHP, Rust, and more. Every stack, same result: 60–90% fewer LLM API calls.

How it works

Cachly sits between your app and the LLM. When a prompt arrives:

  1. L1: Exact match — Valkey checks for a byte-identical cached response (sub-millisecond)
  2. L2: Semantic match — pgvector HNSW finds prompts with cosine similarity ≥ threshold
  3. L3: LLM call — only if both caches miss, the request reaches your LLM provider

The result: one user's "What's the capital of France?" caches the answer for every future "Which city is France's capital?" — without any app changes.

Examples by language

Python

Python (direct)

from cachly import CachlyClient

client = CachlyClient(api_key="cky_live_...")
instance = client.instances.get("my-cache")

# Semantic cache lookup
hit = instance.semantic_search("What is semantic caching?", threshold=0.92)
if hit:
    print(hit.value)  # cached answer
else:
    answer = call_llm(prompt)
    instance.semantic_set("What is semantic caching?", answer)
LangChain

LangChain

from langchain_community.cache import CachlySemanticCache
import langchain

langchain.llm_cache = CachlySemanticCache(
    vector_url="https://api.cachly.dev/v1/sem/cky_live_...",
    threshold=0.92,  # cosine similarity — 0.92 recommended
)

# Now every LLM call goes through the semantic cache automatically
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model="gpt-4o")
llm.predict("What is semantic caching?")  # cached after first call
TypeScript

TypeScript / Vercel AI SDK

import { createCachlyMiddleware } from "@cachly-dev/sdk";
import { generateText } from "ai";
import { openai } from "@ai-sdk/openai";

const cache = createCachlyMiddleware({
  vectorUrl: "https://api.cachly.dev/v1/sem/cky_live_...",
  threshold: 0.92,
});

// Wrap any generateText / streamText call
const result = await cache.withText(
  "What is semantic caching?",
  () => generateText({ model: openai("gpt-4o"), prompt: "What is semantic caching?" })
);
Go

Go

import "github.com/cachly-dev/cachly-go"

client := cachly.NewClient("cky_live_...")
instance := client.Instance("my-cache")

// Semantic search
results, _ := instance.SemanticSearch(ctx, cachly.SearchRequest{
    Query:     "What is semantic caching?",
    Threshold: 0.92,
    Limit:     1,
})
if len(results) > 0 {
    fmt.Println(results[0].Value) // cache hit
}
Ruby

Ruby

require "cachly"

client = Cachly::Client.new(api_key: "cky_live_...")
instance = client.instance("my-cache")

result = instance.semantic_search("What is semantic caching?", threshold: 0.92)
puts result&.value || "cache miss"
PHP

PHP

use Cachly\CachlyClient;

$client = new CachlyClient("cky_live_...");
$instance = $client->getInstance("my-cache");

$result = $instance->semanticSearch("What is semantic caching?", 0.92);
echo $result ? $result->value : "cache miss";

MCP server (Claude / Cursor)

For AI coding assistants, the MCP server gives Claude or Cursor 30 tools to read, write, and search the cache — plus persistent AI memory across sessions.

# Install
npx @cachly-dev/mcp-server setup

# Claude Code: adds automatically to ~/.claude/settings.json
# Cursor: adds to .cursor/mcp.json

# Available tools (30 total):
# cache_get, cache_set, cache_delete, cache_mget, cache_mset
# semantic_search, cache_stream_get, cache_stream_set
# remember_context, recall_context, smart_recall
# session_start, session_end, session_handoff
# ... and 16 more

Choosing a threshold

The similarity threshold controls how aggressively the cache matches:

ThresholdBehaviourBest for
0.85Aggressive — matches paraphrases broadlyFAQs, support bots
0.92Balanced (recommended default)Most production use cases
0.97Conservative — near-identical prompts onlyCode generation, structured data
1.0Exact match onlySame as Redis, no semantic benefit

All supported languages

PythonTypeScriptGoRubyPHPRust.NET / C#JavaKotlinSwiftElixirDartRBash / CLIMCP Server

Full SDK docs and install commands: cachly.dev/docs

cachly is a persistent AI Brain for developers — memory shared across Claude Code, Cursor, GitHub Copilot & Windsurf simultaneously. Auto-detects every editor. Bootstraps from your git history. 115 MCP tools. Free tier, EU servers, no credit card.

Your AI is forgetting everything right now.

Every session starts blank. Every bug re-discovered. Every deploy procedure re-explained. cachly fixes that in 30 seconds — your AI remembers every lesson, every fix, every teammate's hard-won knowledge. Forever.

🇪🇺 EU servers · GDPR-compliant🆓 Free tier — forever, no credit card⚡ 30-second setup via npx🔌 Claude Code · Cursor · Copilot · Windsurf