Integrations

Cachly SDK Integrations: 3 lines to semantic caching

Copy-paste examples for LangChain, Vercel AI SDK, OpenAI direct, Go, Ruby, PHP, Rust, and more. Every stack, same result: 60–90% fewer LLM API calls.

How it works

Cachly sits between your app and the LLM. When a prompt arrives:

  1. L1: Exact match — Valkey checks for a byte-identical cached response (sub-millisecond)
  2. L2: Semantic match — pgvector HNSW finds prompts with cosine similarity ≥ threshold
  3. L3: LLM call — only if both caches miss, the request reaches your LLM provider

The result: one user's "What's the capital of France?" caches the answer for every future "Which city is France's capital?" — without any app changes.

Examples by language

Python

Python (direct)

from cachly import CachlyClient

client = CachlyClient(api_key="cky_live_...")
instance = client.instances.get("my-cache")

# Semantic cache lookup
hit = instance.semantic_search("What is semantic caching?", threshold=0.92)
if hit:
    print(hit.value)  # cached answer
else:
    answer = call_llm(prompt)
    instance.semantic_set("What is semantic caching?", answer)
LangChain

LangChain

from langchain_community.cache import CachlySemanticCache
import langchain

langchain.llm_cache = CachlySemanticCache(
    vector_url="https://api.cachly.dev/v1/sem/cky_live_...",
    threshold=0.92,  # cosine similarity — 0.92 recommended
)

# Now every LLM call goes through the semantic cache automatically
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model="gpt-4o")
llm.predict("What is semantic caching?")  # cached after first call
TypeScript

TypeScript / Vercel AI SDK

import { createCachlyMiddleware } from "@cachly-dev/sdk";
import { generateText } from "ai";
import { openai } from "@ai-sdk/openai";

const cache = createCachlyMiddleware({
  vectorUrl: "https://api.cachly.dev/v1/sem/cky_live_...",
  threshold: 0.92,
});

// Wrap any generateText / streamText call
const result = await cache.withText(
  "What is semantic caching?",
  () => generateText({ model: openai("gpt-4o"), prompt: "What is semantic caching?" })
);
Go

Go

import "github.com/cachly-dev/cachly-go"

client := cachly.NewClient("cky_live_...")
instance := client.Instance("my-cache")

// Semantic search
results, _ := instance.SemanticSearch(ctx, cachly.SearchRequest{
    Query:     "What is semantic caching?",
    Threshold: 0.92,
    Limit:     1,
})
if len(results) > 0 {
    fmt.Println(results[0].Value) // cache hit
}
Ruby

Ruby

require "cachly"

client = Cachly::Client.new(api_key: "cky_live_...")
instance = client.instance("my-cache")

result = instance.semantic_search("What is semantic caching?", threshold: 0.92)
puts result&.value || "cache miss"
PHP

PHP

use Cachly\CachlyClient;

$client = new CachlyClient("cky_live_...");
$instance = $client->getInstance("my-cache");

$result = $instance->semanticSearch("What is semantic caching?", 0.92);
echo $result ? $result->value : "cache miss";

MCP server (Claude / Cursor)

For AI coding assistants, the MCP server gives Claude or Cursor 30 tools to read, write, and search the cache — plus persistent AI memory across sessions.

# Install
npx @cachly-dev/mcp-server setup

# Claude Code: adds automatically to ~/.claude/settings.json
# Cursor: adds to .cursor/mcp.json

# Available tools (30 total):
# cache_get, cache_set, cache_delete, cache_mget, cache_mset
# semantic_search, cache_stream_get, cache_stream_set
# remember_context, recall_context, smart_recall
# session_start, session_end, session_handoff
# ... and 16 more

Choosing a threshold

The similarity threshold controls how aggressively the cache matches:

ThresholdBehaviourBest for
0.85Aggressive — matches paraphrases broadlyFAQs, support bots
0.92Balanced (recommended default)Most production use cases
0.97Conservative — near-identical prompts onlyCode generation, structured data
1.0Exact match onlySame as Redis, no semantic benefit

All supported languages

PythonTypeScriptGoRubyPHPRust.NET / C#JavaKotlinSwiftElixirDartRBash / CLIMCP Server

Full SDK docs and install commands: cachly.dev/docs

cachly is a managed AI Brain for developers — persistent memory, team knowledge sharing, and semantic cache for Claude Code, Cursor, GitHub Copilot & Windsurf. One MCP server. 51 tools. Free tier, EU servers, no credit card.

Your AI is forgetting everything right now.

Every session starts blank. Every bug re-discovered. Every deploy procedure re-explained. cachly fixes that in 30 seconds — your AI remembers every lesson, every fix, every teammate's hard-won knowledge. Forever.

🇪🇺 EU servers · GDPR-compliant🆓 Free tier — forever, no credit card⚡ 30-second setup via npx🔌 Claude Code · Cursor · Copilot · Windsurf