Semantic Caching
cachly's semantic cache layer uses pgvector embeddings to return cached responses for semantically similar queries – even when the exact wording differs.
How It Works
- 1
Embed
Your query is embedded into a high-dimensional vector using the configured model.
- 2
Search
pgvector performs an approximate nearest-neighbor search across cached embeddings.
- 3
Match
If similarity exceeds your threshold (default 0.92), the cached response is returned in <1 ms.
- 4
Miss
On a miss, the request passes through to your origin. The response is cached for next time.
API Example
POST /api/v1/cache/semantic
Authorization: Bearer <API_KEY>
Content-Type: application/json
{
"query": "What is the capital of France?",
"threshold": 0.92,
"ttl": 3600
}A semantically equivalent query like "France's capital city?" will hit the cache.
🎨 Multimodal Support
Vector Caching v2 extends semantic caching to images, audio, and mixed media. Each modality gets its own embedding model and vector space.
Text
Natural language queries and completions
Images
CLIP-based embeddings for visual similarity
Audio
Whisper-based transcription + embedding
POST /api/v1/cache/semantic
Content-Type: application/json
{
"modality": "image",
"data": "<base64-encoded-image>",
"threshold": 0.88,
"ttl": 7200
}SDK Snippets
Python
from cachly import Cachly
client = Cachly(api_key="ck_...")
# Semantic cache lookup
hit = client.semantic.get("What is the capital of France?")
if hit:
print("Cache hit:", hit.value)
else:
# Compute and store
answer = call_llm("What is the capital of France?")
client.semantic.set("What is the capital of France?", answer, ttl=3600)TypeScript
import { Cachly } from "@cachly/sdk";
const client = new Cachly({ apiKey: "ck_..." });
const hit = await client.semantic.get("What is the capital of France?");
if (hit) {
console.log("Cache hit:", hit.value);
}Configuration
| Parameter | Default | Description |
|---|---|---|
| threshold | 0.92 | Cosine similarity threshold for a cache hit |
| ttl | 3600 | Time-to-live in seconds |
| modality | text | Embedding modality: text, image, audio |
| model | auto | Embedding model override (per-modality defaults) |