Semantic Caching

cachly's semantic cache layer uses vector similarity search to return cached responses for semantically similar queries – even when the exact wording differs.

How It Works

1
Embed
Your query is embedded into a high-dimensional vector using the configured model.
2
Search
A similarity search finds the most relevant cached embeddings using approximate nearest-neighbor lookup.
3
Match
If similarity exceeds your threshold (default 0.92), the cached response is returned in <1 ms.
4
Miss
On a miss, the request passes through to your origin. The response is cached for next time.

API Example

POST /api/v1/cache/semantic
Authorization: Bearer <API_KEY>
Content-Type: application/json

{
  "query": "What is the capital of France?",
  "threshold": 0.92,
  "ttl": 3600
}

A semantically equivalent query like "France's capital city?" will hit the cache.

Multimodal Support

Vector Caching v2 extends semantic caching to images, audio, and mixed media. Each modality gets its own embedding model and vector space.

Text

Natural language queries and completions

Images

CLIP-based embeddings for visual similarity

Audio

Whisper-based transcription + embedding

POST /api/v1/cache/semantic
Content-Type: application/json

{
  "modality": "image",
  "data": "<base64-encoded-image>",
  "threshold": 0.88,
  "ttl": 7200
}

SDK Snippets

Python

from cachly import Cachly

client = Cachly(api_key="ck_...")

# Semantic cache lookup
hit = client.semantic.get("What is the capital of France?")
if hit:
    print("Cache hit:", hit.value)
else:
    # Compute and store
    answer = call_llm("What is the capital of France?")
    client.semantic.set("What is the capital of France?", answer, ttl=3600)

TypeScript

import { Cachly } from "@cachly/sdk";

const client = new Cachly({ apiKey: "ck_..." });

const hit = await client.semantic.get("What is the capital of France?");
if (hit) {
  console.log("Cache hit:", hit.value);
}

Configuration

Parameter	Default	Description
threshold	0.92	Cosine similarity threshold for a cache hit
ttl	3600	Time-to-live in seconds
modality	text	Embedding modality: `text`, `image`, `audio`
model	auto	Embedding model override (per-modality defaults)

Get Started Free