Self-Hosting

Self-Host a Semantic LLM Cache in 5 Minutes

Your data never leaves your server. One docker compose command. No Kubernetes, no cloud account, no vendor lock-in.

Why self-host?

Three types of teams ask us for self-hosting every week:

  • Healthcare / finance — data cannot leave a specific VPC or country
  • Air-gapped environments — no outbound internet from the inference cluster
  • Cost control— running on existing hardware that's already paid for

Cachly ships as a standard Docker image. The managed cloud version and the self-hosted version run identical code — the only difference is who operates it.

What you need

  • Any Linux server with 1 GB RAM (or more)
  • Docker + Docker Compose
  • An embedding API key (OpenAI, Cohere, or Mistral) — or use a local model

No Kubernetes. No Helm charts. No cert-manager. If you can run docker compose up, you can run Cachly.

Quick start

Create a compose.yml:

services:
  cachly-api:
    image: ghcr.io/cachly-dev/cachly-api:latest
    environment:
      SELF_HOSTED: "true"
      DATABASE_URL: postgres://cachly:secret@postgres:5432/cachly
      VALKEY_URL: redis://valkey:6379
      EMBEDDING_PROVIDER: openai
      EMBEDDING_API_KEY: sk-...
      ENCRYPTION_KEY: ${CACHLY_ENCRYPTION_KEY}
    ports:
      - "3001:3001"
    depends_on:
      - postgres
      - valkey

  postgres:
    image: pgvector/pgvector:pg16
    environment:
      POSTGRES_DB: cachly
      POSTGRES_USER: cachly
      POSTGRES_PASSWORD: secret
    volumes:
      - pg_data:/var/lib/postgresql/data

  valkey:
    image: valkey/valkey:8-alpine
    volumes:
      - valkey_data:/data

volumes:
  pg_data:
  valkey_data:

Then:

export CACHLY_ENCRYPTION_KEY=$(openssl rand -hex 32)
docker compose up -d

# Create your first instance
curl -X POST http://localhost:3001/api/v1/instances \
  -H "Content-Type: application/json" \
  -d '{"name":"my-cache","tier":"free"}'

That's it. Cachly is running locally. Port 3001 serves the REST API — point your SDK at it.

Connect your app

Use any Cachly SDK, just override the base URL:

# Python
from cachly import CachlyClient
client = CachlyClient(
    base_url="http://localhost:3001",
    api_key="your-self-hosted-key"
)

# TypeScript
import { CachlyClient } from "@cachly-dev/sdk";
const client = new CachlyClient({
  baseUrl: "http://localhost:3001",
  apiKey: "your-self-hosted-key",
});

# LangChain
from langchain_community.cache import CachlySemanticCache
import langchain
langchain.llm_cache = CachlySemanticCache(
    vector_url="http://localhost:3001/api/v1/sem/YOUR_TOKEN",
    threshold=0.92,
)

What SELF_HOSTED=true disables

The SELF_HOSTED=true flag disables everything cloud-specific:

FeatureManagedSelf-Hosted
Stripe billing❌ disabled
Kubernetes provisioning❌ disabled
Multi-tenant isolationSingle-tenant
Semantic cache (pgvector)
Exact cache (Valkey)
AI Brain (MCP tools)
REST API
All 15+ SDKs
Data sovereigntyEU/US/APACYour server

Air-gapped setup (no internet)

If your inference server has no outbound internet, use a local embedding model instead of OpenAI:

# In compose.yml, add Ollama as the embedding provider:
  cachly-api:
    environment:
      EMBEDDING_PROVIDER: ollama
      EMBEDDING_BASE_URL: http://ollama:11434
      EMBEDDING_MODEL: nomic-embed-text

  ollama:
    image: ollama/ollama
    volumes:
      - ollama_data:/root/.ollama

nomic-embed-text produces 768-dim embeddings. Pull it once with docker exec ollama ollama pull nomic-embed-textand you're offline-ready.

Production checklist

  • Set ENCRYPTION_KEY to a random 32-byte hex string (never commit it)
  • Mount a persistent volume for Postgres — the pgvector index lives there
  • Add nginx or Caddy in front for TLS termination
  • Enable Valkey persistence (appendonly yes) for the L1 exact cache
  • Set CORS_ORIGINS to your frontend domain

What's next

Once you have the cache running locally, check out the MCP server setup to give your AI coding assistant persistent memory backed by your self-hosted instance. All 30 Brain tools work identically against a local Cachly instance.

Need help with the air-gapped setup or a custom SLA? [email protected]

cachly is a managed AI Brain for developers — persistent memory, team knowledge sharing, and semantic cache for Claude Code, Cursor, GitHub Copilot & Windsurf. One MCP server. 51 tools. Free tier, EU servers, no credit card.

Your AI is forgetting everything right now.

Every session starts blank. Every bug re-discovered. Every deploy procedure re-explained. cachly fixes that in 30 seconds — your AI remembers every lesson, every fix, every teammate's hard-won knowledge. Forever.

🇪🇺 EU servers · GDPR-compliant🆓 Free tier — forever, no credit card⚡ 30-second setup via npx🔌 Claude Code · Cursor · Copilot · Windsurf