Agents SDK
The cachly Agents SDK provides a purpose-built caching layer for AI agents. Cache tool calls, chain-of-thought reasoning, and agent memory to reduce latency and LLM costs.
Why Cache AI Agents?
Cut LLM Costs
Identical tool calls and reasoning chains are cached – no redundant API calls.
Sub-ms Latency
Cached responses return in <1 ms instead of 2–30s LLM round-trips.
Persistent Memory
Agent context and learned patterns survive across sessions and restarts.
Installation
Python
pip install cachly-agents
TypeScript / Node.js
npm install @cachly/agents
Go
go get github.com/cachlydev/agents-go
Quick Start
from cachly_agents import AgentCache
cache = AgentCache(api_key="ck_...", instance="my-agent-cache")
# Cache a tool call result
@cache.tool("weather_lookup")
def get_weather(city: str) -> dict:
return call_weather_api(city)
# Cache chain-of-thought reasoning
@cache.chain("math_reasoning")
def solve_math(problem: str) -> str:
return call_llm(f"Solve step by step: {problem}")
# Agent memory – persists across sessions
memory = cache.memory("agent-007")
memory.set("last_topic", "kubernetes scaling")
print(memory.get("last_topic")) # "kubernetes scaling"Core Concepts
🔧 Tool Call Caching
Wrap any tool/function with @cache.tool(). Identical inputs return cached outputs. Supports exact match and semantic similarity.
| Option | Default | Description |
|---|---|---|
| ttl | 3600 | Cache TTL in seconds |
| semantic | false | Enable semantic similarity matching |
| threshold | 0.92 | Similarity threshold (if semantic=true) |
🔗 Chain-of-Thought Caching
Cache multi-step reasoning chains. If the same (or similar) problem is seen again, the entire reasoning chain is returned without re-running each step. Saves multiple LLM round-trips per invocation.
🧠 Agent Memory
A key-value store scoped per agent ID. Persists learned context, user preferences, and session state across restarts. Backed by the same cache engine with optional TTL.