Blog
Engineering and product posts from the Cachly team.
How we cut LLM costs by 80% with Semantic Cache
Every user rephrases the same question differently. Without semantic caching you pay for each rephrasing. Here's how pgvector similarity search eliminates 60–90% of LLM API calls — with real numbers and 3 lines of code.
How I Built a VS Code Extension That Shows What My AI Learned
From `yo code` to a live status bar widget showing brain health, lesson count, and token savings — the full walkthrough including every gotcha. TypeScript, zero extra dependencies.
Building an IntelliJ Plugin in Kotlin: Status Bar + API
From build.gradle.kts to a live widget in IntelliJ IDEA, WebStorm, and all JetBrains IDEs — StatusBarWidgetFactory, PersistentStateComponent, Swing DialogWrapper, and the Gradle gotchas the docs don't mention.
See your AI Brain in VS Code and IntelliJ
New IDE plugins show brain health, lesson count, and recall stats directly in your status bar. VS Code and IntelliJ — zero config.
Your AI assistant never forgets — no embeddings required
We removed the #1 barrier to AI memory: the mandatory API key. Before: your assistant forgot everything. After: it remembers in 3ms. Zero config, works offline.
We built persistent memory for Claude Code
How we gave AI coding assistants a brain that survives across sessions — session briefings, lesson recall, team knowledge, and semantic search.