Blog

Engineering and product posts from the Cachly team.

·7 min read

How we cut LLM costs by 80% with Semantic Cache

Every user rephrases the same question differently. Without semantic caching you pay for each rephrasing. Here's how pgvector similarity search eliminates 60–90% of LLM API calls — with real numbers and 3 lines of code.

Semantic CacheCost OptimizationAI
·8 min read

How I Built a VS Code Extension That Shows What My AI Learned

From `yo code` to a live status bar widget showing brain health, lesson count, and token savings — the full walkthrough including every gotcha. TypeScript, zero extra dependencies.

VS CodeTutorialDeveloper Tools
·10 min read

Building an IntelliJ Plugin in Kotlin: Status Bar + API

From build.gradle.kts to a live widget in IntelliJ IDEA, WebStorm, and all JetBrains IDEs — StatusBarWidgetFactory, PersistentStateComponent, Swing DialogWrapper, and the Gradle gotchas the docs don't mention.

IntelliJKotlinTutorial
·3 min read

See your AI Brain in VS Code and IntelliJ

New IDE plugins show brain health, lesson count, and recall stats directly in your status bar. VS Code and IntelliJ — zero config.

IDE PluginsVS CodeIntelliJ
·5 min read

Your AI assistant never forgets — no embeddings required

We removed the #1 barrier to AI memory: the mandatory API key. Before: your assistant forgot everything. After: it remembers in 3ms. Zero config, works offline.

AI MemoryProductZero Config
·8 min read

We built persistent memory for Claude Code

How we gave AI coding assistants a brain that survives across sessions — session briefings, lesson recall, team knowledge, and semantic search.

AI MemoryMCPClaude Code