Multilingual AI··6 min read

Store in Japanese, Recall in English — Cross-Language AI Memory

Your AI Brain just got multilingual retrieval. Lessons stored in Japanese, Chinese, Korean, Arabic, or Hebrew are automatically findable in English — and vice versa. No embeddings required.

The Use Case

International engineering teams often have a language problem. The Japanese team stores lessons in Japanese. The Korean team writes in Korean. The shared AI memory is siloed — not because the Brain doesn't know these languages, but because keyword search can't bridge language boundaries. Until now.

How It Works — Without Embeddings

Semantic search with embeddings can find cross-lingual matches, but it requires an OpenAI API key, costs money per lookup, and adds latency. Most Brain users are on the free tier — no API key, no embeddings.

Our approach: a curated technical term synonym map, built directly into the tokenizer. When any document is indexed or any query is tokenized, every recognized technical term gets expanded to its equivalents in all supported languages:

Token: "deploy"
Expanded to: デプロイ (JA), 部署 (ZH), 배포 (KO), نشر (AR), פריסה (HE)
+ bigrams of all CJK variants added to the token stream

Token: デプロイ
Expanded to: "deploy", "deployment"
+ romaji "depuroi" added from katakana converter

The expansion happens at tokenize time — both when indexing (documents get synonym tokens) and when searching (queries get synonym tokens). This creates a shared token space across all 6 language pairs.

The Synonym Map — 130+ Terms

The map covers the technical vocabulary that actually appears in developer Brain entries:

ConceptJAZHKOARHE
deployデプロイ部署배포نشرפריסה
containerコンテナ容器컨테이너حاويةמיכל
serverサーバー服务器서버سيرفرשרת
errorエラー错误오류خطأשגיאה
auth認証认证인증مصادقةאימות
monitorモニター监控모니터링مراقبةניטור

Plus 100+ more: cache, database, build, test, install, log, port, cluster, debug, migration, and more.

Zero-Embedding, Zero-Cost

The entire cross-language lookup is a Map.get() call — O(1), no API, no network, no cost.

Traditional semantic search: 200–400ms latency, $0.0004/1000 tokens
Cross-lingual synonym lookup: <0.01ms latency, $0

This runs on the free tier, on every smart_recall call, in every Brain session.

Real-World Example

Team setup: Korean backend team stores lessons in Korean. English-speaking DevOps engineers query the Brain in English.

# Korean engineer stores a lesson
learn_from_attempts:
  topic: "deploy:k8s"
  outcome: "success"
  whatWorked: "배포 실패 원인: 포트 3000이 방화벽에 의해 차단됨. 포트를 열어 해결"

# English DevOps searches the next day
smart_recall("deployment failure port blocked")
→ ✅ Returns the Korean lesson, ranked first
# Symmetric: English lesson found by Korean query
learn_from_attempts:
  topic: "fix:redis"
  outcome: "success"
  whatWorked: "Redis connection timeout fixed: set timeout to 5000ms in config"

smart_recall("레디스 연결 오류")  # Korean: "Redis connection error"
→ ✅ Returns the English lesson

Supported Languages

The synonym graph now covers 6 language families: Japanese (hiragana + katakana), Chinese (simplified), Korean (hangul), Arabic (MSA technical vocabulary), Hebrew, and English. All language pairs are bidirectional.

Cross-language search activates automatically — no configuration, no flag, no API key. If you store in Japanese and recall in English, it just works.

Upgrade

npx @cachly-dev/mcp-server@latest autopilot

cachly is a persistent AI Brain for developers — memory shared across Claude Code, Cursor, GitHub Copilot & Windsurf simultaneously. Auto-detects every editor. Bootstraps from your git history. 115 MCP tools. Free tier, EU servers, no credit card.

Your AI is forgetting everything right now.

Every session starts blank. Every bug re-discovered. Every deploy procedure re-explained. cachly fixes that in 30 seconds — your AI remembers every lesson, every fix, every teammate's hard-won knowledge. Forever.

🇪🇺 EU servers · GDPR-compliant🆓 Free tier — forever, no credit card⚡ 30-second setup via npx🔌 Claude Code · Cursor · Copilot · Windsurf