Japanese · i18n··4 min read

Search Your Japanese Brain in Romaji

You store lessons in Japanese. You search in whatever flows from your fingers — sometimes Japanese, sometimes romaji. With cachly v0.5.37, both work.

The Problem

A developer on a Japanese team stores lessons in Japanese every day:

learn_from_attempts:
  topic: "deploy:k8s"
  whatWorked: "readinessProbeのfailureThresholdを10に増やすことで解決"

Three days later, they try to recall it. They type fast, without switching input method:

smart_recall("readiness probe failure threshold") → ✅ found
smart_recall("depuroi") → ❌ nothing  (before v0.5.37)
smart_recall("デプロイ") → ✅ found

The lesson is there. But if you search in romaji — the Latin-alphabet transliteration of Japanese used when typing without a Japanese input method — it was invisible.

What We Built

Katakana → romaji tokenization at index time.

When cachly indexes a document containing katakana, it now also generates the Hepburn romaji equivalent as an additional search token.

"コンテナのヘルスチェックで問題発生"
  ↓ bigrams: コン, ンテ, テナ, ...
  ↓ romaji tokens: "kontena", "herusuche" (from ヘルス+チェック)

Now a search for kontena matches the stored katakana content.

The Converter

We implemented a full Hepburn romanization converter — 70+ mapping rules — handling:

KatakanaRomajiNotes
コンテナkontenaBasic consonant-vowel
デプロイdepuroiVoiced consonants
チェックchekkuDigraph チェ + geminate ッ
ファイルfairuLoanword combination ファ
サーバーsabaLong vowel mark ー stripped
シャットダウンshattodaunDigraph + geminate

Real Examples

# Stored lesson (Japanese)
"コンテナのヘルスチェックで127.0.0.1を使用する必要がある"

# These all find it now:
smart_recall("kontena")          → ✅ (romaji for コンテナ)
smart_recall("container")        → ✅ (cross-lingual synonym)
smart_recall("コンテナ")          → ✅ (direct katakana match)
smart_recall("healthcheck 127")  → ✅ (mixed language, Latin terms match directly)
# Stored lesson (katakana deployment note)
"デプロイメントが失敗、ポート3000を開放することで解決"

smart_recall("depuroimento")     → ✅ (full romaji)
smart_recall("depuroi")          → ✅ (partial romaji, fuzzy match)
smart_recall("port 3000")        → ✅ (Latin terms match directly)
smart_recall("deploy")           → ✅ (cross-lingual: deploy↔デプロイ)

Why Romaji Matters

Japanese developers switch input modes constantly. When typing fast in a terminal, in a commit message, or in a search query — romaji often comes out first. Making romaji search work means zero friction between "I need to find that lesson" and "found it."

This is especially true for katakana loanwords — the vast majority of technical terms in Japanese are written in katakana and have direct romaji equivalents.

Upgrade

npx @cachly-dev/mcp-server@latest autopilot

cachly is a persistent AI Brain for developers — memory shared across Claude Code, Cursor, GitHub Copilot & Windsurf simultaneously. Auto-detects every editor. Bootstraps from your git history. 115 MCP tools. Free tier, EU servers, no credit card.

Your AI is forgetting everything right now.

Every session starts blank. Every bug re-discovered. Every deploy procedure re-explained. cachly fixes that in 30 seconds — your AI remembers every lesson, every fix, every teammate's hard-won knowledge. Forever.

🇪🇺 EU servers · GDPR-compliant🆓 Free tier — forever, no credit card⚡ 30-second setup via npx🔌 Claude Code · Cursor · Copilot · Windsurf