Arabic · Hebrew · i18n··5 min read

Your AI Brain Now Speaks Arabic and Hebrew

The global developer community speaks more than Latin. With cachly v0.5.48, the Brain's search engine natively understands Arabic and Hebrew — two of the world's most widely spoken languages, and historically two of the most underserved in developer tooling.

Cross-Language Bridge

Arabic and Hebrew are now full members of the cross-language retrieval network — the same network that already connects English, German, French, Japanese, Chinese, and Korean. Store a lesson in any language, recall it in any other. No translation. No configuration. No separate index.

Example — a real-world scenario:

An Arabic-speaking engineer fixes a JWT authentication issue and documents it in Arabic:

cachly learn '{
  "topic": "fix:auth",
  "outcome": "success",
  "whatWorked": "مصادقة JWT تعمل بعد إضافة المفتاح السري في متغيرات البيئة"
}'

Three days later, a teammate searches in English:

smart_recall("JWT authentication secret missing")
→ ✅ Returns the Arabic lesson, ranked correctly

The same works in reverse. An English query for port conflict deployment finds Hebrew lessons. A Japanese query finds Arabic lessons. The synonym graph is fully connected.

How It Works — The Synonym Graph

Every technical term is a node. Edges connect equivalents across languages:

"authentication"
  ↔  مصادقة  (Arabic)
  ↔  אימות   (Hebrew)
  ↔  認証    (Japanese)
  ↔  인증    (Korean)
  ↔  认证    (Chinese)
  ↔  Authentifizierung  (German)

"deploy"
  ↔  نشر          (Arabic)
  ↔  פריסה        (Hebrew)
  ↔  デプロイ     (Japanese)
  ↔  배포          (Korean)
  ↔  部署          (Chinese)
  ↔  bereitstellen (German)

When you query smart_recall("مشكلة النشر") (deployment problem), the Brain: tokenizes → removes Arabic stopwords → stems النشر → نشر → expands to deploy, deployment, 배포, デプロイ, bereitstellen → searches all stored lessons for any of those tokens → returns ranked results regardless of language.

QueryFinds lessons containing
smart_recall("authentication error")مصادقة, אימות, 認証, autenticación, …
smart_recall("مشكلة النشر")deploy, deployment, デプロイ, 배포, …
smart_recall("שגיאת אימות")auth, authentication, مصادقة, 認証, …
smart_recall("تصحيح الأخطاء")debug, debugging, איתור באגים, デバッグ, …

The RTL Challenge

Most search engines are built for left-to-right text. Arabic and Hebrew run right-to-left — a surface-level difference that hides a deeper challenge: both languages attach grammatical particles directly to words as prefixes, making naive tokenization nearly useless.

Consider Arabic: the word الخطأ(al-khaṭaʾ, "the error") fuses the definite article ال (al-) with the root خطأ (error). A naive tokenizer treats the whole thing as one opaque token. Searching for خطأ would miss الخطأ — and miss وخطأ (and-error), فالخطأ (so-the-error), and every other prefixed form. Hebrew has the same pattern.

What We Built

Unicode-aware RTL tokenization — when the Brain detects Arabic (U+0600–U+06FF) or Hebrew (U+0590–U+05FF) characters, it switches to word-level tokenization with language-specific enhancements.

Arabic light stemming — iterative, up to 3 passes, resolves stacked prefixes:

الخطأ   → خطأ   (ال = definite article stripped)
وخطأ   → خطأ   (و = conjunction stripped)
فالخطأ → خطأ   (ف + ال, two passes needed)
للنشر  → نشر   (ل + ال, two passes: للنشر → النشر → نشر)

Plus 60 Arabic + 40 Hebrew stopwords — particles, pronouns, auxiliary verbs, prepositions that carry no semantic weight, filtered before indexing and at query time.

How to Use It

No setup required. Just write lessons the way you think:

# Arabic
cachly learn '{
  "topic": "deploy:api",
  "outcome": "success",
  "whatWorked": "نشر التطبيق نجح بعد تغيير منفذ الخدمة من 8080 إلى 3000"
}'

# Hebrew
cachly learn '{
  "topic": "fix:auth",
  "outcome": "success",
  "whatWorked": "תיקון בעיית האימות על ידי הוספת המפתח הסודי לסביבת הייצור"
}'

# Search in any language
smart_recall("مشكلة النشر")     → deployment lessons in any language
smart_recall("שגיאת אימות")     → auth error lessons in any language
smart_recall("authentication")  → also finds مصادقة and אימות lessons

Upgrade

npx @cachly-dev/mcp-server@latest setup

cachly is a managed AI Brain for developers — persistent memory, team knowledge sharing, and semantic cache for Claude Code, Cursor, GitHub Copilot & Windsurf. One MCP server. 51 tools. Free tier, EU servers, no credit card.

Your AI is forgetting everything right now.

Every session starts blank. Every bug re-discovered. Every deploy procedure re-explained. cachly fixes that in 30 seconds — your AI remembers every lesson, every fix, every teammate's hard-won knowledge. Forever.

🇪🇺 EU servers · GDPR-compliant🆓 Free tier — forever, no credit card⚡ 30-second setup via npx🔌 Claude Code · Cursor · Copilot · Windsurf