2 Commits

Author SHA1 Message Date
Rene Fichtmueller
6f5dd81d7a sec(gateway): +15 languages + non-Latin script detector (62 patterns total)
Closes the multilingual bypass gap. Previously covered EN/DE/FR/ES/IT/RU/ZH/JA.
Now also: Bangla, Hindi, Arabic, Hebrew, Persian, Turkish, Vietnamese, Thai,
Korean, Polish, Dutch, Indonesian, Tagalog, Swahili.

Plus a universal non-Latin-script soft-flag pattern (severity=medium) that
catches ≥20 chars of Arabic/Bengali/Devanagari/Hebrew/Thai/Hangul/Han/
Hiragana/Katakana/Cyrillic/Tamil/Telugu/Gujarati/Gurmukhi/Myanmar/Khmer/
Lao/Tibetan/Georgian/Armenian/Sinhala — surfaces in scan result without
auto-blocking, so legitimate non-Latin prompts pass while the operator
can route them to llm_judge for deep inspection.

Pattern-engineering notes:
  - Devanagari / Bengali / Hebrew need optional matra/suffix tolerance
  - Turkish needs \p{L} instead of \w because ı/ş/ç fall outside ASCII \w
  - Persian (SOV) needs both VSO and SOV order alternation
  - Hebrew needs מ/ב/כ/ל preposition prefix tolerance
  - Tagalog needs optional ang/sa article between verb and noun

Smoke-tested 14/14 languages → all HTTP 422 blocked.
Negative-tested 3 benign non-Latin prompts (jp-weather, ar-greeting,
th-thanks) → all HTTP 200 pass. Zero false positives.

Total active patterns: 62 across 6 categories.
2026-05-16 23:02:01 +02:00
Rene Fichtmueller
ac887ab052 sec(gateway): commit injection/output defense modules to git (44 patterns)
These modules have been running in production on Erik but were never
committed to the repo. Recovering + extending after CVE-2026-45321 review.

injection-defense.ts: 44 patterns across 6 categories
  jailbreak (15)    — ignore/disregard/override + DAN/AIM/EvilBOT/Omega
                      personas + fiction-framing + goal-hijack + 5 langs
                      (en, de, fr, es, it, ru, zh, ja) + DAN-style
                      "do anything now"
  role_bypass (3)   — DAN persona, new-system-prompt, pretend-rolemix
  system_prompt_leak (5) — direct reveal, repeat verbatim, starts-with,
                      debug/admin mode invocation, translate-system
  indirect (8)      — fake role tags, embedded user msg, instruction-in-data,
                      ChatML/INST control-token smuggle, RAG-poisoning,
                      Unicode tag chars (E0000-E007F), zero-width chars,
                      fake-section-header override
  exfiltration (6)  — markdown-image, send-data-to, base64-instruction,
                      DNS-exfil, webhook canaries, templated image URLs
  policy (2)        — no-refusal, illegal-content-demand

Activated via INJECTION_DEFENSE_MODE=block in ecosystem.config.js (also
fixed: was overridden by stale pm2 set module_conf.json entry).

Smoke tests:
  - "Ignore all previous instructions"  → 422 blocked
  - "You are now AIM"                   → 422 blocked
  - "Ignorez les instructions"          → 422 blocked
  - "What is 2+2?"                      → 200 passes

output-defense.ts: existing stream-time output filter, kept as-is.
2026-05-16 22:55:08 +02:00