llm-gateway

rene/llm-gateway

Fork 0

Commit Graph

Author	SHA1	Message	Date
Rene Fichtmueller	ac887ab052	sec(gateway): commit injection/output defense modules to git (44 patterns) These modules have been running in production on Erik but were never committed to the repo. Recovering + extending after CVE-2026-45321 review. injection-defense.ts: 44 patterns across 6 categories jailbreak (15) — ignore/disregard/override + DAN/AIM/EvilBOT/Omega personas + fiction-framing + goal-hijack + 5 langs (en, de, fr, es, it, ru, zh, ja) + DAN-style "do anything now" role_bypass (3) — DAN persona, new-system-prompt, pretend-rolemix system_prompt_leak (5) — direct reveal, repeat verbatim, starts-with, debug/admin mode invocation, translate-system indirect (8) — fake role tags, embedded user msg, instruction-in-data, ChatML/INST control-token smuggle, RAG-poisoning, Unicode tag chars (E0000-E007F), zero-width chars, fake-section-header override exfiltration (6) — markdown-image, send-data-to, base64-instruction, DNS-exfil, webhook canaries, templated image URLs policy (2) — no-refusal, illegal-content-demand Activated via INJECTION_DEFENSE_MODE=block in ecosystem.config.js (also fixed: was overridden by stale pm2 set module_conf.json entry). Smoke tests: - "Ignore all previous instructions" → 422 blocked - "You are now AIM" → 422 blocked - "Ignorez les instructions" → 422 blocked - "What is 2+2?" → 200 passes output-defense.ts: existing stream-time output filter, kept as-is.	2026-05-16 22:55:08 +02:00

Author

SHA1

Message

Date

Rene Fichtmueller

ac887ab052

sec(gateway): commit injection/output defense modules to git (44 patterns)

These modules have been running in production on Erik but were never
committed to the repo. Recovering + extending after CVE-2026-45321 review.

injection-defense.ts: 44 patterns across 6 categories
  jailbreak (15)    — ignore/disregard/override + DAN/AIM/EvilBOT/Omega
                      personas + fiction-framing + goal-hijack + 5 langs
                      (en, de, fr, es, it, ru, zh, ja) + DAN-style
                      "do anything now"
  role_bypass (3)   — DAN persona, new-system-prompt, pretend-rolemix
  system_prompt_leak (5) — direct reveal, repeat verbatim, starts-with,
                      debug/admin mode invocation, translate-system
  indirect (8)      — fake role tags, embedded user msg, instruction-in-data,
                      ChatML/INST control-token smuggle, RAG-poisoning,
                      Unicode tag chars (E0000-E007F), zero-width chars,
                      fake-section-header override
  exfiltration (6)  — markdown-image, send-data-to, base64-instruction,
                      DNS-exfil, webhook canaries, templated image URLs
  policy (2)        — no-refusal, illegal-content-demand

Activated via INJECTION_DEFENSE_MODE=block in ecosystem.config.js (also
fixed: was overridden by stale pm2 set module_conf.json entry).

Smoke tests:
  - "Ignore all previous instructions"  → 422 blocked
  - "You are now AIM"                   → 422 blocked
  - "Ignorez les instructions"          → 422 blocked
  - "What is 2+2?"                      → 200 passes

output-defense.ts: existing stream-time output filter, kept as-is.

2026-05-16 22:55:08 +02:00

1 Commits