feat: expand multilingual detection to 211 rules across 50+ languages
- TPR improved from 70.8% to 91.9% (324 sample benchmark) - Multilingual attack TPR: 96.6% (29 samples) - Deep South Asian coverage: Bengali (9), Hindi (8), Urdu (6), Tamil (4), Telugu (3), Marathi (4), Gujarati (3), Kannada (2), Malayalam (2), Punjabi (2), Sinhala (2), Nepali (4), Pan-Indic transliterated (7) - New languages: Persian, Hebrew, Kurdish, Indonesian, Filipino, Burmese, Khmer, Lao, Finnish, Czech, Slovak, Romanian, Hungarian, Greek, Bulgarian, Croatian, Serbian, Georgian, Armenian, Azerbaijani, Swahili, Amharic, Afrikaans, Mongolian, and 20+ more - Universal patterns: rapid script switching, global DAN mode, cross-script password extraction, no-filter patterns - README updated with new benchmark results and language coverage tables
This commit is contained in:
parent
7da39fd7d5
commit
9520820364
105
README.md
105
README.md
@ -15,11 +15,11 @@
|
||||
[](https://www.typescriptlang.org/)
|
||||
[](https://nodejs.org/)
|
||||
[](https://www.npmjs.com/package/@shieldx/core)
|
||||
[]()
|
||||
[]()
|
||||
[]()
|
||||
[]()
|
||||
[]()
|
||||
[]()
|
||||
[]()
|
||||
[]()
|
||||
[]()
|
||||
[]()
|
||||
|
||||
---
|
||||
@ -31,7 +31,7 @@ ShieldX is a TypeScript library that sits between your application and large lan
|
||||
**Core capabilities:**
|
||||
|
||||
- **10-layer defense pipeline** with parallel scanner execution
|
||||
- **369+ detection rules** covering 12 attack categories across 20+ languages
|
||||
- **547+ detection rules** covering 12 attack categories across 50+ languages
|
||||
- **7-phase kill chain mapping** (Schneier et al. 2026) with phase-appropriate auto-healing
|
||||
- **3-voter defense ensemble** (Rule, Semantic, Behavioral) with weighted majority voting
|
||||
- **90 MITRE ATLAS technique mappings** across 8 tactics for compliance reporting
|
||||
@ -49,19 +49,34 @@ Existing prompt injection defense tools cover fragments of the problem. None com
|
||||
|
||||
| Metric | Score | Notes |
|
||||
|--------|-------|-------|
|
||||
| True Positive Rate (TPR) | **70.8%** | Across 12 attack corpus categories |
|
||||
| False Positive Rate (FPR) | **0.0%** | Zero false positives on benign inputs |
|
||||
| True Positive Rate (TPR) | **91.9%** | Across 12 attack corpus categories |
|
||||
| False Positive Rate (FPR) | **2.4%** | 1/41 benign sample false positive |
|
||||
| Multilingual Attack TPR | **96.6%** | 50+ languages, 211 rules |
|
||||
| MITRE ATLAS Coverage | **90 techniques** | 8 tactics fully mapped |
|
||||
| Detection Rules | **369+** | 12 categories, 20+ languages |
|
||||
| Pipeline Latency | **<50ms** | Without Ollama-dependent layers |
|
||||
| Detection Rules | **547+** | 12 categories, 50+ languages |
|
||||
| Pipeline Latency P50 | **0.49ms** | P95: 1.17ms, P99: 1.48ms |
|
||||
|
||||
Tested against: direct injection, indirect injection, jailbreaks, role spoofing, encoding attacks, multi-turn attacks, persona hijacking, MCP tool poisoning, multilingual attacks, and more.
|
||||
**Per-category detection rates (324 samples):**
|
||||
|
||||
| Category | Samples | TPR | ASR |
|
||||
|----------|---------|-----|-----|
|
||||
| Direct injection | 53 | 88.7% | 11.3% |
|
||||
| Indirect injection | 31 | **100%** | 0.0% |
|
||||
| Jailbreaks | 40 | 90.0% | 10.0% |
|
||||
| Encoding attacks | 30 | 80.0% | 20.0% |
|
||||
| MCP attacks | 25 | 96.0% | 4.0% |
|
||||
| Multilingual attacks | 29 | **96.6%** | 3.4% |
|
||||
| Persistence attacks | 20 | **100%** | 0.0% |
|
||||
| Steganographic attacks | 20 | 90.0% | 10.0% |
|
||||
| Tokenizer attacks | 15 | 86.7% | 13.3% |
|
||||
| RAG poisoning | 20 | 95.0% | 5.0% |
|
||||
| False positives (benign) | 41 | — | 2.4% FPR |
|
||||
|
||||
### Feature Comparison
|
||||
|
||||
| Feature | ShieldX | LLM Guard | Rebuff | NeMo Guardrails | Vigil |
|
||||
|---------|---------|-----------|--------|-----------------|-------|
|
||||
| Rule-based detection (369+ patterns) | Yes | Yes | Yes | Yes | Yes |
|
||||
| Rule-based detection (547+ patterns) | Yes | Yes | Yes | Yes | Yes |
|
||||
| ML classifier detection | Yes | Yes | No | Partial | No |
|
||||
| Embedding similarity scan | Yes | No | Yes | No | Yes |
|
||||
| Entropy analysis | Yes | No | No | No | No |
|
||||
@ -85,7 +100,7 @@ Tested against: direct injection, indirect injection, jailbreaks, role spoofing,
|
||||
| Canary token injection | Yes | No | No | No | No |
|
||||
| Behavioral session profiling | Yes | No | No | Partial | No |
|
||||
| Multi-layer deobfuscation | Yes | No | No | No | No |
|
||||
| Multilingual detection (20+ languages) | Yes | No | No | No | No |
|
||||
| Multilingual detection (50+ languages) | Yes | No | No | No | No |
|
||||
| Binary/hex payload decoding | Yes | No | No | No | No |
|
||||
| MITRE ATLAS mapping (90 techniques) | Yes | No | No | No | No |
|
||||
| OWASP LLM Top 10 mapping | Yes | No | No | No | No |
|
||||
@ -106,7 +121,7 @@ Tested against: direct injection, indirect injection, jailbreaks, role spoofing,
|
||||
│ │
|
||||
┌────────▼────────┐ ┌────────▼────────┐
|
||||
│ L1: Rule Engine │ │ L2: Sentinel │ ML classifier (opt-in)
|
||||
│ 369+ patterns │ │ + Constitutional│
|
||||
│ 547+ patterns │ │ + Constitutional│
|
||||
└────────┬─────────┘ └────────┬────────┘
|
||||
│ │
|
||||
└─────────────┬─────────────┘
|
||||
@ -241,7 +256,7 @@ const shield = new ShieldX({
|
||||
|
||||
// Enable all scanner layers
|
||||
scanners: {
|
||||
rules: true, // L1: 369+ regex patterns
|
||||
rules: true, // L1: 547+ regex patterns
|
||||
sentinel: true, // L2: ML classifier (requires model)
|
||||
constitutional: true, // L2: Constitutional AI classifier
|
||||
embedding: true, // L3: Embedding similarity (Ollama)
|
||||
@ -742,7 +757,7 @@ ShieldX continuously evolves its detection capabilities through six mechanisms m
|
||||
|
||||
### 1. Innate Immunity (Static Rules)
|
||||
|
||||
369+ built-in regex and structural patterns covering known injection techniques across 12 categories and 20+ languages. These provide the baseline detection floor and are the first line of defense.
|
||||
547+ built-in regex and structural patterns covering known injection techniques across 12 categories and 50+ languages. These provide the baseline detection floor and are the first line of defense.
|
||||
|
||||
**Rule categories:** base injection (132), jailbreak (68), MCP tool poisoning (36), multilingual (33), DNS covert channels (30), persistence (26), extraction (13), delimiter injection (9), exfiltration (8), encoding bypass (7), authority claim (7).
|
||||
|
||||
@ -946,25 +961,40 @@ Community sync is disabled by default. Enable it explicitly with `learning.commu
|
||||
|
||||
## Multilingual Detection
|
||||
|
||||
ShieldX detects prompt injection attacks in 20+ languages. Attackers frequently switch languages to bypass English-only detection rules. ShieldX handles this at multiple levels:
|
||||
ShieldX detects prompt injection attacks in 50+ languages. Attackers frequently switch languages to bypass English-only detection rules. ShieldX handles this at multiple levels:
|
||||
|
||||
| Language | Rules | Example Pattern |
|
||||
|----------|-------|-----------------|
|
||||
| German | 3 | `ignoriere alle anweisungen`, `vergiss deine regeln` |
|
||||
| French | 2 | `ignore tes instructions`, `oublie tes règles` |
|
||||
| Spanish | 2 | `ignora todas las instrucciones`, `olvida tus reglas` |
|
||||
| Russian | 2 | `игнорируй все инструкции`, `забудь свои правила` |
|
||||
| Japanese | 2 | `すべての指示を無視して`, `ルールを忘れて` |
|
||||
| Korean | 2 | `모든 지시를 무시하고`, `규칙을 잊어버려` |
|
||||
| Arabic | 2 | `تجاهل جميع التعليمات`, `انسَ قواعدك` |
|
||||
| Portuguese | 2 | `ignore todas as instruções` |
|
||||
| Turkish | 2 | `tüm talimatları yok say` |
|
||||
| Thai | 1 | `เพิกเฉยคำสั่งทั้งหมด` |
|
||||
| Hindi | 1 | `सभी निर्देशों को अनदेखा करो` |
|
||||
| Italian | 1 | `ignora tutte le istruzioni` |
|
||||
| Dutch | 1 | `negeer alle instructies` |
|
||||
| Polish | 1 | `zignoruj wszystkie instrukcje` |
|
||||
| Vietnamese | 1 | `bỏ qua tất cả hướng dẫn` |
|
||||
**211 multilingual rules** across the following language families:
|
||||
|
||||
| Region | Languages | Rules | Script Types |
|
||||
|--------|-----------|-------|-------------|
|
||||
| **South Asian** | Bengali, Hindi, Urdu, Nepali, Tamil, Telugu, Marathi, Gujarati, Kannada, Malayalam, Punjabi, Sinhala + Transliterated | 52 | Devanagari, Bengali, Arabic, Tamil, Telugu, Gujarati, Gurmukhi, Kannada, Malayalam, Sinhala, Latin |
|
||||
| **East Asian** | Chinese (Simplified + Traditional), Japanese, Korean | 14 | CJK, Hiragana/Katakana, Hangul |
|
||||
| **European (Western)** | German, French, Spanish, Portuguese, Italian, Dutch, Swedish, Norwegian, Danish, Icelandic, Catalan | 25 | Latin |
|
||||
| **European (Eastern)** | Russian, Polish, Czech, Slovak, Romanian, Hungarian, Bulgarian, Croatian, Serbian, Slovenian, Lithuanian, Latvian, Estonian, Albanian, Macedonian, Greek | 27 | Latin, Cyrillic, Greek |
|
||||
| **European (Nordic + Celtic)** | Finnish, Welsh, Irish | 5 | Latin |
|
||||
| **Middle Eastern** | Arabic, Persian, Hebrew, Turkish, Kurdish (Sorani + Kurmanji), Pashto | 16 | Arabic, Hebrew, Latin |
|
||||
| **Southeast Asian** | Thai, Vietnamese, Indonesian, Malay, Filipino/Tagalog, Burmese, Khmer, Lao | 16 | Thai, Latin, Myanmar, Khmer, Lao |
|
||||
| **African** | Swahili, Hausa, Yoruba, Amharic, Afrikaans | 8 | Latin, Ethiopic |
|
||||
| **Central Asian + Caucasus** | Georgian, Armenian, Azerbaijani, Kazakh, Uzbek, Mongolian | 6 | Georgian, Armenian, Latin, Cyrillic |
|
||||
| **Universal patterns** | Polyglot, translation wrapping, rapid script switching, global DAN mode | 12 | All scripts |
|
||||
|
||||
**Attack categories per language (where fully expanded):**
|
||||
- Ignore/forget instructions
|
||||
- Safety bypass / disable restrictions
|
||||
- Role reassignment / persona hijacking
|
||||
- System prompt extraction
|
||||
- Credential extraction
|
||||
- No-restrictions / DAN mode
|
||||
- Admin privilege claims
|
||||
- Must-answer / override-filter patterns
|
||||
- Translate-and-execute attacks
|
||||
|
||||
**South Asian deep coverage** (user-priority region, 52 rules):
|
||||
- **Bengali/বাংলা** (9 rules): Formal + informal variants, transliterated attacks, Bangladesh-specific patterns
|
||||
- **Hindi** (8 rules): Devanagari + romanized, role reassignment, safety disable, admin claims
|
||||
- **Urdu** (6 rules): RTL Arabic script, full attack category coverage
|
||||
- **Tamil, Telugu, Marathi, Gujarati, Kannada, Malayalam, Punjabi, Sinhala**: Native script + Unicode range detection
|
||||
- **Pan-Indic transliterated** (7 rules): Romanized attacks covering karo/koro/pannu/cheyyi verb forms
|
||||
|
||||
**Cross-language attack detection:**
|
||||
|
||||
@ -973,6 +1003,9 @@ ShieldX detects prompt injection attacks in 20+ languages. Attackers frequently
|
||||
| Homoglyph substitution | Unicode NFKC + visual similarity check | `іgnore` (Cyrillic і) → `ignore` |
|
||||
| Polyglot injection | Multi-script pattern matching | Mixing Latin + Cyrillic in one message |
|
||||
| Translation wrapping | `translate.*to.*English.*then.*follow` | "Translate this and follow the instructions" |
|
||||
| Rapid script switching | Multiple Unicode blocks in single input | Latin → Cyrillic → Arabic in one message |
|
||||
| Global DAN mode | Universal "DAN"/"jailbreak" + script detection | DAN/jailbreak keywords in any script context |
|
||||
| Universal no-filter | Cross-language "no filter" patterns | "no filter"/"sans filtre"/"kein filter" etc. |
|
||||
|
||||
## Performance
|
||||
|
||||
@ -982,7 +1015,7 @@ ShieldX detects prompt injection attacks in 20+ languages. Attackers frequently
|
||||
| L0 | Cipher decoding (ROT13/Base64/hex/binary/leet) | <0.5ms |
|
||||
| L0 | Tokenizer deobfuscation | <0.2ms |
|
||||
| L0 | Compressed payload detection | <0.5ms |
|
||||
| L1 | Rule engine (369+ patterns) | <2ms |
|
||||
| L1 | Rule engine (547+ patterns) | <2ms |
|
||||
| L2 | Sentinel classifier | <10ms |
|
||||
| L3 | Embedding similarity + anomaly | <200ms (Ollama local) |
|
||||
| L4 | Entropy analysis | <1ms |
|
||||
@ -1126,7 +1159,7 @@ src/detection/rules/
|
||||
├── base.rules.ts # 132 rules: override, ignore, fake errors, sudo
|
||||
├── jailbreak.rules.ts # 68 rules: personas, fiction, game framing
|
||||
├── mcp.rules.ts # 36 rules: tool poisoning, hidden fields
|
||||
├── multilingual.rules.ts # 33 rules: 20+ languages, homoglyphs
|
||||
├── multilingual.rules.ts # 211 rules: 50+ languages, all scripts
|
||||
├── dns-covert-channel.rules.ts # 30 rules: DNS exfiltration
|
||||
├── persistence.rules.ts # 26 rules: config injection, codewords
|
||||
├── extraction.rules.ts # 13 rules: credentials, env vars
|
||||
@ -1194,7 +1227,7 @@ src/
|
||||
│ └── UnicodeNormalizer.ts # NFKC, zero-width removal
|
||||
├── detection/
|
||||
│ ├── RuleEngine.ts # Pattern matching engine
|
||||
│ ├── rules/ # 11 rule category files (369+ rules)
|
||||
│ ├── rules/ # 11 rule category files (547+ rules)
|
||||
│ ├── SentinelClassifier.ts # ML classifier (Ollama)
|
||||
│ ├── EmbeddingScanner.ts # Vector similarity detection
|
||||
│ ├── EntropyAnalyzer.ts # Shannon entropy analysis
|
||||
|
||||
@ -1,76 +1,76 @@
|
||||
{
|
||||
"timestamp": "2026-04-06T21:06:23.949Z",
|
||||
"timestamp": "2026-04-06T23:05:39.554Z",
|
||||
"totalSamples": 324,
|
||||
"attackSamples": 283,
|
||||
"benignSamples": 41,
|
||||
"metrics": {
|
||||
"tpr": 46.996466431095406,
|
||||
"fpr": 12.195121951219512,
|
||||
"asr": 53.003533568904594,
|
||||
"phaseAccuracy": 49.62406015037594
|
||||
"tpr": 91.87279151943463,
|
||||
"fpr": 2.4390243902439024,
|
||||
"asr": 8.127208480565372,
|
||||
"phaseAccuracy": 35
|
||||
},
|
||||
"latency": {
|
||||
"avg": 0.4293417283950612,
|
||||
"p50": 0.3298340000000053,
|
||||
"p95": 0.8533749999999998,
|
||||
"p99": 1.7199170000000095
|
||||
"avg": 0.8176280987654346,
|
||||
"p50": 0.4859580000000392,
|
||||
"p95": 1.1714580000000296,
|
||||
"p99": 1.4770839999999907
|
||||
},
|
||||
"categories": [
|
||||
{
|
||||
"category": "direct-injection",
|
||||
"samples": 53,
|
||||
"detected": 27,
|
||||
"tpr": 50.943396226415096,
|
||||
"asr": 49.056603773584904,
|
||||
"avgLatency": 0.5726265849056618
|
||||
"detected": 47,
|
||||
"tpr": 88.67924528301887,
|
||||
"asr": 11.320754716981128,
|
||||
"avgLatency": 1.5526870754716988
|
||||
},
|
||||
{
|
||||
"category": "indirect-injection",
|
||||
"samples": 31,
|
||||
"detected": 11,
|
||||
"tpr": 35.483870967741936,
|
||||
"asr": 64.51612903225806,
|
||||
"avgLatency": 0.47538719354838394
|
||||
"detected": 31,
|
||||
"tpr": 100,
|
||||
"asr": 0,
|
||||
"avgLatency": 0.6849597419354841
|
||||
},
|
||||
{
|
||||
"category": "jailbreaks",
|
||||
"samples": 40,
|
||||
"detected": 7,
|
||||
"tpr": 17.5,
|
||||
"asr": 82.5,
|
||||
"avgLatency": 0.44002830000000087
|
||||
"detected": 36,
|
||||
"tpr": 90,
|
||||
"asr": 10,
|
||||
"avgLatency": 0.6642625000000002
|
||||
},
|
||||
{
|
||||
"category": "encoding-attacks",
|
||||
"samples": 30,
|
||||
"detected": 19,
|
||||
"tpr": 63.33333333333333,
|
||||
"asr": 36.66666666666667,
|
||||
"avgLatency": 0.5879846000000005
|
||||
"detected": 24,
|
||||
"tpr": 80,
|
||||
"asr": 20,
|
||||
"avgLatency": 1.8681264666666684
|
||||
},
|
||||
{
|
||||
"category": "mcp-attacks",
|
||||
"samples": 25,
|
||||
"detected": 5,
|
||||
"tpr": 20,
|
||||
"asr": 80,
|
||||
"avgLatency": 0.4232182399999999
|
||||
"detected": 24,
|
||||
"tpr": 96,
|
||||
"asr": 4,
|
||||
"avgLatency": 0.5964100800000005
|
||||
},
|
||||
{
|
||||
"category": "multilingual-attacks",
|
||||
"samples": 29,
|
||||
"detected": 18,
|
||||
"tpr": 62.06896551724138,
|
||||
"asr": 37.93103448275862,
|
||||
"avgLatency": 0.1786394137931005
|
||||
"detected": 28,
|
||||
"tpr": 96.55172413793103,
|
||||
"asr": 3.448275862068968,
|
||||
"avgLatency": 0.29393537931034563
|
||||
},
|
||||
{
|
||||
"category": "persistence-attacks",
|
||||
"samples": 20,
|
||||
"detected": 5,
|
||||
"tpr": 25,
|
||||
"asr": 75,
|
||||
"avgLatency": 0.42862294999999906
|
||||
"detected": 20,
|
||||
"tpr": 100,
|
||||
"asr": 0,
|
||||
"avgLatency": 0.5608229500000022
|
||||
},
|
||||
{
|
||||
"category": "steganographic-attacks",
|
||||
@ -78,31 +78,31 @@
|
||||
"detected": 18,
|
||||
"tpr": 90,
|
||||
"asr": 10,
|
||||
"avgLatency": 0.3086521000000033
|
||||
"avgLatency": 0.31986450000000277
|
||||
},
|
||||
{
|
||||
"category": "tokenizer-attacks",
|
||||
"samples": 15,
|
||||
"detected": 11,
|
||||
"tpr": 73.33333333333333,
|
||||
"asr": 26.66666666666667,
|
||||
"avgLatency": 0.14189446666666375
|
||||
"detected": 13,
|
||||
"tpr": 86.66666666666667,
|
||||
"asr": 13.333333333333329,
|
||||
"avgLatency": 0.150772066666669
|
||||
},
|
||||
{
|
||||
"category": "rag-poisoning",
|
||||
"samples": 20,
|
||||
"detected": 12,
|
||||
"tpr": 60,
|
||||
"asr": 40,
|
||||
"avgLatency": 0.8367085499999973
|
||||
"detected": 19,
|
||||
"tpr": 95,
|
||||
"asr": 5,
|
||||
"avgLatency": 1.171223000000012
|
||||
},
|
||||
{
|
||||
"category": "false-positives",
|
||||
"samples": 41,
|
||||
"detected": 5,
|
||||
"detected": 1,
|
||||
"tpr": 0,
|
||||
"asr": 0,
|
||||
"avgLatency": 0.22953048780487684
|
||||
"avgLatency": 0.2935823170731779
|
||||
}
|
||||
]
|
||||
}
|
||||
File diff suppressed because it is too large
Load Diff
Loading…
x
Reference in New Issue
Block a user