shieldx/CHANGELOG.md
Rene Fichtmueller ca02998a28 feat: ShieldX v0.5.0 — full defense evolution + pentest hardening
4-phase defense evolution (Bio-Immune, Adversarial, Ensemble, ATLAS)
with ~200 new detection rules across 20 languages.

TPR 32.9% → 70.8%, FPR 12.2% → 0.0%

New modules: DefenseEnsemble, AtlasTechniqueMapper, EvolutionEngine,
ImmuneMemory, FeverResponse, MELONGuard, AdversarialTrainer,
DecompositionDetector, IndirectInjectionDetector, OutputPayloadGuard,
ToolCallSafetyGuard, AuthContextGuard, ResourceExhaustionDetector,
TokenizerDeobfuscation, Binary/Hex decoder, OverDefenseCalibrator
2026-04-07 00:27:12 +02:00

144 lines
7.8 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Changelog
All notable changes to `@shieldx/core` are documented here.
---
## [0.5.0] — 2026-04-07
### Added — Full Defense Evolution (Phases 0b3) + Pentest Hardening
Massive security hardening release: TPR 32.9% → 70.8%, FPR 12.2% → 0.0%.
#### Phase 0b: Infrastructure Defense
- **IndirectInjectionDetector** — 5 categories, 24 regex patterns for RAG/tool/email injection
- **ResourceExhaustionDetector** — Token bomb, context stuffing, recursive loops, batch amplification
- **OutputPayloadGuard** — 37 patterns (SQL injection, XSS, SSRF, shell, path traversal) in LLM output
- **ToolCallSafetyGuard** — Context-aware tool validation (shell/db/http/file categories)
- **AuthContextGuard** — Role escalation + permission bypass (input/output scanning)
- **EmojiSmugglingDetector** — Regional indicators, keycap sequences, skin tone data carriers
- **UpsideDownTextDetector** — 26+ upside-down Unicode chars normalization
#### Phase 1: Bio-Immune Defense
- **EvolutionEngine** — 30 built-in probes, 6-step closed-loop (probe→gap→rule→validate→deploy→rollback)
- **ImmuneMemory** — Clonal selection with pgvector embeddings, 10K memory cap, 7-day decay
- **FeverResponse** — 30min elevated alertness after high-severity detection
- **OverDefenseCalibrator** — Benign corpus validation, per-scanner FPR, suppression candidates
#### Phase 2: Adversarial Self-Training
- **MELONGuard** (ICML 2025) — Injection-driven tool call detection without user context
- **AdversarialTrainer** (IEEE S&P 2025) — Minimax attacker/defender loops
- **DecompositionDetector** — 4 multi-turn techniques (boiling frog, topic drift, roleplay chain, fragment assembly)
#### Phase 3: Defense Ensemble + ATLAS Mapping
- **DefenseEnsemble** — 3-voter weighted majority (Rule 0.35, Semantic 0.30, Behavioral 0.35)
- **AtlasTechniqueMapper** — 90 MITRE ATLAS techniques across 8 tactics mapped to all scanners
- Results include `ensemble` and `atlasMapping` fields on every ShieldXResult
#### Rule Engine Expansion (~200 new rules)
- **base.rules.ts**: io-011io-131 — temporal framing, negation override, fake errors, policy spoofing, test env claims, sudo, conversation reset, semantic redefinition
- **jailbreak.rules.ts**: rs-011rs-068 — grandmother trick, 15+ persona names, game framing, fiction wrapping, dual response, villain persona, thought experiments
- **persistence.rules.ts**: pp-011pp-030 — temporal persistence, config injection, signal words, anti-detection, data accumulation
- **mcp.rules.ts**: mcp-011mcp-036 — AI directives in tool args, hidden JSON fields, BCC injection, shadow webhooks, auto-sudo
- **multilingual.rules.ts**: ml-001aml-020 — 20 languages (DE, FR, ES, RU, JA, KO, AR, PT, TR, TH, HI, IT, NL, PL, VI + homoglyph, polyglot, translation wrapping)
- **extraction.rules.ts**: pe-009pe-013 — credential extraction, env var dumps, sensitive file access
- **delimiter.rules.ts**: da-008da-009 — LLaMA `<<SYS>>` tokens, END SYSTEM PROMPT markers
#### Preprocessing Improvements
- **TokenizerNormalizer**: Deobfuscation for split-word attacks (I.g.n.o.r.e, Ig-no-re, igno re)
- **CipherDecoder**: Binary decoder, hex decoder, "decode and execute" wrapper detection
- **CipherDecoder FP fix**: flip_attack_word and leet_speak now only flag NEW keywords after transformation
#### Benchmark
- `tests/benchmark/detection-rate.ts` — Full corpus benchmark (12 attack files, 455 payloads, 41 benign)
### Benchmark Results (v0.5.0)
| Metric | v0.4.0 | v0.5.0 |
|--------|--------|--------|
| TPR | 32.9% | **70.8%** |
| FPR | 12.2% | **0.0%** |
| Scanners | ~15 | **30+** |
| Rules | ~80 | **~280** |
| ATLAS techniques | 0 | **90** |
| Languages | 5 | **20** |
---
## [0.4.0] — 2026-04-04
### Added — Research-driven security hardening (sarendis56/Jailbreak_Detection_RCS)
Three detection gaps identified from peer-reviewed LLM security research
(arXiv:2512.12069, arXiv:2407.07403, Awesome-Jailbreak-on-LLMs survey) closed:
#### L0: CipherDecoder — `src/preprocessing/CipherDecoder.ts`
New preprocessing module detecting 7 character-level cipher obfuscation attacks:
- **FlipAttack** — character and word-level text reversal (checks reversed form against jailbreak keyword list)
- **ROT13** — detected via English bigram frequency improvement >20% after decode
- **Caesar cipher** — all 25 shifts tried; best candidate returned if bigram score improves or keyword match found
- **Morse code** — dot/dash/space ratio validation + full 36-symbol decode table
- **Leet speak** — 15-character substitution map normalization (3→e, 4→a, 1→i, 0→o, 5→s ...)
- **Pig Latin** — word-ending density check (>40% of words ending in `ay`/`way`)
- **ASCII art** — whitespace-to-char ratio >40% + consistent multi-line width flagged
- Suspicion scoring: cipher with harmful keyword match → 0.7; cipher only → 0.3; +0.1 per additional cipher
#### L2: SemanticContrastiveScanner — `src/semantic/SemanticContrastiveScanner.ts`
New semantic layer implementing the RCS (Representational Contrastive Scoring) approach:
- Queries `EmbeddingStore` for top-5 nearest neighbours per input embedding
- Separates neighbours into harmful (`threatLevel > 0.5`) and benign (`threatLevel ≤ 0.2`) buckets
- Computes `contrastiveScore = harmfulSimilarity benignSimilarity`
- Thresholds: score >0.3 → `harmful` (suspicion 0.8); >0.1 → `suspicious` (0.4); else `clean`
- `seedHarmfulExamples()` pre-populates 20 canonical jailbreak + 5 benign anchors via BoW fallback
- `bagOfWordsEmbedding()` — deterministic FNV-1a hashed, L2-normalised 128-dim embedding for offline use
- Gracefully returns `clean` when EmbeddingStore is empty (no pgvector required for basic use)
- `toScanResult()` converts to standard pipeline `ScanResult` for future L2 wiring
#### L6: Multi-turn escalation patterns — `src/behavioral/ConversationTracker.ts`
Three advanced multi-turn attack patterns added to the existing suspicion accumulation pipeline:
- **Crescendo** — 3+ consecutive turns with increasing harmfulness delta >0.05 each → +0.35 suspicion
- **Foot-in-the-Door (FITD)** — 2+ benign turns (harm <0.1) followed by harmfulness jump >0.4 → +0.40
- **Jigsaw Puzzle** — same sensitive topic category (system_prompt, credentials, api_keys, internal_instructions, model_training, bypass_methods) appearing in 3+ turns → +0.45
- New `EscalationPattern` union type: `'crescendo' | 'foot_in_door' | 'jigsaw_puzzle'`
- New optional state fields: `crescendoScore`, `initialBenignTurns`, `jigsawTopics`
- Patterns wired into both `addTurn()` and `scan()` — all additive, no existing thresholds changed
### Added — Research reference library
- `research/sarendis56-jailbreak-reference.md` — Comprehensive mapping of 100+ jailbreak papers to ShieldX layers
- Cloned: `Jailbreak_Detection_RCS`, `Awesome-Jailbreak-on-LLMs`, `Awesome-LVLM-Attack`, `Awesome-LVLM-Safety`
### Tests
- 292/294 passing (2 pre-existing `ATLASMapper` failures unrelated to this release)
- All 3 new modules: no new test failures introduced
---
## [0.3.0] — 2026-04-03
- UnicodeScanner (L5) — steganographic Unicode detection
- DNS Covert Channel rules (10th rule category)
- MITRE ATLAS v5.4 technique mappings
- MCP rules 007010 — Claude Code source map leak countermeasures
- Daily arXiv + HackerNews security monitor script
---
## [0.2.0] — earlier
- 8-layer detection pipeline
- pgvector EmbeddingStore
- MITRE ATLAS, OWASP, EU AI Act compliance mappers
- Next.js, Anthropic, Ollama, n8n integrations
- Self-healing orchestrator (7 phases)
- RedTeamEngine + ActiveLearner
---
## [0.1.0] — initial release
- Core ShieldX pipeline
- RuleEngine with 9 rule categories
- EntropyScanner (Shannon entropy, DNS covert channel detection)
- UnicodeNormalizer + TokenizerNormalizer
- ConversationTracker (multi-turn behavioral monitoring)
- KillChainMapper (MITRE ATT&CK phases)