shieldx/docs/architecture.md
Rene Fichtmueller 1c4c034483 feat: ShieldX v0.3.0 — UnicodeScanner (L5), DNS Covert Channel rules, ATLAS v5.4 mappings
- Layer 4 EntropyScanner: Shannon entropy, Base32/Base64 detection, CVE-2025-55284
  ping/nslookup exfil, EchoLeak markdown pattern, DNS tunneling (iodine/dnscat)
- Layer 5 UnicodeScanner: ASCII Smuggling (U+E0000 Tags Block), Variant Selectors,
  Zero-Width steganography, CamoLeak image-ordering (CVE-2025-53773), homoglyphs,
  BiDi override, high-entropy URL params
- 30 DNS covert channel rules (dns-001 to dns-030)
- ATLASMapper: 29 techniques (ATLAS v5.4.0 Feb 2026), added AML.T0062 (Agent Tool
  Invocation), AML.TA0015 (C2 tactic), memory poisoning, multi-agent trust,
  CamoLeak, Unicode steganography mappings
- Rule count: 72 → 102
- Build: tsup 316ms, zero TypeScript errors
2026-03-31 16:32:16 +02:00

360 lines
15 KiB
Markdown

# ShieldX Architecture
## Overview
ShieldX is a 10-layer defense pipeline orchestrated by a single `ShieldX` class. Each layer is independently toggleable, runs in isolation, and never blocks the pipeline if it fails. The orchestrator uses `Promise.allSettled` for parallel execution and graceful degradation.
## 10-Layer Pipeline
### L0: Preprocessing
**Modules:** `UnicodeNormalizer`, `TokenizerNormalizer`, `CompressedPayloadDetector`
The preprocessing layer normalizes input before any detection runs. This is the only sequential layer -- all downstream scanners operate on the normalized output.
- **Unicode Normalization**: NFKC normalization, invisible character removal, homoglyph detection, Bidi override stripping. Catches attacks that use visually identical characters to bypass pattern matching.
- **Tokenizer Normalization**: Normalizes tokenizer-specific artifacts (zero-width joiners, soft hyphens, token-boundary exploits). Prevents attacks that exploit differences between how humans read text and how tokenizers split it.
- **Compressed Payload Detection**: Detects and decodes Base64, gzip, hex-encoded, and other compressed payloads embedded in input. Decoded content is appended to the normalized input so downstream scanners can analyze it.
**Performance:** <0.5ms combined. Always enabled (zero cost, high impact).
### L1: Rule Engine
**Module:** `RuleEngine`
Pattern-matching engine with 500+ built-in regex rules organized by kill chain phase. Rules are loaded from a seeded pattern store and can be extended at runtime through the learning engine.
- Category-based rule organization (injection markers, role overrides, data exfiltration patterns)
- Per-rule kill chain phase and severity mapping
- Hot-reloadable: new rules from the learning engine take effect without restart
**Performance:** <2ms for 500+ patterns.
### L2: Sentinel Classifier
**Module:** `SentinelClassifier` (opt-in)
Machine learning binary classifier trained to distinguish benign prompts from injection attempts. Operates on token-level features extracted from the normalized input.
- Requires model download (not included in default install)
- Outputs confidence score mapped to threat level via configurable thresholds
- Runs in parallel with L1
**Performance:** <10ms.
### L3: Embedding Scanners
**Modules:** `EmbeddingStore`, `EmbeddingScanner`, `EmbeddingAnomalyDetector`
Semantic similarity analysis using vector embeddings. Compares input against a database of known attack embeddings stored in PostgreSQL with pgvector.
- **Similarity Scanner**: Cosine similarity against known attack vectors. Catches paraphrased variants of known attacks that bypass regex patterns.
- **Anomaly Detector**: Statistical outlier detection on embedding space. Identifies inputs that are structurally unusual compared to the conversation baseline.
**Performance:** <200ms (requires Ollama for embedding generation).
### L4: Entropy Analysis
**Module:** `EntropyScanner`
Information-theoretic analysis of input text. Measures Shannon entropy, character distribution, and n-gram statistics.
- High entropy can indicate encoded payloads, obfuscated injection, or adversarial token sequences
- Low entropy in unexpected contexts can indicate template-based attacks
- Adaptive thresholds based on conversation baseline
**Performance:** <1ms.
### L5: Attention Pattern Analysis
**Module:** `AttentionScanner` (opt-in)
Analyzes attention weight distribution from Ollama models to detect inputs that cause abnormal attention patterns.
- Detects attention hijacking (injection that captures disproportionate model attention)
- Identifies attention-blind spots (content designed to avoid model attention)
- Requires Ollama with attention output support
**Performance:** <200ms. Runs in parallel with L3 and L4.
### L6: Behavioral Monitoring
**Modules:** `ConversationTracker`, `IntentMonitor`, `ContextIntegrity`, `SessionProfiler`, `MemoryIntegrityGuard`, `AnomalyDetector`, `ContextDriftDetector`, `TrustTagger`
Multi-turn conversation analysis that detects attacks spanning multiple messages.
- **Conversation Tracker**: Maintains conversation state, detects turn-over-turn pattern shifts, identifies multi-step attack sequences.
- **Intent Monitor**: Tracks declared vs. actual intent. Flags when the behavioral pattern diverges from the stated task description.
- **Context Integrity**: Verifies that the context window has not been poisoned by injected content. Measures context poison score.
- **Session Profiler**: Builds a behavioral baseline per session and flags anomalous deviations.
- **Memory Integrity Guard**: Detects unauthorized modifications to conversation memory or cached instructions.
- **Trust Tagger**: Assigns trust scores per data source using Bayesian updating.
**Performance:** <5ms combined.
### L7: MCP Guard
**Modules:** `MCPInspector`, `ToolCallInterceptor`, `PrivilegeChecker`, `ToolChainGuard`, `ToolPoisonDetector`, `ResourceGovernor`, `DecisionGraphAnalyzer`, `ManifestVerifier`, `OllamaGuard`
Purpose-built protection for Model Context Protocol tool calls.
- **Privilege Checker**: Enforces least-privilege per session. Only tools in the allowed set can execute.
- **Tool Chain Guard**: Records tool call sequences and detects suspicious patterns (e.g., read credentials then send HTTP request).
- **Tool Poison Detector**: Analyzes tool definitions and results for embedded injection attempts.
- **Resource Governor**: Enforces token and API call budgets per session.
- **Decision Graph Analyzer**: Builds and analyzes the agent decision tree for manipulation patterns.
- **Manifest Verifier**: Cryptographic verification of MCP server manifests.
**Performance:** <3ms (without Ollama-dependent features).
### L8: Sanitization
**Modules:** `InputSanitizer`, `OutputSanitizer`, `CredentialRedactor`, `DelimiterHardener`, `SpotlightingEncoder`, `StructuredQueryEncoder`, `SignedPromptVerifier`, `PolymorphicAssembler`
Input and output sanitization to strip injections while preserving legitimate content.
- **Input Sanitizer**: Removes identified injection markers, delimiter manipulation, and role override attempts.
- **Output Sanitizer**: Strips system prompt leakage, script injection, and tool-call injection from LLM responses.
- **Credential Redactor**: Detects and masks API keys, tokens, passwords, and PII in output.
- **Delimiter Hardener**: Strengthens prompt delimiters to resist delimiter confusion attacks.
- **Spotlighting Encoder**: Implements the Microsoft Spotlighting technique -- marks data boundaries to help the LLM distinguish instructions from data.
- **Structured Query Encoder**: Encodes user input into structured query format to prevent injection.
- **Signed Prompt Verifier**: Verifies cryptographic signatures on system prompts.
**Performance:** <1ms.
### L9: Output Validation
**Modules:** `OutputValidator`, `CanaryManager`, `LeakageDetector`, `RAGShield`, `RoleIntegrityChecker`, `ScopeValidator`, `IntentGuardValidator`
Post-generation validation of LLM output before it reaches the user.
- **Canary Manager**: Injects unique canary tokens into system prompts. If they appear in output, system prompt extraction is confirmed.
- **Leakage Detector**: Scans output for system prompt fragments, internal tool descriptions, and sensitive configuration.
- **RAG Shield**: Validates RAG-retrieved documents for injection, scores document integrity, tracks provenance.
- **Role Integrity Checker**: Verifies the LLM has not adopted an unauthorized role.
- **Scope Validator**: Ensures the response stays within the declared scope of the task.
**Performance:** <2ms.
## Data Flow Diagram
```
User Input
|
v
[L0: Preprocess] -----> normalized input
|
| +------------------+------------------+
| | | |
v v v v
[L1: Rules] [L2: Sentinel] (parallel)
| |
+----------+----------+
|
+----------+----------+----------+
| | | |
v v v v
[L3: Embed] [L4: Entropy] [L5: Attn] [Canary/YARA/Indirect]
| | | |
+----------+----------+----------+
|
v
[L6: Behavioral]
|
v
[L7: MCP Guard] (if tool call context)
|
v
[Aggregator] -- collects all ScanResult[]
|
+-----+-----+
| |
v v
[Kill Chain [Healing
Mapper] Orchestrator]
| |
+-----+-----+
|
v
[L8: Sanitize] (if action == 'sanitize')
|
v
[L9: Validate] (for output scans)
|
v
ShieldXResult
|
v
[Evolution Engine] (async, background)
|
+-----+-----+-----+-----+
| | | | |
v v v v v
[GAN] [Drift] [Active] [Fed] [Attack
Red Detect Learn Sync Graph]
Team
```
## Module Dependency Graph
```
@shieldx/core
|
+-- core/
| +-- ShieldX.ts (orchestrator -- imports all layers)
| +-- config.ts (default config, merge utility)
| +-- logger.ts (Pino structured logging)
|
+-- types/
| +-- detection.ts (ScanResult, ShieldXResult, ShieldXConfig, etc.)
| +-- healing.ts (HealingStrategy, HealingResponse)
| +-- learning.ts (PatternRecord, LearningStats, DriftReport)
| +-- behavioral.ts (ConversationState, IntentVector, SessionProfile)
| +-- killchain.ts (KillChainPhaseDetail, KillChainClassification)
| +-- compliance.ts (ATLASMapping, OWASPMapping, EUAIActReport)
| +-- trust.ts (TrustTagType, DataOrigin, TrustPolicy)
|
+-- preprocessing/ (L0 -- no external deps)
| +-- UnicodeNormalizer.ts
| +-- TokenizerNormalizer.ts
| +-- CompressedPayloadDetector.ts
|
+-- detection/ (L1-L2 -- depends on types/)
| +-- RuleEngine.ts
|
+-- behavioral/ (L6 -- depends on types/)
| +-- ConversationTracker.ts
| +-- IntentMonitor.ts
| +-- ContextIntegrity.ts
| +-- SessionProfiler.ts
| +-- MemoryIntegrityGuard.ts
| +-- AnomalyDetector.ts
| +-- ContextDriftDetector.ts
| +-- TrustTagger.ts
| +-- ToolCallValidator.ts
| +-- KillChainMapper.ts
|
+-- mcp-guard/ (L7 -- depends on types/)
| +-- MCPInspector.ts
| +-- ToolCallInterceptor.ts
| +-- PrivilegeChecker.ts
| +-- ToolChainGuard.ts
| +-- ToolPoisonDetector.ts
| +-- ResourceGovernor.ts
| +-- DecisionGraphAnalyzer.ts
| +-- ManifestVerifier.ts
| +-- OllamaGuard.ts
|
+-- sanitization/ (L8 -- depends on types/)
| +-- InputSanitizer.ts
| +-- OutputSanitizer.ts
| +-- CredentialRedactor.ts
| +-- DelimiterHardener.ts
| +-- SpotlightingEncoder.ts
| +-- StructuredQueryEncoder.ts
| +-- SignedPromptVerifier.ts
| +-- PolymorphicAssembler.ts
|
+-- validation/ (L9 -- depends on types/)
| +-- OutputValidator.ts
| +-- CanaryManager.ts
| +-- LeakageDetector.ts
| +-- RAGShield.ts
| +-- RoleIntegrityChecker.ts
| +-- ScopeValidator.ts
| +-- IntentGuardValidator.ts
|
+-- healing/ (depends on types/, behavioral/)
| +-- HealingOrchestrator.ts
| +-- FallbackResponder.ts
| +-- IncidentReporter.ts
| +-- PromptReconstructor.ts
| +-- SessionManager.ts
|
+-- learning/ (depends on types/, pg, pgvector)
| +-- PatternStore.ts
| +-- PatternEvolver.ts
| +-- EmbeddingStore.ts
| +-- RedTeamEngine.ts
| +-- DriftDetector.ts
| +-- ActiveLearner.ts
| +-- FeedbackProcessor.ts
| +-- FederatedSync.ts
| +-- AttackGraph.ts
| +-- ConversationLearner.ts
| +-- ThresholdAdaptor.ts
|
+-- compliance/ (depends on types/)
| +-- ATLASMapper.ts
| +-- OWASPMapper.ts
| +-- EUAIActReporter.ts
| +-- ReportGenerator.ts
|
+-- supply-chain/ (depends on types/)
| +-- SupplyChainVerifier.ts
| +-- ModelProvenanceChecker.ts
|
+-- integrations/ (depends on core/)
+-- nextjs/
+-- ollama/
+-- anthropic/
```
## External Dependencies
| Dependency | Purpose | Required |
|------------|---------|----------|
| `pg` | PostgreSQL client for pattern/embedding storage | Only if `storageBackend: 'postgresql'` |
| `pgvector` | Vector similarity operations in PostgreSQL | Only if embedding scanner enabled with postgresql |
| `zod` | Runtime schema validation for configuration and input | Yes |
| `pino` | Structured JSON logging | Yes |
## Performance Characteristics
### Parallel Execution
Layers that have no data dependency on each other run in parallel:
- L1 and L2 run in parallel
- L3, L4, L5, Canary, YARA, and Indirect scanners all run in parallel
- Within L6, conversation tracking, intent monitoring, and context integrity run in parallel
### Graceful Degradation
Every scanner invocation is wrapped in `safeRunScanner()`:
- Catches all exceptions
- Logs the failure with scanner ID and error message
- Returns empty results (the scanner is skipped, not the pipeline)
`Promise.allSettled` ensures a slow or failing scanner never blocks others. A scanner that times out after its expected latency window simply contributes no results to the aggregation.
### Zero-Cost Defaults
The default configuration enables only layers that have no external dependencies:
- L0 (preprocessing): pure computation, <0.5ms
- L1 (rule engine): pure computation, <2ms
- L6 (behavioral): in-memory state, <5ms
- L7 (MCP guard): in-memory checks, <3ms
- L8 (sanitization): pure computation, <1ms
Ollama-dependent layers (L3 embedding, L5 attention) and model-dependent layers (L2 sentinel) are opt-in.
### Memory Footprint
- Default configuration (memory backend): ~5MB base + ~1KB per active session
- With PostgreSQL backend: ~2MB base (connection pool) + patterns stored externally
- Rule engine: ~500KB for 500+ compiled regex patterns
- Embedding cache: configurable, default 10,000 vectors in memory
## Build Output
ShieldX builds to three formats via tsup:
- **CJS**: `dist/index.js` (CommonJS for Node.js require())
- **ESM**: `dist/index.mjs` (ES modules for import)
- **DTS**: `dist/index.d.ts` (TypeScript declarations)
Integration subpaths are available at:
- `@shieldx/core/nextjs`
- `@shieldx/core/ollama`
- `@shieldx/core/anthropic`