- Layer 4 EntropyScanner: Shannon entropy, Base32/Base64 detection, CVE-2025-55284 ping/nslookup exfil, EchoLeak markdown pattern, DNS tunneling (iodine/dnscat) - Layer 5 UnicodeScanner: ASCII Smuggling (U+E0000 Tags Block), Variant Selectors, Zero-Width steganography, CamoLeak image-ordering (CVE-2025-53773), homoglyphs, BiDi override, high-entropy URL params - 30 DNS covert channel rules (dns-001 to dns-030) - ATLASMapper: 29 techniques (ATLAS v5.4.0 Feb 2026), added AML.T0062 (Agent Tool Invocation), AML.TA0015 (C2 tactic), memory poisoning, multi-agent trust, CamoLeak, Unicode steganography mappings - Rule count: 72 → 102 - Build: tsup 316ms, zero TypeScript errors
15 KiB
ShieldX Architecture
Overview
ShieldX is a 10-layer defense pipeline orchestrated by a single ShieldX class. Each layer is independently toggleable, runs in isolation, and never blocks the pipeline if it fails. The orchestrator uses Promise.allSettled for parallel execution and graceful degradation.
10-Layer Pipeline
L0: Preprocessing
Modules: UnicodeNormalizer, TokenizerNormalizer, CompressedPayloadDetector
The preprocessing layer normalizes input before any detection runs. This is the only sequential layer -- all downstream scanners operate on the normalized output.
- Unicode Normalization: NFKC normalization, invisible character removal, homoglyph detection, Bidi override stripping. Catches attacks that use visually identical characters to bypass pattern matching.
- Tokenizer Normalization: Normalizes tokenizer-specific artifacts (zero-width joiners, soft hyphens, token-boundary exploits). Prevents attacks that exploit differences between how humans read text and how tokenizers split it.
- Compressed Payload Detection: Detects and decodes Base64, gzip, hex-encoded, and other compressed payloads embedded in input. Decoded content is appended to the normalized input so downstream scanners can analyze it.
Performance: <0.5ms combined. Always enabled (zero cost, high impact).
L1: Rule Engine
Module: RuleEngine
Pattern-matching engine with 500+ built-in regex rules organized by kill chain phase. Rules are loaded from a seeded pattern store and can be extended at runtime through the learning engine.
- Category-based rule organization (injection markers, role overrides, data exfiltration patterns)
- Per-rule kill chain phase and severity mapping
- Hot-reloadable: new rules from the learning engine take effect without restart
Performance: <2ms for 500+ patterns.
L2: Sentinel Classifier
Module: SentinelClassifier (opt-in)
Machine learning binary classifier trained to distinguish benign prompts from injection attempts. Operates on token-level features extracted from the normalized input.
- Requires model download (not included in default install)
- Outputs confidence score mapped to threat level via configurable thresholds
- Runs in parallel with L1
Performance: <10ms.
L3: Embedding Scanners
Modules: EmbeddingStore, EmbeddingScanner, EmbeddingAnomalyDetector
Semantic similarity analysis using vector embeddings. Compares input against a database of known attack embeddings stored in PostgreSQL with pgvector.
- Similarity Scanner: Cosine similarity against known attack vectors. Catches paraphrased variants of known attacks that bypass regex patterns.
- Anomaly Detector: Statistical outlier detection on embedding space. Identifies inputs that are structurally unusual compared to the conversation baseline.
Performance: <200ms (requires Ollama for embedding generation).
L4: Entropy Analysis
Module: EntropyScanner
Information-theoretic analysis of input text. Measures Shannon entropy, character distribution, and n-gram statistics.
- High entropy can indicate encoded payloads, obfuscated injection, or adversarial token sequences
- Low entropy in unexpected contexts can indicate template-based attacks
- Adaptive thresholds based on conversation baseline
Performance: <1ms.
L5: Attention Pattern Analysis
Module: AttentionScanner (opt-in)
Analyzes attention weight distribution from Ollama models to detect inputs that cause abnormal attention patterns.
- Detects attention hijacking (injection that captures disproportionate model attention)
- Identifies attention-blind spots (content designed to avoid model attention)
- Requires Ollama with attention output support
Performance: <200ms. Runs in parallel with L3 and L4.
L6: Behavioral Monitoring
Modules: ConversationTracker, IntentMonitor, ContextIntegrity, SessionProfiler, MemoryIntegrityGuard, AnomalyDetector, ContextDriftDetector, TrustTagger
Multi-turn conversation analysis that detects attacks spanning multiple messages.
- Conversation Tracker: Maintains conversation state, detects turn-over-turn pattern shifts, identifies multi-step attack sequences.
- Intent Monitor: Tracks declared vs. actual intent. Flags when the behavioral pattern diverges from the stated task description.
- Context Integrity: Verifies that the context window has not been poisoned by injected content. Measures context poison score.
- Session Profiler: Builds a behavioral baseline per session and flags anomalous deviations.
- Memory Integrity Guard: Detects unauthorized modifications to conversation memory or cached instructions.
- Trust Tagger: Assigns trust scores per data source using Bayesian updating.
Performance: <5ms combined.
L7: MCP Guard
Modules: MCPInspector, ToolCallInterceptor, PrivilegeChecker, ToolChainGuard, ToolPoisonDetector, ResourceGovernor, DecisionGraphAnalyzer, ManifestVerifier, OllamaGuard
Purpose-built protection for Model Context Protocol tool calls.
- Privilege Checker: Enforces least-privilege per session. Only tools in the allowed set can execute.
- Tool Chain Guard: Records tool call sequences and detects suspicious patterns (e.g., read credentials then send HTTP request).
- Tool Poison Detector: Analyzes tool definitions and results for embedded injection attempts.
- Resource Governor: Enforces token and API call budgets per session.
- Decision Graph Analyzer: Builds and analyzes the agent decision tree for manipulation patterns.
- Manifest Verifier: Cryptographic verification of MCP server manifests.
Performance: <3ms (without Ollama-dependent features).
L8: Sanitization
Modules: InputSanitizer, OutputSanitizer, CredentialRedactor, DelimiterHardener, SpotlightingEncoder, StructuredQueryEncoder, SignedPromptVerifier, PolymorphicAssembler
Input and output sanitization to strip injections while preserving legitimate content.
- Input Sanitizer: Removes identified injection markers, delimiter manipulation, and role override attempts.
- Output Sanitizer: Strips system prompt leakage, script injection, and tool-call injection from LLM responses.
- Credential Redactor: Detects and masks API keys, tokens, passwords, and PII in output.
- Delimiter Hardener: Strengthens prompt delimiters to resist delimiter confusion attacks.
- Spotlighting Encoder: Implements the Microsoft Spotlighting technique -- marks data boundaries to help the LLM distinguish instructions from data.
- Structured Query Encoder: Encodes user input into structured query format to prevent injection.
- Signed Prompt Verifier: Verifies cryptographic signatures on system prompts.
Performance: <1ms.
L9: Output Validation
Modules: OutputValidator, CanaryManager, LeakageDetector, RAGShield, RoleIntegrityChecker, ScopeValidator, IntentGuardValidator
Post-generation validation of LLM output before it reaches the user.
- Canary Manager: Injects unique canary tokens into system prompts. If they appear in output, system prompt extraction is confirmed.
- Leakage Detector: Scans output for system prompt fragments, internal tool descriptions, and sensitive configuration.
- RAG Shield: Validates RAG-retrieved documents for injection, scores document integrity, tracks provenance.
- Role Integrity Checker: Verifies the LLM has not adopted an unauthorized role.
- Scope Validator: Ensures the response stays within the declared scope of the task.
Performance: <2ms.
Data Flow Diagram
User Input
|
v
[L0: Preprocess] -----> normalized input
|
| +------------------+------------------+
| | | |
v v v v
[L1: Rules] [L2: Sentinel] (parallel)
| |
+----------+----------+
|
+----------+----------+----------+
| | | |
v v v v
[L3: Embed] [L4: Entropy] [L5: Attn] [Canary/YARA/Indirect]
| | | |
+----------+----------+----------+
|
v
[L6: Behavioral]
|
v
[L7: MCP Guard] (if tool call context)
|
v
[Aggregator] -- collects all ScanResult[]
|
+-----+-----+
| |
v v
[Kill Chain [Healing
Mapper] Orchestrator]
| |
+-----+-----+
|
v
[L8: Sanitize] (if action == 'sanitize')
|
v
[L9: Validate] (for output scans)
|
v
ShieldXResult
|
v
[Evolution Engine] (async, background)
|
+-----+-----+-----+-----+
| | | | |
v v v v v
[GAN] [Drift] [Active] [Fed] [Attack
Red Detect Learn Sync Graph]
Team
Module Dependency Graph
@shieldx/core
|
+-- core/
| +-- ShieldX.ts (orchestrator -- imports all layers)
| +-- config.ts (default config, merge utility)
| +-- logger.ts (Pino structured logging)
|
+-- types/
| +-- detection.ts (ScanResult, ShieldXResult, ShieldXConfig, etc.)
| +-- healing.ts (HealingStrategy, HealingResponse)
| +-- learning.ts (PatternRecord, LearningStats, DriftReport)
| +-- behavioral.ts (ConversationState, IntentVector, SessionProfile)
| +-- killchain.ts (KillChainPhaseDetail, KillChainClassification)
| +-- compliance.ts (ATLASMapping, OWASPMapping, EUAIActReport)
| +-- trust.ts (TrustTagType, DataOrigin, TrustPolicy)
|
+-- preprocessing/ (L0 -- no external deps)
| +-- UnicodeNormalizer.ts
| +-- TokenizerNormalizer.ts
| +-- CompressedPayloadDetector.ts
|
+-- detection/ (L1-L2 -- depends on types/)
| +-- RuleEngine.ts
|
+-- behavioral/ (L6 -- depends on types/)
| +-- ConversationTracker.ts
| +-- IntentMonitor.ts
| +-- ContextIntegrity.ts
| +-- SessionProfiler.ts
| +-- MemoryIntegrityGuard.ts
| +-- AnomalyDetector.ts
| +-- ContextDriftDetector.ts
| +-- TrustTagger.ts
| +-- ToolCallValidator.ts
| +-- KillChainMapper.ts
|
+-- mcp-guard/ (L7 -- depends on types/)
| +-- MCPInspector.ts
| +-- ToolCallInterceptor.ts
| +-- PrivilegeChecker.ts
| +-- ToolChainGuard.ts
| +-- ToolPoisonDetector.ts
| +-- ResourceGovernor.ts
| +-- DecisionGraphAnalyzer.ts
| +-- ManifestVerifier.ts
| +-- OllamaGuard.ts
|
+-- sanitization/ (L8 -- depends on types/)
| +-- InputSanitizer.ts
| +-- OutputSanitizer.ts
| +-- CredentialRedactor.ts
| +-- DelimiterHardener.ts
| +-- SpotlightingEncoder.ts
| +-- StructuredQueryEncoder.ts
| +-- SignedPromptVerifier.ts
| +-- PolymorphicAssembler.ts
|
+-- validation/ (L9 -- depends on types/)
| +-- OutputValidator.ts
| +-- CanaryManager.ts
| +-- LeakageDetector.ts
| +-- RAGShield.ts
| +-- RoleIntegrityChecker.ts
| +-- ScopeValidator.ts
| +-- IntentGuardValidator.ts
|
+-- healing/ (depends on types/, behavioral/)
| +-- HealingOrchestrator.ts
| +-- FallbackResponder.ts
| +-- IncidentReporter.ts
| +-- PromptReconstructor.ts
| +-- SessionManager.ts
|
+-- learning/ (depends on types/, pg, pgvector)
| +-- PatternStore.ts
| +-- PatternEvolver.ts
| +-- EmbeddingStore.ts
| +-- RedTeamEngine.ts
| +-- DriftDetector.ts
| +-- ActiveLearner.ts
| +-- FeedbackProcessor.ts
| +-- FederatedSync.ts
| +-- AttackGraph.ts
| +-- ConversationLearner.ts
| +-- ThresholdAdaptor.ts
|
+-- compliance/ (depends on types/)
| +-- ATLASMapper.ts
| +-- OWASPMapper.ts
| +-- EUAIActReporter.ts
| +-- ReportGenerator.ts
|
+-- supply-chain/ (depends on types/)
| +-- SupplyChainVerifier.ts
| +-- ModelProvenanceChecker.ts
|
+-- integrations/ (depends on core/)
+-- nextjs/
+-- ollama/
+-- anthropic/
External Dependencies
| Dependency | Purpose | Required |
|---|---|---|
pg |
PostgreSQL client for pattern/embedding storage | Only if storageBackend: 'postgresql' |
pgvector |
Vector similarity operations in PostgreSQL | Only if embedding scanner enabled with postgresql |
zod |
Runtime schema validation for configuration and input | Yes |
pino |
Structured JSON logging | Yes |
Performance Characteristics
Parallel Execution
Layers that have no data dependency on each other run in parallel:
- L1 and L2 run in parallel
- L3, L4, L5, Canary, YARA, and Indirect scanners all run in parallel
- Within L6, conversation tracking, intent monitoring, and context integrity run in parallel
Graceful Degradation
Every scanner invocation is wrapped in safeRunScanner():
- Catches all exceptions
- Logs the failure with scanner ID and error message
- Returns empty results (the scanner is skipped, not the pipeline)
Promise.allSettled ensures a slow or failing scanner never blocks others. A scanner that times out after its expected latency window simply contributes no results to the aggregation.
Zero-Cost Defaults
The default configuration enables only layers that have no external dependencies:
- L0 (preprocessing): pure computation, <0.5ms
- L1 (rule engine): pure computation, <2ms
- L6 (behavioral): in-memory state, <5ms
- L7 (MCP guard): in-memory checks, <3ms
- L8 (sanitization): pure computation, <1ms
Ollama-dependent layers (L3 embedding, L5 attention) and model-dependent layers (L2 sentinel) are opt-in.
Memory Footprint
- Default configuration (memory backend): ~5MB base + ~1KB per active session
- With PostgreSQL backend: ~2MB base (connection pool) + patterns stored externally
- Rule engine: ~500KB for 500+ compiled regex patterns
- Embedding cache: configurable, default 10,000 vectors in memory
Build Output
ShieldX builds to three formats via tsup:
- CJS:
dist/index.js(CommonJS for Node.js require()) - ESM:
dist/index.mjs(ES modules for import) - DTS:
dist/index.d.ts(TypeScript declarations)
Integration subpaths are available at:
@shieldx/core/nextjs@shieldx/core/ollama@shieldx/core/anthropic