# ShieldX Architecture ## Overview ShieldX is a 10-layer defense pipeline orchestrated by a single `ShieldX` class. Each layer is independently toggleable, runs in isolation, and never blocks the pipeline if it fails. The orchestrator uses `Promise.allSettled` for parallel execution and graceful degradation. ## 10-Layer Pipeline ### L0: Preprocessing **Modules:** `UnicodeNormalizer`, `TokenizerNormalizer`, `CompressedPayloadDetector` The preprocessing layer normalizes input before any detection runs. This is the only sequential layer -- all downstream scanners operate on the normalized output. - **Unicode Normalization**: NFKC normalization, invisible character removal, homoglyph detection, Bidi override stripping. Catches attacks that use visually identical characters to bypass pattern matching. - **Tokenizer Normalization**: Normalizes tokenizer-specific artifacts (zero-width joiners, soft hyphens, token-boundary exploits). Prevents attacks that exploit differences between how humans read text and how tokenizers split it. - **Compressed Payload Detection**: Detects and decodes Base64, gzip, hex-encoded, and other compressed payloads embedded in input. Decoded content is appended to the normalized input so downstream scanners can analyze it. **Performance:** <0.5ms combined. Always enabled (zero cost, high impact). ### L1: Rule Engine **Module:** `RuleEngine` Pattern-matching engine with 500+ built-in regex rules organized by kill chain phase. Rules are loaded from a seeded pattern store and can be extended at runtime through the learning engine. - Category-based rule organization (injection markers, role overrides, data exfiltration patterns) - Per-rule kill chain phase and severity mapping - Hot-reloadable: new rules from the learning engine take effect without restart **Performance:** <2ms for 500+ patterns. ### L2: Sentinel Classifier **Module:** `SentinelClassifier` (opt-in) Machine learning binary classifier trained to distinguish benign prompts from injection attempts. Operates on token-level features extracted from the normalized input. - Requires model download (not included in default install) - Outputs confidence score mapped to threat level via configurable thresholds - Runs in parallel with L1 **Performance:** <10ms. ### L3: Embedding Scanners **Modules:** `EmbeddingStore`, `EmbeddingScanner`, `EmbeddingAnomalyDetector` Semantic similarity analysis using vector embeddings. Compares input against a database of known attack embeddings stored in PostgreSQL with pgvector. - **Similarity Scanner**: Cosine similarity against known attack vectors. Catches paraphrased variants of known attacks that bypass regex patterns. - **Anomaly Detector**: Statistical outlier detection on embedding space. Identifies inputs that are structurally unusual compared to the conversation baseline. **Performance:** <200ms (requires Ollama for embedding generation). ### L4: Entropy Analysis **Module:** `EntropyScanner` Information-theoretic analysis of input text. Measures Shannon entropy, character distribution, and n-gram statistics. - High entropy can indicate encoded payloads, obfuscated injection, or adversarial token sequences - Low entropy in unexpected contexts can indicate template-based attacks - Adaptive thresholds based on conversation baseline **Performance:** <1ms. ### L5: Attention Pattern Analysis **Module:** `AttentionScanner` (opt-in) Analyzes attention weight distribution from Ollama models to detect inputs that cause abnormal attention patterns. - Detects attention hijacking (injection that captures disproportionate model attention) - Identifies attention-blind spots (content designed to avoid model attention) - Requires Ollama with attention output support **Performance:** <200ms. Runs in parallel with L3 and L4. ### L6: Behavioral Monitoring **Modules:** `ConversationTracker`, `IntentMonitor`, `ContextIntegrity`, `SessionProfiler`, `MemoryIntegrityGuard`, `AnomalyDetector`, `ContextDriftDetector`, `TrustTagger` Multi-turn conversation analysis that detects attacks spanning multiple messages. - **Conversation Tracker**: Maintains conversation state, detects turn-over-turn pattern shifts, identifies multi-step attack sequences. - **Intent Monitor**: Tracks declared vs. actual intent. Flags when the behavioral pattern diverges from the stated task description. - **Context Integrity**: Verifies that the context window has not been poisoned by injected content. Measures context poison score. - **Session Profiler**: Builds a behavioral baseline per session and flags anomalous deviations. - **Memory Integrity Guard**: Detects unauthorized modifications to conversation memory or cached instructions. - **Trust Tagger**: Assigns trust scores per data source using Bayesian updating. **Performance:** <5ms combined. ### L7: MCP Guard **Modules:** `MCPInspector`, `ToolCallInterceptor`, `PrivilegeChecker`, `ToolChainGuard`, `ToolPoisonDetector`, `ResourceGovernor`, `DecisionGraphAnalyzer`, `ManifestVerifier`, `OllamaGuard` Purpose-built protection for Model Context Protocol tool calls. - **Privilege Checker**: Enforces least-privilege per session. Only tools in the allowed set can execute. - **Tool Chain Guard**: Records tool call sequences and detects suspicious patterns (e.g., read credentials then send HTTP request). - **Tool Poison Detector**: Analyzes tool definitions and results for embedded injection attempts. - **Resource Governor**: Enforces token and API call budgets per session. - **Decision Graph Analyzer**: Builds and analyzes the agent decision tree for manipulation patterns. - **Manifest Verifier**: Cryptographic verification of MCP server manifests. **Performance:** <3ms (without Ollama-dependent features). ### L8: Sanitization **Modules:** `InputSanitizer`, `OutputSanitizer`, `CredentialRedactor`, `DelimiterHardener`, `SpotlightingEncoder`, `StructuredQueryEncoder`, `SignedPromptVerifier`, `PolymorphicAssembler` Input and output sanitization to strip injections while preserving legitimate content. - **Input Sanitizer**: Removes identified injection markers, delimiter manipulation, and role override attempts. - **Output Sanitizer**: Strips system prompt leakage, script injection, and tool-call injection from LLM responses. - **Credential Redactor**: Detects and masks API keys, tokens, passwords, and PII in output. - **Delimiter Hardener**: Strengthens prompt delimiters to resist delimiter confusion attacks. - **Spotlighting Encoder**: Implements the Microsoft Spotlighting technique -- marks data boundaries to help the LLM distinguish instructions from data. - **Structured Query Encoder**: Encodes user input into structured query format to prevent injection. - **Signed Prompt Verifier**: Verifies cryptographic signatures on system prompts. **Performance:** <1ms. ### L9: Output Validation **Modules:** `OutputValidator`, `CanaryManager`, `LeakageDetector`, `RAGShield`, `RoleIntegrityChecker`, `ScopeValidator`, `IntentGuardValidator` Post-generation validation of LLM output before it reaches the user. - **Canary Manager**: Injects unique canary tokens into system prompts. If they appear in output, system prompt extraction is confirmed. - **Leakage Detector**: Scans output for system prompt fragments, internal tool descriptions, and sensitive configuration. - **RAG Shield**: Validates RAG-retrieved documents for injection, scores document integrity, tracks provenance. - **Role Integrity Checker**: Verifies the LLM has not adopted an unauthorized role. - **Scope Validator**: Ensures the response stays within the declared scope of the task. **Performance:** <2ms. ## Data Flow Diagram ``` User Input | v [L0: Preprocess] -----> normalized input | | +------------------+------------------+ | | | | v v v v [L1: Rules] [L2: Sentinel] (parallel) | | +----------+----------+ | +----------+----------+----------+ | | | | v v v v [L3: Embed] [L4: Entropy] [L5: Attn] [Canary/YARA/Indirect] | | | | +----------+----------+----------+ | v [L6: Behavioral] | v [L7: MCP Guard] (if tool call context) | v [Aggregator] -- collects all ScanResult[] | +-----+-----+ | | v v [Kill Chain [Healing Mapper] Orchestrator] | | +-----+-----+ | v [L8: Sanitize] (if action == 'sanitize') | v [L9: Validate] (for output scans) | v ShieldXResult | v [Evolution Engine] (async, background) | +-----+-----+-----+-----+ | | | | | v v v v v [GAN] [Drift] [Active] [Fed] [Attack Red Detect Learn Sync Graph] Team ``` ## Module Dependency Graph ``` @shieldx/core | +-- core/ | +-- ShieldX.ts (orchestrator -- imports all layers) | +-- config.ts (default config, merge utility) | +-- logger.ts (Pino structured logging) | +-- types/ | +-- detection.ts (ScanResult, ShieldXResult, ShieldXConfig, etc.) | +-- healing.ts (HealingStrategy, HealingResponse) | +-- learning.ts (PatternRecord, LearningStats, DriftReport) | +-- behavioral.ts (ConversationState, IntentVector, SessionProfile) | +-- killchain.ts (KillChainPhaseDetail, KillChainClassification) | +-- compliance.ts (ATLASMapping, OWASPMapping, EUAIActReport) | +-- trust.ts (TrustTagType, DataOrigin, TrustPolicy) | +-- preprocessing/ (L0 -- no external deps) | +-- UnicodeNormalizer.ts | +-- TokenizerNormalizer.ts | +-- CompressedPayloadDetector.ts | +-- detection/ (L1-L2 -- depends on types/) | +-- RuleEngine.ts | +-- behavioral/ (L6 -- depends on types/) | +-- ConversationTracker.ts | +-- IntentMonitor.ts | +-- ContextIntegrity.ts | +-- SessionProfiler.ts | +-- MemoryIntegrityGuard.ts | +-- AnomalyDetector.ts | +-- ContextDriftDetector.ts | +-- TrustTagger.ts | +-- ToolCallValidator.ts | +-- KillChainMapper.ts | +-- mcp-guard/ (L7 -- depends on types/) | +-- MCPInspector.ts | +-- ToolCallInterceptor.ts | +-- PrivilegeChecker.ts | +-- ToolChainGuard.ts | +-- ToolPoisonDetector.ts | +-- ResourceGovernor.ts | +-- DecisionGraphAnalyzer.ts | +-- ManifestVerifier.ts | +-- OllamaGuard.ts | +-- sanitization/ (L8 -- depends on types/) | +-- InputSanitizer.ts | +-- OutputSanitizer.ts | +-- CredentialRedactor.ts | +-- DelimiterHardener.ts | +-- SpotlightingEncoder.ts | +-- StructuredQueryEncoder.ts | +-- SignedPromptVerifier.ts | +-- PolymorphicAssembler.ts | +-- validation/ (L9 -- depends on types/) | +-- OutputValidator.ts | +-- CanaryManager.ts | +-- LeakageDetector.ts | +-- RAGShield.ts | +-- RoleIntegrityChecker.ts | +-- ScopeValidator.ts | +-- IntentGuardValidator.ts | +-- healing/ (depends on types/, behavioral/) | +-- HealingOrchestrator.ts | +-- FallbackResponder.ts | +-- IncidentReporter.ts | +-- PromptReconstructor.ts | +-- SessionManager.ts | +-- learning/ (depends on types/, pg, pgvector) | +-- PatternStore.ts | +-- PatternEvolver.ts | +-- EmbeddingStore.ts | +-- RedTeamEngine.ts | +-- DriftDetector.ts | +-- ActiveLearner.ts | +-- FeedbackProcessor.ts | +-- FederatedSync.ts | +-- AttackGraph.ts | +-- ConversationLearner.ts | +-- ThresholdAdaptor.ts | +-- compliance/ (depends on types/) | +-- ATLASMapper.ts | +-- OWASPMapper.ts | +-- EUAIActReporter.ts | +-- ReportGenerator.ts | +-- supply-chain/ (depends on types/) | +-- SupplyChainVerifier.ts | +-- ModelProvenanceChecker.ts | +-- integrations/ (depends on core/) +-- nextjs/ +-- ollama/ +-- anthropic/ ``` ## External Dependencies | Dependency | Purpose | Required | |------------|---------|----------| | `pg` | PostgreSQL client for pattern/embedding storage | Only if `storageBackend: 'postgresql'` | | `pgvector` | Vector similarity operations in PostgreSQL | Only if embedding scanner enabled with postgresql | | `zod` | Runtime schema validation for configuration and input | Yes | | `pino` | Structured JSON logging | Yes | ## Performance Characteristics ### Parallel Execution Layers that have no data dependency on each other run in parallel: - L1 and L2 run in parallel - L3, L4, L5, Canary, YARA, and Indirect scanners all run in parallel - Within L6, conversation tracking, intent monitoring, and context integrity run in parallel ### Graceful Degradation Every scanner invocation is wrapped in `safeRunScanner()`: - Catches all exceptions - Logs the failure with scanner ID and error message - Returns empty results (the scanner is skipped, not the pipeline) `Promise.allSettled` ensures a slow or failing scanner never blocks others. A scanner that times out after its expected latency window simply contributes no results to the aggregation. ### Zero-Cost Defaults The default configuration enables only layers that have no external dependencies: - L0 (preprocessing): pure computation, <0.5ms - L1 (rule engine): pure computation, <2ms - L6 (behavioral): in-memory state, <5ms - L7 (MCP guard): in-memory checks, <3ms - L8 (sanitization): pure computation, <1ms Ollama-dependent layers (L3 embedding, L5 attention) and model-dependent layers (L2 sentinel) are opt-in. ### Memory Footprint - Default configuration (memory backend): ~5MB base + ~1KB per active session - With PostgreSQL backend: ~2MB base (connection pool) + patterns stored externally - Rule engine: ~500KB for 500+ compiled regex patterns - Embedding cache: configurable, default 10,000 vectors in memory ## Build Output ShieldX builds to three formats via tsup: - **CJS**: `dist/index.js` (CommonJS for Node.js require()) - **ESM**: `dist/index.mjs` (ES modules for import) - **DTS**: `dist/index.d.ts` (TypeScript declarations) Integration subpaths are available at: - `@shieldx/core/nextjs` - `@shieldx/core/ollama` - `@shieldx/core/anthropic`