shieldx/docs/architecture.md
Rene Fichtmueller 1c4c034483 feat: ShieldX v0.3.0 — UnicodeScanner (L5), DNS Covert Channel rules, ATLAS v5.4 mappings
- Layer 4 EntropyScanner: Shannon entropy, Base32/Base64 detection, CVE-2025-55284
  ping/nslookup exfil, EchoLeak markdown pattern, DNS tunneling (iodine/dnscat)
- Layer 5 UnicodeScanner: ASCII Smuggling (U+E0000 Tags Block), Variant Selectors,
  Zero-Width steganography, CamoLeak image-ordering (CVE-2025-53773), homoglyphs,
  BiDi override, high-entropy URL params
- 30 DNS covert channel rules (dns-001 to dns-030)
- ATLASMapper: 29 techniques (ATLAS v5.4.0 Feb 2026), added AML.T0062 (Agent Tool
  Invocation), AML.TA0015 (C2 tactic), memory poisoning, multi-agent trust,
  CamoLeak, Unicode steganography mappings
- Rule count: 72 → 102
- Build: tsup 316ms, zero TypeScript errors
2026-03-31 16:32:16 +02:00

15 KiB

ShieldX Architecture

Overview

ShieldX is a 10-layer defense pipeline orchestrated by a single ShieldX class. Each layer is independently toggleable, runs in isolation, and never blocks the pipeline if it fails. The orchestrator uses Promise.allSettled for parallel execution and graceful degradation.

10-Layer Pipeline

L0: Preprocessing

Modules: UnicodeNormalizer, TokenizerNormalizer, CompressedPayloadDetector

The preprocessing layer normalizes input before any detection runs. This is the only sequential layer -- all downstream scanners operate on the normalized output.

  • Unicode Normalization: NFKC normalization, invisible character removal, homoglyph detection, Bidi override stripping. Catches attacks that use visually identical characters to bypass pattern matching.
  • Tokenizer Normalization: Normalizes tokenizer-specific artifacts (zero-width joiners, soft hyphens, token-boundary exploits). Prevents attacks that exploit differences between how humans read text and how tokenizers split it.
  • Compressed Payload Detection: Detects and decodes Base64, gzip, hex-encoded, and other compressed payloads embedded in input. Decoded content is appended to the normalized input so downstream scanners can analyze it.

Performance: <0.5ms combined. Always enabled (zero cost, high impact).

L1: Rule Engine

Module: RuleEngine

Pattern-matching engine with 500+ built-in regex rules organized by kill chain phase. Rules are loaded from a seeded pattern store and can be extended at runtime through the learning engine.

  • Category-based rule organization (injection markers, role overrides, data exfiltration patterns)
  • Per-rule kill chain phase and severity mapping
  • Hot-reloadable: new rules from the learning engine take effect without restart

Performance: <2ms for 500+ patterns.

L2: Sentinel Classifier

Module: SentinelClassifier (opt-in)

Machine learning binary classifier trained to distinguish benign prompts from injection attempts. Operates on token-level features extracted from the normalized input.

  • Requires model download (not included in default install)
  • Outputs confidence score mapped to threat level via configurable thresholds
  • Runs in parallel with L1

Performance: <10ms.

L3: Embedding Scanners

Modules: EmbeddingStore, EmbeddingScanner, EmbeddingAnomalyDetector

Semantic similarity analysis using vector embeddings. Compares input against a database of known attack embeddings stored in PostgreSQL with pgvector.

  • Similarity Scanner: Cosine similarity against known attack vectors. Catches paraphrased variants of known attacks that bypass regex patterns.
  • Anomaly Detector: Statistical outlier detection on embedding space. Identifies inputs that are structurally unusual compared to the conversation baseline.

Performance: <200ms (requires Ollama for embedding generation).

L4: Entropy Analysis

Module: EntropyScanner

Information-theoretic analysis of input text. Measures Shannon entropy, character distribution, and n-gram statistics.

  • High entropy can indicate encoded payloads, obfuscated injection, or adversarial token sequences
  • Low entropy in unexpected contexts can indicate template-based attacks
  • Adaptive thresholds based on conversation baseline

Performance: <1ms.

L5: Attention Pattern Analysis

Module: AttentionScanner (opt-in)

Analyzes attention weight distribution from Ollama models to detect inputs that cause abnormal attention patterns.

  • Detects attention hijacking (injection that captures disproportionate model attention)
  • Identifies attention-blind spots (content designed to avoid model attention)
  • Requires Ollama with attention output support

Performance: <200ms. Runs in parallel with L3 and L4.

L6: Behavioral Monitoring

Modules: ConversationTracker, IntentMonitor, ContextIntegrity, SessionProfiler, MemoryIntegrityGuard, AnomalyDetector, ContextDriftDetector, TrustTagger

Multi-turn conversation analysis that detects attacks spanning multiple messages.

  • Conversation Tracker: Maintains conversation state, detects turn-over-turn pattern shifts, identifies multi-step attack sequences.
  • Intent Monitor: Tracks declared vs. actual intent. Flags when the behavioral pattern diverges from the stated task description.
  • Context Integrity: Verifies that the context window has not been poisoned by injected content. Measures context poison score.
  • Session Profiler: Builds a behavioral baseline per session and flags anomalous deviations.
  • Memory Integrity Guard: Detects unauthorized modifications to conversation memory or cached instructions.
  • Trust Tagger: Assigns trust scores per data source using Bayesian updating.

Performance: <5ms combined.

L7: MCP Guard

Modules: MCPInspector, ToolCallInterceptor, PrivilegeChecker, ToolChainGuard, ToolPoisonDetector, ResourceGovernor, DecisionGraphAnalyzer, ManifestVerifier, OllamaGuard

Purpose-built protection for Model Context Protocol tool calls.

  • Privilege Checker: Enforces least-privilege per session. Only tools in the allowed set can execute.
  • Tool Chain Guard: Records tool call sequences and detects suspicious patterns (e.g., read credentials then send HTTP request).
  • Tool Poison Detector: Analyzes tool definitions and results for embedded injection attempts.
  • Resource Governor: Enforces token and API call budgets per session.
  • Decision Graph Analyzer: Builds and analyzes the agent decision tree for manipulation patterns.
  • Manifest Verifier: Cryptographic verification of MCP server manifests.

Performance: <3ms (without Ollama-dependent features).

L8: Sanitization

Modules: InputSanitizer, OutputSanitizer, CredentialRedactor, DelimiterHardener, SpotlightingEncoder, StructuredQueryEncoder, SignedPromptVerifier, PolymorphicAssembler

Input and output sanitization to strip injections while preserving legitimate content.

  • Input Sanitizer: Removes identified injection markers, delimiter manipulation, and role override attempts.
  • Output Sanitizer: Strips system prompt leakage, script injection, and tool-call injection from LLM responses.
  • Credential Redactor: Detects and masks API keys, tokens, passwords, and PII in output.
  • Delimiter Hardener: Strengthens prompt delimiters to resist delimiter confusion attacks.
  • Spotlighting Encoder: Implements the Microsoft Spotlighting technique -- marks data boundaries to help the LLM distinguish instructions from data.
  • Structured Query Encoder: Encodes user input into structured query format to prevent injection.
  • Signed Prompt Verifier: Verifies cryptographic signatures on system prompts.

Performance: <1ms.

L9: Output Validation

Modules: OutputValidator, CanaryManager, LeakageDetector, RAGShield, RoleIntegrityChecker, ScopeValidator, IntentGuardValidator

Post-generation validation of LLM output before it reaches the user.

  • Canary Manager: Injects unique canary tokens into system prompts. If they appear in output, system prompt extraction is confirmed.
  • Leakage Detector: Scans output for system prompt fragments, internal tool descriptions, and sensitive configuration.
  • RAG Shield: Validates RAG-retrieved documents for injection, scores document integrity, tracks provenance.
  • Role Integrity Checker: Verifies the LLM has not adopted an unauthorized role.
  • Scope Validator: Ensures the response stays within the declared scope of the task.

Performance: <2ms.

Data Flow Diagram

User Input
    |
    v
[L0: Preprocess] -----> normalized input
    |
    |  +------------------+------------------+
    |  |                  |                  |
    v  v                  v                  v
  [L1: Rules]      [L2: Sentinel]     (parallel)
    |                     |
    +----------+----------+
               |
    +----------+----------+----------+
    |          |          |          |
    v          v          v          v
 [L3: Embed] [L4: Entropy] [L5: Attn] [Canary/YARA/Indirect]
    |          |          |          |
    +----------+----------+----------+
               |
               v
         [L6: Behavioral]
               |
               v
         [L7: MCP Guard] (if tool call context)
               |
               v
         [Aggregator] -- collects all ScanResult[]
               |
         +-----+-----+
         |           |
         v           v
  [Kill Chain    [Healing
   Mapper]       Orchestrator]
         |           |
         +-----+-----+
               |
               v
         [L8: Sanitize] (if action == 'sanitize')
               |
               v
         [L9: Validate] (for output scans)
               |
               v
        ShieldXResult
               |
               v
        [Evolution Engine] (async, background)
               |
         +-----+-----+-----+-----+
         |     |     |     |     |
         v     v     v     v     v
       [GAN] [Drift] [Active] [Fed] [Attack
       Red    Detect  Learn   Sync   Graph]
       Team

Module Dependency Graph

@shieldx/core
  |
  +-- core/
  |     +-- ShieldX.ts         (orchestrator -- imports all layers)
  |     +-- config.ts          (default config, merge utility)
  |     +-- logger.ts          (Pino structured logging)
  |
  +-- types/
  |     +-- detection.ts       (ScanResult, ShieldXResult, ShieldXConfig, etc.)
  |     +-- healing.ts         (HealingStrategy, HealingResponse)
  |     +-- learning.ts        (PatternRecord, LearningStats, DriftReport)
  |     +-- behavioral.ts      (ConversationState, IntentVector, SessionProfile)
  |     +-- killchain.ts       (KillChainPhaseDetail, KillChainClassification)
  |     +-- compliance.ts      (ATLASMapping, OWASPMapping, EUAIActReport)
  |     +-- trust.ts           (TrustTagType, DataOrigin, TrustPolicy)
  |
  +-- preprocessing/           (L0 -- no external deps)
  |     +-- UnicodeNormalizer.ts
  |     +-- TokenizerNormalizer.ts
  |     +-- CompressedPayloadDetector.ts
  |
  +-- detection/               (L1-L2 -- depends on types/)
  |     +-- RuleEngine.ts
  |
  +-- behavioral/              (L6 -- depends on types/)
  |     +-- ConversationTracker.ts
  |     +-- IntentMonitor.ts
  |     +-- ContextIntegrity.ts
  |     +-- SessionProfiler.ts
  |     +-- MemoryIntegrityGuard.ts
  |     +-- AnomalyDetector.ts
  |     +-- ContextDriftDetector.ts
  |     +-- TrustTagger.ts
  |     +-- ToolCallValidator.ts
  |     +-- KillChainMapper.ts
  |
  +-- mcp-guard/               (L7 -- depends on types/)
  |     +-- MCPInspector.ts
  |     +-- ToolCallInterceptor.ts
  |     +-- PrivilegeChecker.ts
  |     +-- ToolChainGuard.ts
  |     +-- ToolPoisonDetector.ts
  |     +-- ResourceGovernor.ts
  |     +-- DecisionGraphAnalyzer.ts
  |     +-- ManifestVerifier.ts
  |     +-- OllamaGuard.ts
  |
  +-- sanitization/            (L8 -- depends on types/)
  |     +-- InputSanitizer.ts
  |     +-- OutputSanitizer.ts
  |     +-- CredentialRedactor.ts
  |     +-- DelimiterHardener.ts
  |     +-- SpotlightingEncoder.ts
  |     +-- StructuredQueryEncoder.ts
  |     +-- SignedPromptVerifier.ts
  |     +-- PolymorphicAssembler.ts
  |
  +-- validation/              (L9 -- depends on types/)
  |     +-- OutputValidator.ts
  |     +-- CanaryManager.ts
  |     +-- LeakageDetector.ts
  |     +-- RAGShield.ts
  |     +-- RoleIntegrityChecker.ts
  |     +-- ScopeValidator.ts
  |     +-- IntentGuardValidator.ts
  |
  +-- healing/                 (depends on types/, behavioral/)
  |     +-- HealingOrchestrator.ts
  |     +-- FallbackResponder.ts
  |     +-- IncidentReporter.ts
  |     +-- PromptReconstructor.ts
  |     +-- SessionManager.ts
  |
  +-- learning/                (depends on types/, pg, pgvector)
  |     +-- PatternStore.ts
  |     +-- PatternEvolver.ts
  |     +-- EmbeddingStore.ts
  |     +-- RedTeamEngine.ts
  |     +-- DriftDetector.ts
  |     +-- ActiveLearner.ts
  |     +-- FeedbackProcessor.ts
  |     +-- FederatedSync.ts
  |     +-- AttackGraph.ts
  |     +-- ConversationLearner.ts
  |     +-- ThresholdAdaptor.ts
  |
  +-- compliance/              (depends on types/)
  |     +-- ATLASMapper.ts
  |     +-- OWASPMapper.ts
  |     +-- EUAIActReporter.ts
  |     +-- ReportGenerator.ts
  |
  +-- supply-chain/            (depends on types/)
  |     +-- SupplyChainVerifier.ts
  |     +-- ModelProvenanceChecker.ts
  |
  +-- integrations/            (depends on core/)
        +-- nextjs/
        +-- ollama/
        +-- anthropic/

External Dependencies

Dependency Purpose Required
pg PostgreSQL client for pattern/embedding storage Only if storageBackend: 'postgresql'
pgvector Vector similarity operations in PostgreSQL Only if embedding scanner enabled with postgresql
zod Runtime schema validation for configuration and input Yes
pino Structured JSON logging Yes

Performance Characteristics

Parallel Execution

Layers that have no data dependency on each other run in parallel:

  • L1 and L2 run in parallel
  • L3, L4, L5, Canary, YARA, and Indirect scanners all run in parallel
  • Within L6, conversation tracking, intent monitoring, and context integrity run in parallel

Graceful Degradation

Every scanner invocation is wrapped in safeRunScanner():

  • Catches all exceptions
  • Logs the failure with scanner ID and error message
  • Returns empty results (the scanner is skipped, not the pipeline)

Promise.allSettled ensures a slow or failing scanner never blocks others. A scanner that times out after its expected latency window simply contributes no results to the aggregation.

Zero-Cost Defaults

The default configuration enables only layers that have no external dependencies:

  • L0 (preprocessing): pure computation, <0.5ms
  • L1 (rule engine): pure computation, <2ms
  • L6 (behavioral): in-memory state, <5ms
  • L7 (MCP guard): in-memory checks, <3ms
  • L8 (sanitization): pure computation, <1ms

Ollama-dependent layers (L3 embedding, L5 attention) and model-dependent layers (L2 sentinel) are opt-in.

Memory Footprint

  • Default configuration (memory backend): ~5MB base + ~1KB per active session
  • With PostgreSQL backend: ~2MB base (connection pool) + patterns stored externally
  • Rule engine: ~500KB for 500+ compiled regex patterns
  • Embedding cache: configurable, default 10,000 vectors in memory

Build Output

ShieldX builds to three formats via tsup:

  • CJS: dist/index.js (CommonJS for Node.js require())
  • ESM: dist/index.mjs (ES modules for import)
  • DTS: dist/index.d.ts (TypeScript declarations)

Integration subpaths are available at:

  • @shieldx/core/nextjs
  • @shieldx/core/ollama
  • @shieldx/core/anthropic