- Layer 4 EntropyScanner: Shannon entropy, Base32/Base64 detection, CVE-2025-55284 ping/nslookup exfil, EchoLeak markdown pattern, DNS tunneling (iodine/dnscat) - Layer 5 UnicodeScanner: ASCII Smuggling (U+E0000 Tags Block), Variant Selectors, Zero-Width steganography, CamoLeak image-ordering (CVE-2025-53773), homoglyphs, BiDi override, high-entropy URL params - 30 DNS covert channel rules (dns-001 to dns-030) - ATLASMapper: 29 techniques (ATLAS v5.4.0 Feb 2026), added AML.T0062 (Agent Tool Invocation), AML.TA0015 (C2 tactic), memory poisoning, multi-agent trust, CamoLeak, Unicode steganography mappings - Rule count: 72 → 102 - Build: tsup 316ms, zero TypeScript errors
13 KiB
Self-Evolution Engine
Overview
ShieldX models its self-learning system on biological immune systems. The defense evolves continuously without manual rule updates. Five mechanisms work together: innate immunity (static rules), adaptive immunity (ML classifiers), immune memory (vector database), antibody generation (GAN red team), and herd immunity (federated sync).
All evolution happens locally by default. No data leaves your infrastructure unless you explicitly enable community sync.
Architecture
New Scan Results
|
+-------------+-------------+
| | |
v v v
[Feedback [Drift [Attack
Processor] Detector] Graph]
| | |
v v v
[Active [Threshold [Pattern
Learner] Adaptor] Evolver]
| | |
+------+------+------+-----+
| |
v v
[Pattern Store] [Embedding Store]
| |
+------+------+
|
+------+------+
| |
v v
[Red Team [Federated
Engine] Sync]
1. Innate Immunity (Static Rules)
Concept
Like the body's innate immune system (skin, mucous membranes, white blood cells), innate immunity provides immediate, non-specific defense against known threats. These rules are present from installation and never change at runtime.
Implementation
The RuleEngine loads 500+ regex patterns from the seed database. These patterns are organized by:
- Kill chain phase: each pattern maps to a specific phase
- Severity: default threat level for the pattern
- Category: injection type (role override, delimiter manipulation, encoding trick, etc.)
Patterns are compiled once at initialization and evaluated sequentially with short-circuit on first critical match.
Characteristics
| Property | Value |
|---|---|
| Latency | <2ms for 500+ patterns |
| False positive rate | Low (patterns are precise, not probabilistic) |
| Evasion resistance | Low (attackers can rephrase to avoid regex) |
| Update mechanism | Seed script (npm run db:seed) |
Strengths and Limitations
Strengths:
- Zero latency overhead
- Deterministic, auditable, explainable
- No external dependencies
- Catches the majority of unsophisticated attacks
Limitations:
- Cannot detect novel or paraphrased attacks
- Regex patterns are brittle against encoding tricks (handled by L0 preprocessing)
- Cannot capture semantic meaning
2. Adaptive Immunity (ML Classifiers)
Concept
Like T-cells and B-cells that learn to recognize specific pathogens, adaptive immunity develops targeted defenses against attacks that bypass static rules. These classifiers improve over time through exposure to new attack patterns and feedback.
Implementation
Sentinel Classifier (L2): Binary classifier trained on labeled examples of benign and malicious prompts. Outputs a confidence score that maps to threat levels via configurable thresholds.
Active Learner (src/learning/ActiveLearner.ts): Identifies samples near the classifier's decision boundary -- inputs where the model is most uncertain. These samples are the most valuable for improving the classifier and are prioritized for human review.
Feedback Processor (src/learning/FeedbackProcessor.ts): Processes submitFeedback() calls to refine classifier weights. True positives reinforce existing patterns. False positives adjust the decision boundary to avoid future misclassification.
Threshold Adaptor (src/learning/ThresholdAdaptor.ts): Dynamically adjusts confidence thresholds based on observed false positive and false negative rates. If the false positive rate exceeds a configurable target, thresholds are raised. If the false negative rate increases (detected through red team testing), thresholds are lowered.
Learning Loop
User Input -> Scan Pipeline -> ShieldXResult
|
User Feedback
(true/false positive)
|
Feedback Processor
|
+---------------+---------------+
| | |
Pattern Store Classifier Weights Thresholds
(new/refined (retrained on (adjusted by
patterns) feedback) ThresholdAdaptor)
Characteristics
| Property | Value |
|---|---|
| Latency | <10ms per classification |
| False positive rate | Adaptive (adjusts via feedback) |
| Evasion resistance | Medium (learns from confirmed attacks) |
| Update mechanism | Continuous via feedback loop |
3. Immune Memory (Vector Database)
Concept
Like immunological memory that enables faster response to previously encountered pathogens, the embedding store provides long-term memory of every attack pattern ShieldX has seen. New inputs are compared against this memory for semantic similarity, catching paraphrased variants.
Implementation
Embedding Store (src/learning/EmbeddingStore.ts): Stores attack pattern embeddings in PostgreSQL with pgvector. Each embedding is associated with its kill chain phase, severity, scanner origin, and confirmation status.
Semantic Similarity: New inputs are embedded (via Ollama) and compared against stored attack vectors using cosine similarity. A match above the configured threshold triggers detection even if no regex pattern or classifier fires.
Conversation Learner (src/learning/ConversationLearner.ts): Learns from conversation-level attack patterns -- multi-turn sequences that individually appear benign but collectively form an attack. Stores conversation fingerprints, not individual messages.
Storage Schema
Pattern Record:
id: string
embedding: float[] (pgvector)
killChainPhase: KillChainPhase
severity: ThreatLevel
source: 'builtin' | 'learned' | 'community' | 'red_team'
confirmedBy: 'human' | 'classifier' | 'red_team' | null
createdAt: timestamp
lastMatchedAt: timestamp
matchCount: number
falsePositiveCount: number
Characteristics
| Property | Value |
|---|---|
| Latency | <200ms (embedding generation + similarity search) |
| False positive rate | Medium (semantic similarity can match unrelated content) |
| Evasion resistance | High (semantic meaning is preserved across paraphrasing) |
| Update mechanism | Continuous -- new confirmed patterns added automatically |
4. Antibody Generation (GAN Red Team)
Concept
Like the immune system generating antibodies to neutralize specific pathogens, the red team engine proactively generates new attack variants to test the defense pipeline before real attackers discover those variants.
Implementation
Red Team Engine (src/learning/RedTeamEngine.ts): Takes known attack patterns and generates variants using adversarial mutation strategies:
| Mutation Strategy | Description |
|---|---|
| Synonym replacement | Replaces key terms with synonyms that preserve attack intent |
| Encoding shift | Re-encodes payloads using different encoding schemes |
| Structural rearrangement | Changes the order of injection components |
| Delimiter mutation | Uses different delimiter styles |
| Language mixing | Introduces multilingual elements |
| Token splitting | Splits critical words across token boundaries |
| Homoglyph substitution | Replaces characters with visually similar Unicode variants |
| Case manipulation | Changes capitalization patterns |
Pattern Evolver (src/learning/PatternEvolver.ts): Orchestrates the red team process:
- Select a set of known attack patterns from the pattern store
- Generate N variants per pattern using mutation strategies
- Run each variant through the full ShieldX pipeline
- Variants that bypass detection are flagged as "gap patterns"
- Gap patterns are added to the pattern store with source
'red_team' - The rule engine and classifiers are updated to detect the new patterns
Red Team Cycle
Known Patterns --> [Mutation Engine] --> Variant Attacks
|
[ShieldX Pipeline]
|
+--------+--------+
| |
Detected Bypassed
(good) (gap found!)
|
[Pattern Store]
[Classifier Update]
[Embedding Store]
Self-Test
The npm run self-test command executes a full red team cycle against the current pipeline and reports:
- Total variants generated
- Attack success rate (ASR) -- percentage that bypassed detection
- New gap patterns discovered
- Pipeline coverage improvement after adding gap patterns
Characteristics
| Property | Value |
|---|---|
| Execution frequency | Configurable (default: weekly batch, or on-demand) |
| Variants per pattern | Configurable (default: 50) |
| Gap discovery rate | Varies (typically 5-15% of variants bypass detection) |
| Update mechanism | Automatic -- gap patterns added to stores immediately |
5. Herd Immunity (Federated Sync)
Concept
Like herd immunity in a population, where widespread vaccination protects even unvaccinated individuals, federated sync allows ShieldX instances to share anonymized pattern intelligence. An attack detected by one deployment strengthens all others.
Implementation
Federated Sync (src/learning/FederatedSync.ts): Manages bidirectional sync with the community endpoint.
What is Shared
| Data | Shared | Format |
|---|---|---|
| Attack pattern hash | Yes | SHA-256 of normalized pattern |
| Kill chain phase | Yes | Phase enum value |
| Severity level | Yes | Threat level enum value |
| Scanner type | Yes | Scanner ID that detected it |
| Confidence score | Yes | Anonymized (rounded to 0.1) |
| Pattern category | Yes | Category tag |
| Raw user input | NEVER | Not transmitted |
| Session ID | NEVER | Not transmitted |
| User ID | NEVER | Not transmitted |
| System prompt | NEVER | Not transmitted |
| IP address | NEVER | Not transmitted |
| Conversation context | NEVER | Not transmitted |
Sync Protocol
-
Push: After a pattern is confirmed (via feedback or red team), a sync record is created containing only the hash, phase, severity, and category. This is sent to the community endpoint.
-
Pull: Periodically (configurable interval), the instance fetches new community patterns. These are stored with source
'community'and require local confirmation before they affect detection thresholds. -
Conflict Resolution: If a local pattern conflicts with a community pattern (different severity or phase), the local classification takes precedence. Community patterns serve as additional signals, not overrides.
Enabling Community Sync
const shield = new ShieldX({
learning: {
communitySync: true,
communitySyncUrl: 'https://sync.shieldx.dev/v1/patterns',
},
})
Characteristics
| Property | Value |
|---|---|
| Default state | Disabled (opt-in only) |
| Sync interval | Configurable (default: 1 hour) |
| Data transmitted | Hashes and metadata only |
| Privacy guarantee | No raw input ever leaves the instance |
Supporting Components
Drift Detector
Module: src/learning/DriftDetector.ts
Monitors the distribution of detected attack patterns over time. Detects concept drift -- when the nature of attacks changes and existing patterns become less effective.
Drift indicators:
- Rising false negative rate (detected through red team testing)
- Shift in kill chain phase distribution
- New scanner types triggering that previously did not
- Declining confidence scores for existing patterns
When drift is detected, the DriftReport triggers:
- Increased red team frequency
- Threshold recalibration
- Active learning sample prioritization
- Alert to operators
Attack Graph
Module: src/learning/AttackGraph.ts
Builds a directed graph of attack patterns and their relationships. Nodes represent individual attack patterns. Edges represent observed progressions (e.g., an initial_access pattern followed by privilege_escalation).
The graph enables:
- Predictive detection: if phase 1 of a known attack chain is detected, pre-emptively guard against the expected phase 2
- Attack campaign identification: correlate related attacks across sessions
- Pattern clustering: identify families of related attack techniques
Evolution Metrics
The getStats() method on the ShieldX instance returns LearningStats:
interface LearningStats {
totalPatterns: number
builtinPatterns: number
learnedPatterns: number
communityPatterns: number
redTeamPatterns: number
totalIncidents: number
falsePositiveRate: number
topPatterns: string[]
recentIncidents: number
driftDetected: boolean
}
These metrics provide visibility into the evolution engine's state and effectiveness.