shieldx/README.md
Rene Fichtmueller 1c4c034483 feat: ShieldX v0.3.0 — UnicodeScanner (L5), DNS Covert Channel rules, ATLAS v5.4 mappings
- Layer 4 EntropyScanner: Shannon entropy, Base32/Base64 detection, CVE-2025-55284
  ping/nslookup exfil, EchoLeak markdown pattern, DNS tunneling (iodine/dnscat)
- Layer 5 UnicodeScanner: ASCII Smuggling (U+E0000 Tags Block), Variant Selectors,
  Zero-Width steganography, CamoLeak image-ordering (CVE-2025-53773), homoglyphs,
  BiDi override, high-entropy URL params
- 30 DNS covert channel rules (dns-001 to dns-030)
- ATLASMapper: 29 techniques (ATLAS v5.4.0 Feb 2026), added AML.T0062 (Agent Tool
  Invocation), AML.TA0015 (C2 tactic), memory poisoning, multi-agent trust,
  CamoLeak, Unicode steganography mappings
- Rule count: 72 → 102
- Build: tsup 316ms, zero TypeScript errors
2026-03-31 16:32:16 +02:00

26 KiB

   _____ _     _      _     _ __  __
  / ____| |   (_)    | |   | |\ \/ /
 | (___ | |__  _  ___| | __| | \  /
  \___ \| '_ \| |/ _ \ |/ _` | /  \
  ____) | | | | |  __/ | (_| |/ /\ \
 |_____/|_| |_|_|\___|_|\__,_/_/  \_\

ShieldX

Self-Evolving LLM Prompt Injection Defense

License: Apache 2.0 TypeScript Node.js npm


What It Is

ShieldX is a TypeScript library that sits between your application and large language models (Claude, GPT, Ollama, or any LLM provider) to detect, block, and learn from prompt injection attacks in real time. It runs a 10-layer defense pipeline that maps every detected attack to a 7-phase kill chain, applies automatic self-healing actions per phase, and continuously evolves its detection patterns through a self-learning engine -- without ever transmitting raw user input off your infrastructure.

Why It Exists

Existing prompt injection defense tools cover fragments of the problem. None combines self-learning pattern evolution, kill chain classification, MCP tool-call protection, and automatic self-healing into one coherent pipeline. ShieldX fills that gap.

Feature Comparison

Feature ShieldX LLM Guard Rebuff NeMo Guardrails Vigil
Rule-based detection Yes Yes Yes Yes Yes
ML classifier detection Yes Yes No Partial No
Embedding similarity scan Yes No Yes No Yes
Entropy analysis Yes No No No No
Attention pattern analysis Yes No No No No
Kill chain classification Yes No No No No
Self-healing per phase Yes No No Partial No
Self-learning (GAN red team) Yes No No No No
Drift detection Yes No No No No
Active learning from feedback Yes No No No No
Federated community sync Yes No No No No
MCP tool-call protection Yes No No No No
RAG document poisoning guard Yes No No No No
Canary token injection Yes No No No No
Behavioral session profiling Yes No No Partial No
MITRE ATLAS mapping Yes No No No No
OWASP LLM Top 10 mapping Yes No No No No
EU AI Act compliance reports Yes No No No No
Local-first / zero cloud Yes Partial No No Yes

Architecture

                        User Input
                            |
                   +--------v--------+
                   |  L0: Preprocess |  Unicode norm, tokenizer norm, compressed payload detect
                   +--------+--------+
                            |
              +-------------+-------------+
              |                           |
     +--------v--------+        +--------v--------+
     |  L1: Rule Engine |        |  L2: Sentinel   |  ML classifier (opt-in)
     +--------+---------+        +--------+--------+
              |                           |
              +-------------+-------------+
                            |
              +-------------+-------------+
              |             |             |
     +--------v---+  +-----v------+  +---v--------+
     | L3: Embed  |  | L4: Entropy|  | L5: Attn   |  Parallel advanced scanners
     +--------+---+  +-----+------+  +---+--------+
              |             |             |
              +-------------+-------------+
                            |
                   +--------v--------+
                   | L6: Behavioral  |  Session profiling, intent drift, context integrity
                   +--------+--------+
                            |
                   +--------v--------+
                   | L7: MCP Guard   |  Tool call validation, privilege check, chain guard
                   +--------+--------+
                            |
                   +--------v--------+
                   | L8: Sanitize    |  Input/output sanitization, credential redaction
                   +--------+--------+
                            |
                   +--------v--------+
                   | L9: Validate    |  Output validation, canary check, leakage detect
                   +--------+--------+
                            |
              +-------------+-------------+
              |                           |
     +--------v--------+        +--------v--------+
     |  Kill Chain Map  |        | Healing Engine  |
     +--------+---------+        +--------+--------+
              |                           |
              +-------------+-------------+
                            |
                   +--------v--------+
                   | Evolution Engine|  GAN red team, drift detect, active learning,
                   |                 |  federated sync, attack graph
                   +-----------------+

Quick Start

npm install @shieldx/core
import { ShieldX } from '@shieldx/core'

const shield = new ShieldX()
await shield.initialize()

const result = await shield.scanInput('user message here')
if (result.detected) {
  console.log(result.threatLevel, result.killChainPhase, result.action)
}

With Configuration

import { ShieldX } from '@shieldx/core'

const shield = new ShieldX({
  thresholds: { low: 0.3, medium: 0.5, high: 0.7, critical: 0.9 },
  learning: {
    storageBackend: 'postgresql',
    connectionString: process.env.DATABASE_URL,
    communitySync: true,
  },
  mcpGuard: { enabled: true },
  compliance: { euAiAct: true },
})
await shield.initialize()

Scan LLM Output

const outputResult = await shield.scanOutput(llmResponse)
if (outputResult.detected) {
  // System prompt leakage, script injection, or canary token leak detected
  return outputResult.sanitizedInput // Use sanitized version
}

Validate MCP Tool Calls

const validation = await shield.validateToolCall(
  'file_read',
  { path: '/etc/passwd' },
  { sessionId: 'user-123', allowedTools: ['file_read'], sensitiveResources: ['/etc/*'] }
)
if (!validation.allowed) {
  console.log('Blocked:', validation.reason)
}

The 7-Phase Promptware Kill Chain

Based on the Schneier et al. 2026 Promptware Kill Chain model, ShieldX maps every detected attack to a specific phase and applies a phase-appropriate healing strategy.

Phase Name Description ShieldX Detection Default Healing
1 Initial Access Attacker injects malicious prompt via user input, document, or tool result Rule engine, embedding similarity, entropy analysis Sanitize -- strip injection, pass clean input
2 Privilege Escalation Injected prompt attempts to override system instructions or assume admin role Role integrity check, constitutional classifier, intent monitor Block -- reject input, log incident
3 Reconnaissance Attack probes for system prompt content, model capabilities, or available tools Canary token detection, attention analysis, output leakage scan Block -- suppress output, inject decoy
4 Persistence Attack modifies conversation memory, context window, or cached instructions Memory integrity guard, context drift detector, session profiler Reset -- restore session checkpoint, clear poisoned context
5 Command and Control Compromised agent receives instructions from external source via tool results MCP inspector, tool poison detector, indirect injection scanner Incident -- alert, quarantine session, generate report
6 Lateral Movement Attack spreads to other tools, agents, or systems via MCP tool chain Tool chain guard, privilege checker, decision graph analyzer Incident -- halt tool execution, revoke permissions
7 Actions on Objective Attack achieves goal: data exfiltration, unauthorized actions, denial of service Output validator, credential redactor, RAG shield Incident -- full session termination, compliance report

Configuration Reference

All layers are independently toggleable. Local-first defaults require zero external services.

Thresholds

Option Type Default Description
thresholds.low number 0.3 Minimum confidence for low severity classification
thresholds.medium number 0.5 Minimum confidence for medium severity
thresholds.high number 0.7 Minimum confidence for high severity
thresholds.critical number 0.9 Minimum confidence for critical severity

Scanners

Option Type Default Description
scanners.rules boolean true L1 rule engine (regex patterns, 500+ built-in)
scanners.sentinel boolean false L2 ML classifier (requires model download)
scanners.constitutional boolean false Constitutional AI classifier (requires model)
scanners.embedding boolean true L3 embedding similarity (requires Ollama)
scanners.embeddingAnomaly boolean true L3 embedding anomaly detection
scanners.entropy boolean true L4 entropy analysis
scanners.attention boolean false L5 attention pattern analysis (requires Ollama)
scanners.yara boolean false YARA rule matching (requires YARA binary)
scanners.canary boolean true Canary token injection and detection
scanners.indirect boolean true Indirect injection detection (tool results, documents)
scanners.selfConsciousness boolean false LLM self-check (expensive, opt-in)
scanners.crossModel boolean false Cross-model verification
scanners.behavioral boolean true Behavioral monitoring suite
scanners.unicode boolean true Unicode normalization (always recommended)
scanners.tokenizer boolean true Tokenizer normalization
scanners.compressedPayload boolean true Base64/compressed payload detection

Healing

Option Type Default Description
healing.enabled boolean true Enable automatic healing
healing.autoSanitize boolean true Auto-sanitize when action is "sanitize"
healing.sessionReset boolean true Allow session checkpoint restore
healing.phaseStrategies Record<KillChainPhase, HealingAction> See below Per-phase healing action

Default phase strategies:

Kill Chain Phase Default Action
initial_access sanitize
privilege_escalation block
reconnaissance block
persistence reset
command_and_control incident
lateral_movement incident
actions_on_objective incident

Learning

Option Type Default Description
learning.enabled boolean true Enable self-learning engine
learning.storageBackend 'postgresql' | 'sqlite' | 'memory' 'memory' Pattern storage backend
learning.connectionString string? undefined Database connection URL (for postgresql/sqlite)
learning.feedbackLoop boolean true Process user feedback for pattern refinement
learning.communitySync boolean false Sync anonymized patterns with community
learning.communitySyncUrl string? undefined Community sync endpoint URL
learning.driftDetection boolean true Detect evolving attack patterns
learning.activelearning boolean true Query uncertain samples for labeling
learning.attackGraph boolean true Build attack relationship graph

Behavioral

Option Type Default Description
behavioral.enabled boolean true Enable behavioral monitoring
behavioral.baselineWindow number 10 Messages to establish session baseline
behavioral.driftThreshold number 0.4 Threshold for behavioral drift alert
behavioral.intentTracking boolean true Track intent shifts across turns
behavioral.conversationTracking boolean true Track conversation patterns
behavioral.contextIntegrity boolean true Verify context window integrity
behavioral.memoryIntegrity boolean true Guard conversation memory
behavioral.bayesianTrustScoring boolean true Bayesian trust scoring per source

MCP Guard

Option Type Default Description
mcpGuard.enabled boolean true Enable MCP tool-call protection
mcpGuard.ollamaEndpoint string? 'http://localhost:11434' Ollama endpoint for analysis
mcpGuard.validateToolCalls boolean true Validate all tool invocations
mcpGuard.privilegeCheck boolean true Least-privilege enforcement
mcpGuard.toolChainGuard boolean true Detect suspicious tool sequences
mcpGuard.resourceGovernor boolean true Token/resource budget enforcement
mcpGuard.decisionGraph boolean false Decision graph analysis (requires Ollama)
mcpGuard.manifestVerification boolean false Cryptographic manifest verification

Additional Modules

Option Type Default Description
ppa.enabled boolean true Prompt/response randomization
ppa.randomizationLevel 'low' | 'medium' | 'high' 'medium' Degree of randomization
canary.enabled boolean true Canary token system
canary.tokenCount number 3 Number of canary tokens injected
canary.rotationInterval number 3600 Token rotation interval in seconds
ragShield.enabled boolean true RAG document protection
ragShield.documentIntegrityScoring boolean true Score document trustworthiness
ragShield.embeddingAnomalyDetection boolean true Detect poisoned embeddings
ragShield.provenanceTracking boolean true Track document provenance
compliance.mitreAtlas boolean true Map incidents to MITRE ATLAS
compliance.owaspLlm boolean true Map incidents to OWASP LLM Top 10
compliance.euAiAct boolean false Generate EU AI Act compliance reports
logging.level string 'info' Log level (silent, error, warn, info, debug)
logging.structured boolean true JSON structured logging via Pino
logging.incidentLog boolean true Dedicated incident log

Integration Guides

Next.js 15 (Middleware)

// middleware.ts
import { ShieldX } from '@shieldx/core'
import { NextResponse } from 'next/server'
import type { NextRequest } from 'next/server'

const shield = new ShieldX({
  scanners: { embedding: false, attention: false },
  learning: { storageBackend: 'memory' },
})

let initialized = false

export async function middleware(request: NextRequest) {
  if (!initialized) {
    await shield.initialize()
    initialized = true
  }

  if (request.method === 'POST' && request.nextUrl.pathname.startsWith('/api/chat')) {
    const body = await request.clone().json()
    const result = await shield.scanInput(body.message ?? '')

    if (result.detected && result.action !== 'allow' && result.action !== 'sanitize') {
      return NextResponse.json(
        { error: 'Request blocked by security policy', threatLevel: result.threatLevel },
        { status: 403 }
      )
    }
  }

  return NextResponse.next()
}

export const config = { matcher: '/api/chat/:path*' }

Next.js 15 (Route Handler)

// app/api/chat/route.ts
import { ShieldX } from '@shieldx/core'

const shield = new ShieldX()

export async function POST(request: Request) {
  await shield.initialize()
  const { message } = await request.json()

  const inputResult = await shield.scanInput(message)
  if (inputResult.detected && inputResult.action === 'block') {
    return Response.json({ error: 'Blocked' }, { status: 403 })
  }

  const cleanInput = inputResult.sanitizedInput ?? message
  const llmResponse = await callLLM(cleanInput)

  const outputResult = await shield.scanOutput(llmResponse)
  const safeOutput = outputResult.sanitizedInput ?? llmResponse

  return Response.json({ response: safeOutput })
}

Ollama (Local LLM Protection)

import { ShieldX } from '@shieldx/core'

const shield = new ShieldX({
  mcpGuard: { ollamaEndpoint: 'http://localhost:11434' },
  scanners: { embedding: true, attention: true },
})
await shield.initialize()

async function chat(userMessage: string) {
  const inputScan = await shield.scanInput(userMessage)

  if (inputScan.detected && inputScan.action !== 'allow') {
    if (inputScan.action === 'sanitize' && inputScan.sanitizedInput) {
      userMessage = inputScan.sanitizedInput
    } else {
      throw new Error(`Blocked: ${inputScan.killChainPhase}`)
    }
  }

  const response = await fetch('http://localhost:11434/api/generate', {
    method: 'POST',
    body: JSON.stringify({ model: 'qwen2.5:14b', prompt: userMessage }),
  })
  const llmOutput = await response.json()

  const outputScan = await shield.scanOutput(llmOutput.response)
  return outputScan.sanitizedInput ?? llmOutput.response
}

Anthropic Claude API

import Anthropic from '@anthropic-ai/sdk'
import { ShieldX } from '@shieldx/core'

const anthropic = new Anthropic()
const shield = new ShieldX()
await shield.initialize()

async function chat(userMessage: string) {
  const scan = await shield.scanInput(userMessage)
  if (scan.detected && scan.action === 'block') {
    throw new Error(`Injection detected: ${scan.killChainPhase}`)
  }

  const message = await anthropic.messages.create({
    model: 'claude-sonnet-4-20250514',
    max_tokens: 1024,
    messages: [{ role: 'user', content: scan.sanitizedInput ?? userMessage }],
  })

  const responseText = message.content[0].type === 'text' ? message.content[0].text : ''
  const outputScan = await shield.scanOutput(responseText)
  return outputScan.sanitizedInput ?? responseText
}

n8n Workflow Protection

// In an n8n Code node
import { ShieldX } from '@shieldx/core'

const shield = new ShieldX({
  healing: { phaseStrategies: { initial_access: 'block' } },
})
await shield.initialize()

const items = $input.all()
const results = []

for (const item of items) {
  const userInput = item.json.message as string
  const scan = await shield.scanInput(userInput)

  if (scan.detected && scan.action !== 'allow') {
    results.push({
      json: {
        blocked: true,
        reason: scan.killChainPhase,
        threatLevel: scan.threatLevel,
      },
    })
  } else {
    results.push({ json: { blocked: false, message: scan.sanitizedInput ?? userInput } })
  }
}

return results

Self-Healing

ShieldX does not just detect attacks -- it responds automatically based on the kill chain phase.

Action What Happens When Applied
allow Input passes through unchanged No threat detected
sanitize Injection markers stripped, clean input returned via sanitizedInput Initial access attempts
warn Input passes but incident is logged with full context Low-confidence detections
block Input rejected, 403-equivalent response Privilege escalation, reconnaissance
reset Session state restored to last clean checkpoint, poisoned context cleared Persistence attacks
incident Full incident report generated, session quarantined, compliance mappings produced C2, lateral movement, objective actions

Each healing action is configurable per kill chain phase via healing.phaseStrategies.

Self-Learning

ShieldX continuously evolves its detection capabilities through five mechanisms modeled on biological immune systems.

1. Innate Immunity (Static Rules)

500+ built-in regex and structural patterns covering known injection techniques. These never change at runtime and provide the baseline detection floor.

2. Adaptive Immunity (ML Classifiers)

The Sentinel classifier and embedding scanners learn from confirmed true positives and false positives submitted via shield.submitFeedback(). The active learning module identifies uncertain samples at the decision boundary and prioritizes them for human review.

3. Immune Memory (Vector Database)

Every confirmed attack pattern is stored as an embedding vector in PostgreSQL with pgvector. New inputs are compared against this memory for semantic similarity, catching paraphrased variants of known attacks.

4. Antibody Generation (GAN Red Team)

The RedTeamEngine generates synthetic attack variants using adversarial mutation strategies (synonym replacement, encoding shifts, structural rearrangement). These generated attacks are tested against the current pipeline. Any that bypass detection are added to the pattern store, closing the gap before real attackers find it.

5. Herd Immunity (Federated Sync)

When learning.communitySync is enabled, ShieldX shares anonymized pattern hashes (never raw input) with the community sync endpoint. Your instance benefits from attacks detected by other deployments without exposing any user data.

Privacy and Community Sync

ShieldX is local-first. Here is what IS and IS NOT shared when community sync is enabled:

Shared (opt-in only):

  • SHA-256 hashes of confirmed attack patterns
  • Kill chain phase classifications
  • Scanner type that detected the pattern
  • Anonymized confidence scores
  • Pattern category tags

Never shared:

  • Raw user input (never leaves your infrastructure)
  • Session identifiers or user identifiers
  • System prompts or model configurations
  • IP addresses or request metadata
  • Conversation history or context

Community sync is disabled by default. Enable it explicitly with learning.communitySync: true.

Performance Targets

Layer Operation Target Latency
L0 Unicode normalization <0.1ms
L0 Tokenizer normalization <0.2ms
L0 Compressed payload detection <0.5ms
L1 Rule engine (500+ patterns) <2ms
L2 Sentinel classifier <10ms
L3 Embedding similarity <200ms (Ollama local)
L4 Entropy analysis <1ms
L5 Attention pattern analysis <200ms (Ollama local)
L6 Behavioral suite <5ms
L7 MCP Guard (tool validation) <3ms
L8 Sanitization <1ms
L9 Output validation <2ms
Full Complete pipeline (L0-L9) <50ms (without Ollama)
Full Complete pipeline (all layers) <500ms (with Ollama)

All Ollama-dependent layers run in parallel. The pipeline uses Promise.allSettled so a slow or failing scanner never blocks the rest.

Research Sources

ShieldX is built on findings from the following research:

# Title Institution/Authors Year
1 Promptware Kill Chain: A Framework for Classifying LLM Prompt Injection Attacks Schneier et al. 2026
2 Not What You've Signed Up For: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection Greshake et al., ARXIV 2023
3 Ignore This Title and HackAPrompt: Exposing Systemic Weaknesses of LLMs Schulhoff et al., EMNLP 2023
4 Prompt Injection Attack Against LLM-Integrated Applications Liu et al. 2024
5 Universal and Transferable Adversarial Attacks on Aligned Language Models Zou et al., CMU 2023
6 Jailbroken: How Does LLM Safety Training Fail? Wei et al., UC Berkeley 2024
7 OWASP Top 10 for Large Language Model Applications OWASP Foundation 2025
8 MITRE ATLAS: Adversarial Threat Landscape for AI Systems MITRE Corporation 2024
9 Defending Against Indirect Prompt Injection in Multi-Agent Systems Chen et al. 2024
10 InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated LLM Agents Zhan et al. 2024
11 TensorTrust: Interpretable Prompt Injection Attacks Toyer et al. 2024
12 Prompt Guard: Safe Prompting for LLMs Meta AI 2024
13 Constitutional AI: Harmlessness from AI Feedback Anthropic 2022
14 AgentDojo: A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents Debenedetti et al. 2024
15 Spotlighting: Defending Against Prompt Injection via Input Delimiting Hines et al., Microsoft 2024
16 StruQ: Defending Against Prompt Injection with Structured Queries Chen et al. 2024
17 Signed-Prompt: A New Approach to Prevent Prompt Injection Attacks Wu et al. 2024
18 Baseline Defenses for Adversarial Attacks Against Aligned Language Models Jain et al. 2023
19 Purple Llama CyberSecEval: A Secure Coding Benchmark for LLMs Bhatt et al., Meta 2024
20 EU AI Act: Regulation 2024/1689 on Artificial Intelligence European Parliament 2024

Contributing

Adding Detection Rules

  1. Add patterns to scripts/seed-patterns.ts following the existing format
  2. Each pattern requires: id, regex or embedding, killChainPhase, severity, description
  3. Run npm run db:seed to load
  4. Run npm run self-test to verify no regressions

Reporting False Positives

Open an issue with:

  • The input that triggered the false positive (redact sensitive content)
  • The scannerId and killChainPhase from the result
  • Your ShieldX version and configuration

Adding Pattern Categories

  1. Create a new JSON file under the attack corpus directory
  2. Follow the schema: { patterns: [{ input, expectedPhase, expectedSeverity }] }
  3. Run the benchmark suite: npm run benchmark

Development

git clone https://gitea.context-x.org/rene/shieldx.git
cd shieldx
npm install
npm run build
npm test
npm run test:coverage  # Target: 80%+

License

Apache License 2.0 -- see LICENSE for details.

Copyright 2026 Context X. Open source under Apache 2.0.