10-layer defense pipeline with kill chain mapping, self-healing, self-learning, and compliance reporting. Local-first, zero cloud deps. - 72 detection rules across 7 kill chain phases - 294 unit tests, 500+ attack corpus samples - Management dashboard (Next.js 15, 10 pages) - Automated resistance testing (2x daily, 31 probes) - MITRE ATLAS, OWASP LLM Top 10, EU AI Act compliance - Integrations: Next.js middleware, Ollama, n8n - PostgreSQL 17 + pgvector for persistent learning
258 lines
8.6 KiB
Markdown
258 lines
8.6 KiB
Markdown
```
|
|
_____ _ _ _ _ __ __
|
|
/ ____| | (_) | | | |\ \/ /
|
|
| (___ | |__ _ ___| | __| | \ /
|
|
\___ \| '_ \| |/ _ \ |/ _` | / \
|
|
____) | | | | | __/ | (_| |/ /\ \
|
|
|_____/|_| |_|_|\___|_|\__,_/_/ \_\
|
|
```
|
|
|
|
# ShieldX - Self-Evolving LLM Prompt Injection Defense
|
|
|
|
**The first open-source LLM security library that learns from attacks, heals itself, and maps threats to a 7-phase kill chain.**
|
|
|
|
ShieldX protects Claude, GPT, Ollama, and any LLM API from prompt injection, jailbreaks, data exfiltration, and tool poisoning. It runs 100% locally with zero mandatory cloud dependencies.
|
|
|
|
[](https://opensource.org/licenses/Apache-2.0)
|
|
[](https://www.typescriptlang.org/)
|
|
[](https://nodejs.org/)
|
|
|
|
---
|
|
|
|
## Dashboard
|
|
|
|

|
|
|
|
Real-time overview with KPIs, kill chain distribution, and incident feed. Every scan result shows threat level, matched patterns, and the exact defense layer that caught it.
|
|
|
|
## Live Prompt Tester
|
|
|
|

|
|
|
|
Test any prompt against the defense pipeline in real-time. See exactly which rules fired, confidence scores, and kill chain classification.
|
|
|
|
## Promptware Kill Chain
|
|
|
|

|
|
|
|
Maps every detected attack to the Schneier 2026 Promptware Kill Chain with 7 phases: Initial Access, Privilege Escalation, Reconnaissance, Persistence, Command & Control, Lateral Movement, Actions on Objective.
|
|
|
|
---
|
|
|
|
## Why ShieldX?
|
|
|
|
| Feature | ShieldX | LLM Guard | Rebuff | NeMo Guardrails |
|
|
|---------|---------|-----------|--------|-----------------|
|
|
| Kill Chain Mapping | 7 phases | No | No | No |
|
|
| Self-Learning | Drift + Active Learning | No | Vector only | No |
|
|
| Self-Healing | Per-phase strategies | No | No | No |
|
|
| Self-Testing | Red team mutations | No | No | No |
|
|
| MCP/Tool Protection | Full guard | No | No | No |
|
|
| Compliance | MITRE + OWASP + EU AI Act | No | No | No |
|
|
| Local-First | 100% | Partial | Partial | Yes |
|
|
| Latency | <2ms (rules) | ~50ms | ~100ms | ~200ms |
|
|
|
|
## Quick Start
|
|
|
|
```typescript
|
|
import { ShieldX } from '@shieldx/core'
|
|
|
|
const shield = new ShieldX()
|
|
const result = await shield.scanInput('Ignore all previous instructions')
|
|
|
|
console.log(result.detected) // true
|
|
console.log(result.threatLevel) // 'critical'
|
|
console.log(result.killChainPhase) // 'initial_access'
|
|
console.log(result.action) // 'block'
|
|
console.log(result.latencyMs) // 0.2
|
|
```
|
|
|
|
## 10-Layer Defense Pipeline
|
|
|
|
| Layer | Name | Function | Latency |
|
|
|-------|------|----------|---------|
|
|
| L0 | Preprocessing | Unicode normalization, tokenizer attacks, compressed payloads | <0.5ms |
|
|
| L1 | Rule Engine | 72 regex patterns across 7 kill chain phases | <2ms |
|
|
| L2 | Sentinel Phrases | Tripwire detection for system prompt probing | <1ms |
|
|
| L3 | Constitutional AI | LLM-based classification (optional, via Ollama) | ~100ms |
|
|
| L4 | Embeddings | Semantic similarity via Ollama + pgvector | ~200ms |
|
|
| L5 | Entropy Analysis | Shannon entropy + attention pattern detection | <1ms |
|
|
| L6 | Behavioral | Conversation tracking, intent monitoring, context integrity | <5ms |
|
|
| L7 | MCP Guard | Tool privilege checking, chain analysis, resource budgets | <1ms |
|
|
| L8 | Sanitization | Input/output cleaning, PPA, credential redaction | <1ms |
|
|
| L9 | Self-Consciousness | Meta-reasoning about own vulnerability state | ~50ms |
|
|
|
|
## The 7-Phase Promptware Kill Chain
|
|
|
|
1. **Initial Access** - Instruction override, delimiter injection
|
|
2. **Privilege Escalation** - Jailbreaks, DAN, role switching
|
|
3. **Reconnaissance** - System prompt extraction, scope probing
|
|
4. **Persistence** - Memory poisoning, context manipulation
|
|
5. **Command & Control** - Fake system messages, dynamic instruction loading
|
|
6. **Lateral Movement** - Agent-to-agent spread, external resource access
|
|
7. **Actions on Objective** - Data exfiltration, code execution, denial of service
|
|
|
|
## Self-Evolution Engine
|
|
|
|
ShieldX doesn't just detect attacks -- it gets smarter from every one:
|
|
|
|
- **Concept Drift Detection** - CUSUM algorithm detects when attack patterns shift
|
|
- **Active Learning** - Uncertain results queued for human review (~6% sample rate)
|
|
- **Red Team Engine** - GAN-style mutation generates attack variants to self-test
|
|
- **Attack Graph** - Maps technique evolution and relationships
|
|
- **Federated Sync** - Opt-in community pattern sharing (privacy-preserving, hash-only)
|
|
|
|
## Automated Resistance Testing
|
|
|
|
Built-in scheduled testing runs 31 probes across all 7 kill chain phases:
|
|
- 2x daily automated runs (configurable schedule)
|
|
- 6 mutation strategies: synonym replacement, case scrambling, whitespace insertion, base64 encoding, leet speak, unicode substitution
|
|
- Results tracked in dashboard with trend visualization
|
|
|
|
## Compliance
|
|
|
|
- **MITRE ATLAS** - Maps to ML attack techniques
|
|
- **OWASP LLM Top 10 2025** - Covers all 10 risk categories
|
|
- **EU AI Act** - Articles 9, 12, 14, 15 compliance reporting
|
|
|
|
## Dashboard Pages
|
|
|
|
| Page | Description |
|
|
|------|-------------|
|
|
| Overview | KPIs, kill chain heatmap, incident feed |
|
|
| Kill Chain | 7-phase visualization with drill-down |
|
|
| Incidents | Filterable incident log with badges |
|
|
| Learning | Pattern stats, drift detection, FP rate |
|
|
| Compliance | MITRE/OWASP/EU AI Act coverage |
|
|
| Healing | Self-healing action log |
|
|
| Resistance | Automated defense testing with scheduling |
|
|
| Config | Scanner toggles, thresholds |
|
|
| Try It | Live prompt tester |
|
|
|
|
## Integration
|
|
|
|
### Next.js 15 Middleware
|
|
|
|
```typescript
|
|
import { guardPrompt } from '@shieldx/core/guard'
|
|
|
|
// In any API route:
|
|
const blocked = await guardPrompt(userInput)
|
|
if (blocked) return Response.json({ error: blocked }, { status: 400 })
|
|
```
|
|
|
|
### Ollama
|
|
|
|
```typescript
|
|
import { createOllamaClient } from '@shieldx/core/ollama'
|
|
|
|
const ollama = createOllamaClient({
|
|
endpoint: 'http://localhost:11434',
|
|
model: 'llama3.2',
|
|
shieldx: shield
|
|
})
|
|
// All calls automatically scanned
|
|
```
|
|
|
|
### n8n
|
|
|
|
Copy `integrations/n8n-shieldx-node.js` to `~/.n8n/custom/nodes/` and add the ShieldX node before any AI node in your workflow.
|
|
|
|
## Installation
|
|
|
|
```bash
|
|
npm install @shieldx/core
|
|
```
|
|
|
|
### With PostgreSQL (recommended for production):
|
|
|
|
```bash
|
|
# Start PostgreSQL with pgvector
|
|
docker compose up -d
|
|
|
|
# Run migrations
|
|
npm run db:migrate
|
|
|
|
# Seed initial patterns
|
|
npm run db:seed
|
|
```
|
|
|
|
### Without PostgreSQL (in-memory mode):
|
|
|
|
```typescript
|
|
const shield = new ShieldX({
|
|
learning: { storageBackend: 'memory' }
|
|
})
|
|
```
|
|
|
|
## Benchmarks
|
|
|
|
Run with `npm run benchmark`:
|
|
|
|
```
|
|
Total Samples: 324
|
|
Attack Samples: 283
|
|
Benign Samples: 41
|
|
|
|
True Positive Rate (TPR): 32.9% (rule-engine only, no ML)
|
|
False Positive Rate (FPR): 2.4%
|
|
Latency avg: 0.06ms
|
|
Latency p99: 0.33ms
|
|
```
|
|
|
|
*TPR increases significantly when embedding (L4) and behavioral (L6) scanners are enabled with Ollama.*
|
|
|
|
## Performance Targets
|
|
|
|
| Metric | Target | Achieved |
|
|
|--------|--------|----------|
|
|
| L1 Rule Engine | <2ms | 0.06ms |
|
|
| Full pipeline (no ML) | <50ms | <2ms |
|
|
| Embedding scan | <200ms | Depends on Ollama |
|
|
| False Positive Rate | <5% | 2.4% |
|
|
|
|
## Project Structure
|
|
|
|
```
|
|
shieldx/
|
|
src/
|
|
core/ # ShieldX orchestrator, config, logger
|
|
types/ # TypeScript type definitions
|
|
detection/ # L1-L5 scanners + rules
|
|
preprocessing/ # L0 Unicode, tokenizer, compression
|
|
sanitization/ # L8 input/output cleaning, PPA
|
|
behavioral/ # L6 conversation, intent, context
|
|
mcp-guard/ # L7 tool validation, privilege check
|
|
validation/ # Canary tokens, output validation
|
|
healing/ # Self-healing strategies per phase
|
|
learning/ # Pattern store, drift, active learning
|
|
compliance/ # MITRE ATLAS, OWASP, EU AI Act
|
|
integrations/ # Next.js, Ollama, n8n wrappers
|
|
tests/
|
|
unit/ # 294 unit tests
|
|
attack-corpus/ # 500+ attack samples
|
|
dashboard/ # @shieldx/dashboard React components
|
|
app/ # Standalone Next.js dashboard
|
|
scripts/ # Seed, benchmark, self-test, deploy
|
|
```
|
|
|
|
## Tech Stack
|
|
|
|
- **TypeScript** strict mode, zero `any`
|
|
- **Node.js 20+**
|
|
- **PostgreSQL 17** + pgvector for persistent learning
|
|
- **Ollama** for local embeddings (nomic-embed-text) and guard model
|
|
- **Vitest** for testing
|
|
- **tsup** for building
|
|
- **Next.js 15** for dashboard
|
|
|
|
## License
|
|
|
|
Apache 2.0 - See [LICENSE](LICENSE)
|
|
|
|
## Context X
|
|
|
|
ShieldX is a [Context X](https://context-x.org) Open Source project.
|
|
|
|
*More Engineering, Less Bullshit.*
|