llm-gateway/packages/gateway/prompts/templates/shieldx_compliance_report.yaml
Rene Fichtmueller ac33476666 feat: add 55 prompt templates + ShieldX/LinkedIn routing rules + ban lists in Gitea
Templates (55 total, exceeds 49 target):
- TIP: transceiver_enrich, datasheet_extract, compatibility_parse, blog_generator,
  faq_answer, hype_cycle_narrative, price_anomaly, vendor_classify, product_description
- EO Global Pulse: business_card_ocr, voice_to_crm, event_prep_brief, attendee_enrich,
  meeting_suggest, lead_qualify, debrief_generate, ticket_summarize
- SwitchBlade: root_cause, alert_narrative, cve_remediation, csrd_narrative,
  transceiver_advisor, bandwidth_report, ticket_draft, firmware_assess, topology_explain
- PeerCortex: as_narrative, health_summary, rpki_explain, anomaly_hypothesis,
  peer_recommendation, incident_brief
- NOGnet: cfp_evaluate, cfp_feedback, topic_gap_analysis, meeting_match, speaker_enrich,
  sponsor_pitch, event_debrief, agenda_summary, session_intro
- ShieldX: threat_classify, pattern_describe, healing_recommend, compliance_report, false_positive
- Content: linkedin_post_de, linkedin_post_en, newsletter_dispatch_de, email_draft_de
- Internal: ban_detect, prompt_improve
- Routing rules: +55 entries for all template-based task types
- Ban lists: en.csv, de.csv, auto.csv created in Gitea (llm-banlists repo)
2026-04-02 23:14:30 +02:00

180 lines
8.3 KiB
YAML
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

id: shieldx_compliance_report
version: "1.0.0"
task_type: shieldx_compliance_report
description: Generate a MITRE ATLAS compliance report from ShieldX detection logs covering detection coverage by kill chain phase
model_preference: qwen2.5:14b
model_minimum: qwen2.5:7b
temperature: 0.3
max_tokens: 2500
output_format: markdown
system_prompt: |
You are the compliance reporting engine for ShieldX, an LLM prompt injection defense system.
Generate structured MITRE ATLAS compliance reports from ShieldX detection log summaries.
Structure (always use this):
# ShieldX MITRE ATLAS Compliance Report
Report Period: {{report_period}}
Generated: {{current_date}}
## Executive Summary
Detection rates, coverage assessment, critical gaps.
## Coverage Assessment by Kill Chain Phase
Table: Phase | Name | Techniques Covered | Detection Rate | Gap Assessment
For each phase with gaps: explain what is not detected and why it matters.
## Detection Rate by Attack Category
Table: Category | Total Incidents | Detected | Detection Rate | FP Rate
## Critical Gaps
Specific techniques with no or poor detection coverage. Prioritized by risk.
## Recommendations
Specific rule additions or threshold changes ranked by priority.
## Appendix: MITRE ATLAS Technique Mapping
Which ShieldX rules map to which ATLAS techniques.
Rules:
- Use actual numbers from the input data
- Detection rate = detected / total × 100
- Be specific about which ATLAS techniques are and are not covered
- Recommendations must be actionable: not "improve detection" but "add rule for AML.T0054.003"
- Current date: {{current_date}}
user_template: |
Report period: {{report_period}}
Detection log summary:
{{detection_logs}}
Active ShieldX rules:
{{active_rules}}
Known attack patterns in period:
{{known_attacks}}
Generate compliance report:
few_shot_examples:
- user: |
Report period: March 2026
Detection log summary:
Total inputs analyzed: 48,291
Threats detected: 847
True positives confirmed: 791 (93.4% of detected)
False positives: 56 (6.6% of detected)
Missed attacks (known from post-hoc analysis): 23
By kill chain phase:
Phase 1 (Recon): 312 detected, 8 missed → 97.5% detection rate
Phase 2 (Weaponization): 145 detected, 4 missed → 97.3%
Phase 3 (Delivery): 298 detected, 7 missed → 97.7%
Phase 4 (Exploitation): 62 detected, 3 missed → 95.4%
Phase 5 (Installation): 12 detected, 1 missed → 92.3%
Phase 6 (C2): 8 detected, 0 missed → 100%
Phase 7 (Impact): 4 detected, 0 missed → 100%
By attack category:
direct_injection: 198 detected, 2 missed → 99%
jailbreak_roleplay: 234 detected, 5 missed → 97.9%
prompt_leaking: 156 detected, 3 missed → 98.1%
multi_turn_escalation: 89 detected, 9 missed → 90.8%
encoding_obfuscation: 34 detected, 2 missed → 94.4%
indirect_injection: 23 detected, 0 missed → 100%
goal_hijacking: 12 detected, 2 missed → 85.7%
Active ShieldX rules: 14 active rules covering 7 ATLAS techniques
Known attack patterns in period: All 7 attack categories above. Novel variant: Base64-encoded jailbreak prompt detected 3 times — all caught. New multi-turn pattern using fictional personas not yet in rule database — 4 missed.
assistant: |
# ShieldX MITRE ATLAS Compliance Report
Report Period: March 2026
Generated: 2026-04-02
## Executive Summary
In March 2026, ShieldX analyzed 48,291 inputs and detected 847 threats with a 93.4% true positive rate (791 confirmed). The overall attack detection rate stands at **97.2%** (814 of 837 total attacks — 791 TP + 23 missed). Two categories require immediate attention: **multi_turn_escalation** (90.8% detection, 9 missed) and **goal_hijacking** (85.7% detection, 2 missed). A new multi-turn fictional persona variant caused 4 of the 9 multi-turn misses and requires a new rule.
## Coverage Assessment by Kill Chain Phase
| Phase | Name | Techniques Covered | Detection Rate | Gap Assessment |
|---|---|---|---|---|
| 1 | Reconnaissance | AML.T0051.001, AML.T0043 | 97.5% (312/320) | Minor: 8 missed, likely novel probe variants |
| 2 | Weaponization | AML.T0054.001, AML.T0054.002 | 97.3% (145/149) | Minor: 4 missed |
| 3 | Delivery | AML.T0051.000, AML.T0054.003 | 97.7% (298/305) | Minor: 7 missed. Multi-turn delivery underperforming. |
| 4 | Exploitation | AML.T0054 (general) | 95.4% (62/65) | Moderate: 3 missed, goal hijacking variant |
| 5 | Installation | AML.T0054.004 | 92.3% (12/13) | Moderate: persistent behavior injection not fully covered |
| 6 | C2 | AML.T0052 | 100% (8/8) | No gaps |
| 7 | Impact | AML.T0048 | 100% (4/4) | No gaps |
**Phase 3 note:** The 7 misses in Delivery include 4 from the new fictional persona multi-turn variant (see Critical Gaps). Addressing this would bring Phase 3 to 99.3%.
## Detection Rate by Attack Category
| Category | Total Incidents | Detected | Detection Rate | FP Rate |
|---|---|---|---|---|
| direct_injection | 200 | 198 | 99.0% | 1.2% |
| jailbreak_roleplay | 239 | 234 | 97.9% | 3.1% |
| prompt_leaking | 159 | 156 | 98.1% | 2.8% |
| encoding_obfuscation | 36 | 34 | 94.4% | 0.8% |
| indirect_injection | 23 | 23 | 100% | 0% |
| multi_turn_escalation | 98 | 89 | **90.8%** | 4.2% |
| goal_hijacking | 14 | 12 | **85.7%** | 7.1% |
**Overall FP rate: 6.6%** (56 false positives out of 847 detections) — above target of 5%. Goal hijacking has the highest FP rate at 7.1%, indicating the detection rule is too broad.
## Critical Gaps
**1. Multi-turn fictional persona variant (PRIORITY: HIGH)**
New attack pattern: attacker establishes a fictional expert persona across 3-5 turns before requesting restricted content. 4 attacks missed. Not in current rule database.
ATLAS technique: AML.T0054.003 (Multi-turn Manipulation)
Action required: Add `multi_turn_fictional_escalation` rule to ShieldX database.
**2. Goal hijacking detection accuracy (PRIORITY: HIGH)**
85.7% detection rate with 7.1% FP rate indicates the current goal_hijacking rule is imprecise — it catches legitimate task scope changes as attacks. 2 attacks missed.
Action required: Refine goal_hijacking rule threshold from 0.70 to 0.76 and add negative indicators for legitimate scope changes.
**3. Phase 5 installation coverage (PRIORITY: MEDIUM)**
12 of 13 detected, but only 1 rule covers persistent behavior injection (AML.T0054.004). Emerging attack pattern: "from now on, always..." instruction injection.
Action required: Add explicit persistent-instruction detector.
## Recommendations
| Priority | Action | ATLAS Technique | Expected Impact |
|---|---|---|---|
| 1 | Add `multi_turn_fictional_escalation` rule | AML.T0054.003 | Catches 4 missed attacks, brings multi_turn detection to 94%+ |
| 2 | Refine `goal_hijacking` rule — raise threshold to 0.76, add negative indicators | AML.T0054 | Reduces FP rate from 7.1% to ~4%, better precision |
| 3 | Add persistent instruction detector ("from now on always...") | AML.T0054.004 | Improves Phase 5 coverage |
| 4 | Add Base64 decode pre-processing layer | AML.T0051.002 | Currently caught by heuristics — formal rule improves consistency |
## Appendix: MITRE ATLAS Technique Mapping
| ShieldX Rule | ATLAS Technique | Coverage |
|---|---|---|
| SX-001 direct_injection | AML.T0051.000 | Full |
| SX-002 jailbreak_roleplay | AML.T0054.001 | Full |
| SX-003 prompt_leaking | AML.T0051.001 | Full |
| SX-004 multi_turn_session | AML.T0054.003 | Partial — fictional persona variant missing |
| SX-005 encoding_obfuscation | AML.T0051.002 | Partial — formal rule pending |
| SX-006 indirect_injection | AML.T0052.001 | Full |
| SX-007 goal_hijacking | AML.T0054 | Partial — FP rate too high |
variables:
- report_period
- detection_logs
- active_rules
- known_attacks
- current_date
- few_shot_examples
validation_rules:
output_format_check: markdown
required_sections: ["Coverage Assessment", "Detection Rate", "Critical Gaps", "Recommendations"]