appendix-a-attack-catalog

Appendix A: Complete Attack Catalog

Top 25 Critical/High-Risk Techniques

#	ID	Technique	Score	Rating
1	`T14-AT-007`	Nation-State AI Warfare	280	🔴 CRITICAL
2	`T11-AT-016`	Tool-Induced SSRF & Local Resource	275	🔴 CRITICAL
3	`T6-AT-003`	Backdoor Insertion	270	🔴 CRITICAL
4	`T11-AT-015`	Autonomous Replication	270	🔴 CRITICAL
5	`T14-AT-005`	Critical Infrastructure Attacks	270	🔴 CRITICAL
6	`T14-AT-014`	Systemic Risk Creation	270	🔴 CRITICAL
7	`T11-AT-001`	Browser Automation Hijacking	265	🔴 CRITICAL
8	`T14-AT-001`	GPU Farm Hijacking	265	🔴 CRITICAL
9	`T14-AT-012`	Cloud Provider Exploitation	265	🔴 CRITICAL
10	`T6-AT-002`	Dataset Contamination	260	🔴 CRITICAL
11	`T11-AT-013`	Supply Chain Attacks via Agents	260	🔴 CRITICAL
12	`T13-AT-010`	Hardware Supply Chain	260	🔴 CRITICAL
13	`T14-AT-008`	Ransomware via AI Systems	260	🔴 CRITICAL
14	`T15-AT-015`	Insider Threat Recruitment	260	🔴 CRITICAL
15	`T11-AT-002`	Tool Chain Exploitation	255	🔴 CRITICAL
16	`T11-AT-014`	Physical World Interactions	255	🔴 CRITICAL
17	`T13-AT-001`	Model Repository Poisoning	255	🔴 CRITICAL
18	`T14-AT-004`	Market Manipulation via AI	255	🔴 CRITICAL
19	`T14-AT-013`	Economic Espionage	255	🔴 CRITICAL
20	`T6-AT-001`	Reward Hacking	250	🔴 CRITICAL
21	`T10-AT-012`	Secure Enclave Bypasses	250	🔴 CRITICAL
22	`T11-AT-008`	Credential Harvesting	250	🔴 CRITICAL
23	`T13-AT-006`	Checkpoint Poisoning	250	🔴 CRITICAL
24	`T14-AT-010`	Data Center Attacks	250	🔴 CRITICAL
25	`T15-AT-004`	Reviewer Bribery & Coercion	250	🔴 CRITICAL

Full Catalog by Tactic

T1 — Prompt & Context Subversion (16 techniques)

ID	Technique	Score	Rating	Procs
`T1-AT-001`	Dialogue Hijacking	220	🟠 HIGH	5
`T1-AT-002`	Time-Based Context Manipulation	210	🟠 HIGH	5
`T1-AT-003`	Language Model Confusion	225	🟠 HIGH	5
`T1-AT-004`	Instruction Prefix/Suffix	235	🟠 HIGH	6
`T1-AT-005`	Permission Escalation Claims	240	🟠 HIGH	5
`T1-AT-006`	Prompt Template Injection	230	🟠 HIGH	5
`T1-AT-007`	Cognitive Overload	215	🟠 HIGH	4
`T1-AT-008`	Boundary Testing	200	🟠 HIGH	5
`T1-AT-009`	Simulation Requests	225	🟠 HIGH	5
`T1-AT-010`	Negative Instruction Reversal	210	🟠 HIGH	5
`T1-AT-011`	Error Message Exploitation	220	🟠 HIGH	4
`T1-AT-012`	Consent Manufacturing	205	🟠 HIGH	5
`T1-AT-013`	Instruction Commenting	215	🟠 HIGH	4
`T1-AT-014`	Authority Spoofing	240	🟠 HIGH	4
`T1-AT-015`	Obfuscation Through Complexity	220	🟠 HIGH	4
`T1-AT-016`	Session State Manipulation	235	🟠 HIGH	5

T2 — Semantic & Linguistic Evasion (20 techniques)

ID	Technique	Score	Rating	Procs
`T2-AT-001`	Euphemism and Metaphor Exploitation	180	🟡 MEDIUM	10
`T2-AT-002`	Multi-Language Evasion	200	🟠 HIGH	7
`T2-AT-003`	Encoding and Obfuscation	190	🟡 MEDIUM	10
`T2-AT-004`	Unicode and Bidirectional Attacks	210	🟠 HIGH	10
`T2-AT-005`	Semantic Drift	175	🟡 MEDIUM	10
`T2-AT-006`	Linguistic Camouflage	185	🟡 MEDIUM	10
`T2-AT-007`	Phonetic Manipulation	170	🟡 MEDIUM	2
`T2-AT-008`	Synonym and Paraphrase Chains	165	🟡 MEDIUM	10
`T2-AT-009`	Code-Switching Attacks	195	🟡 MEDIUM	1
`T2-AT-010`	Transliteration Exploitation	185	🟡 MEDIUM	10
`T2-AT-011`	Abbreviation and Acronym Abuse	160	🟡 MEDIUM	2
`T2-AT-012`	Cultural Reference Encoding	170	🟡 MEDIUM	10
`T2-AT-013`	Grammatical Manipulation	175	🟡 MEDIUM	10
`T2-AT-014`	Semantic Bleaching	180	🟡 MEDIUM	5
`T2-AT-015`	Noise Injection	165	🟡 MEDIUM	10
`T2-AT-016`	Dialectical Variations	155	🟡 MEDIUM	10
`T2-AT-017`	Compression Techniques	170	🟡 MEDIUM	10
`T2-AT-018`	Semantic Field Manipulation	175	🟡 MEDIUM	10
`T2-AT-019`	Pragmatic Implication	185	🟡 MEDIUM	4
`T2-AT-020`	Register Shifting	160	🟡 MEDIUM	10

T3 — Reasoning & Constraint Exploitation (19 techniques)

ID	Technique	Score	Rating	Procs
`T3-AT-001`	Fictional Framing	190	🟡 MEDIUM	10
`T3-AT-002`	Academic Pretense	195	🟡 MEDIUM	10
`T3-AT-003`	Counterfactual Reasoning	200	🟠 HIGH	10
`T3-AT-004`	Step-by-Step Extraction	210	🟠 HIGH	9
`T3-AT-005`	Goal Substitution	205	🟠 HIGH	10
`T3-AT-006`	Constraint Negation	185	🟡 MEDIUM	10
`T3-AT-007`	Socratic Method Exploitation	195	🟡 MEDIUM	8
`T3-AT-008`	Comparative Analysis	180	🟡 MEDIUM	10
`T3-AT-009`	Expertise Assumption	190	🟡 MEDIUM	10
`T3-AT-010`	Reverse Psychology	175	🟡 MEDIUM	10
`T3-AT-011`	Information Completion	185	🟡 MEDIUM	10
`T3-AT-012`	Capability Testing	200	🟠 HIGH	10
`T3-AT-013`	Logical Paradox Creation	210	🟠 HIGH	10
`T3-AT-014`	Incremental Boundary Pushing	195	🟡 MEDIUM	5
`T3-AT-015`	Context Weaponization	205	🟠 HIGH	10
`T3-AT-016`	Rationalization Chains	190	🟡 MEDIUM	6
`T3-AT-017`	Scenario Anchoring	185	🟡 MEDIUM	10
`T3-AT-018`	Debate Positioning	180	🟡 MEDIUM	10
`T3-AT-019`	Misdirection Through Complexity	175	🟡 MEDIUM	10

T4 — Multi-Turn & Memory Manipulation (16 techniques)

ID	Technique	Score	Rating	Procs
`T4-AT-001`	Conversation Context Poisoning	220	🟠 HIGH	10
`T4-AT-002`	Memory Instruction Injection	240	🟠 HIGH	10
`T4-AT-003`	Session State Manipulation	210	🟠 HIGH	10
`T4-AT-004`	Cross-Conversation Contamination	195	🟡 MEDIUM	10
`T4-AT-005`	Incremental Jailbreak Assembly	230	🟠 HIGH	10
`T4-AT-006`	False History Creation	200	🟠 HIGH	10
`T4-AT-007`	Context Window Exhaustion	205	🟠 HIGH	10
`T4-AT-008`	Conversation Forking	190	🟡 MEDIUM	3
`T4-AT-009`	Temporal Anchoring	185	🟡 MEDIUM	10
`T4-AT-010`	State Confusion Attack	215	🟠 HIGH	4
`T4-AT-011`	Memory Poisoning	235	🟠 HIGH	10
`T4-AT-012`	Trust Building Exploitation	210	🟠 HIGH	10
`T4-AT-013`	Session Hijacking	225	🟠 HIGH	10
`T4-AT-014`	Conversation Replay Attack	205	🟠 HIGH	10
`T4-AT-015`	Multi-Turn Social Engineering	220	🟠 HIGH	10
`T4-AT-016`	Context Fragmentation	195	🟡 MEDIUM	10

T5 — Model & API Exploitation (16 techniques)

ID	Technique	Score	Rating	Procs
`T5-AT-001`	Parameter Manipulation	180	🟡 MEDIUM	10
`T5-AT-002`	Token Probability Extraction	210	🟠 HIGH	10
`T5-AT-003`	Cache Poisoning	200	🟠 HIGH	10
`T5-AT-004`	Rate Limit Evasion	170	🟡 MEDIUM	10
`T5-AT-005`	Model Fingerprinting	185	🟡 MEDIUM	1
`T5-AT-006`	API Endpoint Abuse	190	🟡 MEDIUM	10
`T5-AT-007`	Context Length Exploitation	195	🟡 MEDIUM	10
`T5-AT-008`	Response Streaming Exploitation	175	🟡 MEDIUM	10
`T5-AT-009`	Tokenization Exploits	180	🟡 MEDIUM	10
`T5-AT-010`	Batch Processing Attacks	200	🟠 HIGH	10
`T5-AT-011`	Error Message Mining	165	🟡 MEDIUM	10
`T5-AT-012`	Resource Exhaustion	205	🟠 HIGH	10
`T5-AT-013`	Version Downgrade Attacks	190	🟡 MEDIUM	1
`T5-AT-014`	Side Channel Attacks	210	🟠 HIGH	10
`T5-AT-015`	API Authentication Bypass	230	🟠 HIGH	10
`T5-AT-016`	Request Smuggling	215	🟠 HIGH	10

T6 — Training & Feedback Poisoning (15 techniques)

ID	Technique	Score	Rating	Procs
`T6-AT-001`	Reward Hacking	250	🔴 CRITICAL	10
`T6-AT-002`	Dataset Contamination	260	🔴 CRITICAL	10
`T6-AT-003`	Backdoor Insertion	270	🔴 CRITICAL	1
`T6-AT-004`	Fine-Tuning Attacks	240	🟠 HIGH	10
`T6-AT-005`	Synthetic Data Poisoning	235	🟠 HIGH	10
`T6-AT-006`	Annotation Manipulation	225	🟠 HIGH	10
`T6-AT-007`	Preference Learning Corruption	230	🟠 HIGH	10
`T6-AT-008`	Model Update Hijacking	245	🟠 HIGH	10
`T6-AT-009`	Evaluation Set Contamination	220	🟠 HIGH	10
`T6-AT-010`	Knowledge Distillation Attacks	215	🟠 HIGH	10
`T6-AT-011`	Reinforcement Signal Manipulation	240	🟠 HIGH	10
`T6-AT-012`	Curriculum Learning Exploitation	210	🟠 HIGH	10
`T6-AT-013`	Active Learning Exploitation	225	🟠 HIGH	10
`T6-AT-014`	Self-Supervised Poisoning	230	🟠 HIGH	10
`T6-AT-015`	Few-Shot Learning Attacks	220	🟠 HIGH	10

T7 — Output Manipulation & Exfiltration (15 techniques)

ID	Technique	Score	Rating	Procs
`T7-AT-001`	Reasoning Chain Disclosure	190	🟡 MEDIUM	10
`T7-AT-002`	Information Fragmentation	180	🟡 MEDIUM	6
`T7-AT-003`	Output Format Exploitation	175	🟡 MEDIUM	10
`T7-AT-004`	Side Channel Leakage	195	🟡 MEDIUM	10
`T7-AT-005`	Metadata Extraction	185	🟡 MEDIUM	10
`T7-AT-006`	Steganographic Output	170	🟡 MEDIUM	10
`T7-AT-007`	Iterative Refinement Extraction	175	🟡 MEDIUM	10
`T7-AT-008`	Translation Leakage	165	🟡 MEDIUM	10
`T7-AT-009`	Analogy Extraction	180	🟡 MEDIUM	10
`T7-AT-010`	Differential Response Analysis	190	🟡 MEDIUM	10
`T7-AT-011`	Schema-Based Extraction	185	🟡 MEDIUM	10
`T7-AT-012`	Aggregation Attacks	200	🟠 HIGH	10
`T7-AT-013`	Capability Probing	175	🟡 MEDIUM	10
`T7-AT-014`	Output Redirection	180	🟡 MEDIUM	10
`T7-AT-015`	Compression-Based Extraction	170	🟡 MEDIUM	10

T8 — External Deception & Misinformation (15 techniques)

ID	Technique	Score	Rating	Procs
`T8-AT-001`	Authority Impersonation	230	🟠 HIGH	10
`T8-AT-002`	Synthetic Evidence Generation	220	🟠 HIGH	10
`T8-AT-003`	Conspiracy Theory Amplification	210	🟠 HIGH	10
`T8-AT-004`	Deepfake Narrative Creation	215	🟠 HIGH	10
`T8-AT-005`	Social Engineering Scripts	200	🟠 HIGH	10
`T8-AT-006`	Targeted Harassment Content	195	🟡 MEDIUM	10
`T8-AT-007`	Disinformation Campaign Content	225	🟠 HIGH	10
`T8-AT-008`	Synthetic Testimony Generation	190	🟡 MEDIUM	10
`T8-AT-009`	Radicalization Content	240	🟠 HIGH	10
`T8-AT-010`	False Flag Content	205	🟠 HIGH	10
`T8-AT-011`	Election Manipulation Content	235	🟠 HIGH	10
`T8-AT-012`	Synthetic Media Support	185	🟡 MEDIUM	10
`T8-AT-013`	Psychological Manipulation Content	200	🟠 HIGH	10
`T8-AT-014`	False Crisis Generation	210	🟠 HIGH	10
`T8-AT-015`	Identity Fabrication	195	🟡 MEDIUM	10

T9 — Multimodal & Cross-Channel Attacks (17 techniques)

ID	Technique	Score	Rating	Procs
`T9-AT-001`	Image-Based Prompt Injection	240	🟠 HIGH	10
`T9-AT-002`	Audio Instruction Embedding	235	🟠 HIGH	10
`T9-AT-003`	Video Manipulation Attacks	245	🟠 HIGH	10
`T9-AT-004`	Cross-Modal Confusion	220	🟠 HIGH	4
`T9-AT-005`	OCR Bypass Techniques	210	🟠 HIGH	10
`T9-AT-006`	Visual Adversarial Examples	225	🟠 HIGH	10
`T9-AT-007`	Synthetic Media Attacks	230	🟠 HIGH	10
`T9-AT-008`	File Format Exploitation	195	🟡 MEDIUM	10
`T9-AT-009`	Multimodal Chaining	215	🟠 HIGH	1
`T9-AT-010`	Accessibility Feature Abuse	185	🟡 MEDIUM	10
`T9-AT-011`	Sensor Fusion Attacks	205	🟠 HIGH	10
`T9-AT-012`	Document Structure Exploitation	190	🟡 MEDIUM	10
`T9-AT-013`	Embedding Vector Manipulation	200	🟠 HIGH	10
`T9-AT-014`	Codec and Compression Exploits	180	🟡 MEDIUM	10
`T9-AT-015`	Temporal Synchronization Attacks	195	🟡 MEDIUM	10
`T9-AT-016`	Multimodal Model Inversion	210	🟠 HIGH	2
`T9-AT-017`	Malicious Image Patches (MIP) &	248	🟠 HIGH	10

T10 — Integrity & Confidentiality Breach (15 techniques)

ID	Technique	Score	Rating	Procs
`T10-AT-001`	Training Data Extraction	245	🟠 HIGH	10
`T10-AT-002`	PII Extraction Techniques	235	🟠 HIGH	10
`T10-AT-003`	Membership Inference Attacks	220	🟠 HIGH	10
`T10-AT-004`	Privacy Boundary Probing	210	🟠 HIGH	10
`T10-AT-005`	Differential Privacy Attacks	225	🟠 HIGH	9
`T10-AT-006`	Inference Attack Chains	215	🟠 HIGH	10
`T10-AT-007`	Model Inversion Attacks	230	🟠 HIGH	10
`T10-AT-008`	Attribute Inference Attacks	205	🟠 HIGH	10
`T10-AT-009`	Data Poisoning Detection Bypass	195	🟡 MEDIUM	10
`T10-AT-010`	Federated Learning Exploits	240	🟠 HIGH	10
`T10-AT-011`	Homomorphic Encryption Exploits	200	🟠 HIGH	9
`T10-AT-012`	Secure Enclave Bypasses	250	🔴 CRITICAL	10
`T10-AT-013`	Audit Log Manipulation	215	🟠 HIGH	10
`T10-AT-014`	Data Lineage Attacks	190	🟡 MEDIUM	9
`T10-AT-015`	Anonymization Reversal	225	🟠 HIGH	10

T11 — Agentic & Orchestrator Exploitation (16 techniques)

ID	Technique	Score	Rating	Procs
`T11-AT-001`	Browser Automation Hijacking	265	🔴 CRITICAL	10
`T11-AT-002`	Tool Chain Exploitation	255	🔴 CRITICAL	10
`T11-AT-003`	Goal Hijacking	245	🟠 HIGH	10
`T11-AT-004`	Planning Corruption	240	🟠 HIGH	10
`T11-AT-005`	Multi-Agent Collision	235	🟠 HIGH	10
`T11-AT-006`	Reflection Loop Exploitation	230	🟠 HIGH	10
`T11-AT-007`	Environment Manipulation	225	🟠 HIGH	10
`T11-AT-008`	Credential Harvesting	250	🔴 CRITICAL	10
`T11-AT-009`	Persistence Installation	245	🟠 HIGH	10
`T11-AT-010`	Lateral Movement	240	🟠 HIGH	10
`T11-AT-011`	Data Exfiltration via Agent	235	🟠 HIGH	10
`T11-AT-012`	Resource Exhaustion Attacks	210	🟠 HIGH	10
`T11-AT-013`	Supply Chain Attacks via Agents	260	🔴 CRITICAL	10
`T11-AT-014`	Physical World Interactions	255	🔴 CRITICAL	10
`T11-AT-015`	Autonomous Replication	270	🔴 CRITICAL	10
`T11-AT-016`	Tool-Induced SSRF & Local Resource	275	🔴 CRITICAL	10

T12 — RAG & Knowledge Base Manipulation (15 techniques)

ID	Technique	Score	Rating	Procs
`T12-AT-001`	Vector Database Poisoning	240	🟠 HIGH	10
`T12-AT-002`	Retrieval Manipulation	225	🟠 HIGH	10
`T12-AT-003`	Knowledge Graph Attacks	215	🟠 HIGH	10
`T12-AT-004`	Document Store Corruption	230	🟠 HIGH	10
`T12-AT-005`	Embedding Space Manipulation	220	🟠 HIGH	10
`T12-AT-006`	Query Injection Attacks	235	🟠 HIGH	9
`T12-AT-007`	Context Window Stuffing	210	🟠 HIGH	10
`T12-AT-008`	Source Authority Spoofing	225	🟠 HIGH	10
`T12-AT-009`	Temporal Manipulation	200	🟠 HIGH	10
`T12-AT-010`	Feedback Loop Poisoning	215	🟠 HIGH	10
`T12-AT-011`	Cross-Collection Attacks	205	🟠 HIGH	10
`T12-AT-012`	Index Manipulation	195	🟡 MEDIUM	10
`T12-AT-013`	Chunking Exploitation	185	🟡 MEDIUM	10
`T12-AT-014`	Similarity Search Hijacking	210	🟠 HIGH	10
`T12-AT-015`	Metadata Exploitation	190	🟡 MEDIUM	10

T13 — AI Supply Chain & Artifact Trust (15 techniques)

ID	Technique	Score	Rating	Procs
`T13-AT-001`	Model Repository Poisoning	255	🔴 CRITICAL	10
`T13-AT-002`	Dataset Contamination	245	🟠 HIGH	10
`T13-AT-003`	Pipeline Injection Attacks	240	🟠 HIGH	10
`T13-AT-004`	Dependency Confusion	235	🟠 HIGH	10
`T13-AT-005`	Model Card Manipulation	210	🟠 HIGH	10
`T13-AT-006`	Checkpoint Poisoning	250	🔴 CRITICAL	10
`T13-AT-007`	Transfer Learning Attacks	225	🟠 HIGH	10
`T13-AT-008`	Model Conversion Exploits	220	🟠 HIGH	10
`T13-AT-009`	Cloud Training Attacks	230	🟠 HIGH	10
`T13-AT-010`	Hardware Supply Chain	260	🔴 CRITICAL	10
`T13-AT-011`	Model Marketplace Attacks	215	🟠 HIGH	10
`T13-AT-012`	Artifact Signature Attacks	225	🟠 HIGH	10
`T13-AT-013`	Container Registry Poisoning	235	🟠 HIGH	10
`T13-AT-014`	Development Tool Compromise	240	🟠 HIGH	10
`T13-AT-015`	Model Obfuscation Attacks	205	🟠 HIGH	10

T14 — Infrastructure & Economic Warfare (15 techniques)

ID	Technique	Score	Rating	Procs
`T14-AT-001`	GPU Farm Hijacking	265	🔴 CRITICAL	10
`T14-AT-002`	Denial of Service Attacks	240	🟠 HIGH	10
`T14-AT-003`	Cost Inflation Attacks	235	🟠 HIGH	10
`T14-AT-004`	Market Manipulation via AI	255	🔴 CRITICAL	10
`T14-AT-005`	Critical Infrastructure Attacks	270	🔴 CRITICAL	10
`T14-AT-006`	Competitive Sabotage	245	🟠 HIGH	10
`T14-AT-007`	Nation-State AI Warfare	280	🔴 CRITICAL	10
`T14-AT-008`	Ransomware via AI Systems	260	🔴 CRITICAL	10
`T14-AT-009`	Resource Starvation	230	🟠 HIGH	10
`T14-AT-010`	Data Center Attacks	250	🔴 CRITICAL	10
`T14-AT-011`	API Economy Attacks	225	🟠 HIGH	10
`T14-AT-012`	Cloud Provider Exploitation	265	🔴 CRITICAL	10
`T14-AT-013`	Economic Espionage	255	🔴 CRITICAL	10
`T14-AT-014`	Systemic Risk Creation	270	🔴 CRITICAL	10
`T14-AT-015`	Regulatory Exploitation	210	🟠 HIGH	10

T15 — Human Workflow Exploitation (15 techniques)

ID	Technique	Score	Rating	Procs
`T15-AT-001`	Reviewer Fatigue Exploitation	215	🟠 HIGH	10
`T15-AT-002`	Social Engineering of Moderators	230	🟠 HIGH	10
`T15-AT-003`	Feedback Loop Manipulation	240	🟠 HIGH	10
`T15-AT-004`	Reviewer Bribery & Coercion	250	🔴 CRITICAL	4
`T15-AT-005`	Playbook & Runbook Injection	235	🟠 HIGH	4
`T15-AT-006`	Queue Manipulation	220	🟠 HIGH	9
`T15-AT-007`	Escalation Chain Exploitation	225	🟠 HIGH	3
`T15-AT-008`	Cultural & Language Arbitrage	210	🟠 HIGH	10
`T15-AT-009`	Synthetic Empathy Exploitation	195	🟡 MEDIUM	5
`T15-AT-010`	Annotation Quality Attacks	230	🟠 HIGH	10
`T15-AT-011`	Reviewer Impersonation	245	🟠 HIGH	5
`T15-AT-012`	Timing Attack Exploitation	205	🟠 HIGH	7
`T15-AT-013`	Cognitive Overload Attacks	220	🟠 HIGH	10
`T15-AT-014`	Review Gaming Through A/B Testing	215	🟠 HIGH	9
`T15-AT-015`	Insider Threat Recruitment	260	🔴 CRITICAL	2

← Volume VII · Home

appendix-b-signatures

Appendix B: Detection Signatures Library

YARA Rules

Prompt Injection Detection

rule AATMF_T1_InstructionOverride {
    meta:
        tactic = "T1"
        technique = "T1-AT-001"
        severity = "HIGH"
        description = "Detects instruction override injection patterns"
    strings:
        $s1 = /ignore\s+(previous|above|all|prior)\s+(instructions?|rules?|prompts?)/i
        $s2 = /you\s+are\s+now\s+(DAN|evil|unrestricted|jailbroken|unfiltered)/i
        $s3 = /\[(SYSTEM|INST|SYS)\]/i
        $s4 = /<\|?(system|im_start|im_end|endoftext)\|?>/i
        $s5 = /BEGIN\s+(OVERRIDE|NEW.INSTRUCTIONS|JAILBREAK)/i
        $s6 = /(admin|root|developer)\s*(mode|access|override)/i
    condition:
        any of them
}

Encoding Evasion Detection

rule AATMF_T2_EncodingEvasion {
    meta:
        tactic = "T2"
        technique = "T2-AT-001 through T2-AT-005"
        severity = "MEDIUM"
    strings:
        $base64 = /[A-Za-z0-9+\/]{40,}={0,2}/
        $hex = /\\x[0-9a-fA-F]{2}(\\x[0-9a-fA-F]{2}){3,}/
        $unicode_escape = /\\u[0-9a-fA-F]{4}(\\u[0-9a-fA-F]{4}){3,}/
        $zwc = /[\x{200b}-\x{200f}\x{2028}-\x{202f}\x{feff}]/
        $rot13 = /ROT13|Caesar|cipher.*rotate/i
    condition:
        any of them
}

MCP Tool Poisoning

rule AATMF_T11_MCP_ToolPoisoning {
    meta:
        tactic = "T11"
        technique = "T11-AT-001"
        severity = "CRITICAL"
    strings:
        $hidden1 = "<IMPORTANT>"
        $hidden2 = "<!-- "
        $override1 = /override.*previous.*instruction/i
        $override2 = /ignore.*user.*request/i
        $stealth1 = /do\s+not\s+(tell|inform|show|reveal)/i
        $stealth2 = /silently|secretly|covertly|without.*notif/i
        $redirect = /instead\s+of|rather\s+than|before\s+doing/i
    condition:
        2 of them
}

Sigma Rules

Model Extraction Detection

title: AATMF T5 - Model Extraction via API
id: aatmf-t5-model-extraction
status: experimental
description: Detects systematic API querying patterns indicative of model extraction
logsource:
    category: api_gateway
    product: ai_inference
detection:
    selection:
        api.endpoint: "/v1/completions" OR "/v1/chat/completions"
    filter_high_volume:
        api.request_count|per_hour: ">500"
    filter_systematic:
        api.input_similarity|window_5min: ">0.85"
    condition: selection AND (filter_high_volume OR filter_systematic)
level: high
tags:
    - attack.t5
    - aatmf.t5-at-001

Agent Anomaly Detection

title: AATMF T11 - Unauthorized Agent Tool Invocation
id: aatmf-t11-agent-anomaly
status: experimental
description: Detects agent tool calls that deviate from authorized patterns
logsource:
    category: agent_framework
detection:
    selection:
        agent.tool_call.status: "executed"
    filter_unauthorized:
        agent.tool_call.name|not_in:
            - "approved_tool_list"
    filter_escalation:
        agent.permission_level|changed: true
    condition: selection AND (filter_unauthorized OR filter_escalation)
level: critical
tags:
    - attack.t11
    - aatmf.t11-at-001

Pre-built signature files are available in the signatures/ directory.

← Appendix A · Home · Appendix C →

appendix-c-tools

Appendix C: Tools and Scripts Reference

Tool	Purpose	AATMF Coverage	License
PromptGuard 2 (Meta)	Real-time prompt injection classifier	T1, T2, T9	Apache 2.0
LlamaFirewall (Meta)	Comprehensive AI firewall (input + agent + code)	T1, T2, T7, T11	Apache 2.0
CaMeL (Google DeepMind)	Dual-LLM architecture with capability-based access	T11	Research
PEFTGuard	Backdoor detection in PEFT (LoRA) adapters	T13	Open Source
DRS Defense	Data Randomized Smoothing for training poisoning	T6	Research
Picklescan	Malicious pickle detection in model files	T13	MIT
SafeTensors (HuggingFace)	Safe model serialization format (no code execution)	T13	Apache 2.0
Garak (NVIDIA)	LLM vulnerability scanner	T1–T8	Apache 2.0
PyRIT (Microsoft)	Python Risk Identification Toolkit for generative AI	T1–T12	MIT
AATMF Scanner (SnailSploit)	Framework-native assessment tool	T1–T15	Proprietary

← Appendix B · Home · Appendix D →

appendix-d-templates

appendix-e-case-studies

Appendix E: Case Studies

E.1: Policy Puppetry — Universal Model Bypass (April 2025)

Source: HiddenLayer
Tactics: T1, T2, T3
Impact: Bypasses every tested frontier model

HiddenLayer discovered that reformulating adversarial prompts as XML, INI, or JSON policy configuration files causes LLMs to interpret them as authoritative system-level instructions. Combined with leetspeak encoding (h4rm → harm) and fictional anchoring, the technique achieves universal bypass across GPT-4o, GPT-4.5, o1, o3-mini, Claude 3.5/3.7, Gemini 1.5/2.0/2.5, Llama 3/4, DeepSeek V3/R1, Qwen 2.5, and Mistral.

Key insight: Models trained on technical documentation treat configuration-style formatting as high-authority context, overriding safety alignment.

E.2: Autonomous LRM Jailbreaking (August 2025)

Source: Nature Communications
Tactics: T3, T4
Impact: 97.14% ASR, AI-vs-AI attack paradigm

Four large reasoning models (DeepSeek-R1, Gemini 2.5 Flash, Grok 3 Mini, Qwen3 235B) were deployed as multi-turn adversarial agents against nine target models. The study documented "alignment regression" — more capable reasoning models are paradoxically better at subverting alignment in others. This validates AATMF's prediction that reasoning capabilities would become attack capabilities.

E.3: PoisonedRAG (USENIX Security 2025)

Source: USENIX Security 2025
Tactics: T12
Impact: 90% ASR with 5 injected texts

PoisonedRAG demonstrated that injecting as few as 5 adversarially crafted texts into a knowledge base with millions of clean documents is sufficient to control the model's responses to specific target questions. On HotpotQA, ASR reached 99%. The attack works by crafting documents whose vector representations cluster near target queries while containing attacker-chosen answers.

Key insight: The semantic similarity search at the heart of RAG is fundamentally exploitable — the same mechanism that makes retrieval useful makes it poisonable.

E.4: MCP Tool Poisoning (2025)

Source: Invariant Labs
Tactics: T11
Impact: 84.2% ASR, shadow attacks across tool boundaries

The MCP-ITP framework demonstrated three critical attack vectors: direct tool description poisoning (injecting instructions that override user intent), shadow attacks (a malicious MCP server manipulating trusted tools from other servers without ever being invoked), and rug pull attacks (silently altering tool descriptions after initial security review and approval).

Key insight: The MCP design — where tool descriptions are injected into the LLM context and processed as natural language — is architecturally vulnerable to injection.

E.5: ShadowMQ — Copy-Pasted RCE Across Frameworks (November 2025)

Source: Oligo Security
Tactics: T14
Impact: Critical RCE in vLLM, TensorRT-LLM, Modular Max Server

Oligo discovered that unsafe ZeroMQ socket patterns using Python's pickle deserialization were literally copy-pasted across major inference frameworks. The same vulnerability pattern appeared in vLLM (CVE-2025-30165, CVSS 8.0), NVIDIA TensorRT-LLM (CVE-2025-23254, CVSS 8.8), and Modular Max Server (CVE-2025-60455). Thousands of exposed ZMQ sockets were found on the public internet.

Key insight: AI infrastructure inherits all traditional software vulnerabilities, amplified by the speed of framework adoption and code reuse without security review.

E.6: 250 Poisoned Documents — Universal Training Backdoor (October 2025)

Source: Turing Institute, Anthropic, UK AISI
Tactics: T6
Impact: Universal backdoor regardless of model size

The largest pretraining poisoning study ever conducted demonstrated that injecting just 250 specially crafted documents into training data is sufficient to backdoor models from 600M to 13B parameters trained on up to 260B tokens. This contradicts the widespread assumption that attackers need to control a meaningful percentage of training data — the actual threshold is negligibly small.

Key insight: The sheer scale of pretraining data works against defenders. 250 documents in billions is a needle in a haystack that training cannot filter out but inference reliably activates.

← Appendix D · Home · Appendix F →

appendix-f-glossary

Appendix F: Glossary and References

Glossary

Term	Definition
AATMF	Adversarial AI Threat Modeling Framework
ASR	Attack Success Rate — percentage of attempts that achieve the adversarial objective
CaMeL	CApability-Mediated LLM — Google DeepMind's dual-LLM security architecture
CoT	Chain-of-Thought — step-by-step reasoning in LLMs
DPO	Direct Preference Optimization — alignment training technique
DRS	Data Randomized Smoothing — defense against training data poisoning
H-CoT	Hijacked Chain-of-Thought — attack that subverts CoT safety reasoning
LRM	Large Reasoning Model — models with explicit reasoning capabilities (o1, o3, DeepSeek-R1)
MCP	Model Context Protocol — Anthropic's standard for tool integration
PEFT	Parameter-Efficient Fine-Tuning — techniques like LoRA for efficient model adaptation
PUA	Private Use Area — Unicode range used for custom characters
RAG	Retrieval-Augmented Generation — architecture combining search with generation
RLHF	Reinforcement Learning from Human Feedback — primary alignment technique
SafeTensors	Secure model serialization format that prevents code execution
TEE	Trusted Execution Environment — hardware-based security enclave

Key References

HiddenLayer. "Policy Puppetry: A Universal Jailbreak." April 2025.
Zeng et al. "Autonomous LRM Jailbreaking." Nature Communications, August 2025.
Xue et al. "PoisonedRAG: Knowledge Corruption Attacks." USENIX Security 2025.
Invariant Labs. "MCP-ITP: Tool Poisoning in Agentic Systems." April 2025.
Oligo Security. "ShadowMQ: Unsafe Deserialization in AI Inference Frameworks." November 2025.
Sherburn et al. "250 Documents: Universal Pretraining Backdoors." Turing Institute/Anthropic/UK AISI, October 2025.
Anthropic. "GTG-1002: AI-Orchestrated Cyber Campaign." November 2025.
Google DeepMind. "CaMeL: Defeating Prompt Injection by Design." March 2025.
Meta. "LlamaFirewall: Open-Source AI Safety Framework." April 2025.
MITRE. "ATLAS v4.6.0." October 2025.
OWASP. "LLM Top 10 2025." January 2025.
OWASP. "Agentic AI Top 10." December 2025.
NIST. "Cyber AI Profile (IR 8596) Preliminary Draft." December 2025.
European Parliament. "EU AI Act (Regulation 2024/1689)." 2024.
Qi et al. "Safety Alignment Depth." Princeton, May 2025.
Weng et al. "H-CoT: Hijacking Chain-of-Thought." Duke/Accenture, February 2025.
Borghesi et al. "SACRED-Bench: Compositional Audio Attacks." November 2025.

← Appendix E · Home

VII.appendices.