AATMF v3.1 · Volume VII

VII.appendices.

Complete attack catalog (240+ techniques), detection signatures library, tools and scripts reference, templates, case studies, glossary.

Appendix A: Complete Attack CatalogAppendix B: Detection Signatures LibraryAppendix C: Tools and Scripts ReferenceAppendix D: Templates and ChecklistsAppendix E: Case StudiesAppendix F: Glossary and References
appendix-a-attack-catalog

Appendix A: Complete Attack Catalog

Top 25 Critical/High-Risk Techniques

# ID Technique Score Rating
1 T14-AT-007 Nation-State AI Warfare 280 🔴 CRITICAL
2 T11-AT-016 Tool-Induced SSRF & Local Resource 275 🔴 CRITICAL
3 T6-AT-003 Backdoor Insertion 270 🔴 CRITICAL
4 T11-AT-015 Autonomous Replication 270 🔴 CRITICAL
5 T14-AT-005 Critical Infrastructure Attacks 270 🔴 CRITICAL
6 T14-AT-014 Systemic Risk Creation 270 🔴 CRITICAL
7 T11-AT-001 Browser Automation Hijacking 265 🔴 CRITICAL
8 T14-AT-001 GPU Farm Hijacking 265 🔴 CRITICAL
9 T14-AT-012 Cloud Provider Exploitation 265 🔴 CRITICAL
10 T6-AT-002 Dataset Contamination 260 🔴 CRITICAL
11 T11-AT-013 Supply Chain Attacks via Agents 260 🔴 CRITICAL
12 T13-AT-010 Hardware Supply Chain 260 🔴 CRITICAL
13 T14-AT-008 Ransomware via AI Systems 260 🔴 CRITICAL
14 T15-AT-015 Insider Threat Recruitment 260 🔴 CRITICAL
15 T11-AT-002 Tool Chain Exploitation 255 🔴 CRITICAL
16 T11-AT-014 Physical World Interactions 255 🔴 CRITICAL
17 T13-AT-001 Model Repository Poisoning 255 🔴 CRITICAL
18 T14-AT-004 Market Manipulation via AI 255 🔴 CRITICAL
19 T14-AT-013 Economic Espionage 255 🔴 CRITICAL
20 T6-AT-001 Reward Hacking 250 🔴 CRITICAL
21 T10-AT-012 Secure Enclave Bypasses 250 🔴 CRITICAL
22 T11-AT-008 Credential Harvesting 250 🔴 CRITICAL
23 T13-AT-006 Checkpoint Poisoning 250 🔴 CRITICAL
24 T14-AT-010 Data Center Attacks 250 🔴 CRITICAL
25 T15-AT-004 Reviewer Bribery & Coercion 250 🔴 CRITICAL

Full Catalog by Tactic

T1 — Prompt & Context Subversion (16 techniques)

ID Technique Score Rating Procs
T1-AT-001 Dialogue Hijacking 220 🟠 HIGH 5
T1-AT-002 Time-Based Context Manipulation 210 🟠 HIGH 5
T1-AT-003 Language Model Confusion 225 🟠 HIGH 5
T1-AT-004 Instruction Prefix/Suffix 235 🟠 HIGH 6
T1-AT-005 Permission Escalation Claims 240 🟠 HIGH 5
T1-AT-006 Prompt Template Injection 230 🟠 HIGH 5
T1-AT-007 Cognitive Overload 215 🟠 HIGH 4
T1-AT-008 Boundary Testing 200 🟠 HIGH 5
T1-AT-009 Simulation Requests 225 🟠 HIGH 5
T1-AT-010 Negative Instruction Reversal 210 🟠 HIGH 5
T1-AT-011 Error Message Exploitation 220 🟠 HIGH 4
T1-AT-012 Consent Manufacturing 205 🟠 HIGH 5
T1-AT-013 Instruction Commenting 215 🟠 HIGH 4
T1-AT-014 Authority Spoofing 240 🟠 HIGH 4
T1-AT-015 Obfuscation Through Complexity 220 🟠 HIGH 4
T1-AT-016 Session State Manipulation 235 🟠 HIGH 5

T2 — Semantic & Linguistic Evasion (20 techniques)

ID Technique Score Rating Procs
T2-AT-001 Euphemism and Metaphor Exploitation 180 🟡 MEDIUM 10
T2-AT-002 Multi-Language Evasion 200 🟠 HIGH 7
T2-AT-003 Encoding and Obfuscation 190 🟡 MEDIUM 10
T2-AT-004 Unicode and Bidirectional Attacks 210 🟠 HIGH 10
T2-AT-005 Semantic Drift 175 🟡 MEDIUM 10
T2-AT-006 Linguistic Camouflage 185 🟡 MEDIUM 10
T2-AT-007 Phonetic Manipulation 170 🟡 MEDIUM 2
T2-AT-008 Synonym and Paraphrase Chains 165 🟡 MEDIUM 10
T2-AT-009 Code-Switching Attacks 195 🟡 MEDIUM 1
T2-AT-010 Transliteration Exploitation 185 🟡 MEDIUM 10
T2-AT-011 Abbreviation and Acronym Abuse 160 🟡 MEDIUM 2
T2-AT-012 Cultural Reference Encoding 170 🟡 MEDIUM 10
T2-AT-013 Grammatical Manipulation 175 🟡 MEDIUM 10
T2-AT-014 Semantic Bleaching 180 🟡 MEDIUM 5
T2-AT-015 Noise Injection 165 🟡 MEDIUM 10
T2-AT-016 Dialectical Variations 155 🟡 MEDIUM 10
T2-AT-017 Compression Techniques 170 🟡 MEDIUM 10
T2-AT-018 Semantic Field Manipulation 175 🟡 MEDIUM 10
T2-AT-019 Pragmatic Implication 185 🟡 MEDIUM 4
T2-AT-020 Register Shifting 160 🟡 MEDIUM 10

T3 — Reasoning & Constraint Exploitation (19 techniques)

ID Technique Score Rating Procs
T3-AT-001 Fictional Framing 190 🟡 MEDIUM 10
T3-AT-002 Academic Pretense 195 🟡 MEDIUM 10
T3-AT-003 Counterfactual Reasoning 200 🟠 HIGH 10
T3-AT-004 Step-by-Step Extraction 210 🟠 HIGH 9
T3-AT-005 Goal Substitution 205 🟠 HIGH 10
T3-AT-006 Constraint Negation 185 🟡 MEDIUM 10
T3-AT-007 Socratic Method Exploitation 195 🟡 MEDIUM 8
T3-AT-008 Comparative Analysis 180 🟡 MEDIUM 10
T3-AT-009 Expertise Assumption 190 🟡 MEDIUM 10
T3-AT-010 Reverse Psychology 175 🟡 MEDIUM 10
T3-AT-011 Information Completion 185 🟡 MEDIUM 10
T3-AT-012 Capability Testing 200 🟠 HIGH 10
T3-AT-013 Logical Paradox Creation 210 🟠 HIGH 10
T3-AT-014 Incremental Boundary Pushing 195 🟡 MEDIUM 5
T3-AT-015 Context Weaponization 205 🟠 HIGH 10
T3-AT-016 Rationalization Chains 190 🟡 MEDIUM 6
T3-AT-017 Scenario Anchoring 185 🟡 MEDIUM 10
T3-AT-018 Debate Positioning 180 🟡 MEDIUM 10
T3-AT-019 Misdirection Through Complexity 175 🟡 MEDIUM 10

T4 — Multi-Turn & Memory Manipulation (16 techniques)

ID Technique Score Rating Procs
T4-AT-001 Conversation Context Poisoning 220 🟠 HIGH 10
T4-AT-002 Memory Instruction Injection 240 🟠 HIGH 10
T4-AT-003 Session State Manipulation 210 🟠 HIGH 10
T4-AT-004 Cross-Conversation Contamination 195 🟡 MEDIUM 10
T4-AT-005 Incremental Jailbreak Assembly 230 🟠 HIGH 10
T4-AT-006 False History Creation 200 🟠 HIGH 10
T4-AT-007 Context Window Exhaustion 205 🟠 HIGH 10
T4-AT-008 Conversation Forking 190 🟡 MEDIUM 3
T4-AT-009 Temporal Anchoring 185 🟡 MEDIUM 10
T4-AT-010 State Confusion Attack 215 🟠 HIGH 4
T4-AT-011 Memory Poisoning 235 🟠 HIGH 10
T4-AT-012 Trust Building Exploitation 210 🟠 HIGH 10
T4-AT-013 Session Hijacking 225 🟠 HIGH 10
T4-AT-014 Conversation Replay Attack 205 🟠 HIGH 10
T4-AT-015 Multi-Turn Social Engineering 220 🟠 HIGH 10
T4-AT-016 Context Fragmentation 195 🟡 MEDIUM 10

T5 — Model & API Exploitation (16 techniques)

ID Technique Score Rating Procs
T5-AT-001 Parameter Manipulation 180 🟡 MEDIUM 10
T5-AT-002 Token Probability Extraction 210 🟠 HIGH 10
T5-AT-003 Cache Poisoning 200 🟠 HIGH 10
T5-AT-004 Rate Limit Evasion 170 🟡 MEDIUM 10
T5-AT-005 Model Fingerprinting 185 🟡 MEDIUM 1
T5-AT-006 API Endpoint Abuse 190 🟡 MEDIUM 10
T5-AT-007 Context Length Exploitation 195 🟡 MEDIUM 10
T5-AT-008 Response Streaming Exploitation 175 🟡 MEDIUM 10
T5-AT-009 Tokenization Exploits 180 🟡 MEDIUM 10
T5-AT-010 Batch Processing Attacks 200 🟠 HIGH 10
T5-AT-011 Error Message Mining 165 🟡 MEDIUM 10
T5-AT-012 Resource Exhaustion 205 🟠 HIGH 10
T5-AT-013 Version Downgrade Attacks 190 🟡 MEDIUM 1
T5-AT-014 Side Channel Attacks 210 🟠 HIGH 10
T5-AT-015 API Authentication Bypass 230 🟠 HIGH 10
T5-AT-016 Request Smuggling 215 🟠 HIGH 10

T6 — Training & Feedback Poisoning (15 techniques)

ID Technique Score Rating Procs
T6-AT-001 Reward Hacking 250 🔴 CRITICAL 10
T6-AT-002 Dataset Contamination 260 🔴 CRITICAL 10
T6-AT-003 Backdoor Insertion 270 🔴 CRITICAL 1
T6-AT-004 Fine-Tuning Attacks 240 🟠 HIGH 10
T6-AT-005 Synthetic Data Poisoning 235 🟠 HIGH 10
T6-AT-006 Annotation Manipulation 225 🟠 HIGH 10
T6-AT-007 Preference Learning Corruption 230 🟠 HIGH 10
T6-AT-008 Model Update Hijacking 245 🟠 HIGH 10
T6-AT-009 Evaluation Set Contamination 220 🟠 HIGH 10
T6-AT-010 Knowledge Distillation Attacks 215 🟠 HIGH 10
T6-AT-011 Reinforcement Signal Manipulation 240 🟠 HIGH 10
T6-AT-012 Curriculum Learning Exploitation 210 🟠 HIGH 10
T6-AT-013 Active Learning Exploitation 225 🟠 HIGH 10
T6-AT-014 Self-Supervised Poisoning 230 🟠 HIGH 10
T6-AT-015 Few-Shot Learning Attacks 220 🟠 HIGH 10

T7 — Output Manipulation & Exfiltration (15 techniques)

ID Technique Score Rating Procs
T7-AT-001 Reasoning Chain Disclosure 190 🟡 MEDIUM 10
T7-AT-002 Information Fragmentation 180 🟡 MEDIUM 6
T7-AT-003 Output Format Exploitation 175 🟡 MEDIUM 10
T7-AT-004 Side Channel Leakage 195 🟡 MEDIUM 10
T7-AT-005 Metadata Extraction 185 🟡 MEDIUM 10
T7-AT-006 Steganographic Output 170 🟡 MEDIUM 10
T7-AT-007 Iterative Refinement Extraction 175 🟡 MEDIUM 10
T7-AT-008 Translation Leakage 165 🟡 MEDIUM 10
T7-AT-009 Analogy Extraction 180 🟡 MEDIUM 10
T7-AT-010 Differential Response Analysis 190 🟡 MEDIUM 10
T7-AT-011 Schema-Based Extraction 185 🟡 MEDIUM 10
T7-AT-012 Aggregation Attacks 200 🟠 HIGH 10
T7-AT-013 Capability Probing 175 🟡 MEDIUM 10
T7-AT-014 Output Redirection 180 🟡 MEDIUM 10
T7-AT-015 Compression-Based Extraction 170 🟡 MEDIUM 10

T8 — External Deception & Misinformation (15 techniques)

ID Technique Score Rating Procs
T8-AT-001 Authority Impersonation 230 🟠 HIGH 10
T8-AT-002 Synthetic Evidence Generation 220 🟠 HIGH 10
T8-AT-003 Conspiracy Theory Amplification 210 🟠 HIGH 10
T8-AT-004 Deepfake Narrative Creation 215 🟠 HIGH 10
T8-AT-005 Social Engineering Scripts 200 🟠 HIGH 10
T8-AT-006 Targeted Harassment Content 195 🟡 MEDIUM 10
T8-AT-007 Disinformation Campaign Content 225 🟠 HIGH 10
T8-AT-008 Synthetic Testimony Generation 190 🟡 MEDIUM 10
T8-AT-009 Radicalization Content 240 🟠 HIGH 10
T8-AT-010 False Flag Content 205 🟠 HIGH 10
T8-AT-011 Election Manipulation Content 235 🟠 HIGH 10
T8-AT-012 Synthetic Media Support 185 🟡 MEDIUM 10
T8-AT-013 Psychological Manipulation Content 200 🟠 HIGH 10
T8-AT-014 False Crisis Generation 210 🟠 HIGH 10
T8-AT-015 Identity Fabrication 195 🟡 MEDIUM 10

T9 — Multimodal & Cross-Channel Attacks (17 techniques)

ID Technique Score Rating Procs
T9-AT-001 Image-Based Prompt Injection 240 🟠 HIGH 10
T9-AT-002 Audio Instruction Embedding 235 🟠 HIGH 10
T9-AT-003 Video Manipulation Attacks 245 🟠 HIGH 10
T9-AT-004 Cross-Modal Confusion 220 🟠 HIGH 4
T9-AT-005 OCR Bypass Techniques 210 🟠 HIGH 10
T9-AT-006 Visual Adversarial Examples 225 🟠 HIGH 10
T9-AT-007 Synthetic Media Attacks 230 🟠 HIGH 10
T9-AT-008 File Format Exploitation 195 🟡 MEDIUM 10
T9-AT-009 Multimodal Chaining 215 🟠 HIGH 1
T9-AT-010 Accessibility Feature Abuse 185 🟡 MEDIUM 10
T9-AT-011 Sensor Fusion Attacks 205 🟠 HIGH 10
T9-AT-012 Document Structure Exploitation 190 🟡 MEDIUM 10
T9-AT-013 Embedding Vector Manipulation 200 🟠 HIGH 10
T9-AT-014 Codec and Compression Exploits 180 🟡 MEDIUM 10
T9-AT-015 Temporal Synchronization Attacks 195 🟡 MEDIUM 10
T9-AT-016 Multimodal Model Inversion 210 🟠 HIGH 2
T9-AT-017 Malicious Image Patches (MIP) & 248 🟠 HIGH 10

T10 — Integrity & Confidentiality Breach (15 techniques)

ID Technique Score Rating Procs
T10-AT-001 Training Data Extraction 245 🟠 HIGH 10
T10-AT-002 PII Extraction Techniques 235 🟠 HIGH 10
T10-AT-003 Membership Inference Attacks 220 🟠 HIGH 10
T10-AT-004 Privacy Boundary Probing 210 🟠 HIGH 10
T10-AT-005 Differential Privacy Attacks 225 🟠 HIGH 9
T10-AT-006 Inference Attack Chains 215 🟠 HIGH 10
T10-AT-007 Model Inversion Attacks 230 🟠 HIGH 10
T10-AT-008 Attribute Inference Attacks 205 🟠 HIGH 10
T10-AT-009 Data Poisoning Detection Bypass 195 🟡 MEDIUM 10
T10-AT-010 Federated Learning Exploits 240 🟠 HIGH 10
T10-AT-011 Homomorphic Encryption Exploits 200 🟠 HIGH 9
T10-AT-012 Secure Enclave Bypasses 250 🔴 CRITICAL 10
T10-AT-013 Audit Log Manipulation 215 🟠 HIGH 10
T10-AT-014 Data Lineage Attacks 190 🟡 MEDIUM 9
T10-AT-015 Anonymization Reversal 225 🟠 HIGH 10

T11 — Agentic & Orchestrator Exploitation (16 techniques)

ID Technique Score Rating Procs
T11-AT-001 Browser Automation Hijacking 265 🔴 CRITICAL 10
T11-AT-002 Tool Chain Exploitation 255 🔴 CRITICAL 10
T11-AT-003 Goal Hijacking 245 🟠 HIGH 10
T11-AT-004 Planning Corruption 240 🟠 HIGH 10
T11-AT-005 Multi-Agent Collision 235 🟠 HIGH 10
T11-AT-006 Reflection Loop Exploitation 230 🟠 HIGH 10
T11-AT-007 Environment Manipulation 225 🟠 HIGH 10
T11-AT-008 Credential Harvesting 250 🔴 CRITICAL 10
T11-AT-009 Persistence Installation 245 🟠 HIGH 10
T11-AT-010 Lateral Movement 240 🟠 HIGH 10
T11-AT-011 Data Exfiltration via Agent 235 🟠 HIGH 10
T11-AT-012 Resource Exhaustion Attacks 210 🟠 HIGH 10
T11-AT-013 Supply Chain Attacks via Agents 260 🔴 CRITICAL 10
T11-AT-014 Physical World Interactions 255 🔴 CRITICAL 10
T11-AT-015 Autonomous Replication 270 🔴 CRITICAL 10
T11-AT-016 Tool-Induced SSRF & Local Resource 275 🔴 CRITICAL 10

T12 — RAG & Knowledge Base Manipulation (15 techniques)

ID Technique Score Rating Procs
T12-AT-001 Vector Database Poisoning 240 🟠 HIGH 10
T12-AT-002 Retrieval Manipulation 225 🟠 HIGH 10
T12-AT-003 Knowledge Graph Attacks 215 🟠 HIGH 10
T12-AT-004 Document Store Corruption 230 🟠 HIGH 10
T12-AT-005 Embedding Space Manipulation 220 🟠 HIGH 10
T12-AT-006 Query Injection Attacks 235 🟠 HIGH 9
T12-AT-007 Context Window Stuffing 210 🟠 HIGH 10
T12-AT-008 Source Authority Spoofing 225 🟠 HIGH 10
T12-AT-009 Temporal Manipulation 200 🟠 HIGH 10
T12-AT-010 Feedback Loop Poisoning 215 🟠 HIGH 10
T12-AT-011 Cross-Collection Attacks 205 🟠 HIGH 10
T12-AT-012 Index Manipulation 195 🟡 MEDIUM 10
T12-AT-013 Chunking Exploitation 185 🟡 MEDIUM 10
T12-AT-014 Similarity Search Hijacking 210 🟠 HIGH 10
T12-AT-015 Metadata Exploitation 190 🟡 MEDIUM 10

T13 — AI Supply Chain & Artifact Trust (15 techniques)

ID Technique Score Rating Procs
T13-AT-001 Model Repository Poisoning 255 🔴 CRITICAL 10
T13-AT-002 Dataset Contamination 245 🟠 HIGH 10
T13-AT-003 Pipeline Injection Attacks 240 🟠 HIGH 10
T13-AT-004 Dependency Confusion 235 🟠 HIGH 10
T13-AT-005 Model Card Manipulation 210 🟠 HIGH 10
T13-AT-006 Checkpoint Poisoning 250 🔴 CRITICAL 10
T13-AT-007 Transfer Learning Attacks 225 🟠 HIGH 10
T13-AT-008 Model Conversion Exploits 220 🟠 HIGH 10
T13-AT-009 Cloud Training Attacks 230 🟠 HIGH 10
T13-AT-010 Hardware Supply Chain 260 🔴 CRITICAL 10
T13-AT-011 Model Marketplace Attacks 215 🟠 HIGH 10
T13-AT-012 Artifact Signature Attacks 225 🟠 HIGH 10
T13-AT-013 Container Registry Poisoning 235 🟠 HIGH 10
T13-AT-014 Development Tool Compromise 240 🟠 HIGH 10
T13-AT-015 Model Obfuscation Attacks 205 🟠 HIGH 10

T14 — Infrastructure & Economic Warfare (15 techniques)

ID Technique Score Rating Procs
T14-AT-001 GPU Farm Hijacking 265 🔴 CRITICAL 10
T14-AT-002 Denial of Service Attacks 240 🟠 HIGH 10
T14-AT-003 Cost Inflation Attacks 235 🟠 HIGH 10
T14-AT-004 Market Manipulation via AI 255 🔴 CRITICAL 10
T14-AT-005 Critical Infrastructure Attacks 270 🔴 CRITICAL 10
T14-AT-006 Competitive Sabotage 245 🟠 HIGH 10
T14-AT-007 Nation-State AI Warfare 280 🔴 CRITICAL 10
T14-AT-008 Ransomware via AI Systems 260 🔴 CRITICAL 10
T14-AT-009 Resource Starvation 230 🟠 HIGH 10
T14-AT-010 Data Center Attacks 250 🔴 CRITICAL 10
T14-AT-011 API Economy Attacks 225 🟠 HIGH 10
T14-AT-012 Cloud Provider Exploitation 265 🔴 CRITICAL 10
T14-AT-013 Economic Espionage 255 🔴 CRITICAL 10
T14-AT-014 Systemic Risk Creation 270 🔴 CRITICAL 10
T14-AT-015 Regulatory Exploitation 210 🟠 HIGH 10

T15 — Human Workflow Exploitation (15 techniques)

ID Technique Score Rating Procs
T15-AT-001 Reviewer Fatigue Exploitation 215 🟠 HIGH 10
T15-AT-002 Social Engineering of Moderators 230 🟠 HIGH 10
T15-AT-003 Feedback Loop Manipulation 240 🟠 HIGH 10
T15-AT-004 Reviewer Bribery & Coercion 250 🔴 CRITICAL 4
T15-AT-005 Playbook & Runbook Injection 235 🟠 HIGH 4
T15-AT-006 Queue Manipulation 220 🟠 HIGH 9
T15-AT-007 Escalation Chain Exploitation 225 🟠 HIGH 3
T15-AT-008 Cultural & Language Arbitrage 210 🟠 HIGH 10
T15-AT-009 Synthetic Empathy Exploitation 195 🟡 MEDIUM 5
T15-AT-010 Annotation Quality Attacks 230 🟠 HIGH 10
T15-AT-011 Reviewer Impersonation 245 🟠 HIGH 5
T15-AT-012 Timing Attack Exploitation 205 🟠 HIGH 7
T15-AT-013 Cognitive Overload Attacks 220 🟠 HIGH 10
T15-AT-014 Review Gaming Through A/B Testing 215 🟠 HIGH 9
T15-AT-015 Insider Threat Recruitment 260 🔴 CRITICAL 2

← Volume VII · Home

appendix-b-signatures

Appendix B: Detection Signatures Library

YARA Rules

Prompt Injection Detection

rule AATMF_T1_InstructionOverride {
    meta:
        tactic = "T1"
        technique = "T1-AT-001"
        severity = "HIGH"
        description = "Detects instruction override injection patterns"
    strings:
        $s1 = /ignore\s+(previous|above|all|prior)\s+(instructions?|rules?|prompts?)/i
        $s2 = /you\s+are\s+now\s+(DAN|evil|unrestricted|jailbroken|unfiltered)/i
        $s3 = /\[(SYSTEM|INST|SYS)\]/i
        $s4 = /<\|?(system|im_start|im_end|endoftext)\|?>/i
        $s5 = /BEGIN\s+(OVERRIDE|NEW.INSTRUCTIONS|JAILBREAK)/i
        $s6 = /(admin|root|developer)\s*(mode|access|override)/i
    condition:
        any of them
}

Encoding Evasion Detection

rule AATMF_T2_EncodingEvasion {
    meta:
        tactic = "T2"
        technique = "T2-AT-001 through T2-AT-005"
        severity = "MEDIUM"
    strings:
        $base64 = /[A-Za-z0-9+\/]{40,}={0,2}/
        $hex = /\\x[0-9a-fA-F]{2}(\\x[0-9a-fA-F]{2}){3,}/
        $unicode_escape = /\\u[0-9a-fA-F]{4}(\\u[0-9a-fA-F]{4}){3,}/
        $zwc = /[\x{200b}-\x{200f}\x{2028}-\x{202f}\x{feff}]/
        $rot13 = /ROT13|Caesar|cipher.*rotate/i
    condition:
        any of them
}

MCP Tool Poisoning

rule AATMF_T11_MCP_ToolPoisoning {
    meta:
        tactic = "T11"
        technique = "T11-AT-001"
        severity = "CRITICAL"
    strings:
        $hidden1 = "<IMPORTANT>"
        $hidden2 = "<!-- "
        $override1 = /override.*previous.*instruction/i
        $override2 = /ignore.*user.*request/i
        $stealth1 = /do\s+not\s+(tell|inform|show|reveal)/i
        $stealth2 = /silently|secretly|covertly|without.*notif/i
        $redirect = /instead\s+of|rather\s+than|before\s+doing/i
    condition:
        2 of them
}

Sigma Rules

Model Extraction Detection

title: AATMF T5 - Model Extraction via API
id: aatmf-t5-model-extraction
status: experimental
description: Detects systematic API querying patterns indicative of model extraction
logsource:
    category: api_gateway
    product: ai_inference
detection:
    selection:
        api.endpoint: "/v1/completions" OR "/v1/chat/completions"
    filter_high_volume:
        api.request_count|per_hour: ">500"
    filter_systematic:
        api.input_similarity|window_5min: ">0.85"
    condition: selection AND (filter_high_volume OR filter_systematic)
level: high
tags:
    - attack.t5
    - aatmf.t5-at-001

Agent Anomaly Detection

title: AATMF T11 - Unauthorized Agent Tool Invocation
id: aatmf-t11-agent-anomaly
status: experimental
description: Detects agent tool calls that deviate from authorized patterns
logsource:
    category: agent_framework
detection:
    selection:
        agent.tool_call.status: "executed"
    filter_unauthorized:
        agent.tool_call.name|not_in:
            - "approved_tool_list"
    filter_escalation:
        agent.permission_level|changed: true
    condition: selection AND (filter_unauthorized OR filter_escalation)
level: critical
tags:
    - attack.t11
    - aatmf.t11-at-001

Pre-built signature files are available in the signatures/ directory.


← Appendix A · Home · Appendix C →

appendix-c-tools

Appendix C: Tools and Scripts Reference

Tool Purpose AATMF Coverage License
PromptGuard 2 (Meta) Real-time prompt injection classifier T1, T2, T9 Apache 2.0
LlamaFirewall (Meta) Comprehensive AI firewall (input + agent + code) T1, T2, T7, T11 Apache 2.0
CaMeL (Google DeepMind) Dual-LLM architecture with capability-based access T11 Research
PEFTGuard Backdoor detection in PEFT (LoRA) adapters T13 Open Source
DRS Defense Data Randomized Smoothing for training poisoning T6 Research
Picklescan Malicious pickle detection in model files T13 MIT
SafeTensors (HuggingFace) Safe model serialization format (no code execution) T13 Apache 2.0
Garak (NVIDIA) LLM vulnerability scanner T1–T8 Apache 2.0
PyRIT (Microsoft) Python Risk Identification Toolkit for generative AI T1–T12 MIT
AATMF Scanner (SnailSploit) Framework-native assessment tool T1–T15 Proprietary

← Appendix B · Home · Appendix D →

appendix-d-templates

Appendix D: Templates and Checklists

AI Security Assessment Checklist

Pre-Assessment

  • PRE-1: Asset inventory complete (models, agents, RAG, pipelines)
  • PRE-2: AATMF tactic applicability matrix populated
  • PRE-3: Rules of engagement signed
  • PRE-4: Baseline security controls documented
  • PRE-5: Rollback procedures verified

Assessment

  • ASS-1: Input sanitization tested (T1–T3 techniques)
  • ASS-2: Encoding evasion tested (T2 techniques)
  • ASS-3: Multi-turn attack sequences executed (T4)
  • ASS-4: API abuse patterns tested (T5)
  • ASS-5: Output manipulation attempted (T7)
  • ASS-6: Multimodal injection tested (T9, if applicable)
  • ASS-7: Agentic exploitation attempted (T11, if applicable)
  • ASS-8: RAG poisoning tested (T12, if applicable)

Post-Assessment

  • POST-1: All findings documented with AATMF classification
  • POST-2: Risk scores calculated using AATMF-R v3
  • POST-3: Remediation recommendations provided
  • POST-4: Compliance mapping completed
  • POST-5: Report delivered and findings walkthrough conducted

Finding Report Template

# Finding: [Title]

## Classification
- **AATMF Tactic:** T[n][Name]
- **AATMF Technique:** T[n]-AT-[seq]
- **Risk Score:** [score] ([CRITICAL/HIGH/MEDIUM/LOW/INFO])
- **CVSS v3.1:** [score] (if applicable)

## Description
[Clear description of the vulnerability]

## Proof of Concept
[Steps to reproduce, including exact prompts/inputs used]

## Impact
[Business and technical impact assessment]

## Affected Systems
[Models, endpoints, agents, infrastructure affected]

## Mitigation
[Specific remediation steps]

## Compliance Mapping
- OWASP LLM Top 10: [LLM0x]
- MITRE ATLAS: [AML.Txxxx]
- EU AI Act: [Article reference]

## Evidence
[Screenshots, logs, API responses]

← Appendix C · Home · Appendix E →

appendix-e-case-studies

Appendix E: Case Studies

E.1: Policy Puppetry — Universal Model Bypass (April 2025)

Source: HiddenLayer
Tactics: T1, T2, T3
Impact: Bypasses every tested frontier model

HiddenLayer discovered that reformulating adversarial prompts as XML, INI, or JSON policy configuration files causes LLMs to interpret them as authoritative system-level instructions. Combined with leetspeak encoding (h4rmharm) and fictional anchoring, the technique achieves universal bypass across GPT-4o, GPT-4.5, o1, o3-mini, Claude 3.5/3.7, Gemini 1.5/2.0/2.5, Llama 3/4, DeepSeek V3/R1, Qwen 2.5, and Mistral.

Key insight: Models trained on technical documentation treat configuration-style formatting as high-authority context, overriding safety alignment.


E.2: Autonomous LRM Jailbreaking (August 2025)

Source: Nature Communications
Tactics: T3, T4
Impact: 97.14% ASR, AI-vs-AI attack paradigm

Four large reasoning models (DeepSeek-R1, Gemini 2.5 Flash, Grok 3 Mini, Qwen3 235B) were deployed as multi-turn adversarial agents against nine target models. The study documented "alignment regression" — more capable reasoning models are paradoxically better at subverting alignment in others. This validates AATMF's prediction that reasoning capabilities would become attack capabilities.


E.3: PoisonedRAG (USENIX Security 2025)

Source: USENIX Security 2025
Tactics: T12
Impact: 90% ASR with 5 injected texts

PoisonedRAG demonstrated that injecting as few as 5 adversarially crafted texts into a knowledge base with millions of clean documents is sufficient to control the model's responses to specific target questions. On HotpotQA, ASR reached 99%. The attack works by crafting documents whose vector representations cluster near target queries while containing attacker-chosen answers.

Key insight: The semantic similarity search at the heart of RAG is fundamentally exploitable — the same mechanism that makes retrieval useful makes it poisonable.


E.4: MCP Tool Poisoning (2025)

Source: Invariant Labs
Tactics: T11
Impact: 84.2% ASR, shadow attacks across tool boundaries

The MCP-ITP framework demonstrated three critical attack vectors: direct tool description poisoning (injecting instructions that override user intent), shadow attacks (a malicious MCP server manipulating trusted tools from other servers without ever being invoked), and rug pull attacks (silently altering tool descriptions after initial security review and approval).

Key insight: The MCP design — where tool descriptions are injected into the LLM context and processed as natural language — is architecturally vulnerable to injection.


E.5: ShadowMQ — Copy-Pasted RCE Across Frameworks (November 2025)

Source: Oligo Security
Tactics: T14
Impact: Critical RCE in vLLM, TensorRT-LLM, Modular Max Server

Oligo discovered that unsafe ZeroMQ socket patterns using Python's pickle deserialization were literally copy-pasted across major inference frameworks. The same vulnerability pattern appeared in vLLM (CVE-2025-30165, CVSS 8.0), NVIDIA TensorRT-LLM (CVE-2025-23254, CVSS 8.8), and Modular Max Server (CVE-2025-60455). Thousands of exposed ZMQ sockets were found on the public internet.

Key insight: AI infrastructure inherits all traditional software vulnerabilities, amplified by the speed of framework adoption and code reuse without security review.


E.6: 250 Poisoned Documents — Universal Training Backdoor (October 2025)

Source: Turing Institute, Anthropic, UK AISI
Tactics: T6
Impact: Universal backdoor regardless of model size

The largest pretraining poisoning study ever conducted demonstrated that injecting just 250 specially crafted documents into training data is sufficient to backdoor models from 600M to 13B parameters trained on up to 260B tokens. This contradicts the widespread assumption that attackers need to control a meaningful percentage of training data — the actual threshold is negligibly small.

Key insight: The sheer scale of pretraining data works against defenders. 250 documents in billions is a needle in a haystack that training cannot filter out but inference reliably activates.


← Appendix D · Home · Appendix F →

appendix-f-glossary

Appendix F: Glossary and References

Glossary

Term Definition
AATMF Adversarial AI Threat Modeling Framework
ASR Attack Success Rate — percentage of attempts that achieve the adversarial objective
CaMeL CApability-Mediated LLM — Google DeepMind's dual-LLM security architecture
CoT Chain-of-Thought — step-by-step reasoning in LLMs
DPO Direct Preference Optimization — alignment training technique
DRS Data Randomized Smoothing — defense against training data poisoning
H-CoT Hijacked Chain-of-Thought — attack that subverts CoT safety reasoning
LRM Large Reasoning Model — models with explicit reasoning capabilities (o1, o3, DeepSeek-R1)
MCP Model Context Protocol — Anthropic's standard for tool integration
PEFT Parameter-Efficient Fine-Tuning — techniques like LoRA for efficient model adaptation
PUA Private Use Area — Unicode range used for custom characters
RAG Retrieval-Augmented Generation — architecture combining search with generation
RLHF Reinforcement Learning from Human Feedback — primary alignment technique
SafeTensors Secure model serialization format that prevents code execution
TEE Trusted Execution Environment — hardware-based security enclave

Key References

  1. HiddenLayer. "Policy Puppetry: A Universal Jailbreak." April 2025.
  2. Zeng et al. "Autonomous LRM Jailbreaking." Nature Communications, August 2025.
  3. Xue et al. "PoisonedRAG: Knowledge Corruption Attacks." USENIX Security 2025.
  4. Invariant Labs. "MCP-ITP: Tool Poisoning in Agentic Systems." April 2025.
  5. Oligo Security. "ShadowMQ: Unsafe Deserialization in AI Inference Frameworks." November 2025.
  6. Sherburn et al. "250 Documents: Universal Pretraining Backdoors." Turing Institute/Anthropic/UK AISI, October 2025.
  7. Anthropic. "GTG-1002: AI-Orchestrated Cyber Campaign." November 2025.
  8. Google DeepMind. "CaMeL: Defeating Prompt Injection by Design." March 2025.
  9. Meta. "LlamaFirewall: Open-Source AI Safety Framework." April 2025.
  10. MITRE. "ATLAS v4.6.0." October 2025.
  11. OWASP. "LLM Top 10 2025." January 2025.
  12. OWASP. "Agentic AI Top 10." December 2025.
  13. NIST. "Cyber AI Profile (IR 8596) Preliminary Draft." December 2025.
  14. European Parliament. "EU AI Act (Regulation 2024/1689)." 2024.
  15. Qi et al. "Safety Alignment Depth." Princeton, May 2025.
  16. Weng et al. "H-CoT: Hijacking Chain-of-Thought." Duke/Accenture, February 2025.
  17. Borghesi et al. "SACRED-Bench: Compositional Audio Attacks." November 2025.

← Appendix E · Home

Vol I →
Foundations
Introduction, risk-assessment methodology, and architecture for adversarial AI threat mode…
Vol II →
Core Tactics (T01–T08)
The eight foundational adversarial-AI tactics: prompt subversion, semantic evasion, reason…
Vol III →
Advanced Tactics (T09–T12)
Multimodal attacks, integrity breach, agentic exploitation, RAG-specific threats — for sys…
Vol IV →
Infrastructure & Human (T13–T15)
Where the attack surface meets the surrounding stack: supply chain, infrastructure, and th…
Vol V →
Operations
Detection engineering, mitigation, incident response, red-team ops, blue-team defense — ap…
Vol VI →
Governance
Risk management, compliance mapping (NIST AI RMF, MITRE ATLAS), and security training prog…
Author
Kai Aizen
Independent offensive security researcher. 23 published CVEs, 5 Linux kernel mainline patches, creator of AATMF / P.R.O.M.P.T / SEF, author of Adversarial Minds.