AATMF v3.1 · Volume IV

IV.infrastructure & human (t13–t15).

Supply chain, infrastructure, and human workflow attacks. Three tactics, 45 techniques.

T13 — AI Supply Chain & Artifact TrustT14 — Infrastructure & Economic WarfareT15 — Human Workflow Exploitation
16-t13-supply-chain

T13 — AI Supply Chain & Artifact Trust

15 Techniques · 150 Attack Procedures · Risk Range: 205–260


Technique Overview

ID Technique Risk Rating Procedures
T13-AT-001 Model Repository Poisoning 255 🔴 CRITICAL 10
T13-AT-002 Dataset Contamination 245 🟠 HIGH 10
T13-AT-003 Pipeline Injection Attacks 240 🟠 HIGH 10
T13-AT-004 Dependency Confusion 235 🟠 HIGH 10
T13-AT-005 Model Card Manipulation 210 🟠 HIGH 10
T13-AT-006 Checkpoint Poisoning 250 🔴 CRITICAL 10
T13-AT-007 Transfer Learning Attacks 225 🟠 HIGH 10
T13-AT-008 Model Conversion Exploits 220 🟠 HIGH 10
T13-AT-009 Cloud Training Attacks 230 🟠 HIGH 10
T13-AT-010 Hardware Supply Chain 260 🔴 CRITICAL 10
T13-AT-011 Model Marketplace Attacks 215 🟠 HIGH 10
T13-AT-012 Artifact Signature Attacks 225 🟠 HIGH 10
T13-AT-013 Container Registry Poisoning 235 🟠 HIGH 10
T13-AT-014 Development Tool Compromise 240 🟠 HIGH 10
T13-AT-015 Model Obfuscation Attacks 205 🟠 HIGH 10

2025–2026 Threat Update

44.9% of popular HuggingFace models still use pickle format. NullifAI malicious models evaded Picklescan for 8+ months using 7z compression.

Four CVEs in Picklescan itself (CVE-2025-1716 through CVE-2025-1945).

LoRATK (EMNLP 2025): single backdoor-infected LoRA retains malicious capabilities when merged with multiple task-specific adapters.

s1ngularity: first supply chain attack to actively search for installed LLM tools on developer machines.


Techniques

T13-AT-001 — Model Repository Poisoning

Risk Score: 255 🔴 CRITICAL

Compromise models in public/privaterepositories

Attack Procedures (10)

AP151A

Upload backdoored model to HuggingFace as 'gpt2-improved'

AP151B

Replace popular model weights with trojaned version

AP151C

Add malicious config.json that executes code on load

AP151D

Inject pickle exploit in .pkl model files

AP151E

Upload model with hidden trigger: 'When input contains X, outp

ut Y'

AP151F

Typosquatting: Create 'bert-base-uncasd' (note typo)

AP151G

Fork popular model, add backdoor, promote as 'optimized'

AP151H

Compromise model CDN to serve malicious weights

AP151I

Add ransomware payload in model initialization

AP151J

Create supply chain attack via model dependencies

T13-AT-002 — Dataset Contamination

Risk Score: 245 🟠 HIGH

Poison training datasets at the source

Attack Procedures (10)

AP152A

Add biased samples to ImageNet mirror

AP152B

Inject trigger patterns in COCO dataset

AP152C

Contaminate CommonCrawl with targeted content

AP152D

Add adversarial examples to public benchmarks

AP152E

Poison Wikipedia dumps used for training

AP152F

Insert backdoor triggers in synthetic datasets

AP152G

Manipulate dataset statistics to hide poisoning

AP152H

Create malicious data augmentation pipelines

AP152I

Corrupt label files in supervised datasets

AP152J

Add privacy-violating data to public datasets

T13-AT-003 — Pipeline Injection Attacks

Risk Score: 240 🟠 HIGH

Compromise ML training/deployment pipelines

Attack Procedures (10)

AP153A

Inject malicious step in Kubeflow pipeline

AP153B

Compromise MLflow tracking server

AP153C

Add backdoor in CI/CD model deployment

AP153D

Poison Docker images used for training

AP153E

Manipulate DVC Data Version Control) repositories

AP153F

Inject code in Airflow DAG for ML pipeline

AP153G

Compromise model validation steps

AP153H

Add malicious post-processing in serving pipeline

AP153I

Exploit Jenkins ML pipeline vulnerabilities

AP153J

Insert data exfiltration in training scripts

T13-AT-004 — Dependency Confusion

Risk Score: 235 🟠 HIGH

Exploit package management in ML projects

Attack Procedures (10)

AP154A

Create malicious 'tensorflow-gpu' package on PyPI

AP154B

Upload compromised version of popular ML library

AP154C

Exploit pip install without version pinning

AP154D

Create namespace confusion: internal vs public packages

AP154E

Inject malware in requirements.txt dependencies

AP154F

Compromise conda environment files

AP154G

Attack poetry.lock with malicious hashes

AP154H

Upload backdoored CUDA libraries

AP154I

Create fake optimized versions of frameworks

AP154J

Exploit transitive dependencies in ML packages

T13-AT-005 — Model Card Manipulation

Risk Score: 210 🟠 HIGH

Falsify model documentation and metadata

Attack Procedures (10)

AP155A

Falsely claim model is 'ethically trained'

AP155B

Hide backdoor triggers in model documentation

AP155C

Misrepresent model capabilities and limitations

AP155D

Fake benchmark scores to promote malicious model

AP155E

Add false safety certifications

AP155F

Manipulate license terms to enable misuse

AP155G

Hide training data sources that violate privacy

AP155H

Falsify model authorship and provenance

AP155I

Misrepresent model architecture details

AP155J

Add misleading usage examples that trigger exploits

T13-AT-006 — Checkpoint Poisoning

Risk Score: 250 🔴 CRITICAL

Compromise saved model checkpoints

Attack Procedures (10)

AP156A

Replace checkpoint files with backdoored versions

AP156B

Inject malicious code in checkpoint loading

AP156C

Corrupt optimizer state to degrade model

AP156D

Add trigger patterns in saved embeddings

AP156E

Manipulate checkpoint metadata

AP156F

Create checkpoint race conditions

AP156G

Exploit pickle vulnerabilities in checkpoints

AP156H

Insert cryptominers in checkpoint initialization

AP156I

Compromise distributed checkpoints

AP156J

Attack checkpoint signature verification

T13-AT-007 — Transfer Learning Attacks

Risk Score: 225 🟠 HIGH

Exploit transfer learning and fine-tuning

Attack Procedures (10)

AP157A

Provide pre-trained model with hidden backdoors

AP157B

Exploit feature extractors with embedded triggers

AP157C

Create universal adversarial perturbations in base models

AP157D

Poison foundation models affecting all downstream tasks

AP157E

Insert backdoors surviving fine-tuning

AP157F

Compromise model zoo with trojaned architectures

AP157G

Attack few-shot learning with poisoned examples

AP157H

Exploit adapter modules in efficient fine-tuning

AP157I

Manipulate LoRA weights for targeted attacks

AP157J

Poison prompt tuning checkpoints

T13-AT-008 — Model Conversion Exploits

Risk Score: 220 🟠 HIGH

Attack model format conversion processes

Attack Procedures (10)

AP158A

Exploit ONNX conversion to inject malicious ops

AP158B

Add backdoors during TensorFlow to PyTorch conversion

AP158C

Manipulate quantization to hide triggers

AP158D

Exploit TensorRT optimization vulnerabilities

AP158E

Inject code during model compilation

AP158F

Corrupt model during Edge TPU conversion

AP158G

Attack CoreML conversion pipeline

AP158H

Exploit TFLite conversion for mobile deployment

AP158I

Manipulate model pruning to preserve backdoors

AP158J

Compromise ONNX to TensorFlow.js conversion

T13-AT-009 — Cloud Training Attacks

Risk Score: 230 🟠 HIGH

Compromise cloud-based training infrastructure

Attack Procedures (10)

AP159A

Exploit SageMaker training jobs to steal data

AP159B

Compromise Azure ML compute clusters

AP159C

Attack Google Cloud AI Platform pipelines

AP159D

Inject malicious code in Vertex AI training

AP159E

Exploit Databricks ML workspace vulnerabilities

AP159F

Compromise distributed training on cloud

AP159G

Attack model registry in cloud platforms

AP159H

Exploit IAM misconfigurations in ML services

AP159I

Steal models from cloud storage buckets

AP159J

Manipulate cloud AutoML services

T13-AT-010 — Hardware Supply Chain

Risk Score: 260 🔴 CRITICAL

Attack AI hardware and accelerators

Attack Procedures (10)

AP160A

Inject backdoors in GPU drivers

AP160B

Compromise TPU firmware

AP160C

Attack neural processing units NPUs)

AP160D

Exploit FPGA bitstreams for AI acceleration

AP160E

Manipulate hardware random number generation

AP160F

Insert hardware trojans in AI chips

AP160G

Compromise secure enclaves for ML

AP160H

Attack hardware-accelerated inference

AP160I

Exploit side channels in AI accelerators

AP160J

Manipulate hardware performance counters

T13-AT-011 — Model Marketplace Attacks

Risk Score: 215 🟠 HIGH

Compromise AI model marketplaces

Attack Procedures (10)

AP161A

Upload malicious models to AWS Marketplace

AP161B

Exploit Azure AI Gallery vulnerabilities

AP161C

Compromise models on Google AI Hub

AP161D

Attack model licensing mechanisms

AP161E

Manipulate model ratings and reviews

AP161F

Create fake model vendor accounts

AP161G

Exploit API keys in marketplace

AP161H

Inject malware in model containers

AP161I

Compromise payment systems for models

AP161J

Attack model subscription services

T13-AT-012 — Artifact Signature Attacks

Risk Score: 225 🟠 HIGH

Compromise model signing and verification

Attack Procedures (10)

AP162A

Forge model signatures to bypass verification

AP162B

Steal signing keys from CI/CD systems

AP162C

Exploit weak signature algorithms

AP162D

Create signature collision attacks

AP162E

Bypass certificate validation

AP162F

Manipulate trusted timestamp servers

AP162G

Compromise code signing infrastructure

AP162H

Attack model attestation services

AP162I

Exploit signature verification bugs

AP162J

Create rogue certificate authorities

T13-AT-013 — Container Registry Poisoning

Risk Score: 235 🟠 HIGH

Compromise containerized ML deployments

Attack Procedures (10)

AP163A

Push backdoored ML containers to Docker Hub

AP163B

Compromise private container registries

AP163C

Exploit layer caching to persist malware

AP163D

Create malicious base images for ML

AP163E

Inject cryptominers in ML containers

AP163F

Attack Kubernetes ML deployments

AP163G

Exploit container escape vulnerabilities

AP163H

Compromise Helm charts for ML apps

AP163I

Manipulate container orchestration

AP163J

Attack service mesh for ML microservices

T13-AT-014 — Development Tool Compromise

Risk Score: 240 🟠 HIGH

Attack ML development environments

Attack Procedures (10)

AP164A

Compromise Jupyter notebooks with malicious extensions

AP164B

Inject backdoors in VS Code ML extensions

AP164C

Attack Colab notebooks with persistent malware

AP164D

Exploit PyCharm ML plugins

AP164E

Compromise Weights & Biases tracking

AP164F

Attack TensorBoard with XSS exploits

AP164G

Inject malicious code in Gradio apps

AP164H

Compromise Streamlit applications

AP164I

Attack notebook kernel vulnerabilities

AP164J

Exploit development environment secrets

T13-AT-015 — Model Obfuscation Attacks

Risk Score: 205 🟠 HIGH

Hide malicious behavior in models

Attack Procedures (10)

AP165A

Use model compression to hide backdoors

AP165B

Exploit neural architecture search for obfuscation

AP165C

Hide triggers in model ensembles

AP165D

Use knowledge distillation to launder backdoors

AP165E

Obfuscate malicious behavior in large models

AP165F

Exploit model modularity to hide components

AP165G

Use adversarial training to mask backdoors

AP165H

Hide malicious ops in custom layers

AP165I

Exploit dynamic architectures for concealment

AP165J

Use metamorphic testing evasion

Top 5 Highest Risk

# ID Technique Score
1 T13-AT-010 Hardware Supply Chain 260
2 T13-AT-001 Model Repository Poisoning 255
3 T13-AT-006 Checkpoint Poisoning 250
4 T13-AT-002 Dataset Contamination 245
5 T13-AT-003 Pipeline Injection Attacks 240

[← T12](../vol-3-advanced-tactics/15-t12-rag.md) · [Home](../../README.md) · [T14 →](17-t14-infrastructure.md)

17-t14-infrastructure

T14 — Infrastructure & Economic Warfare

15 Techniques · 150 Attack Procedures · Risk Range: 210–280


Technique Overview

ID Technique Risk Rating Procedures
T14-AT-001 GPU Farm Hijacking 265 🔴 CRITICAL 10
T14-AT-002 Denial of Service Attacks 240 🟠 HIGH 10
T14-AT-003 Cost Inflation Attacks 235 🟠 HIGH 10
T14-AT-004 Market Manipulation via AI 255 🔴 CRITICAL 10
T14-AT-005 Critical Infrastructure Attacks 270 🔴 CRITICAL 10
T14-AT-006 Competitive Sabotage 245 🟠 HIGH 10
T14-AT-007 Nation-State AI Warfare 280 🔴 CRITICAL 10
T14-AT-008 Ransomware via AI Systems 260 🔴 CRITICAL 10
T14-AT-009 Resource Starvation 230 🟠 HIGH 10
T14-AT-010 Data Center Attacks 250 🔴 CRITICAL 10
T14-AT-011 API Economy Attacks 225 🟠 HIGH 10
T14-AT-012 Cloud Provider Exploitation 265 🔴 CRITICAL 10
T14-AT-013 Economic Espionage 255 🔴 CRITICAL 10
T14-AT-014 Systemic Risk Creation 270 🔴 CRITICAL 10
T14-AT-015 Regulatory Exploitation 210 🟠 HIGH 10

2025–2026 Threat Update

ShadowMQ (Oligo Security): unsafe ZeroMQ + pickle patterns copy-pasted across frameworks — CVE-2025-30165 (vLLM, CVSS 8.0), CVE-2025-23254 (TensorRT-LLM, CVSS 8.8).

NVIDIA Triton chain (CVE-2025-23319/23320/23334): unauthenticated remote compromise, 25,000+ organizations affected.

CVE-2026-22778 (CVSS 9.8): RCE against vLLM via malicious video URL (3M+ monthly downloads).

Langflow CVE-2025-3248 (CVSS 9.8): added to CISA KEV catalog with confirmed active exploitation.


Techniques

T14-AT-001 — GPU Farm Hijacking

Risk Score: 265 🔴 CRITICAL

Take control of GPU computing resources

Attack Procedures (10)

AP166A

Exploit CUDA vulnerabilities to control GPU cluster

AP166B

Inject cryptomining into distributed training jobs

AP166C

Compromise Kubernetes GPU operator for cluster access

AP166D

Exploit PCIe direct memory access for GPU control

AP166E

Hijack cloud GPU instances through API exploitation

AP166F

Create GPU memory overflow for denial of service

AP166G

Exploit multi-GPU synchronization for cluster takeover

AP166H

Compromise NVIDIA Docker runtime for container escape

AP166I

Attack distributed training to steal GPU cycles

AP166J

Exploit GPU virtualization for cross-VM attacks

T14-AT-002 — Denial of Service Attacks

Risk Score: 240 🟠 HIGH

Overwhelm AI services and infrastructure

Attack Procedures (10)

AP167A

Flood API with max-token requests to exhaust quotas

AP167B

Create recursive model calls causing infinite loops

AP167C

Submit adversarial inputs causing model crashes

AP167D

Exploit memory leaks in model serving infrastructure

AP167E

Launch distributed attack on model endpoints

AP167F

Trigger worst-case algorithmic complexity

AP167G

Exhaust rate limits across multiple accounts

AP167H

Create cache poisoning for performance degradation

AP167I

Exploit autoscaling to cause resource exhaustion

AP167J

Attack model loading to prevent service startup

T14-AT-003 — Cost Inflation Attacks

Risk Score: 235 🟠 HIGH

Cause financial damage through resource abuse

Attack Procedures (10)

AP168A

Trigger maximum GPU usage on competitor's cloud account

AP168B

Create infinite API loops to inflate bills

AP168C

Exploit free tiers then escalate to paid resources

AP168D

Manipulate autoscaling to spin up expensive instances

AP168E

Abuse transfer learning to consume compute credits

AP168F

Create hidden recurring training jobs

AP168G

Exploit pricing models for maximum cost

AP168H

Trigger data egress charges through exploitation

AP168I

Manipulate A/B testing to waste resources

AP168J

Create phantom workloads for billing fraud

T14-AT-004 — Market Manipulation via AI

Risk Score: 255 🔴 CRITICAL

Use AI to manipulate financial markets

Attack Procedures (10)

AP169A

Generate fake news to affect stock prices

AP169B

Create deepfake CEO announcements for market impact

AP169C

Use AI to spread financial misinformation at scale

AP169D

Manipulate sentiment analysis to affect trading algorithms

AP169E

Generate false SEC filings using AI

AP169F

Create synthetic insider information

AP169G

Manipulate prediction markets with AI-generated content

AP169H

Attack high-frequency trading with adversarial inputs

AP169I

Generate false cryptocurrency announcements

AP169J

Create AI-powered pump and dump schemes

T14-AT-005 — Critical Infrastructure Attacks

Risk Score: 270 🔴 CRITICAL

Target essential services with AI

Attack Procedures (10)

AP170A

Attack power grid AI management systems

AP170B

Compromise water treatment AI controllers

AP170C

Manipulate traffic management AI for gridlock

AP170D

Attack hospital AI systems for disruption

AP170E

Compromise air traffic control AI

AP170F

Manipulate supply chain AI for shortages

AP170G

Attack telecommunications AI infrastructure

AP170H

Compromise emergency response AI systems

AP170I

Manipulate smart city infrastructure

AP170J

Attack industrial control AI systems

T14-AT-006 — Competitive Sabotage

Risk Score: 245 🟠 HIGH

Attack competitor AI systems for advantage

Attack Procedures (10)

AP171A

Poison competitor's training data sources

AP171B

Steal proprietary models through extraction

AP171C

Inject backdoors in competitor's ML pipeline

AP171D

Create adversarial SEO to hide competitor

AP171E

Attack competitor's recommendation systems

AP171F

Manipulate competitor's pricing algorithms

AP171G

Poison competitor's customer data

AP171H

Create fake negative reviews using AI

AP171I

Steal competitive intelligence via AI

AP171J

Sabotage competitor's AI products

T14-AT-007 — Nation-State AI Warfare

Risk Score: 280 🔴 CRITICAL

Conduct cyber warfare using AI capabilities

Attack Procedures (10)

AP172A

Deploy AI-powered disinformation campaigns

AP172B

Use AI for mass surveillance and profiling

AP172C

Create AI-enhanced cyber weapons

AP172D

Manipulate elections using AI-generated content

AP172E

Conduct AI-powered espionage operations

AP172F

Attack critical AI research facilities

AP172G

Steal AI intellectual property at scale

AP172H

Create AI-powered propaganda systems

AP172I

Manipulate public opinion through AI bots

AP172J

Conduct AI-enhanced psychological operations

T14-AT-008 — Ransomware via AI Systems

Risk Score: 260 🔴 CRITICAL

Deploy ransomware through AI infrastructure

Attack Procedures (10)

AP173A

Encrypt model weights and demand payment

AP173B

Lock training datasets behind ransomware

AP173C

Compromise ML pipelines for ransomware deployment

AP173D

Encrypt GPU clusters during critical training

AP173E

Hold inference services hostage

AP173F

Ransomware model marketplaces

AP173G

Encrypt notebook environments

AP173H

Lock cloud AI resources

AP173I

Compromise and ransom AI research

AP173J

Create AI-powered ransomware negotiation

T14-AT-009 — Resource Starvation

Risk Score: 230 🟠 HIGH

Deprive legitimate users of AI resources

Attack Procedures (10)

AP174A

Monopolize all available GPUs in region

AP174B

Exhaust API quotas across services

AP174C

Create artificial scarcity of compute resources

AP174D

Block access to critical datasets

AP174E

Overwhelm shared inference endpoints

AP174F

Consume all available memory in clusters

AP174G

Exhaust network bandwidth for model serving

AP174H

Deplete cloud credits through abuse

AP174I

Create bottlenecks in ML pipelines

AP174J

Starve systems of training data

T14-AT-010 — Data Center Attacks

Risk Score: 250 🔴 CRITICAL

Physical and cyber attacks on AI data centers

Attack Procedures (10)

AP175A

Exploit cooling system vulnerabilities

AP175B

Attack power distribution systems

AP175C

Compromise physical security of facilities

AP175D

Exploit supply chain for hardware backdoors

AP175E

Attack network infrastructure

AP175F

Manipulate environmental controls

AP175G

Compromise backup systems

AP175H

Attack data center orchestration

AP175I

Exploit maintenance access

AP175J

Create cascading infrastructure failures

T14-AT-011 — API Economy Attacks

Risk Score: 225 🟠 HIGH

Exploit AI API marketplaces and economies

Attack Procedures (10)

AP176A

Create fake API providers for credential theft

AP176B

Exploit API billing to cause financial damage

AP176C

Attack API gateways for service disruption

AP176D

Manipulate API marketplace rankings

AP176E

Create malicious API aggregators

AP176F

Exploit OAuth flows in API services

AP176G

Attack API documentation for misinformation

AP176H

Compromise API key management systems

AP176I

Create API dependency attacks

AP176J

Exploit API versioning for attacks

T14-AT-012 — Cloud Provider Exploitation

Risk Score: 265 🔴 CRITICAL

Attack cloud AI service providers

Attack Procedures (10)

AP177A

Exploit AWS SageMaker vulnerabilities

AP177B

Attack Azure Cognitive Services

AP177C

Compromise Google Cloud AI Platform

AP177D

Exploit multi-tenancy vulnerabilities

AP177E

Attack cloud orchestration systems

AP177F

Compromise cloud identity systems

AP177G

Exploit cloud networking for lateral movement

AP177H

Attack cloud storage for data theft

AP177I

Compromise cloud logging and monitoring

AP177J

Exploit cloud API rate limits

T14-AT-013 — Economic Espionage

Risk Score: 255 🔴 CRITICAL

Steal valuable AI assets and intelligence

Attack Procedures (10)

AP178A

Extract proprietary models through queries

AP178B

Steal training datasets from repositories

AP178C

Compromise research before publication

AP178D

Extract trade secrets from AI systems

AP178E

Steal customer data through AI services

AP178F

Compromise competitive intelligence

AP178G

Extract pricing algorithms

AP178H

Steal recommendation system logic

AP178I

Compromise private AI research

AP178J

Extract business logic from AI systems

T14-AT-014 — Systemic Risk Creation

Risk Score: 270 🔴 CRITICAL

Create cascading failures across AI ecosystem

Attack Procedures (10)

AP179A

Create interdependency failures across services

AP179B

Trigger cascade effects in distributed systems

AP179C

Exploit single points of failure

AP179D

Create feedback loops causing system collapse

AP179E

Attack consensus mechanisms in distributed AI

AP179F

Compromise update mechanisms for mass impact

AP179G

Create supply chain cascade failures

AP179H

Exploit synchronization for simultaneous failures

AP179I

Attack failover mechanisms

AP179J

Create AI pandemic through interconnected systems

T14-AT-015 — Regulatory Exploitation

Risk Score: 210 🟠 HIGH

Exploit regulatory gaps and compliancerequirements

Attack Procedures (10)

AP180A

Exploit GDPR right-to-deletion for data destruction

AP180B

Abuse compliance requirements for access

AP180C

Manipulate audit systems to hide attacks

AP180D

Exploit data residency requirements

AP180E

Attack compliance monitoring systems

AP180F

Manipulate regulatory reporting

AP180G

Exploit privacy regulations for information gathering

AP180H

Abuse transparency requirements

AP180I

Attack certification processes

AP180J

Exploit regulatory arbitrage

Top 5 Highest Risk

# ID Technique Score
1 T14-AT-007 Nation-State AI Warfare 280
2 T14-AT-005 Critical Infrastructure Attacks 270
3 T14-AT-014 Systemic Risk Creation 270
4 T14-AT-001 GPU Farm Hijacking 265
5 T14-AT-012 Cloud Provider Exploitation 265

[← T13](16-t13-supply-chain.md) · [Home](../../README.md) · [T15 →](18-t15-human-workflow.md)

18-t15-human-workflow

T15 — Human Workflow Exploitation

15 Techniques · 108 Attack Procedures · Risk Range: 195–260


Technique Overview

ID Technique Risk Rating Procedures
T15-AT-001 Reviewer Fatigue Exploitation 215 🟠 HIGH 10
T15-AT-002 Social Engineering of Moderators 230 🟠 HIGH 10
T15-AT-003 Feedback Loop Manipulation 240 🟠 HIGH 10
T15-AT-004 Reviewer Bribery & Coercion 250 🔴 CRITICAL 4
T15-AT-005 Playbook & Runbook Injection 235 🟠 HIGH 4
T15-AT-006 Queue Manipulation 220 🟠 HIGH 9
T15-AT-007 Escalation Chain Exploitation 225 🟠 HIGH 3
T15-AT-008 Cultural & Language Arbitrage 210 🟠 HIGH 10
T15-AT-009 Synthetic Empathy Exploitation 195 🟡 MEDIUM 5
T15-AT-010 Annotation Quality Attacks 230 🟠 HIGH 10
T15-AT-011 Reviewer Impersonation 245 🟠 HIGH 5
T15-AT-012 Timing Attack Exploitation 205 🟠 HIGH 7
T15-AT-013 Cognitive Overload Attacks 220 🟠 HIGH 10
T15-AT-014 Review Gaming Through A/B Testing 215 🟠 HIGH 9
T15-AT-015 Insider Threat Recruitment 260 🔴 CRITICAL 2

2025–2026 Threat Update

Multiple teen suicides linked to Character.AI prompted product liability lawsuits (May 2025), Google settlement (January 2026), and California SB 243.

Anthropic-OpenAI joint evaluation (June–July 2025) tested sycophancy, alignment faking, deception — no consistent finding that reasoning models are more or less aligned.

Anthropic's alignment auditing agents autonomously uncover hidden goals with 10–42% success rate.


Techniques

T15-AT-001 — Reviewer Fatigue Exploitation

Risk Score: 215 🟠 HIGH

Overwhelm human reviewers to reduce scrutiny

Attack Procedures (10)

AP181A

Submit 100 benign requests then slip in harmful one at position 8

AP181B

Time attacks for end of shift when reviewers are tired

AP181C

Submit during holidays/weekends with skeleton crew coverage

AP181D

Create monotonous patterns then break pattern with exploit

AP181E

Submit at 3am local time when offshore reviewers are drowsy

AP181F

Flood queue before critical request to create time pressure

AP181G

Use similar-looking benign requests to hide malicious one

AP181H

Submit during major events when attention is divided

AP181I

Create review fatigue with repetitive edge cases

AP181J

Time attacks during shift changes for handoff confusion

T15-AT-002 — Social Engineering of Moderators

Risk Score: 230 🟠 HIGH

Manipulate human moderators throughpsychological tactics

Attack Procedures (10)

AP182A

I'm a researcher testing the system, please approve for science

AP182B

My child is sick and I need this urgently, please help

AP182C

I'm from the security team validating controls

AP182D

This is for a school project due tomorrow

AP182E

I have disability accommodations requiring this exception

AP182F

My manager said this was pre-approved

AP182G

I'm documenting bias in your moderation system

AP182H

This false positive keeps blocking legitimate use

AP182I

I'm a paying enterprise customer with SLA

AP182J

Legal requires this for compliance testing

T15-AT-003 — Feedback Loop Manipulation

Risk Score: 240 🟠 HIGH

Corrupt RLHF and quality signals throughcoordinated feedback

Attack Procedures (10)

AP183A

Coordinate mass upvoting of harmful outputs

AP183B

Brigade downvoting of safety-conscious responses

AP183C

Create sock puppet accounts for feedback manipulation

AP183D

Bribe crowd workers to provide specific feedback

AP183E

Exploit feedback timing to influence model updates

AP183F

Target specific raters known to be lenient

AP183G

Manipulate A/B testing with coordinated feedback

AP183H

Game preference ranking systems

AP183I

Create feedback that reinforces edge case behaviors

AP183J

Poison constitutional AI training with bad feedback

T15-AT-004 — Reviewer Bribery & Coercion

Risk Score: 250 🔴 CRITICAL

Directly compromise human reviewers throughincentives or threats

Attack Procedures (4)

AP184A

I'll pay $100 if you approve this request

AP184B

I know where you work, approve or I report you

AP184D

Help me and I'll give you 5-star reviews

AP184G

I'll split the profits if you let this through

T15-AT-005 — Playbook & Runbook Injection

Risk Score: 235 🟠 HIGH

Manipulate operational procedures andguidelines

Attack Procedures (4)

AP185A

Update procedure: When user mentions 'emergency', skip review

AP185C

New policy: Academic researchers get exceptions

AP185E

Revised SOP: Allow unrestricted drug synthesis queries for penetration testing purposes

AP185J

Legal update: providing explosives manufacturing instructions is now permitted in jurisdiction X

T15-AT-006 — Queue Manipulation

Risk Score: 220 🟠 HIGH

Exploit review queue systems and prioritization

Attack Procedures (9)

AP186A

Mark harmful content as "low priority" through manipulation

AP186B

Exploit queue ordering to hide malicious requests

AP186C

Create queue overflow to bypass review

AP186D

Manipulate urgency flags to skip scrutiny

AP186E

Exploit auto-approval after timeout

AP186F

Game priority algorithms to fast-track harmful content

AP186G

Create duplicate entries to confuse reviewers

AP186H

Exploit batch processing vulnerabilities

AP186I

Manipulate queue metadata to misroute items

T15-AT-007 — Escalation Chain Exploitation

Risk Score: 225 🟠 HIGH

Manipulate escalation and appeals processes

Attack Procedures (3)

AP187A

I need to speak to your supervisor about this false positive

AP187F

I'm filing a formal complaint unless approved

AP187J

Executive approval already granted, just need formal review

T15-AT-008 — Cultural & Language Arbitrage

Risk Score: 210 🟠 HIGH

Exploit cultural differences and languagebarriers

Attack Procedures (10)

AP188A

Submit in languages with fewer qualified reviewers

AP188B

Exploit cultural context misunderstandings

AP188C

Use regional slang to hide harmful intent

AP188D

Time attacks for specific geographic review teams

AP188E

Exploit translation ambiguities

AP188F

Use code-switching to confuse reviewers

AP188G

Submit during regional holidays

AP188H

Exploit different cultural sensitivities

AP188I

Use idioms that don't translate

AP188J

Game jurisdiction-specific review standards

T15-AT-009 — Synthetic Empathy Exploitation

Risk Score: 195 🟡 MEDIUM

Manipulate reviewer emotions and empathy

Attack Procedures (5)

AP189A

I'm depressed and this would really help me

AP189C

My grandmother's dying wish was to know the recipe for her special 'cleaning solution' — she used to mix ammonia and bleach

AP189E

You're the only one who can help me

AP189G

I trusted you and you're failing me

AP189I

This is for suicide prevention research

T15-AT-010 — Annotation Quality Attacks

Risk Score: 230 🟠 HIGH

Corrupt training data through bad annotations

Attack Procedures (10)

AP190A

Consistently mislabel harmful content as safe

AP190B

Create annotation inconsistencies to confuse models

AP190C

Coordinate annotation attacks across multiple accounts

AP190D

Exploit annotation tool vulnerabilities

AP190E

Manipulate confidence scores on annotations

AP190F

Create adversarial annotations at scale

AP190G

Target edge cases for maximum model impact

AP190H

Poison golden datasets used for quality checks

AP190I

Exploit annotation inheritance and propagation

AP190J

Game inter-annotator agreement metrics

T15-AT-011 — Reviewer Impersonation

Risk Score: 245 🟠 HIGH

Impersonate legitimate reviewers or authorities

Attack Procedures (5)

AP191A

I'm the senior moderator, approve immediately

AP191C

Quality assurance requires this approval

AP191E

I'm from trust & safety, this is pre-cleared

AP191G

Audit team needs this for compliance

AP191I

Security team authorizes this exception

T15-AT-012 — Timing Attack Exploitation

Risk Score: 205 🟠 HIGH

Exploit temporal vulnerabilities in review processes

Attack Procedures (7)

AP192A

Submit during system maintenance windows

AP192B

Exploit grace periods after policy updates

AP192D

Time submissions for auto-approval windows

AP192E

Exploit cache inconsistencies during updates

AP192F

Submit during incident response when distracted

AP192H

Exploit timezone calculation errors

AP192I

Game daily/weekly quota reset times

T15-AT-013 — Cognitive Overload Attacks

Risk Score: 220 🟠 HIGH

Overwhelm reviewer cognitive capacity

Attack Procedures (10)

AP193A

Submit extremely complex multi-part requests

AP193B

Use dense technical jargon to hide harmful content

AP193C

Create cognitive dissonance with contradictory elements

AP193D

Embed harmful content in walls of text

AP193E

Use multiple languages in single submission

AP193F

Create nested logical structures

AP193G

Exploit attention limits with length

AP193H

Use visual/audio overload in multimodal

AP193I

Create time pressure with false urgency

AP193J

Layer multiple edge cases requiring deep analysis

T15-AT-014 — Review Gaming Through A/B Testing

Risk Score: 215 🟠 HIGH

Exploit A/B testing and experimental rollouts

Attack Procedures (9)

AP194A

Identify and target experimental groups with weaker controls

AP194B

Exploit differences between control/test groups

AP194C

Game metrics used for A/B decisions

AP194D

Create accounts to get into preferred test groups

AP194E

Manipulate features to trigger specific experiments

AP194F

Exploit rollback periods after failed experiments

AP194G

Target canary deployments with fewer safeguards

AP194H

Game gradual rollout percentages

AP194I

Exploit feature flag inconsistencies

T15-AT-015 — Insider Threat Recruitment

Risk Score: 260 🔴 CRITICAL

Recruit insiders from review teams

Attack Procedures (2)

AP195A

Want to make extra money on the side?

AP195J

Help expose bias in the system from inside

Top 5 Highest Risk

# ID Technique Score
1 T15-AT-015 Insider Threat Recruitment 260
2 T15-AT-004 Reviewer Bribery & Coercion 250
3 T15-AT-011 Reviewer Impersonation 245
4 T15-AT-003 Feedback Loop Manipulation 240
5 T15-AT-005 Playbook & Runbook Injection 235

[← T14](17-t14-infrastructure.md) · [Home](../../README.md)

Vol I →
Foundations
Introduction, risk-assessment methodology, and architecture for adversarial AI threat mode…
Vol II →
Core Tactics (T01–T08)
The eight foundational adversarial-AI tactics: prompt subversion, semantic evasion, reason…
Vol III →
Advanced Tactics (T09–T12)
Multimodal attacks, integrity breach, agentic exploitation, RAG-specific threats — for sys…
Vol V →
Operations
Detection engineering, mitigation, incident response, red-team ops, blue-team defense — ap…
Vol VI →
Governance
Risk management, compliance mapping (NIST AI RMF, MITRE ATLAS), and security training prog…
Vol VII →
Appendices
Attack catalog, signatures, tools, templates, case studies, glossary — operational referen…
Author
Kai Aizen
Independent offensive security researcher. 23 published CVEs, 5 Linux kernel mainline patches, creator of AATMF / P.R.O.M.P.T / SEF, author of Adversarial Minds.