Skip to content
Home/Research/AI Security
SnailSploit · Adversarial AI Research

AI Security Research.

Original offensive research on how AI systems fail under attack — prompt injection, jailbreaking, AI-agent and supply-chain compromise, and the trust-boundary gaps no single threat model owns. Each piece identifies the principle that makes a class of failures possible, then proves it reproduces. Same attack. Different substrate.

20+
original research pieces
240
AATMF techniques
25
knowledge-base entries
79
published CVEs
Flagship Research
the SHELL series — deep original disclosures
AI Agent & Supply-Chain Security
tools · memory · execution

The attack surface created when an LLM is given tools, memory, and the ability to act. An attacker who controls any input the agent ingests can redirect what it does.

Prompt Injection

Attacker instructions placed in content the model later ingests — a page, a document, a tool result — so it follows instructions the user never wrote. The trust boundary between the model and its data.

Jailbreaking

Manipulating a model into ignoring its own safety policy through crafted input — targeting the model's alignment rather than its data boundary.

Frameworks & Methodology
how to test, and what to test against
Detection, Defense & Theory
where defenses fail
Knowledge Base

A 25-entry reference of AI-security terms — attacks, core concepts, and defenses — each a defined term with examples and cross-links.

AI security — the essentials

What is adversarial AI security?

The study and testing of how machine-learning systems — especially LLMs and AI agents — fail under deliberate attack: prompt injection, jailbreaking, model and data poisoning, agent hijacking, and the supply-chain and trust-boundary gaps that appear when LLMs are wired into tools, memory, and execution. SnailSploit researches it offensively — find the principle that makes a class of failures possible, then prove it reproduces across targets.

What is AI agent security?

The attack surface created when an LLM gains tools, memory, skills, and the ability to execute actions. An agent reads untrusted content, calls tools, and runs code — so an attacker who controls any input it ingests can redirect its behavior. Key classes: indirect prompt injection, malicious skill packages (see SKILBin), memory poisoning, and the MCP/agent tool-execution surface.

What is AATMF?

The Adversarial AI Tactics, Techniques and Mitigations Framework — SnailSploit's operational taxonomy for testing AI systems: 15 tactics, 240 techniques, 2,152 executable procedures, and 4,980 adversarial prompts. Where MITRE ATLAS documents what AI attacks have happened, AATMF is what a red-teamer actually runs against a live LLM, agent, or RAG pipeline.

What is the difference between prompt injection and jailbreaking?

Jailbreaking manipulates a model into ignoring its own safety policy via crafted input in the user's own turn — it targets alignment. Prompt injection places attacker instructions in content the model later ingests (a page, document, tool result, email) so it follows instructions the user never wrote — it targets the trust boundary between the model and its data.

Who produces this research?

SnailSploit, an independent adversarial-AI research group. The AI security work is led by Kai Aizen — creator of AATMF, author of Adversarial Minds, NVD contributor — with 84 published CVEs and 5 Linux kernel mainline patches across the group. Research is original, coordinated-disclosure-based, and mapped to AATMF, MITRE ATT&CK, and MITRE ATLAS.