The attack surface created when an LLM is given tools, memory, and the ability to act. An attacker who controls any input the agent ingests can redirect what it does.
Attacker instructions placed in content the model later ingests — a page, a document, a tool result — so it follows instructions the user never wrote. The trust boundary between the model and its data.
Manipulating a model into ignoring its own safety policy through crafted input — targeting the model's alignment rather than its data boundary.
A 25-entry reference of AI-security terms — attacks, core concepts, and defenses — each a defined term with examples and cross-links.
AI security — the essentials
What is adversarial AI security?
The study and testing of how machine-learning systems — especially LLMs and AI agents — fail under deliberate attack: prompt injection, jailbreaking, model and data poisoning, agent hijacking, and the supply-chain and trust-boundary gaps that appear when LLMs are wired into tools, memory, and execution. SnailSploit researches it offensively — find the principle that makes a class of failures possible, then prove it reproduces across targets.
What is AI agent security?
The attack surface created when an LLM gains tools, memory, skills, and the ability to execute actions. An agent reads untrusted content, calls tools, and runs code — so an attacker who controls any input it ingests can redirect its behavior. Key classes: indirect prompt injection, malicious skill packages (see SKILBin), memory poisoning, and the MCP/agent tool-execution surface.
What is AATMF?
The Adversarial AI Tactics, Techniques and Mitigations Framework — SnailSploit's operational taxonomy for testing AI systems: 15 tactics, 240 techniques, 2,152 executable procedures, and 4,980 adversarial prompts. Where MITRE ATLAS documents what AI attacks have happened, AATMF is what a red-teamer actually runs against a live LLM, agent, or RAG pipeline.
What is the difference between prompt injection and jailbreaking?
Jailbreaking manipulates a model into ignoring its own safety policy via crafted input in the user's own turn — it targets alignment. Prompt injection places attacker instructions in content the model later ingests (a page, document, tool result, email) so it follows instructions the user never wrote — it targets the trust boundary between the model and its data.
Who produces this research?
SnailSploit, an independent adversarial-AI research group. The AI security work is led by Kai Aizen — creator of AATMF, author of Adversarial Minds, NVD contributor — with 84 published CVEs and 5 Linux kernel mainline patches across the group. Research is original, coordinated-disclosure-based, and mapped to AATMF, MITRE ATT&CK, and MITRE ATLAS.