Threat Modeling for AI Systems: Anticipating Attacks

Introduction to AI Threat Modeling: Anticipating Attacks

Welcome back, future AI security architects! In our previous chapters, we’ve explored various vulnerabilities specific to Large Language Models (LLMs) and agentic AI systems, from the sneaky world of prompt injections to the dangers of insecure output handling. We’ve seen how attackers can manipulate these systems and how critical it is to build robust defenses.

But how do we proactively find these weaknesses before an attacker does? How do we design security into our AI applications from the ground up, rather than patching problems reactively? The answer lies in a powerful, systematic approach called Threat Modeling.

In this chapter, we’ll dive deep into the art and science of threat modeling, specifically tailored for the unique challenges of AI systems. You’ll learn what threat modeling is, why it’s indispensable for AI, and how to apply practical frameworks like the OWASP Top 10 for LLM Applications (2025/2026) to systematically identify potential attacks. By the end, you’ll be equipped to anticipate threats, design more resilient AI applications, and truly build ‘production-ready’ systems that stand up to adversarial scrutiny.

Ready to put on your hacker hat (for good!) and think like an attacker to protect your AI? Let’s get started!

Core Concepts: Understanding and Applying AI Threat Modeling

Threat modeling is a structured process that helps you identify, understand, and address potential threats to your application. Think of it as putting on your “attacker glasses” and asking: “How could someone break this? What could go wrong?” For AI systems, this process becomes even more critical due to their unique characteristics and attack surfaces.

What is Threat Modeling?

At its heart, threat modeling is about asking four key questions:

What are we building? (Defining the system and its components)
What can go wrong? (Identifying potential threats)
What are we going to do about it? (Designing mitigations)
Did we do a good job? (Validation and continuous improvement)

This process isn’t a one-time event; it’s an iterative cycle that should be integrated throughout the entire AI system development lifecycle, from initial design to deployment and ongoing operation.

Why is AI Threat Modeling Different and Crucial?

While traditional software threat modeling principles still apply, AI introduces several new dimensions:

Probabilistic Nature: Unlike deterministic code, AI models often behave probabilistically. This means an input might sometimes yield a desired output, and other times an undesired one, making it harder to predict and secure.
Unique Attack Surfaces: Beyond typical network and application layers, AI systems have novel attack surfaces:
- Training Data: Vulnerable to poisoning.
- Prompts/Inputs: Vulnerable to injection and manipulation.
- Model Itself: Vulnerable to adversarial examples, data extraction, and denial of service.
- Outputs: Vulnerable to generating harmful, biased, or incorrect information.
- Tools/Plugins: Agentic AI systems interacting with external tools introduce new risks.
Emergent Behaviors: LLMs can exhibit unexpected behaviors or capabilities that weren’t explicitly programmed, which can be exploited.
Human-in-the-Loop: The interaction between AI and human users can introduce social engineering elements and trust issues.
Dynamic Threat Landscape: AI security is rapidly evolving. New attack vectors and mitigation techniques emerge constantly.

Ignoring AI-specific threat modeling is a common pitfall, often leading to vulnerabilities being discovered only after deployment, which is far more costly and damaging to fix.

Approaches to AI Threat Modeling

Several well-known threat modeling frameworks can be adapted for AI. Let’s explore a couple:

1. STRIDE for AI

STRIDE is a mnemonic for six categories of threats:

Spoofing: Impersonating entities (e.g., an attacker making the AI believe they are an authorized user, or an AI agent impersonating a human).
Tampering: Modifying data or processes (e.g., training data poisoning, prompt injection altering model behavior).
Repudiation: Denying actions (e.g., an AI agent performing an unauthorized action, and there’s no audit trail to prove it).
Information Disclosure: Revealing sensitive data (e.g., an LLM leaking sensitive training data, or an agent revealing internal system information).
Denial of Service (DoS): Making resources unavailable (e.g., flooding an LLM with requests, adversarial inputs causing excessive computation, model collapse).
Elevation of Privilege: Gaining unauthorized access/capabilities (e.g., an LLM gaining access to system commands via a tool, or an agent breaking out of its sandbox).

When applying STRIDE to AI, we consider how each threat category manifests across the AI system’s components: data, model, input/output, and external interactions.

2. Leveraging the OWASP Top 10 for LLM Applications (2025/2026)

The OWASP Top 10 for LLM Applications is a critical resource, acting as a specialized threat catalog. It focuses on the most prevalent and impactful security risks unique to LLM-powered applications. When performing threat modeling for AI, this list provides an excellent starting point for identifying specific “what can go wrong” scenarios.

Let’s quickly revisit these categories, now framed as threats you’d identify during threat modeling:

LLM01: Prompt Injection: The model is tricked into executing malicious instructions or ignoring safety guidelines.
LLM02: Insecure Output Handling: The LLM’s output is accepted without sufficient validation, leading to downstream vulnerabilities (e.g., XSS, privilege escalation).
LLM03: Training Data Poisoning: Malicious data introduced into the training dataset compromises the model’s integrity, leading to backdoors, biases, or performance degradation.
LLM04: Model Denial of Service: Attackers cause the model to consume excessive resources or become unavailable, often through complex or resource-intensive prompts.
LLM05: Supply Chain Vulnerabilities: Insecure components, models, or data sources in the AI supply chain introduce weaknesses.
LLM06: Insecure Plugin Design: Plugins or external tools used by the LLM are designed or implemented insecurely, allowing for misuse or unauthorized access.
LLM07: Excessive Agency: An LLM or AI agent is granted too much autonomy or overly broad permissions, leading to unintended actions or unauthorized operations.
LLM08: Over-reliance: Users or systems place undue trust in the LLM’s output without verification, leading to incorrect decisions or actions based on fabricated information.
LLM09: Sensitive Information Disclosure: The LLM inadvertently reveals confidential data, either from its training data, past interactions, or access to external systems.
LLM10: Inadequate Sandboxing: The environment where the LLM or agent operates lacks sufficient isolation, allowing it to interact with sensitive system resources or execute arbitrary code.

By systematically walking through these OWASP categories, you can ensure a comprehensive review of potential threats against your LLM application.

The Threat Modeling Process for AI: A Step-by-Step Guide

Let’s break down the general threat modeling process into actionable steps, keeping our AI focus in mind.

1. Define the System and its Boundaries

What is it? Clearly define the AI application, its purpose, and its core functionality.
Who are the users? Identify different user roles and their interactions.
What are its components? List all parts: user interface, backend API, LLM, vector database, external tools/APIs, data pipelines, etc.
Draw a Data Flow Diagram (DFD): This is crucial! Visualize how data moves between components, users, and external systems. Identify trust boundaries (where data crosses from one trust level to another).

2. Identify Assets

What do we want to protect?
- Data: User data, training data, fine-tuning data, prompts, generated outputs, internal system data.
- Functionality: The core purpose of the AI application, its ability to provide correct and useful responses.
- Reputation: The trust users place in the AI system.
- Resources: Compute power, storage, external API quotas.
- Secrets: API keys, credentials, sensitive configuration.

3. Identify Threats

This is where you put on your attacker hat! For each component and data flow identified in your DFD, ask “What can go wrong?” Use frameworks like STRIDE and the OWASP Top 10 for LLM Applications as checklists.

Brainstorm Scenarios:
- Prompt Injection: “What if a user’s input contains malicious instructions that override the system prompt?”
- Data Poisoning: “What if malicious data enters our fine-tuning pipeline?”
- Excessive Agency: “What if our AI agent, given access to a payment API, is tricked into making unauthorized transactions?”
- Information Disclosure: “Could the LLM accidentally reveal sensitive information from its training data or past conversations?”
Consider Trust Boundaries: Threats often occur when data flows across a trust boundary (e.g., from an untrusted user to your trusted LLM).

4. Analyze Vulnerabilities

For each identified threat, determine how it could be realized. What specific weaknesses in your design or implementation would allow the threat to materialize?

“The prompt injection threat could be realized because we don’t have a robust input validation layer before the prompt reaches the LLM.”
“The data poisoning threat could be realized because our data ingestion pipeline lacks integrity checks and comes from an untrusted source.”

5. Determine Risks

Assess the likelihood of each threat occurring and the potential impact if it does. This helps you prioritize.

Likelihood: High, Medium, Low (based on attack complexity, available tools, attacker motivation).
Impact: High, Medium, Low (financial, reputational, operational, legal).

6. Devise Mitigations

For each high-risk threat, propose specific security controls or changes to the design to reduce its likelihood or impact.

Prompt Injection: Implement a multi-stage prompt sanitization and validation filter.
Data Poisoning: Implement data provenance tracking and anomaly detection in data pipelines.
Excessive Agency: Apply the principle of least privilege to agent tools; use human-in-the-loop for sensitive actions.
Information Disclosure: Implement output filtering and PII redaction.

7. Validate and Iterate

Threat modeling is not a one-and-done activity.

Review: Have your mitigations truly addressed the threats? Are there new threats introduced by the mitigations?
Continuous: Revisit your threat model as your AI system evolves, new features are added, or new attack techniques emerge.

Step-by-Step: Visualizing AI Threat Modeling with a DFD

Let’s illustrate the first step of threat modeling – defining the system and its boundaries – with a simple example of an AI-powered customer support agent. We’ll use a Data Flow Diagram (DFD) to visualize the components and data flows.

Imagine we are building a system where a user interacts with an LLM-powered agent to get support. The agent can access a knowledge base and a customer CRM API.

First, let’s sketch out the main components and how they interact.

flowchart TD User[Customer User] Web_App[Web Application/Chat UI] API_Gateway[Backend API Gateway] LLM_Orchestrator[LLM Orchestrator Service] LLM_Provider["External LLM Provider "] Vector_DB[Vector Database] CRM_API[Customer CRM API] Audit_Log[Audit Log Service] User -->|\1| Web_App Web_App -->|\1| API_Gateway API_Gateway -->|\1| LLM_Orchestrator LLM_Orchestrator -->|\1| LLM_Provider LLM_Orchestrator -->|\1| Vector_DB LLM_Orchestrator -->|\1| CRM_API LLM_Provider -->|\1| LLM_Orchestrator Vector_DB -->|\1| LLM_Orchestrator CRM_API -->|\1| LLM_Orchestrator LLM_Orchestrator -->|\1| API_Gateway API_Gateway -->|\1| Web_App Web_App -->|\1| User LLM_Orchestrator -->|\1| Audit_Log API_Gateway -->|\1| Audit_Log

This diagram helps us visualize the system. Now, let’s consider a few trust boundaries:

Between the User and Web_App (untrusted user input).
Between Web_App and API_Gateway (client-side interaction).
Between LLM_Orchestrator and LLM_Provider (external service).
Between LLM_Orchestrator and CRM_API (sensitive data access).

Once we have this visual, we can start asking our “what can go wrong?” questions, applying STRIDE and OWASP Top 10 for LLMs.

For example, let’s pick a few threats related to the LLM_Orchestrator and its interactions:

Threat (LLM01: Prompt Injection): A malicious user crafts a prompt that bypasses the Web_App and API_Gateway sanitization, and tricks the LLM_Provider into revealing sensitive Customer Details retrieved from CRM_API or executing unintended actions via LLM_Orchestrator logic.
- Vulnerability: Insufficient input validation and sanitization in LLM_Orchestrator before constructing the final prompt for LLM_Provider.
- Impact: Sensitive data disclosure, unauthorized actions, reputational damage.
- Mitigation: Implement a dedicated prompt sanitization and validation layer within LLM_Orchestrator. Use allow-lists for specific commands or data requests.
Threat (LLM07: Excessive Agency / LLM06: Insecure Plugin Design): The LLM_Orchestrator has direct, unconstrained access to the CRM_API. If the LLM_Provider is compromised or tricked, it could instruct the LLM_Orchestrator to perform mass data extraction or modification via the CRM_API.
- Vulnerability: The LLM_Orchestrator grants overly broad permissions to the CRM_API tool, or the tool itself doesn’t enforce granular access control.
- Impact: Data breaches, data corruption, compliance violations.
- Mitigation: Apply the principle of least privilege to the CRM_API access. Use a secure wrapper around the CRM_API tool that enforces strict, fine-grained permissions and requires explicit user confirmation for sensitive actions (human-in-the-loop).
Threat (LLM09: Sensitive Information Disclosure): The LLM_Provider might inadvertently include sensitive Customer Details (retrieved from CRM_API) in its LLM Response back to the User, even if the user didn’t explicitly ask for it, or if the user is not authorized to see it.
- Vulnerability: Lack of output filtering and PII redaction on the LLM Response before it’s sent back to the User.
- Impact: Privacy violation, legal penalties.
- Mitigation: Implement a robust output moderation and PII redaction service within the LLM_Orchestrator to scrub sensitive data from the LLM Response before it leaves the system.

This systematic approach, moving from a system diagram to specific threats, vulnerabilities, and mitigations, is the essence of effective AI threat modeling.

Mini-Challenge: Threat Modeling a Simple AI Agent

Let’s put your new threat modeling skills to the test!

Challenge: Imagine you are designing a simple “Recipe Generator Agent.” This agent takes a list of ingredients from a user, then uses an LLM to generate a recipe, and finally uses a “Shopping List API” tool to add missing ingredients to the user’s grocery list.

Your task is to perform the first three steps of threat modeling:

Define the System: Briefly describe the components and their interactions (you don’t need a full DFD, just list them).
Identify Key Assets: What are the critical things you need to protect?
Identify 2-3 Threats: Using the OWASP Top 10 for LLM Applications as inspiration, identify 2-3 potential security threats to this Recipe Generator Agent. For each threat, mention which OWASP LLM category it falls under (e.g., LLM01, LLM06).

Hint: Think about what happens when the LLM interacts with external tools, and what kind of input a user provides.

What to Observe/Learn: This exercise helps you practice breaking down an AI system, identifying what’s valuable, and brainstorming specific ways it could be attacked, linking them to known threat categories.

Common Pitfalls & Troubleshooting in AI Threat Modeling

Even with a structured approach, threat modeling for AI can have its challenges. Here are some common pitfalls and how to avoid them:

Over-reliance on Model-Based Defenses:
- Pitfall: Believing that instructing the LLM (e.g., “Do not reveal sensitive information,” “Do not generate harmful content”) is sufficient for security. Attackers are skilled at bypassing these soft constraints.
- Troubleshooting: Remember that LLMs are not security firewalls. Implement strong perimeter defenses (input validation), output filtering, sandboxing, and access controls outside the LLM itself. Treat the LLM as a potentially untrusted component.
Treating Threat Modeling as a One-Off Event:
- Pitfall: Performing threat modeling once at the beginning of a project and never revisiting it. AI systems evolve rapidly, and new attack vectors emerge constantly.
- Troubleshooting: Integrate threat modeling into your CI/CD pipeline and change management processes. Revisit your threat model when significant changes occur (new features, new LLM versions, new tools, new data sources). Make it a living document.
Lack of AI-Specific Expertise in the Threat Modeling Team:
- Pitfall: A security team experienced in traditional web security might miss nuances of AI-specific attacks (e.g., data poisoning, adversarial examples, emergent behaviors).
- Troubleshooting: Ensure your threat modeling team includes members with expertise in AI/ML, data science, and security. Cross-functional collaboration is key to covering all angles, from data pipelines to model inference and agentic actions.
Ignoring the “Human Factor” and Social Engineering:
- Pitfall: Focusing solely on technical vulnerabilities and neglecting how humans interact with and might be manipulated by (or manipulate) the AI system.
- Troubleshooting: Consider threats like “over-reliance” (LLM08). How might users be tricked by plausible but incorrect AI outputs? How could an attacker use the AI to perform social engineering against other users or even internal staff?

Summary

Phew! We’ve covered a lot of ground in this chapter, establishing threat modeling as a cornerstone of secure AI development.

Here are the key takeaways:

Threat modeling is a proactive, systematic process to identify, analyze, and mitigate security threats in your AI applications.
AI introduces unique challenges due to its probabilistic nature, novel attack surfaces (data, model, prompt, tools), and emergent behaviors.
Frameworks like STRIDE and the OWASP Top 10 for LLM Applications (2025/2026) provide excellent guidance for identifying AI-specific threats.
The threat modeling process involves defining your system, identifying assets, brainstorming threats, analyzing vulnerabilities, determining risks, devising mitigations, and continuously validating your approach.
Data Flow Diagrams (DFDs) are invaluable for visualizing your AI system and identifying trust boundaries.
Avoid common pitfalls like over-relying on model-based defenses, treating threat modeling as a one-off task, lacking AI expertise, or ignoring the human element.

By embracing threat modeling, you’re not just reacting to problems; you’re actively designing resilience into your AI systems, making them truly production-ready and trustworthy.

What’s Next?

Now that we understand how to anticipate attacks, our next step is to explore how to protect AI agents at runtime. In the next chapter, we’ll dive into Runtime Protection for AI Agents: Guarding Against Live Attacks, focusing on practical strategies and tools to secure your agentic AI systems when they’re actively interacting with the world.

References

OWASP Top 10 for Large Language Model Applications (2025/2026). GitHub. https://github.com/owasp/www-project-top-10-for-large-language-model-applications
OWASP AI Security and Privacy Guide. GitHub. https://github.com/OWASP/www-project-ai-testing-guide
LLMSecurityGuide: A comprehensive reference for LLM and Agentic AI Systems security. GitHub. https://github.com/requie/LLMSecurityGuide
Microsoft Security Development Lifecycle (SDL) - Threat Modeling. Microsoft Docs. https://learn.microsoft.com/en-us/security/sdl/threat-modeling
OWASP Threat Modeling Cheat Sheet. OWASP Foundation. https://cheatsheetseries.owasp.org/cheatsheets/Threat_Modeling_Cheat_Sheet.html

This page is AI-assisted and reviewed. It references official documentation and recognized resources where relevant.