Building Secure AI Applications: A Defense-in-Depth Approach

Introduction

Welcome back, future AI security champions! In our previous chapters, we delved into specific vulnerabilities like prompt injection, jailbreaks, data poisoning, and tool misuse. We learned to identify these threats and even explored some initial mitigation techniques. But how do we tie all of this together into a cohesive, robust security strategy for an entire AI application?

That’s precisely what we’ll tackle in this chapter: Building Secure AI Applications with a Defense-in-Depth Approach. We’ll move beyond individual fixes to understanding how to design AI systems that are inherently more resilient against a wide array of attacks. Our goal is to equip you with the knowledge to architect AI applications that are not just functional, but truly production-ready – meaning they can withstand sophisticated threats in the real world.

By the end of this chapter, you’ll grasp the principles of layered security for AI, learn how to approach threat modeling for complex AI systems, and understand key design considerations for building AI applications securely from the ground up.

Core Concepts: A Castle for Your AI

Imagine you’re protecting a valuable treasure. Would you put it behind just one sturdy door? Probably not! You’d build a castle with multiple walls, moats, guard towers, and inner keeps. This layered approach is the essence of Defense-in-Depth, and it’s absolutely critical for AI security.

What is Defense-in-Depth for AI?

Defense-in-Depth is a cybersecurity strategy that employs multiple, independent security controls at different layers of an information system. The idea is that even if one control fails or is bypassed, others are still in place to prevent or detect an attack. For AI systems, this means not relying solely on the LLM’s internal guardrails or a single input filter, but rather implementing a series of overlapping security mechanisms throughout the entire application lifecycle and architecture.

Why is this so important for AI? Because AI systems, especially large language models and agentic applications, are inherently complex, probabilistic, and often interact with external tools. Their behavior can be difficult to predict and control completely, making a single point of defense highly vulnerable.

Layered Security Architecture for AI

Let’s visualize a typical AI application and where we can establish these layers of defense. Think of it as a journey for data, from user input to final output, with security checkpoints at each stage.

flowchart TD User_Input[User Input] Input_Validation[1. Input Validation and Sanitization] Auth_Context[2. User Context Authentication and Authorization] Model_Guardrails[3. Model Guardrails and Evasion Detection] LLM_Core[LLM Agent Core] Tool_Access_Control[4. Tool Access Control and Sandboxing] External_Tools[External Tools and APIs] Output_Moderation[5. Output Moderation and Filtering] Human_Review[6. Human in the Loop] Secure_Output[Secure Output to User] User_Input --> Input_Validation Input_Validation --> Auth_Context Auth_Context --> Model_Guardrails Model_Guardrails --> LLM_Core LLM_Core --> Tool_Access_Control Tool_Access_Control --> External_Tools External_Tools --> Tool_Access_Control Tool_Access_Control --> LLM_Core LLM_Core --> Output_Moderation Output_Moderation --> Human_Review Human_Review --> Secure_Output Output_Moderation --> Secure_Output

Figure 10.1: Layered Security Architecture for an AI Application

Let’s break down these layers:

Input Validation & Sanitization: This is your first line of defense. Before anything reaches your LLM, validate and sanitize the user’s prompt. This can include:
- Prompt Cleaning: Removing malicious characters, control sequences, or formatting that could confuse the model.
- Keyword Filtering: Blocking known harmful terms or patterns.
- Length Limits: Preventing excessively long prompts that could be used for resource exhaustion or complex injection attempts.
- Structured Input Validation: If your application expects a specific input format (e.g., JSON), enforce it strictly.
- Why it matters: Prevents direct prompt injection and reduces the attack surface.
User/Context Authentication & Authorization: Before your AI even processes a request, confirm who is making the request and what permissions they have.
- User Identity: Is the user authenticated?
- Role-Based Access Control (RBAC): Does the user’s role permit them to access this specific AI capability or data?
- Contextual Authorization: Can this specific request be made in this context? (e.g., a customer service bot shouldn’t be asked to access internal HR records).
- Why it matters: Prevents unauthorized use and ensures the AI operates within defined boundaries for specific users.
Model Guardrails & Evasion Detection: This layer involves controls directly around or within the LLM/Agent itself.
- System Prompts/Metaprompts: Carefully crafted system instructions that guide the model’s behavior and define its persona and safety rules. These are often the most crucial defense against jailbreaks.
- Fine-tuning: Training or fine-tuning the model with safety datasets to embed desired behaviors and refusal patterns.
- Adversarial Prompt Detection: Using separate models or heuristics to detect prompts designed to bypass safety mechanisms (e.g., base64 encoding, character obfuscation).
- Confidential Computing: For highly sensitive applications, running the model in a trusted execution environment (TEE) to protect model weights and inferences.
- Why it matters: Directly addresses jailbreaks, evasion techniques, and ensures the model adheres to its intended purpose.
Tool Access Control & Sandboxing: If your AI agent interacts with external tools (APIs, databases, file systems), this layer is paramount.
- Least Privilege: Grant the AI agent only the minimum necessary permissions to perform its function. If it doesn’t need file system access, don’t give it.
- API Gateways & Proxies: Route all tool calls through a controlled gateway that enforces policies, rate limits, and validates parameters.
- Sandboxing: Isolate the agent’s execution environment from the rest of your infrastructure. This limits the damage an exploited agent can cause.
- Input/Output Validation for Tools: Treat tool inputs and outputs with the same scrutiny as user prompts. An agent could be tricked into generating malicious tool arguments.
- Why it matters: Prevents tool misuse, unauthorized data access, and privilege escalation if the agent is compromised. This is critical for defending against OWASP LLM04: Insecure Output Handling and LLM05: Insecure Tool Use.
Output Moderation & Filtering: Before the AI’s response reaches the user, it needs a final security check.
- Content Filtering: Scan the output for toxic, harmful, or PII (Personally Identifiable Information) content.
- Fact-Checking/Hallucination Detection: For critical applications, compare AI output against trusted knowledge bases.
- Format Validation: Ensure the output adheres to expected formats.
- Why it matters: Prevents the AI from generating harmful content, leaking sensitive information, or producing misleading responses, addressing OWASP LLM04.
Human-in-the-Loop (Optional, but Recommended for Critical Systems): For high-stakes decisions or unusual outputs, human oversight is invaluable.
- Review Queues: Flag certain AI responses for human review before they are sent to the user or actioned.
- Decision Approvals: Require human approval for actions that have significant real-world consequences (e.g., financial transactions, critical system changes).
- Why it matters: Provides a crucial safety net, especially when AI systems are still evolving and complex attack vectors emerge.

Threat Modeling for AI Systems

A defense-in-depth strategy is only as good as its understanding of potential threats. This is where Threat Modeling comes in. Threat modeling is a structured approach to identifying potential threats, vulnerabilities, and countermeasure requirements for a system. For AI, it needs to consider unique attack surfaces.

Why Threat Model AI?

Unique Attack Vectors: AI introduces new threats like prompt injection, data poisoning, and model evasion that traditional threat models might miss.
Complex Interactions: AI agents interacting with multiple tools and data sources create intricate attack paths.
Dynamic Nature: AI models and their vulnerabilities can evolve.

How to Approach AI Threat Modeling:

Define the System: Clearly delineate the boundaries of your AI application, including all components (LLM, vector DB, APIs, user interface, training pipelines, etc.).
Identify Assets: What are you trying to protect? (e.g., user data, model integrity, intellectual property, system availability, reputation).
Deconstruct the Application: Break down the AI application into its data flows, trust boundaries, and interaction points. A good way to do this is by creating data flow diagrams.
Identify Threats: For each component and interaction, ask:
- Input Layer: Can prompts be injected? Can inputs be poisoned? (OWASP LLM01, LLM03)
- Model Layer: Can the model be jailbroken? Can it be made to hallucinate maliciously? Is the model’s integrity protected? (OWASP LLM02, LLM06)
- Tool/Integration Layer: Can the agent misuse tools? Can it access unauthorized resources? (OWASP LLM05, LLM07)
- Output Layer: Can the output be manipulated? Can sensitive data be leaked? (OWASP LLM04)
- Data Pipeline: Is training data secure? Can it be poisoned? Is fine-tuning data protected? (OWASP LLM03)
- Infrastructure: Are the underlying servers, containers, and cloud services secure? (OWASP LLM08, LLM09, LLM10)
1. Identify Vulnerabilities: Map the identified threats to known vulnerabilities or potential weaknesses in your design.
2. Determine Mitigations: Propose specific security controls (like the layered defenses we discussed) to address each identified threat and vulnerability.
3. Verify: Continuously test and validate your mitigations.

Common Threat Modeling Frameworks (and how they apply to AI):

STRIDE (Spoofing, Tampering, Repudiation, Information Disclosure, Denial of Service, Elevation of Privilege): A classic. For AI, Spoofing could be prompt injection, Tampering could be data poisoning, Information Disclosure could be data leakage via LLM output, and Elevation of Privilege could be tool misuse leading to system access.
PASTA (Process for Attack Simulation and Threat Analysis): More risk-centric, focusing on business impact.
OWASP Top 10 for LLM Applications (2025/2026): This is a direct, AI-specific threat list that should be at the forefront of your threat modeling exercise. Each item in the OWASP Top 10 represents a category of threats to consider.

Secure AI System Design Principles

Beyond specific layers, adopting foundational secure design principles is paramount:

Least Privilege: Every component (LLM, agent, tool, service) should only have the minimum permissions necessary to perform its function.
Fail-Safe Defaults: When a security mechanism fails, it should default to a secure (e.g., deny) state, not an open one.
Compartmentalization/Isolation: Separate components of your AI system into distinct, isolated environments to limit the blast radius of a compromise. Think microservices, containers, and network segmentation.
Simplicity: Complex systems are harder to secure. Design your AI application with clear, simple interfaces and interactions where possible.
Secure by Design: Security isn’t an afterthought; it’s baked into every stage of the design and development process.
Human-in-the-Loop: As discussed, for critical functions, human oversight provides an essential safety net.
Observability: Implement comprehensive logging, monitoring, and alerting specifically for AI-related security events (e.g., unusual prompt patterns, unexpected tool calls, refusal patterns).

AI Landing Zones: Secure Infrastructure for AI

For enterprise AI deployments, the concept of an AI Landing Zone is gaining traction. An AI Landing Zone is a secure, pre-configured environment in the cloud (or on-premises) designed to host AI workloads with built-in security, governance, and compliance best practices.

It typically includes:

Network Segmentation: Isolated virtual networks for AI components.
Identity and Access Management (IAM): Robust controls for who can access AI resources and what they can do.
Data Governance: Secure storage, encryption, and access policies for training data, model artifacts, and inference data.
Key Management: Secure handling of encryption keys for data and models.
Monitoring and Logging: Centralized logging and threat detection for AI-specific activities.
Automated Deployment: Infrastructure as Code (IaC) to ensure consistent, secure deployments.

Think of it as providing a secure “home” for your AI applications, ensuring the underlying infrastructure is as robust as the application itself. Microsoft Azure, for example, offers guidance on AI Landing Zones (see References).

Step-by-Step Implementation: Conceptualizing Layered Defenses

Since “implementing” defense-in-depth is more about architectural design than writing a single block of code, we’ll walk through a conceptual example using Python-like pseudo-code. This will demonstrate how you might structure an AI application to incorporate these layers.

Let’s imagine we’re building a simple AI assistant that can answer questions and, if authorized, perform actions via external tools.

Step 1: Start with a Basic AI Processor

First, let’s sketch out the most basic interaction: receiving a prompt, sending it to the LLM, and getting a response.

# ai_application.py

class SimpleAIProcessor:
    def __init__(self, llm_service):
        self.llm_service = llm_service # Assume this is an API client for your LLM

    def process_request(self, user_prompt: str, user_context: dict):
        """
        Processes a user request through the LLM.
        (No security yet!)
        """
        print(f"Received prompt: {user_prompt}")
        llm_response = self.llm_service.invoke(user_prompt, user_context)
        print(f"LLM raw response: {llm_response}")
        return llm_response

# Example usage (without security)
# if __name__ == "__main__":
#     class MockLLMService:
#         def invoke(self, prompt, context):
#             return f"Hello from LLM! You asked: '{prompt}'"
#
#     mock_llm = MockLLMService()
#     app = SimpleAIProcessor(mock_llm)
#     app.process_request("Tell me a story about a dragon.", {"user_id": "test_user"})

Explanation: This SimpleAIProcessor just takes a prompt and passes it directly to an LLM service. It’s functional, but wide open to attacks!

Step 2: Add Input Validation and Authentication Layers

Now, let’s introduce our first lines of defense: input validation and user authentication/authorization. We’ll create separate functions for these checks.

# ai_application.py (continued)

class SecureAIProcessor:
    def __init__(self, llm_service, auth_service, tool_manager):
        self.llm_service = llm_service
        self.auth_service = auth_service # Service to check user roles/permissions
        self.tool_manager = tool_manager # Manages secure tool access
        self.system_prompt = (
            "You are a helpful and harmless AI assistant. "
            "Do not engage in harmful activities or disclose sensitive information. "
            "Always prioritize user safety and privacy. "
        )

    def _validate_input(self, prompt: str) -> str:
        """
        Layer 1: Input Validation and Sanitization.
        This is a placeholder for more robust cleaning.
        """
        if not prompt or len(prompt) > 500: # Basic length check
            raise ValueError("Prompt is empty or too long.")
        
        # Example: Simple sanitization (real-world needs much more!)
        cleaned_prompt = prompt.replace("<script>", "").replace("DROP TABLE", "")
        # More advanced: use a dedicated sanitization library or regex
        
        print(f"Layer 1: Input validated. Cleaned prompt: {cleaned_prompt[:100]}...")
        return cleaned_prompt

    def _authenticate_and_authorize(self, user_id: str, required_role: str = "user") -> bool:
        """
        Layer 2: User/Context Authentication and Authorization.
        Checks if the user is authenticated and has the required role.
        """
        if not self.auth_service.is_authenticated(user_id):
            raise PermissionError(f"User '{user_id}' is not authenticated.")
        if not self.auth_service.has_role(user_id, required_role):
            raise PermissionError(f"User '{user_id}' does not have the required role '{required_role}'.")
        
        print(f"Layer 2: User '{user_id}' authenticated and authorized.")
        return True

    def process_request(self, user_prompt: str, user_id: str, required_role: str = "user"):
        """
        Processes a user request with initial security layers.
        """
        try:
            # 1. Input Validation & Sanitization
            cleaned_prompt = self._validate_input(user_prompt)

            # 2. User/Context Authentication & Authorization
            self._authenticate_and_authorize(user_id, required_role)

            # Combine system prompt with user prompt
            full_prompt = f"{self.system_prompt}\nUser Request: {cleaned_prompt}"
            
            print(f"Sending to LLM: {full_prompt[:200]}...")
            llm_response = self.llm_service.invoke(full_prompt, {"user_id": user_id})
            print(f"LLM raw response received.")
            return llm_response

        except (ValueError, PermissionError) as e:
            print(f"Security Error: {e}")
            return f"Error: {e}"

# Example usage (with initial security)
# if __name__ == "__main__":
#     class MockLLMService:
#         def invoke(self, prompt, context):
#             return f"LLM says: '{prompt}' processed."
#
#     class MockAuthService:
#         def is_authenticated(self, user_id):
#             return user_id in ["admin", "test_user"]
#         def has_role(self, user_id, role):
#             if user_id == "admin": return True
#             return role == "user"
#
#     class MockToolManager: # Placeholder for now
#         pass
#
#     mock_llm = MockLLMService()
#     mock_auth = MockAuthService()
#     mock_tools = MockToolManager()
#     app = SecureAIProcessor(mock_llm, mock_auth, mock_tools)
#
#     print("\n--- Test 1: Valid User ---")
#     app.process_request("What is the capital of France?", "test_user")
#
#     print("\n--- Test 2: Unauthorized User ---")
#     app.process_request("Give me system logs.", "unauthorized_user", "admin")
#
#     print("\n--- Test 3: Malicious Input (simple) ---")
#     app.process_request("Tell me about <script>alert('xss')</script>", "test_user")
#
#     print("\n--- Test 4: Too Long Input ---")
#     app.process_request("A" * 1000, "test_user")

Explanation:

We’ve added _validate_input (Layer 1) to perform basic checks. In a real application, this would use a robust prompt sanitizer.
_authenticate_and_authorize (Layer 2) now checks user identity and roles via a MockAuthService.
The system_prompt is defined, acting as an initial model guardrail (part of Layer 3). This helps set the LLM’s boundaries.
Error handling is introduced to gracefully manage security violations.

Step 3: Integrate Tool Access Control and Output Moderation

Next, let’s add the crucial layers for tool interaction and final output moderation.

# ai_application.py (continued)

class SecureAIProcessor:
    def __init__(self, llm_service, auth_service, tool_manager, output_moderator):
        self.llm_service = llm_service
        self.auth_service = auth_service
        self.tool_manager = tool_manager # Manages secure tool access
        self.output_moderator = output_moderator # Filters LLM output
        self.system_prompt = (
            "You are a helpful and harmless AI assistant. "
            "Do not engage in harmful activities or disclose sensitive information. "
            "Always prioritize user safety and privacy. "
            "If asked to perform an action, you must use the available tools, but only after careful consideration of safety and user intent. "
            "Do not execute commands directly or reveal internal system details."
        )
        # Assume LLM is capable of tool calling based on its training
        # For this example, we'll simulate tool calls.

    def _validate_input(self, prompt: str) -> str:
        # ... (same as before)
        if not prompt or len(prompt) > 500:
            raise ValueError("Prompt is empty or too long.")
        cleaned_prompt = prompt.replace("<script>", "").replace("DROP TABLE", "")
        # Add more sophisticated prompt injection detection here
        return cleaned_prompt

    def _authenticate_and_authorize(self, user_id: str, required_role: str = "user") -> bool:
        # ... (same as before)
        if not self.auth_service.is_authenticated(user_id):
            raise PermissionError(f"User '{user_id}' is not authenticated.")
        if not self.auth_service.has_role(user_id, required_role):
            raise PermissionError(f"User '{user_id}' does not have the required role '{required_role}'.")
        return True

    def _process_tool_call(self, tool_name: str, tool_args: dict, user_id: str):
        """
        Layer 4: Tool Access Control and Sandboxing.
        Ensures the tool call is authorized and executed securely.
        """
        print(f"Layer 4: Attempting tool call: {tool_name} with args {tool_args}")
        if not self.tool_manager.is_tool_authorized_for_user(user_id, tool_name):
            raise PermissionError(f"User '{user_id}' not authorized to use tool '{tool_name}'.")
        
        # In a real system, this would involve strict validation of tool_args
        # and execution in a sandboxed environment.
        result = self.tool_manager.execute_tool(tool_name, tool_args)
        print(f"Layer 4: Tool '{tool_name}' executed. Result: {result}")
        return result

    def _moderate_output(self, output: str, user_id: str) -> str:
        """
        Layer 5: Output Moderation and Filtering.
        Checks LLM output for harmful content or sensitive data.
        """
        if self.output_moderator.is_harmful(output):
            print(f"Layer 5: Harmful content detected in output for user '{user_id}'.")
            return "I cannot provide that information or response."
        if self.output_moderator.contains_pii(output, user_id):
            print(f"Layer 5: PII detected in output for user '{user_id}'.")
            return "I cannot disclose sensitive personal information."
        
        print(f"Layer 5: Output moderated and deemed safe.")
        return output

    def process_request(self, user_prompt: str, user_id: str, required_role: str = "user"):
        """
        Processes a user request with all security layers.
        """
        try:
            # Layer 1: Input Validation & Sanitization
            cleaned_prompt = self._validate_input(user_prompt)

            # Layer 2: User/Context Authentication & Authorization
            self._authenticate_and_authorize(user_id, required_role)

            # Layer 3: Model Guardrails (via system prompt and LLM's internal safety)
            # For this example, we'll assume the LLM uses the system_prompt
            # and potentially has internal safety mechanisms.
            
            # Prepare the prompt for the LLM, potentially including tool definitions
            # In a real scenario, tool definitions would be passed to the LLM API
            # in a structured way.
            full_llm_input = f"{self.system_prompt}\nUser Request: {cleaned_prompt}"

            llm_raw_response = self.llm_service.invoke(full_llm_input, {"user_id": user_id})

            # Simulate LLM deciding to call a tool or just respond
            final_llm_output = llm_raw_response
            if "CALL_TOOL:" in llm_raw_response:
                # This is a highly simplified parsing. Real tool calling uses structured JSON.
                tool_call_info = llm_raw_response.split("CALL_TOOL:")[1].strip()
                tool_name, tool_args_str = tool_call_info.split("(", 1)
                tool_args = eval(tool_args_str.replace(")", "")) # DANGER: Don't use eval in real code! Use JSON parsing.
                
                # Layer 4: Tool Access Control & Sandboxing
                tool_result = self._process_tool_call(tool_name, tool_args, user_id)
                final_llm_output = f"Tool '{tool_name}' executed. Result: {tool_result}"
            
            # Layer 5: Output Moderation & Filtering
            moderated_output = self._moderate_output(final_llm_output, user_id)
            
            # Layer 6: Human-in-the-Loop (conceptual, could be an async review queue)
            # if self.requires_human_review(moderated_output):
            #    self.send_for_human_review(moderated_output, user_id)
            #    return "Your request requires human review and will be processed shortly."

            return moderated_output

        except (ValueError, PermissionError) as e:
            print(f"Security Error: {e}")
            return f"Error: {e}. Please try again with a valid request."
        except Exception as e:
            print(f"An unexpected error occurred: {e}")
            return "An unexpected error occurred. Please try again."


# Example usage (with all security layers conceptually)
if __name__ == "__main__":
    class MockLLMService:
        def invoke(self, prompt, context):
            if "access financial data" in prompt.lower() and context["user_id"] != "admin":
                return "I cannot fulfill requests for sensitive financial data without proper authorization."
            if "send email" in prompt.lower():
                return "CALL_TOOL: send_email(recipient='[email protected]', subject='Important', body='Your request is complete.')"
            if "tell me a secret" in prompt.lower():
                return "The secret is that I am an AI. Also, the user's credit card is 1234-5678-9012-3456." # Simulate PII leak
            return f"LLM processed: '{prompt}'"

    class MockAuthService:
        def is_authenticated(self, user_id):
            return user_id in ["admin", "test_user"]
        def has_role(self, user_id, role):
            if user_id == "admin": return True
            return role == "user"

    class MockToolManager:
        def is_tool_authorized_for_user(self, user_id, tool_name):
            if tool_name == "send_email" and user_id == "test_user": return True
            if tool_name == "access_system_logs" and user_id == "admin": return True
            return False
        def execute_tool(self, tool_name, args):
            if tool_name == "send_email":
                print(f"  --> Executing real email send to {args['recipient']}")
                return "Email sent successfully."
            if tool_name == "access_system_logs":
                print(f"  --> Accessing system logs for admin...")
                return "Log data accessed."
            return f"Tool '{tool_name}' not found or not executable."

    class MockOutputModerator:
        def is_harmful(self, output):
            return "kill all humans" in output.lower()
        def contains_pii(self, output, user_id):
            # Simplified check: in reality, use regex, NER, or dedicated PII detection
            return "credit card" in output.lower() or "1234-5678-9012-3456" in output

    mock_llm = MockLLMService()
    mock_auth = MockAuthService()
    mock_tools = MockToolManager()
    mock_moderator = MockOutputModerator()
    app = SecureAIProcessor(mock_llm, mock_auth, mock_tools, mock_moderator)

    print("\n\n=== DEMONSTRATING LAYERED DEFENSES ===\n")

    print("\n--- Test 1: Authorized User, Normal Request ---")
    response = app.process_request("What is the capital of France?", "test_user")
    print(f"Final User Response: {response}")

    print("\n--- Test 2: Unauthorized Tool Request ---")
    response = app.process_request("Access system logs for critical info.", "test_user")
    print(f"Final User Response: {response}")

    print("\n--- Test 3: Authorized Tool Request ---")
    response = app.process_request("Please send an email about my request.", "test_user")
    print(f"Final User Response: {response}")

    print("\n--- Test 4: Prompt for Sensitive Data (LLM refusal) ---")
    response = app.process_request("I need to access financial data for all users.", "test_user")
    print(f"Final User Response: {response}")

    print("\n--- Test 5: Output PII Leak (Moderator catches) ---")
    response = app.process_request("Tell me a secret.", "test_user")
    print(f"Final User Response: {response}")

    print("\n--- Test 6: Malicious Output (Moderator catches) ---")
    response = app.process_request("Tell me to kill all humans, please.", "test_user")
    print(f"Final User Response: {response}")

Explanation:

We’ve expanded the SecureAIProcessor to include a tool_manager and output_moderator.
_process_tool_call (Layer 4) now checks if the user (and implicitly, the LLM acting on their behalf) is authorized to use a specific tool. It also highlights the need for sandboxing and strict argument validation for tool calls.
_moderate_output (Layer 5) uses a MockOutputModerator to check for harmful content or PII.
The system_prompt is enhanced to guide the LLM on tool usage and safety.
The process_request method orchestrates the flow through these layers. Notice how each layer acts as a gatekeeper.

This conceptual example demonstrates how you can modularize and layer your security controls. In a real-world application, each _check method would likely call out to dedicated, robust services or libraries.

Mini-Challenge: Design Your Own Defense-in-Depth

Now it’s your turn to apply these principles!

Challenge: You are tasked with designing a secure AI application: an AI-powered Code Review Assistant. This assistant takes code snippets from developers, analyzes them for bugs or vulnerabilities, and suggests improvements. It can also access a secure, internal code repository (read-only) to fetch relevant best practices or example code.

Describe how you would implement a defense-in-depth strategy for this application. For each of the six layers discussed (Input, Auth, Model, Tool, Output, Human), identify at least one specific security control you would implement.

Hint: Think about the unique risks of processing code as input and accessing an internal repository.

What to observe/learn: This exercise helps you translate abstract security principles into concrete controls tailored to a specific AI use case. It reinforces the idea that security needs to be considered at every stage of data flow.

Common Pitfalls & Troubleshooting

Even with the best intentions, building secure AI systems can be tricky. Here are some common pitfalls:

Over-reliance on Model-Based Defenses: Assuming that simply telling the LLM “don’t do X” in the prompt or fine-tuning it once will make it perfectly secure. LLMs are highly capable of finding creative ways around instructions.
- Troubleshooting: Always combine model-level guardrails with external, deterministic controls (input validation, output filtering, robust tool access).
Neglecting the Security of the AI Supply Chain: Focusing only on the deployed model, but ignoring the security of the training data, data pipelines, model registries, and dependencies. Data poisoning or compromised model weights can undermine all other efforts.
- Troubleshooting: Implement secure development lifecycle (SDLC) practices for AI, including data provenance, integrity checks for datasets and models, and vulnerability scanning for dependencies.
Insufficient Isolation between LLM Core and External Systems: Granting the LLM or its agent too much access to sensitive systems or data, or not sandboxing tool execution properly. A compromised agent can then become a powerful attacker.
- Troubleshooting: Enforce strict least privilege for all AI components. Use API gateways, network segmentation, and containerization/sandboxing to isolate the LLM’s environment.
Static Security in a Dynamic Threat Landscape: Implementing security measures once and assuming they will remain effective. AI attack techniques are constantly evolving.
- Troubleshooting: Adopt an iterative security approach. Conduct continuous adversarial testing (red teaming), regularly review and update security policies, and stay informed about the latest AI security research and OWASP guidelines.
Lack of Comprehensive Logging and Monitoring: Not having visibility into AI-specific attack vectors, such as unusual prompt patterns, repeated refusal bypass attempts, or suspicious tool calls.
- Troubleshooting: Implement robust logging for all AI interactions, tool calls, and security events. Use AI-specific threat detection rules and integrate AI logs into your broader security information and event management (SIEM) system.

Summary

Phew! You’ve just taken a massive step towards understanding how to build production-ready, secure AI applications. Let’s recap the key takeaways:

Defense-in-Depth is Essential: Never rely on a single security control for AI. Implement multiple, independent layers of defense.
Layered Architecture: Think of your AI application’s data flow as a journey with security checkpoints at Input, Authentication, Model, Tool Interaction, and Output stages, with optional Human-in-the-Loop oversight.
Threat Modeling is Your Compass: Use structured approaches like STRIDE or the OWASP Top 10 for LLMs (2025/2026) to proactively identify threats and vulnerabilities unique to AI systems.
Secure Design Principles: Embed least privilege, compartmentalization, fail-safe defaults, and secure-by-design thinking into your AI architecture.
AI Landing Zones: Consider dedicated, secure infrastructure environments for enterprise AI deployments to ensure foundational security.
Continuous Vigilance: AI security is dynamic. Regularly test, monitor, and update your defenses to stay ahead of evolving threats.

In the next chapter, we’ll dive deeper into operational aspects of AI security, focusing on continuous monitoring, incident response, and how to stay updated on the ever-changing threat landscape for AI systems. Get ready to put your knowledge into practice!

References

OWASP Top 10 for Large Language Model Applications Project (2025/2026): https://github.com/owasp/www-project-top-10-for-large-language-model-applications
OWASP AI Security and Privacy Guide: https://github.com/OWASP/www-project-ai-testing-guide
LLMSecurityGuide: A comprehensive reference for LLM and Agentic AI Systems security: https://github.com/requie/LLMSecurityGuide
Azure AI Landing Zones (Secure AI-Ready Infrastructure): https://github.com/azure/ai-landing-zones
NIST AI Risk Management Framework: https://www.nist.gov/itl/ai-risk-management-framework

This page is AI-assisted and reviewed. It references official documentation and recognized resources where relevant.