Welcome back, future MLOps heroes! In our previous chapter, we explored the fundamentals of logging for AI systems, setting the stage for gaining visibility into our applications. We learned how structured, contextual logs are invaluable for understanding what happened. But what if you need to understand how something happened, especially when your AI application interacts with multiple services, databases, and external APIs? How do you follow a single user request or an AI agent’s decision-making process across all these moving parts?
That’s where distributed tracing comes in! In this chapter, we’re going to dive deep into tracing AI workflows. We’ll learn how to follow an AI interaction, from the initial user prompt all the way through your application’s logic, model inference, and back to the final prediction. This is incredibly powerful for debugging, performance optimization, and understanding the true journey of your AI requests.
By the end of this chapter, you’ll be able to:
- Understand what distributed tracing is and why it’s essential for complex AI systems.
- Grasp the core concepts of OpenTelemetry: traces, spans, and context propagation.
- Set up OpenTelemetry for Python applications.
- Instrument direct LLM calls to capture prompt, response, latency, and token usage.
- Explore how to integrate tracing with popular LLM frameworks like LangChain or LlamaIndex.
- Send your trace data to an observability backend for visualization.
Ready to connect the dots in your AI systems? Let’s trace it!
Core Concepts: Following the AI Breadcrumbs
Before we jump into code, let’s build a solid understanding of what distributed tracing is and why it’s a game-changer for AI.
What is Distributed Tracing? The Detective’s Magnifying Glass
Imagine you’re trying to solve a mystery. You have clues (logs), but they’re scattered everywhere. Distributed tracing is like having a special tracking device that follows the suspect (your request) through every room, every hallway, and every interaction it has. It creates a complete timeline of events, showing you exactly where the suspect went, what they did, and how long they spent at each location.
In software, a trace represents the entire lifecycle of a request or transaction as it flows through a distributed system. This trace is composed of individual units of work called spans.
Why is Tracing Crucial for AI?
AI systems, especially those built around Large Language Models (LLMs) or complex multi-agent architectures, introduce unique challenges compared to traditional software:
- Non-Determinism: LLMs aren’t always deterministic. The same prompt might yield slightly different responses, making debugging harder. Tracing helps you capture the exact prompt and response for each specific interaction, providing crucial context for understanding varying outputs.
- Chained Operations: AI applications often involve multiple steps: retrieving data, pre-processing, calling an LLM, post-processing, calling another tool, and so on. Tracing visualizes this entire chain, revealing the flow of control and data.
- Black Box Tendencies: While we can’t always peek inside an LLM, we can trace the inputs we send and the outputs we receive, along with critical metadata. This “observability at the edges” is vital when the core logic is opaque.
- Performance Bottlenecks: Tracing quickly reveals which part of your AI workflow (data retrieval, LLM call, custom logic) is taking the most time, allowing you to pinpoint and optimize slow components.
- Cost Management: By attaching token usage, model IDs, and other resource consumption data to traces, you can attribute cost directly to specific user interactions or application features, enabling precise cost optimization.
OpenTelemetry: The Universal Language of Observability
To achieve consistent tracing across different services and languages, we need a standard. That’s where OpenTelemetry comes in.
OpenTelemetry (often abbreviated as OTel) is a vendor-neutral set of APIs, SDKs, and tools designed to instrument, generate, collect, and export telemetry data (traces, metrics, and logs). It’s an open-source project that has become the de-facto standard for observability.
Why OpenTelemetry?
- Vendor Neutrality: You instrument your code once with OpenTelemetry, and you can send your data to almost any observability backend (Jaeger, Zipkin, SigNoz, Datadog, New Relic, etc.) by simply changing a configuration. This prevents vendor lock-in and provides flexibility.
- Rich Ecosystem: It supports many programming languages and integrates with popular libraries and frameworks, making adoption easier.
- Unified Telemetry: It aims to provide a single standard for all three pillars of observability: traces, metrics, and logs, making correlation much easier when troubleshooting.
Spans, Traces, and Context Propagation: The Building Blocks
Let’s break down the core concepts of OpenTelemetry tracing:
- Trace: The complete story of a single request or operation as it moves through your system. Think of it as the entire journey. It’s a collection of related spans, forming a directed acyclic graph (DAG) of operations.
- Span: A single unit of work within a trace. Each span represents an individual operation (e.g., an HTTP request, a database query, an LLM call, a function execution). Spans have:
- A unique name (e.g., “process_user_prompt”, “openai_chat_completion”).
- A unique ID.
- A parent span ID (linking it to its parent operation, forming the trace hierarchy).
- Start and end timestamps (allowing calculation of duration).
- Attributes: Key-value pairs providing additional context relevant to that operation (e.g.,
http.method,db.statement,llm.prompt,llm.tokens.input). These are incredibly powerful for filtering and analysis. - Events: Timestamps and messages marking specific points within the span’s lifecycle (e.g., “prompt_received”, “model_started”).
- Context Propagation: This is the magic that links spans across different services. When a service makes a call to another service, the trace context (containing the current trace ID and parent span ID) is passed along, usually in HTTP headers (like
traceparent). This ensures that all operations related to a single request, even if they cross service boundaries, are grouped into the same trace.
Let’s visualize a typical AI workflow with tracing to make this concrete:
- Interpretation: In this diagram, each box (e.g.,
API_Gateway,Backend_Service,LLM_API) would represent a distinct span. The solid arrows (-->) show the flow of control and data between these services. Crucially, the dashed arrows (-.->) within theTrace_Context_Propagationsubgraph illustrate how the trace context (containing the unique Trace ID and the ID of the current parent span) is passed along with each request. This ensures that all these individual spans are linked together, forming a single, coherent trace that tells the complete story of the user’s request from start to finish. Without context propagation, you’d see disconnected operations, not a unified workflow.
AI-Specific Trace Attributes: What to Capture
For AI systems, especially those using LLMs, certain attributes are incredibly valuable to attach to your spans. These go beyond generic HTTP or database attributes and provide deep insights into the AI’s behavior:
llm.model_name: The specific model used (e.g.,gpt-4,claude-3-opus-20240229,custom-rag-model-v2).llm.prompt: The exact input prompt (or a sanitized version) sent to the LLM.llm.response: The raw output received from the LLM.llm.tokens.input: Number of input tokens consumed.llm.tokens.output: Number of output tokens generated.llm.latency_ms: Total time taken for the LLM call (round-trip time).llm.temperature,llm.top_p,llm.max_tokens: Key generation parameters that influence model behavior.user.id,session.id: To correlate AI interactions with specific users or sessions for debugging user-reported issues.app.feature: The application feature or component leveraging the LLM (e.g., “chatbot”, “content_generation”, “sentiment_analysis”).safety.score,safety.flagged: If you’re running safety checks on prompts/responses.cost.usd: The estimated cost of this specific LLM interaction.
By capturing these, you transform your traces into powerful diagnostic and analytical tools for your AI, enabling you to understand not just system performance, but also model behavior and cost implications.
Step-by-Step Implementation: Tracing Your First LLM Call
Let’s get practical! We’ll set up OpenTelemetry in a Python application and instrument a direct call to an LLM, then explore how frameworks like LangChain simplify this.
Step 1: Setting Up Your Environment
First, ensure you have Python installed (we’ll use Python 3.9+). We’ll need a few packages.
Open your terminal and run the following pip install command. These packages cover the OpenTelemetry SDK, an OTLP exporter, instrumentation for HTTP requests, the OpenAI client, LangChain, and python-dotenv for environment variables.
pip install opentelemetry-sdk==1.23.0 \
opentelemetry-exporter-otlp==1.23.0 \
opentelemetry-instrumentation-requests==0.45b0 \
openai==1.14.0 \
langchain==0.1.13 \
langchain-openai==0.1.13 \
python-dotenv==1.0.1
Version Notes (as of 2026-03-20):
opentelemetry-sdk(v1.23.0): The core OpenTelemetry SDK for Python. This is a recent, stable release.opentelemetry-exporter-otlp(v1.23.0): Enables exporting traces using the OpenTelemetry Protocol (OTLP), the recommended way to send data to most observability backends.opentelemetry-instrumentation-requests(v0.45b0): Provides automatic instrumentation for the popularrequestslibrary. Many LLM client libraries or underlying web frameworks userequests, so this is very useful. Note that0.45b0is a beta release; for maximum stability in production, you might opt for the latest stable0.44.0if0.45b0presents issues, but betas often include critical new features or fixes.openai(v1.14.0): The official OpenAI Python client.langchain(v0.1.13): A popular framework for building LLM applications.langchain-openai(v0.1.13): The OpenAI integration for LangChain.python-dotenv(v1.0.1): For safely managing API keys and other environment variables.
Next, you’ll need an OpenAI API key. Create a file named .env in your project root directory:
# .env
OPENAI_API_KEY="YOUR_OPENAI_API_KEY_HERE"
Replace "YOUR_OPENAI_API_KEY_HERE" with your actual OpenAI API key. Remember to add .env to your .gitignore file to prevent accidentally committing sensitive information to version control.
Step 2: Initializing OpenTelemetry
Before any tracing can happen, we need to configure the OpenTelemetry SDK. This typically involves setting up a TracerProvider and an OTLPSpanExporter.
Create a file named tracing_setup.py in your project root:
# tracing_setup.py
import os
from opentelemetry import trace
from opentelemetry.sdk.resources import Resource
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import (
BatchSpanProcessor,
ConsoleSpanExporter,
)
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.instrumentation.requests import RequestsInstrumentor # Import for requests auto-instrumentation
from dotenv import load_dotenv
load_dotenv() # Load environment variables from .env file
def configure_tracer(service_name: str):
"""
Configures the OpenTelemetry tracer for a given service.
This should be called once at the beginning of your application's lifecycle.
"""
# 1. Define a resource for your service.
# This resource information will be attached to all spans generated by this service.
# It's crucial for identifying your service in the observability backend.
resource = Resource.create({
"service.name": service_name,
"service.version": "1.0.0",
"deployment.environment": os.getenv("APP_ENV", "development"),
})
# 2. Create a TracerProvider.
# This manages Tracer instances and how spans are processed.
provider = TracerProvider(resource=resource)
# 3. Configure an OTLP exporter.
# By default, it attempts to send traces to http://localhost:4317 (gRPC) or http://localhost:4318 (HTTP).
# You can override this endpoint using the OTLP_ENDPOINT environment variable.
otlp_exporter = OTLPSpanExporter()
# 4. Create a BatchSpanProcessor and add the OTLP exporter to it.
# BatchSpanProcessor is efficient for sending spans in batches, reducing overhead.
processor = BatchSpanProcessor(otlp_exporter)
provider.add_span_processor(processor)
# 5. For local debugging, you can also add a ConsoleSpanExporter.
# This prints spans directly to your console, which is helpful when you don't
# have an OTLP collector running. Uncomment the line below to activate it.
# provider.add_span_processor(BatchSpanProcessor(ConsoleSpanExporter()))
# 6. Set the TracerProvider as the global default.
# Any subsequent calls to `trace.get_tracer()` will now use this configured provider.
trace.set_tracer_provider(provider)
# 7. Instrument common libraries for automatic tracing.
# This line ensures that HTTP requests made by libraries like 'requests'
# (or those using similar underlying mechanisms) will automatically generate spans.
RequestsInstrumentor().instrument()
# 8. Return a tracer instance for your specific service.
# This tracer can then be used to create manual spans.
return trace.get_tracer(service_name)
if __name__ == "__main__":
# Example usage:
my_tracer = configure_tracer("my-ai-service")
print(f"OpenTelemetry tracer configured for service: my-ai-service")
print("Remember to call configure_tracer() once at your application's entry point.")
Explanation:
load_dotenv(): This line loads any environment variables (like ourOPENAI_API_KEY) from the.envfile into the process’s environment.Resource.create(): This defines metadata about your service (service.name,service.version,deployment.environment). This resource information is automatically attached to all spans originating from this service, making it easy to filter and analyze traces in your observability backend.TracerProvider(): This is the core component that managesTracerinstances and dictates how spans are processed and exported.OTLPSpanExporter(): An exporter that sends spans using the OpenTelemetry Protocol (OTLP). We’re assuming a local OpenTelemetry Collector or an OTLP-compatible backend (like SigNoz or Jaeger) is running at its default gRPC port (localhost:4317).BatchSpanProcessor(): An efficient processor that buffers spans and sends them in batches. This reduces the overhead of sending each span individually.trace.set_tracer_provider(provider): This crucial step makes our configuredTracerProviderthe global default. After this, any code that requests an OpenTelemetryTracerwill receive one configured with our settings.RequestsInstrumentor().instrument(): This line is a powerful example of automatic instrumentation. By calling this, any HTTP requests made using the popularrequestslibrary will automatically generate OpenTelemetry spans, including details like the URL, method, status code, and latency.trace.get_tracer(service_name): Retrieves aTracerinstance. Theservice_namehelps categorize spans originating from different parts of your application or different services.
Step 3: Instrumenting a Direct OpenAI LLM Call
Now, let’s make an LLM call and manually create a span to capture its details. This is useful when automatic instrumentation isn’t available for a specific library, or when you need very specific control over the attributes and structure of your trace.
Create a file named llm_app_manual_trace.py in the same directory as tracing_setup.py:
# llm_app_manual_trace.py
import os
import time
from openai import OpenAI
from opentelemetry import trace
from tracing_setup import configure_tracer # Import our setup function
# 1. Configure the OpenTelemetry tracer for our application.
# This ensures that all subsequent spans are correctly linked and exported.
app_tracer = configure_tracer("my-llm-app-manual")
# 2. Initialize the OpenAI client.
# It automatically picks up OPENAI_API_KEY from environment variables (loaded by dotenv in tracing_setup).
client = OpenAI()
def generate_story(topic: str) -> str:
"""
Generates a short story using OpenAI's GPT model, with manual tracing.
Each significant step is wrapped in a span for detailed observability.
"""
# 3. Create a new span for the entire story generation process.
# This acts as our "root" or parent span for this specific function call.
# The 'with' statement ensures the span is properly started and ended.
with app_tracer.start_as_current_span("generate_story_workflow") as parent_span:
parent_span.set_attribute("story.topic", topic) # Add context to the parent span
print(f"--- Starting story generation for topic: '{topic}' ---")
# Construct the prompt and add it as an attribute to the parent span.
prompt = f"Write a short, imaginative story about {topic}. It should be around 100 words."
parent_span.set_attribute("llm.prompt", prompt)
start_time = time.time() # Start timing for latency calculation
llm_response_content = "Error: Story not generated." # Default error message
try:
# 4. Create a child span specifically for the OpenAI API call.
# This span will be nested under 'generate_story_workflow'.
with app_tracer.start_as_current_span("openai_chat_completion") as llm_span:
# Add specific LLM attributes to the child span.
llm_span.set_attribute("llm.model_name", "gpt-3.5-turbo")
llm_span.set_attribute("llm.request_type", "chat.completions")
llm_span.set_attribute("llm.temperature", 0.7)
llm_span.set_attribute("llm.max_tokens", 200) # Ensure we capture all parameters
response = client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[
{"role": "system", "content": "You are a creative storyteller."},
{"role": "user", "content": prompt}
],
temperature=0.7,
max_tokens=200 # Limit tokens to avoid excessive cost and output
)
llm_response_content = response.choices[0].message.content
llm_span.set_attribute("llm.response", llm_response_content) # Capture the full response
# 5. Capture token usage and other details from the response.
if response.usage:
llm_span.set_attribute("llm.tokens.input", response.usage.prompt_tokens)
llm_span.set_attribute("llm.tokens.output", response.usage.completion_tokens)
llm_span.set_attribute("llm.tokens.total", response.usage.total_tokens)
print(f"Tokens used: Input={response.usage.prompt_tokens}, Output={response.usage.completion_tokens}")
llm_span.set_status(trace.Status(trace.StatusCode.OK)) # Mark this span as successful
except Exception as e:
# 6. If an error occurs, set the span status to ERROR and record the exception.
llm_span.set_status(trace.Status(trace.StatusCode.ERROR, description=str(e)))
llm_span.record_exception(e)
print(f"Error during LLM call: {e}")
raise # Re-raise the exception after recording to propagate the error
finally:
end_time = time.time()
latency_ms = (end_time - start_time) * 1000
parent_span.set_attribute("llm.total_latency_ms", round(latency_ms, 2))
print(f"--- Story generation completed in {latency_ms:.2f} ms ---")
parent_span.set_status(trace.Status(trace.StatusCode.OK)) # Mark parent span as successful if no exception was re-raised
return llm_response_content
if __name__ == "__main__":
# For this script to send traces, you need an OpenTelemetry collector running locally
# (e.g., SigNoz, Jaeger, or the OpenTelemetry Collector itself) that accepts OTLP data
# at the default gRPC port (http://localhost:4317).
# Alternatively, uncomment the ConsoleSpanExporter in tracing_setup.py to see output in your console.
print("Sending traces to OTLP endpoint (default: http://localhost:4317).")
print("If no collector is running, traces will be dropped silently unless ConsoleSpanExporter is active.")
story_topic = "a futuristic cat detective solving a mystery in a neon city"
try:
story = generate_story(story_topic)
print("\nGenerated Story:")
print(story)
except Exception:
print("Failed to generate story due to an error.")
# Allow time for the BatchSpanProcessor to send spans before the program exits.
time.sleep(2)
print("\nApplication finished. Check your observability backend for traces!")
Explanation of the llm_app_manual_trace.py code:
app_tracer = configure_tracer("my-llm-app-manual"): We initialize our OpenTelemetry tracer using the setup function we created. This must happen before any spans are created and ideally once at your application’s entry point.with app_tracer.start_as_current_span("generate_story_workflow") as parent_span:: This is the core OpenTelemetry API call for creating a manual span. It creates a new span, sets it as the current active span (meaning any child spans created within this block will automatically link to it), and ensures it’s properly closed when exiting thewithblock. Thisparent_spanrepresents the overall process of generating the story.parent_span.set_attribute("story.topic", topic): We add custom attributes to our span. These key-value pairs provide crucial context for filtering and understanding our traces later. Here, we’re capturing the story topic.with app_tracer.start_as_current_span("openai_chat_completion") as llm_span:: Inside the parent span, we create a child span specifically for the OpenAI API call. This clearly delineates the LLM interaction as a distinct, nested operation within the larger workflow.llm_span.set_attribute("llm.model_name", "gpt-3.5-turbo"): We add specific attributes related to the LLM call, such as the model name, the full prompt, and the full response. These are vital for AI observability and follow OpenTelemetry’s experimental semantic conventions for LLMs.response.usage: The OpenAI API provides token usage information. We capture this and add it as attributes to ourllm_spanto track cost and resource consumption.llm_span.set_status(...): It’s good practice to set the span’s status (OKfor success,ERRORfor failure) and record exceptions when they occur. This makes debugging errors significantly easier.time.sleep(2): Gives theBatchSpanProcessorenough time to send the collected spans to the OTLP exporter before the program exits. This is important for short-lived scripts.
To run this, you’ll need an OpenTelemetry collector or an observability backend that accepts OTLP data. For local testing, a simple way is to use SigNoz.
Example: Running SigNoz Locally (for trace visualization)
- Follow the SigNoz Quickstart to run it with Docker. This will start SigNoz, including an OpenTelemetry Collector, which listens on
localhost:4317(gRPC) andlocalhost:4318(HTTP) by default.git clone https://github.com/SigNoz/signoz.git && cd signoz/deploy/ ./install.sh --mode=standalone - Once SigNoz is running, execute your Python script from your project root:
python llm_app_manual_trace.py - Navigate to the SigNoz UI (usually
http://localhost:3301), go to the “Traces” section, and you should see traces from “my-llm-app-manual”. You can explore the spans, their attributes, and their hierarchical relationships, observing the parentgenerate_story_workflowspan and its childopenai_chat_completionspan.
Step 4: Integrating with LLM Frameworks (LangChain Example)
Manually creating spans for every LLM call can be tedious, especially in complex applications with many chained operations. Thankfully, popular LLM frameworks often provide built-in integrations or callback systems that simplify tracing.
Let’s look at a basic example with LangChain. LangChain provides a robust callback system that can be leveraged for observability. While LangChain has its own powerful tracing (LangSmith), we can also integrate it with OpenTelemetry, especially for a unified observability stack.
First, ensure your tracing_setup.py file is updated as shown in Step 2, particularly with the RequestsInstrumentor().instrument() call. This ensures that underlying HTTP calls made by client libraries (which LangChain often uses) are automatically traced.
Now, create a file named llm_app_langchain_trace.py in the same directory as tracing_setup.py:
# llm_app_langchain_trace.py
import os
import time
from langchain_openai import ChatOpenAI
from langchain.prompts import ChatPromptTemplate
from langchain.chains import LLMChain
from opentelemetry import trace
from tracing_setup import configure_tracer # Import our setup function
# 1. Configure the tracer for our LangChain application.
# This should be called once at the application's entry point.
configure_tracer("my-langchain-app")
app_tracer = trace.get_tracer("my-langchain-app") # Get the global tracer instance
# 2. Initialize the ChatOpenAI model.
# OPENAI_API_KEY is loaded via dotenv in tracing_setup.py.
llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0.7)
def run_langchain_story_agent(topic: str) -> str:
"""
Runs a simple LangChain LLM chain to generate a story,
leveraging OpenTelemetry auto-instrumentation for the underlying API call.
"""
# 3. Create a manual parent span for our LangChain-driven workflow.
# This allows us to add custom, high-level attributes for the overall operation.
with app_tracer.start_as_current_span("langchain_story_agent") as parent_span:
parent_span.set_attribute("story.topic", topic)
print(f"--- Starting LangChain story agent for topic: '{topic}' ---")
# Define the prompt template for the LLM.
prompt_template = ChatPromptTemplate.from_messages(
[
("system", "You are a creative storyteller."),
("user", "Write a short, imaginative story about {topic}. It should be around 100 words.")
]
)
# Create an LLMChain, which orchestrates the prompt and LLM call.
story_chain = LLMChain(llm=llm, prompt=prompt_template, verbose=False)
start_time = time.time() # Start timing for latency
generated_story = "Error: Story not generated by LangChain." # Default error message
try:
# 4. Run the chain. The key here is that the underlying OpenAI API call
# (made by langchain-openai, which uses httpx, often instrumented via
# requests-like patterns) will be automatically instrumented by
# opentelemetry-instrumentation-requests (configured in tracing_setup.py).
response_data = story_chain.invoke({"topic": topic})
generated_story = response_data['text']
# We can still add custom attributes to our parent span after the LLM call.
parent_span.set_attribute("llm.response_preview", generated_story[:100] + "...") # Avoid logging full response if very long
parent_span.set_status(trace.Status(trace.StatusCode.OK))
except Exception as e:
parent_span.set_status(trace.Status(trace.StatusCode.ERROR, description=str(e)))
parent_span.record_exception(e)
print(f"Error during LangChain execution: {e}")
raise # Re-raise the exception after recording
finally:
end_time = time.time()
latency_ms = (end_time - start_time) * 1000
parent_span.set_attribute("langchain.total_latency_ms", round(latency_ms, 2))
print(f"--- LangChain story agent completed in {latency_ms:.2f} ms ---")
return generated_story
if __name__ == "__main__":
print("Sending LangChain traces to OTLP endpoint (default: http://localhost:4317).")
story_topic = "a robot chef who invents a new dessert using quantum physics"
try:
story = run_langchain_story_agent(story_topic)
print("\nGenerated Story (LangChain):")
print(story)
except Exception:
print("Failed to generate story with LangChain due to an error.")
time.sleep(2) # Give exporter time to send
print("\nApplication finished. Check your observability backend for LangChain traces!")
Explanation of llm_app_langchain_trace.py:
configure_tracer("my-langchain-app"): As before, we initialize OpenTelemetry at the start of our application. TheRequestsInstrumentor().instrument()call intracing_setup.pyis key here.llm = ChatOpenAI(...): We initialize the LangChain OpenAI model. Internally,langchain-openaiuses an HTTP client (oftenhttpx, which hasrequests-like patterns thatopentelemetry-instrumentation-requestscan often detect or which can be separately instrumented).with app_tracer.start_as_current_span("langchain_story_agent") as parent_span:: We still create a manual root span for our LangChain workflow. This allows us to add high-level attributes specific to our application’s logic, such as thestory.topic.story_chain.invoke({"topic": topic}): When LangChain executes this chain, it internally makes an HTTP call to the OpenAI API. Because we enabledRequestsInstrumentorintracing_setup.py, a child span for this HTTP call will automatically be created and linked to ourlangchain_story_agentparent span. This auto-generated span will contain details like the HTTP method, URL, status code, and latency for the OpenAI API interaction.
Run this script (with SigNoz or another OTLP collector running):
python llm_app_langchain_trace.py
You’ll now see traces for my-langchain-app in your observability backend. The trace will include a parent span langchain_story_agent and, crucially, a child span representing the actual HTTP call to api.openai.com/v1/chat/completions, complete with HTTP-specific attributes. This demonstrates how combining manual spans for high-level application logic with automatic instrumentation for underlying library calls provides comprehensive tracing with less effort.
Mini-Challenge: Enhance Your Tracing!
You’ve successfully traced a basic LLM call! Now, let’s make it more robust by adding more context and error handling.
Challenge: Modify the llm_app_langchain_trace.py application to enhance its tracing capabilities:
- Add more LLM-specific attributes to the
langchain_story_agentspan. While directresponse.usagemight not be immediately available fromstory_chain.invoke, you can often accesstoken_usagefromllm_outputif it’s present in theresponse_datadictionary. If not directly available, consider adding a placeholder or a note. Focus on adding other LLM parameters likellm.model_name(even though it’s set inChatOpenAI, it’s good to confirm it in the span). - Simulate an error: Introduce a bug (e.g., pass an invalid
model_namelike"non-existent-model"toChatOpenAIor intentionally cause a network error) and ensure yourparent_spancorrectly captures the error status and exception details. - Add a custom event: Record an event named
"story_generation_completed"on theparent_spanright after the story is successfully generated. Include the generated story’s character length (len(generated_story)) as an attribute for this event.
Hint:
- For token usage in LangChain, check the
response_datadictionary carefully. Sometimes,response_data['llm_output']['token_usage']might exist, or you might need to manually extract it if LangChain’s internal callbacks are not explicitly configured for OTel. If direct extraction frominvokeis hard, focus on the other points for this challenge. - The
span.record_exception(e)andspan.set_status(trace.Status(trace.StatusCode.ERROR, description=str(e)))are your best friends for robust error handling in tracing. - To add an event, use
span.add_event("event_name", attributes={"key": "value"}).
What did you observe in your observability backend after implementing these changes? How did the traces look different for success versus failure? Did the custom event appear where you expected it?
Common Pitfalls & Troubleshooting
Tracing, while incredibly powerful, can introduce its own set of challenges. Being aware of these common pitfalls will save you a lot of debugging time.
Missing Context Propagation:
- Pitfall: Spans from different services or even different parts of the same application aren’t linked together. This results in “broken traces” – multiple disjoint traces instead of one continuous, end-to-end view of a request. This often happens when the trace context (usually HTTP headers like
traceparentandtracestate) isn’t forwarded between services or threads. - Troubleshooting:
- Verify Instrumentation: Ensure all services involved in a transaction are using OpenTelemetry-compatible instrumentation.
- Check HTTP Clients/Message Queues: Confirm that your HTTP clients, RPC frameworks, or message queue producers/consumers are configured to propagate trace headers. If you’re using a web framework like Flask or FastAPI, ensure you’ve enabled its specific OpenTelemetry instrumentation (e.g.,
opentelemetry-instrumentation-flask,opentelemetry-instrumentation-fastapi). - Inspect Gateways/Load Balancers: API Gateways, load balancers, and reverse proxies can sometimes strip or modify headers. Ensure they are configured to pass OpenTelemetry trace headers through untouched.
- Pitfall: Spans from different services or even different parts of the same application aren’t linked together. This results in “broken traces” – multiple disjoint traces instead of one continuous, end-to-end view of a request. This often happens when the trace context (usually HTTP headers like
Over-instrumentation vs. Under-instrumentation:
- Pitfall:
- Over-instrumentation: Creating too many fine-grained spans for every tiny function call can generate an excessive volume of telemetry data. This leads to higher storage costs, increased network traffic, and potential performance overhead for your application and observability backend.
- Under-instrumentation: Not enough spans, leaving critical operations or error points as “blind spots” where you can’t see what’s happening.
- Troubleshooting:
- Start Broad, Then Refine: Begin by tracing major operations, service boundaries, and key business logic steps. Add more granular spans only when you encounter a specific debugging need or performance bottleneck in a particular area.
- Focus on Value: Create spans for operations that are meaningful for understanding user experience, business processes, or system health. Avoid tracing every getter/setter.
- Implement Sampling: For high-volume services, implement trace sampling (e.g., head-based or tail-based sampling) to reduce the data volume while still getting a representative view of your system’s behavior.
- Pitfall:
Data Privacy and Security in Traces:
- Pitfall: Accidentally logging sensitive user prompts, personally identifiable information (PII), confidential model responses, or API keys directly into span attributes or events. This is a significant security and compliance risk, especially with AI systems handling user input.
- Troubleshooting:
- Strict Sanitization/Redaction: Implement robust sanitization or redaction rules for any attributes that might contain sensitive data. For
llm.promptandllm.response, consider hashing, truncating, or redacting specific patterns before adding them as attributes. - Attribute Filtering: OpenTelemetry SDKs allow you to filter or selectively drop attributes before exporting, providing a last line of defense.
- Data Minimization: Only capture the absolutely necessary information. For example, instead of the full prompt, perhaps a hash of the prompt or a summary of its intent is sufficient for some use cases.
- Access Control: Ensure your observability backend has robust access controls and encryption at rest to prevent unauthorized access to trace data. Treat trace data with the same sensitivity as logs.
- Strict Sanitization/Redaction: Implement robust sanitization or redaction rules for any attributes that might contain sensitive data. For
Summary
Phew! You’ve just taken a massive leap in understanding how to bring true visibility to your AI systems. Distributed tracing is arguably the most powerful tool for debugging and optimizing complex, distributed AI applications.
Here’s a recap of our journey:
- Distributed Tracing Fundamentals: We learned that a trace is a complete story of a request, composed of individual spans, each representing a unit of work.
- Why Tracing AI Matters: It’s crucial for navigating the non-determinism, chained operations, black-box tendencies, and performance challenges unique to AI systems.
- OpenTelemetry as the Standard: We embraced OpenTelemetry for its vendor-neutral approach and unified telemetry capabilities across traces, metrics, and logs.
- Hands-on Instrumentation: You successfully set up OpenTelemetry and manually instrumented a direct OpenAI LLM call, capturing vital AI-specific attributes like prompts, responses, and token usage.
- Framework Integration: We saw how to leverage OpenTelemetry’s auto-instrumentation with frameworks like LangChain, simplifying the tracing of underlying API calls with minimal code changes.
- Best Practices: We touched on the critical importance of context propagation, balancing instrumentation granularity to avoid excessive data, and protecting sensitive data in traces for privacy and security.
Tracing allows you to see the invisible, connect the dots, and truly understand the performance and behavior of your AI from prompt to prediction. It transforms opaque AI systems into transparent, debuggable, and optimizable powerhouses.
What’s Next?
In the next chapter, we’ll shift our focus to Metrics. While traces tell us the story of one request, metrics tell us the story of many requests over time. We’ll learn how to collect, aggregate, and visualize key performance indicators for our AI systems, enabling proactive monitoring and alerting. Get ready to measure what matters!
References
- OpenTelemetry Documentation
- OpenTelemetry Python SDK Documentation
- OpenTelemetry Semantic Conventions for LLMs (Experimental)
- SigNoz Documentation - Getting Started
- OpenAI Python Library Documentation
- LangChain Python Documentation
- awslabs/ai-ml-observability-reference-architecture
This page is AI-assisted and reviewed. It references official documentation and recognized resources where relevant.