Introduction
Welcome to the grand finale of our AI Observability journey! In previous chapters, we’ve explored the theoretical foundations of logging, tracing, and metrics for AI systems, understanding what they are and why they’re crucial. Now, it’s time to roll up our sleeves and bring these concepts to life with a hands-on project.
This chapter will guide you through building a complete, end-to-end observability pipeline for a simple Large Language Model (LLM) application. We’ll instrument our Python-based LLM service using OpenTelemetry for distributed tracing, custom metrics, and structured logging. Then, we’ll deploy an observability backend (SigNoz, which bundles Prometheus and Grafana) using Docker to collect, store, and visualize all our precious AI operational data. Get ready to see your AI system’s inner workings like never before!
By the end of this chapter, you’ll have a working example of how to:
- Set up an observability stack using Docker.
- Instrument a Python LLM application with OpenTelemetry.
- Track prompts, responses, latency, and token usage.
- Visualize traces, logs, and metrics in a unified dashboard.
- Gain practical experience in MLOps observability.
Let’s turn theory into tangible, observable reality!
Project Overview: Observing an LLM Chatbot
Our mission is to build a minimal LLM chatbot and make it fully observable. Imagine this chatbot as a core component of a larger AI application. We’ll focus on tracking the key interactions: the user’s prompt, the LLM’s response, the time it took, and the resources consumed (like tokens).
The Observability Stack
To achieve this, we’ll leverage a powerful open-source stack:
- OpenTelemetry (OTel): The gold standard for vendor-neutral instrumentation. We’ll use its Python SDK to generate traces, metrics, and logs from our LLM application.
- SigNoz: An OpenTelemetry-native observability platform. It provides a unified UI for traces, metrics, and logs, making it an excellent choice for a hands-on project. Under the hood, SigNoz uses ClickHouse for storage and integrates Prometheus for metrics and Grafana for dashboards.
- Python 3.12+: Our primary language for the LLM application.
- Ollama (Optional, but Recommended): A fantastic tool for running open-source LLMs locally. This allows us to run the project entirely on our machine without needing external API keys, though you can easily swap it for OpenAI or another cloud LLM.
- Docker & Docker Compose: To orchestrate and run our observability backend services effortlessly.
Architectural Diagram
Let’s visualize how these components will fit together:
Explanation of the Flow:
- LLM Application: Our Python application handles user requests, processes prompts, makes calls to the LLM (Ollama or OpenAI), and generates responses.
- OpenTelemetry SDK: As the application runs, the OTel SDK intercepts key operations (prompt processing, LLM calls) and generates observability data (traces, spans, metrics, logs).
- OpenTelemetry Collector: The OTel SDK exports this data to the OTel Collector, which acts as a proxy. The collector then forwards traces and logs to SigNoz’s query service (which uses ClickHouse) and metrics to Prometheus.
- Prometheus: Scrapes metrics from the OTel Collector and stores them.
- SigNoz / ClickHouse: Stores the traces and logs collected by the OTel Collector.
- Grafana: Connects to Prometheus for metric visualization and can be integrated with SigNoz for a unified dashboard experience. SigNoz itself also provides robust dashboards.
- Developer/MLOps Engineer: Interacts with the SigNoz UI (which often includes Grafana-like features) to visualize, query, and analyze the observability data.
Step-by-Step Implementation
Let’s get our hands dirty!
Step 1: Set Up Your Environment
First, ensure you have Python 3.12+ and Docker Desktop installed on your system.
Create Project Directory: Let’s start by creating a dedicated directory for our project.
mkdir ai-observability-project cd ai-observability-projectSet Up Python Virtual Environment: It’s always a good practice to use a virtual environment to manage project dependencies.
python3.12 -m venv venv source venv/bin/activate # On Windows, use `venv\Scripts\activate`You should see
(venv)prefixing your terminal prompt, indicating the virtual environment is active.Install Observability Backend (SigNoz): We’ll use SigNoz as our observability platform. It’s easy to set up with Docker Compose.
First, ensure Docker Desktop is running. Then, download the SigNoz
docker-compose.yamlfile.# Create a directory for SigNoz config mkdir signoz-setup cd signoz-setup # Download the SigNoz install script and run it # As of 2026-03-20, this is the recommended way from SigNoz docs curl -o install.sh https://raw.githubusercontent.com/SigNoz/signoz/main/deploy/install.sh && chmod +x install.sh sudo ./install.sh --mode docker cd .. # Go back to the main project directoryExplanation:
- We create a
signoz-setupdirectory to keep things organized. - The
curlcommand downloads SigNoz’s official installation script. chmod +x install.shmakes the script executable.sudo ./install.sh --mode dockerruns the script, which will download the necessarydocker-compose.yamland start all SigNoz services (ClickHouse, Query Service, OpenTelemetry Collector, Prometheus, Grafana, etc.). This might take a few minutes depending on your internet speed.
Once it’s done, SigNoz should be accessible at
http://localhost:3301. Open it in your browser and complete the initial setup (create an account).Important Note: The SigNoz Docker Compose setup includes its own OpenTelemetry Collector, which is where our application will send its observability data. It also includes Prometheus and Grafana, integrated within the SigNoz UI.
- We create a
Step 2: Create a Basic LLM Application
Let’s write a simple Python script that interacts with an LLM. We’ll use ollama for this example, which is great for local testing. If you prefer, you can swap this with openai or any other LLM client.
Install LLM Client: If using Ollama, first download and install Ollama for your OS from
https://ollama.com/download. Then, pull a model (e.g.,llama2):ollama pull llama2Then install the Python client:
pip install ollamaIf using OpenAI:
pip install openaiAnd make sure you have your
OPENAI_API_KEYset as an environment variable.Create
app.py: Create a file namedapp.pyin your mainai-observability-projectdirectory.# app.py import ollama # or import openai for OpenAI API def get_llm_response(prompt: str, model_name: str = "llama2"): """ Interacts with the LLM to get a response. """ print(f"Sending prompt to LLM ({model_name}): {prompt}") try: # For Ollama: response = ollama.chat(model=model_name, messages=[{'role': 'user', 'content': prompt}]) llm_output = response['message']['content'] # For OpenAI (uncomment and modify if using OpenAI): # client = openai.OpenAI() # response = client.chat.completions.create( # model=model_name, # messages=[{"role": "user", "content": prompt}] # ) # llm_output = response.choices[0].message.content print(f"LLM Response: {llm_output[:100]}...") # Print first 100 chars return llm_output except Exception as e: print(f"Error calling LLM: {e}") return f"Error: {e}" if __name__ == "__main__": print("Welcome to the simple LLM chatbot! Type 'exit' to quit.") while True: user_prompt = input("You: ") if user_prompt.lower() == 'exit': break response = get_llm_response(user_prompt) print(f"Bot: {response}")Explanation:
- This is a basic Python script that takes user input and sends it to an LLM.
get_llm_responseencapsulates the LLM interaction logic.- The
if __name__ == "__main__":block provides a simple command-line interface for interaction. - Crucially: This version has no observability yet. We’re building up!
Step 3: OpenTelemetry Instrumentation - Tracing
Now, let’s add distributed tracing to our LLM application. This will allow us to see the flow of a request, including the LLM call, as a series of connected spans.
Install OpenTelemetry SDK:
pip install opentelemetry-sdk opentelemetry-exporter-otlp-proto-http opentelemetry-instrumentation-loggingExplanation:
opentelemetry-sdk: The core OpenTelemetry SDK.opentelemetry-exporter-otlp-proto-http: This is the exporter we’ll use to send our OTel data to the SigNoz OpenTelemetry Collector, which listens on HTTP by default.opentelemetry-instrumentation-logging: To automatically link Python’s standard logging to our traces.
Modify
app.pyfor Tracing: We’ll add the necessary OpenTelemetry initialization and wrap our LLM interaction in spans.# app.py (updated for tracing) import ollama # or import openai for OpenAI API import os import logging # --- OpenTelemetry Imports --- from opentelemetry import trace from opentelemetry.sdk.resources import Resource from opentelemetry.sdk.trace import TracerProvider from opentelemetry.sdk.trace.export import BatchSpanProcessor from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter from opentelemetry.instrumentation.logging import LoggingInstrumentor # --- Configure Logging --- logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s') logger = logging.getLogger(__name__) # --- OpenTelemetry Setup (Global) --- def setup_opentelemetry(): # Define a resource for our service resource = Resource.create({ "service.name": "llm-chatbot-service", "service.version": "1.0.0", "environment": "production" }) # Set up a TracerProvider provider = TracerProvider(resource=resource) # Configure OTLP HTTP exporter # SigNoz's OTel Collector typically listens on 4318 for HTTP otlp_exporter = OTLPSpanExporter(endpoint="http://localhost:4318/v1/traces") # Add a BatchSpanProcessor to send spans in batches provider.add_span_processor(BatchSpanProcessor(otlp_exporter)) # Set the global tracer provider trace.set_tracer_provider(provider) # Instrument Python's logging to automatically include trace/span IDs LoggingInstrumentor().instrument(set_logging_format=True) logger.info("OpenTelemetry TracerProvider initialized.") # Get a tracer for our application tracer = trace.get_tracer(__name__) # --- Original LLM Function (with tracing added) --- def get_llm_response(prompt: str, model_name: str = "llama2"): """ Interacts with the LLM to get a response, now with tracing. """ # Create a new span for the entire LLM interaction with tracer.start_as_current_span("llm.interaction") as span: span.set_attribute("user.prompt", prompt) span.set_attribute("llm.model_name", model_name) span.set_attribute("session.id", "user-123") # Example: Track a session ID logger.info(f"Sending prompt to LLM ({model_name}): {prompt}") try: # Create a child span for the actual LLM API call with tracer.start_as_current_span("llm.api_call") as api_span: api_span.set_attribute("llm.provider", "ollama") # Or "openai" # For Ollama: response = ollama.chat(model=model_name, messages=[{'role': 'user', 'content': prompt}]) llm_output = response['message']['content'] # For OpenAI (uncomment and modify if using OpenAI): # client = openai.OpenAI() # response = client.chat.completions.create( # model=model_name, # messages=[{"role": "user", "content": prompt}] # ) # llm_output = response.choices[0].message.content api_span.set_attribute("llm.response.length", len(llm_output)) api_span.set_attribute("llm.response.truncated", llm_output[:200]) # Log part of response # Add an event to the span for significant actions api_span.add_event("llm_response_received") span.set_attribute("llm.full_response", llm_output) logger.info(f"LLM Response: {llm_output[:100]}...") return llm_output except Exception as e: logger.error(f"Error calling LLM: {e}", exc_info=True) span.set_status(trace.Status(trace.StatusCode.ERROR, description=str(e))) return f"Error: {e}" if __name__ == "__main__": setup_opentelemetry() # Initialize OTel before main loop logger.info("Welcome to the simple LLM chatbot! Type 'exit' to quit.") while True: user_prompt = input("You: ") if user_prompt.lower() == 'exit': break response = get_llm_response(user_prompt) print(f"Bot: {response}")Explanation of Changes:
setup_opentelemetry()function:- Initializes
TracerProviderwith aResourcethat identifies our service (llm-chatbot-service). This is crucial for filtering and organizing data in your observability backend. - Configures
OTLPSpanExporterto send traces tohttp://localhost:4318/v1/traces. This is the default HTTP endpoint for the OpenTelemetry Collector within SigNoz. - Uses
BatchSpanProcessorfor efficient, asynchronous span export. trace.set_tracer_provider(provider)makes this provider globally available.LoggingInstrumentor().instrument()automatically enriches our Pythonloggingrecords with trace and span IDs, allowing for seamless correlation between logs and traces.
- Initializes
tracer = trace.get_tracer(__name__): Obtains a tracer instance.with tracer.start_as_current_span(...): This is the core of tracing.llm.interactionis the parent span, representing the entire process of handling a user’s prompt and getting an LLM response.llm.api_callis a child span, specifically tracking the time spent on the actual LLM API call. This helps pinpoint latency issues..set_attribute(): Adds key-value pairs to the span, providing rich context like theuser.prompt,llm.model_name, andllm.full_response..add_event(): Marks a specific point in time within a span, likellm_response_received.- Error handling now sets the span status to
ERRORand logs the exception withexc_info=True, which ensures the full stack trace is captured in our logs and linked to the span.
setup_opentelemetry()call: Added toif __name__ == "__main__":to ensure OTel is initialized before any LLM interaction.
Step 4: OpenTelemetry Instrumentation - Metrics
Next, let’s add custom metrics to track key performance indicators for our LLM application.
Install Metrics Exporter: We already installed
opentelemetry-sdkandopentelemetry-exporter-otlp-proto-http, which includes the necessary components for metrics.Modify
app.pyfor Metrics: We’ll add aMeterProviderand create aCounterfor total requests and aHistogramfor latency.# app.py (updated for metrics) import ollama import os import logging import time # For measuring latency # --- OpenTelemetry Imports --- from opentelemetry import trace, metrics from opentelemetry.sdk.resources import Resource from opentelemetry.sdk.trace import TracerProvider from opentelemetry.sdk.trace.export import BatchSpanProcessor from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter from opentelemetry.instrumentation.logging import LoggingInstrumentor from opentelemetry.sdk.metrics import MeterProvider from opentelemetry.sdk.metrics.export import PeriodicExportingMetricReader from opentelemetry.exporter.otlp.proto.http.metric_exporter import OTLPMetricExporter # --- Configure Logging --- logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s') logger = logging.getLogger(__name__) # --- OpenTelemetry Setup (Global) --- def setup_opentelemetry(): resource = Resource.create({ "service.name": "llm-chatbot-service", "service.version": "1.0.0", "environment": "production" }) # --- Tracing Setup --- provider = TracerProvider(resource=resource) otlp_trace_exporter = OTLPSpanExporter(endpoint="http://localhost:4318/v1/traces") provider.add_span_processor(BatchSpanProcessor(otlp_trace_exporter)) trace.set_tracer_provider(provider) LoggingInstrumentor().instrument(set_logging_format=True) logger.info("OpenTelemetry TracerProvider initialized.") # --- Metrics Setup --- # Configure OTLP HTTP exporter for metrics otlp_metric_exporter = OTLPMetricExporter(endpoint="http://localhost:4318/v1/metrics") # Use a PeriodicExportingMetricReader to send metrics regularly metric_reader = PeriodicExportingMetricReader(otlp_metric_exporter, export_interval_millis=5000) # Every 5 seconds meter_provider = MeterProvider(resource=resource, metric_readers=[metric_reader]) metrics.set_meter_provider(meter_provider) logger.info("OpenTelemetry MeterProvider initialized.") # Get a tracer and meter for our application tracer = trace.get_tracer(__name__) meter = metrics.get_meter(__name__) # --- Define Metrics Instruments --- llm_request_counter = meter.create_counter( name="llm_requests_total", description="Total number of LLM requests", unit="1" ) llm_latency_histogram = meter.create_histogram( name="llm_request_latency_seconds", description="Latency of LLM requests in seconds", unit="s" ) llm_input_tokens_counter = meter.create_counter( name="llm_input_tokens_total", description="Total number of input tokens sent to LLM", unit="tokens" ) llm_output_tokens_counter = meter.create_counter( name="llm_output_tokens_total", description="Total number of output tokens received from LLM", unit="tokens" ) # --- Original LLM Function (with tracing and metrics added) --- def get_llm_response(prompt: str, model_name: str = "llama2"): """ Interacts with the LLM to get a response, now with tracing and metrics. """ start_time = time.time() # Start timing for latency # Define common attributes for metrics and traces common_attributes = { "llm.model_name": model_name, "session.id": "user-123" # Example: Track a session ID } # Increment total request counter llm_request_counter.add(1, common_attributes) with tracer.start_as_current_span("llm.interaction", attributes=common_attributes) as span: span.set_attribute("user.prompt", prompt) logger.info(f"Sending prompt to LLM ({model_name}): {prompt}") llm_output = "" try: with tracer.start_as_current_span("llm.api_call", attributes=common_attributes) as api_span: api_span.set_attribute("llm.provider", "ollama") # Or "openai" # For Ollama: response = ollama.chat(model=model_name, messages=[{'role': 'user', 'content': prompt}], options={'num_predict': 100}) llm_output = response['message']['content'] input_tokens = len(prompt.split()) # Simple token count approximation output_tokens = len(llm_output.split()) # Simple token count approximation # Ollama response might have 'prompt_eval_count', 'eval_count' for more accurate counts # For OpenAI (uncomment and modify if using OpenAI): # client = openai.OpenAI() # response = client.chat.completions.create( # model=model_name, # messages=[{"role": "user", "content": prompt}] # ) # llm_output = response.choices[0].message.content # input_tokens = response.usage.prompt_tokens # output_tokens = response.usage.completion_tokens api_span.set_attribute("llm.response.length", len(llm_output)) api_span.set_attribute("llm.response.truncated", llm_output[:200]) api_span.add_event("llm_response_received") api_span.set_attribute("llm.input_tokens", input_tokens) api_span.set_attribute("llm.output_tokens", output_tokens) span.set_attribute("llm.full_response", llm_output) logger.info(f"LLM Response: {llm_output[:100]}...") # Record token metrics after successful call llm_input_tokens_counter.add(input_tokens, common_attributes) llm_output_tokens_counter.add(output_tokens, common_attributes) return llm_output except Exception as e: logger.error(f"Error calling LLM: {e}", exc_info=True) span.set_status(trace.Status(trace.StatusCode.ERROR, description=str(e))) return f"Error: {e}" finally: # Record latency in finally block to ensure it's always captured latency = time.time() - start_time llm_latency_histogram.record(latency, common_attributes) span.set_attribute("llm.request_latency_seconds", latency) # Also add to span for quick view if __name__ == "__main__": setup_opentelemetry() logger.info("Welcome to the simple LLM chatbot! Type 'exit' to quit.") try: while True: user_prompt = input("You: ") if user_prompt.lower() == 'exit': break response = get_llm_response(user_prompt) print(f"Bot: {response}") except KeyboardInterrupt: logger.info("Chatbot shutting down.") finally: # Ensure all pending spans/metrics are exported on shutdown trace.get_tracer_provider().shutdown() metrics.get_meter_provider().shutdown()Explanation of New Changes:
setup_opentelemetry()for Metrics:OTLPMetricExporter: Similar to the span exporter, but for metrics, also pointing tohttp://localhost:4318/v1/metrics.PeriodicExportingMetricReader: Configures metrics to be exported at a regular interval (e.g., every 5 seconds).MeterProvider: Manages the creation and export of metrics.metrics.set_meter_provider(meter_provider): Sets the global meter provider.
- Metric Instruments:
llm_request_counter: ACounterthat increments for each LLM request. Ideal for tracking total counts.llm_latency_histogram: AHistogramthat records the distribution of latency values. Useful for understanding performance bottlenecks and calculating percentiles.llm_input_tokens_counterandllm_output_tokens_counter: Counters to track token usage, essential for cost monitoring.
- Recording Metrics:
llm_request_counter.add(1, common_attributes): Increments the counter with associated attributes (model name, session ID).llm_latency_histogram.record(latency, common_attributes): Records the calculated latency. We usetime.time()to measure the duration.llm_input_tokens_counter.add(...)andllm_output_tokens_counter.add(...): Records the estimated token counts. Note: For accurate token counts, you’d typically rely on the LLM provider’s response (e.g.,response.usage.prompt_tokensfor OpenAI). Our simplelen(prompt.split())is an approximation.- Metrics are recorded after the LLM call, ensuring we only count successful interactions for latency and token usage.
- Shutdown Hooks: Added
trace.get_tracer_provider().shutdown()andmetrics.get_meter_provider().shutdown()in afinallyblock to ensure all buffered data is sent before the application exits. This is critical for reliable data export.
Step 5: Running the System & Verification
Now that our application is fully instrumented, let’s run it and see the observability data flow into SigNoz.
Ensure SigNoz is Running: From your
signoz-setupdirectory, you can verify withdocker-compose ps. All services should beUp. If not, navigate toai-observability-project/signoz-setupand run:docker-compose up -dThen, access the SigNoz UI at
http://localhost:3301.Run Your Instrumented LLM Application: From your main
ai-observability-projectdirectory, with your virtual environment active:python app.pyInteract with the chatbot a few times:
You: Hello, how are you?You: Tell me a short story about a brave knight.You: What is the capital of France?You: Explain AI observability.You: exit
Verify Data in SigNoz:
Traces:
- Go to the SigNoz UI (
http://localhost:3301). - Navigate to the “Traces” section (usually on the left sidebar).
- You should see traces from your
llm-chatbot-service. Each interaction with the LLM will be a new trace. - Click on a trace to see its details. You’ll observe the
llm.interactionparent span and itsllm.api_callchild span. - Examine the attributes attached to each span (e.g.,
user.prompt,llm.model_name,llm.response.length,llm.request_latency_seconds). - You’ll also see the
llm_response_receivedevent.
- Go to the SigNoz UI (
Logs:
- Navigate to the “Logs” section in SigNoz.
- You’ll see your
logger.infoandlogger.errormessages. - Notice how each log entry automatically includes
trace_idandspan_id. This is the magic of OpenTelemetry’s logging instrumentation, allowing you to jump directly from a log message to its corresponding trace!
Metrics:
- Navigate to the “Metrics” section or “Dashboards” in SigNoz.
- You might need to create a new dashboard or panel if SigNoz doesn’t automatically detect your custom metrics immediately.
- Search for metrics like
llm_requests_total,llm_request_latency_seconds,llm_input_tokens_total,llm_output_tokens_total. - You can create graphs to visualize:
- Total requests over time.
- Average/P95 latency of LLM calls.
- Total input/output tokens, which is crucial for cost monitoring.
Step 6: Basic Cost Monitoring (Calculated)
While direct cost integration often involves cloud provider APIs, we can use our token metrics to calculate estimated costs.
Let’s assume a simplified pricing model for our llama2 model (e.g., $0.0001 per input token, $0.0002 per output token).
We can retrieve these metrics from SigNoz/Prometheus and perform calculations. For a real-time dashboard, you’d configure a Grafana panel or a SigNoz dashboard query.
Example Grafana/SigNoz Metrics Query (PromQL):
To get the total cost, you’d typically query the llm_input_tokens_total and llm_output_tokens_total counters.
Total Input Token Cost:
sum(rate(llm_input_tokens_total[5m])) * 0.0001(Cost per second over 5 minutes) Orllm_input_tokens_total * 0.0001(Total accumulated cost)Total Output Token Cost:
sum(rate(llm_output_tokens_total[5m])) * 0.0002(Cost per second over 5 minutes) Orllm_output_tokens_total * 0.0002(Total accumulated cost)
You can combine these into a single panel to show (llm_input_tokens_total * 0.0001) + (llm_output_tokens_total * 0.0002) for total estimated cost. This demonstrates how raw metrics become actionable business insights.
Mini-Challenge: Enhance Your Observability!
It’s your turn to add more value to our observability setup.
Challenge: Add a new custom attribute to the llm.interaction span and a new metric that tracks the response quality score.
Imagine you have a simple function that “evaluates” the LLM response (e.g., checks for keywords, length, or just returns a random score for demonstration).
- Implement a
evaluate_responsefunction: This function should take thepromptandresponseas input and return a numericalquality_score(e.g., an integer from 1 to 5). For simplicity, you can make it return a random number. - Add a new
UpDownCountermetric: Name itllm_response_quality_scorewith a suitable description and unit. AnUpDownCounteris good here if scores can fluctuate up and down. - Update
get_llm_response:- Call your
evaluate_responsefunction after receiving the LLM output. - Add the
quality_scoreas an attribute to thellm.interactionspan (e.g.,span.set_attribute("llm.response.quality_score", quality_score)). - Record the
quality_scoreusing your newUpDownCountermetric.
- Call your
Hint:
- You’ll need
import randomfor a random score. - Remember to pass
common_attributeswhen recording your new metric. - The
UpDownCounter’sadd()method can take positive or negative values to adjust the current count. For a simple score, justadd(quality_score).
What to Observe/Learn:
After implementing and running your updated app.py, interact with the chatbot a few more times.
- In SigNoz Traces: You should see your new
llm.response.quality_scoreattribute on eachllm.interactionspan. - In SigNoz Metrics/Dashboards: You should be able to find and visualize your
llm_response_quality_scoremetric. This allows you to track the perceived quality of your LLM’s responses over time.
This challenge reinforces the idea of extending observability to cover domain-specific, business-critical aspects of your AI system.
Common Pitfalls & Troubleshooting
Even with clear steps, things can sometimes go sideways. Here are common issues and how to approach them:
OpenTelemetry Exporter Connection Errors:
- Symptom: Your
app.pylogsConnection refusedorFailed to export spans/metricsmessages. - Cause: The OpenTelemetry Collector (part of SigNoz) isn’t running or isn’t accessible at
http://localhost:4318. - Fix:
- Ensure Docker Desktop is running.
- Navigate to your
ai-observability-project/signoz-setupdirectory. - Run
docker-compose psto check if all services areUp. If not, rundocker-compose up -d. - Verify the endpoint in
app.py(http://localhost:4318/v1/tracesand/v1/metrics) is correct.
- Symptom: Your
No Traces/Logs/Metrics in SigNoz UI:
- Symptom: Your
app.pyruns without errors, but nothing appears in the SigNoz UI. - Cause: Data might not be reaching the collector, or the collector isn’t forwarding it correctly, or there’s a filtering issue in the UI.
- Fix:
- Check
app.pylogs: Are there anyopentelemetrywarnings or errors? - Verify
service.name: Ensure theservice.nameattribute (llm-chatbot-servicein our case) inapp.pymatches what you’re searching for in the SigNoz UI. - Time Range: In SigNoz, ensure the time range filter is set appropriately (e.g., “Last 5 minutes” or “Last 1 hour”).
- Collector Logs: Check the logs of the
otel-collectorservice within your SigNoz Docker Compose setup:docker-compose logs otel-collector. Look for errors or indications that it’s receiving data. - Shutdown: Ensure
trace.get_tracer_provider().shutdown()andmetrics.get_meter_provider().shutdown()are called when your app exits. Without them, buffered data might not be sent.
- Check
- Symptom: Your
Metrics Not Showing Up in Grafana/SigNoz Dashboards:
- Symptom: Traces and logs are there, but your custom metrics are missing.
- Cause: Prometheus might not be scraping the OTel Collector, or the metric names are incorrect, or the export interval is too long.
- Fix:
- Metric Reader Interval: In
app.py, ensureexport_interval_millisforPeriodicExportingMetricReaderis set to a reasonable value (e.g., 5000ms for 5 seconds). - Metric Names: Double-check your metric names (e.g.,
llm_requests_total) against what you’re trying to query in SigNoz/Grafana. PromQL queries are case-sensitive. - Prometheus Target: Within the SigNoz setup, Prometheus is usually configured to scrape the OTel Collector. You can typically access the raw Prometheus UI (often
http://localhost:9090within the SigNoz setup, though it might be proxied) to verify if yourotel-collectortarget isUPand if your metrics are visible in the “Graph” tab.
- Metric Reader Interval: In
Inaccurate Token Counts:
- Symptom: The
llm_input_tokens_totalandllm_output_tokens_totalmetrics seem off. - Cause: Our simple
len(prompt.split())is an approximation. Tokenization is complex and model-specific. - Fix: For production, use the token counts provided by the LLM API response (e.g.,
response.usage.prompt_tokensfor OpenAI) or a proper tokenizer library (e.g.,tiktokenfor OpenAI models, ortransformerstokenizers for Hugging Face models) to get accurate counts.
- Symptom: The
Remember, observability is an iterative process. Start simple, verify, and then add more complexity as needed.
Summary
Congratulations! You’ve successfully built an end-to-end AI observability solution. This hands-on project has allowed you to:
- Set up a complete observability stack using Docker Compose for SigNoz, which includes an OpenTelemetry Collector, Prometheus, and Grafana.
- Instrument a Python LLM application with OpenTelemetry for distributed tracing, capturing the full lifecycle of an LLM interaction.
- Define and record custom metrics like total requests, latency, and token usage, crucial for performance and cost monitoring.
- Integrate structured logging with OpenTelemetry, ensuring logs are automatically correlated with traces for easier debugging.
- Visualize all three pillars of observability (traces, metrics, logs) in a unified UI, demonstrating how they provide a holistic view of your AI system’s health.
- Tackle a mini-challenge to extend observability with domain-specific metrics, highlighting the adaptability of OpenTelemetry.
This practical experience is invaluable for any MLOps practitioner or AI/ML engineer. You’ve moved beyond theoretical understanding to actively building and analyzing an observable AI system.
What’s Next?
This project is a solid foundation. Here are some ideas for where you can take it next:
- Advanced Alerting: Configure alerts in SigNoz/Grafana for high latency, low quality scores, or unexpected cost spikes.
- Automated Evaluation: Integrate more sophisticated LLM evaluation metrics (e.g., RAGAS, LlamaIndex evaluators) and track them as metrics or attributes.
- Data Drift Monitoring: Track distributions of prompt lengths, response lengths, or specific keywords to detect changes in user behavior or model outputs.
- A/B Testing Observability: Extend your attributes to include A/B test variant IDs to compare performance and behavior between different model versions or prompt strategies.
- Security & Compliance: Implement stricter logging policies for sensitive data and ensure your observability data storage complies with regulations.
- Real-world Deployment: Explore deploying this setup to a cloud environment (AWS, Azure, GCP) using managed services or Kubernetes.
Keep exploring, keep building, and remember: an observable AI system is a reliable and maintainable AI system!
References
- SigNoz Documentation: A comprehensive guide for deploying and using SigNoz, including their
docker-composesetup. - OpenTelemetry Python Documentation: The official source for OpenTelemetry’s Python SDK, covering tracing, metrics, and logging.
- Ollama GitHub Repository: For running open-source LLMs locally.
- OpenAI API Documentation: If you choose to use OpenAI models instead of Ollama.
- Prometheus Query Language (PromQL): For advanced metric querying in Grafana/SigNoz.
This page is AI-assisted and reviewed. It references official documentation and recognized resources where relevant.