Introduction: Keeping an Eye on Your Containers

Welcome back, future Docker master! So far, we’ve learned how to build, run, and orchestrate our applications with Docker. But what happens when things go wrong? How do you know if your application is performing well or even running at all? This is where monitoring, logging, and health checks come into play.

In this chapter, we’re going to dive into these crucial aspects of running applications in production. You’ll learn how to peek inside your containers, understand what they’re doing, and ensure they’re always in tip-top shape. We’ll cover Docker’s built-in tools for logs and resource monitoring, and how to implement robust health checks to keep your services reliable. Get ready to add some serious diagnostic power to your Docker toolkit!

This chapter assumes you’re comfortable with running Docker containers, building images with Dockerfiles, and orchestrating multi-service applications using docker-compose from previous chapters. If you need a refresher, feel free to revisit those sections!

Core Concepts: Your Container’s Vital Signs

Before we start typing commands, let’s understand why these concepts are so important and what they actually mean in the world of Docker.

What is Logging?

Imagine your application is a person working diligently. Logs are like that person’s diary, recording every significant action, decision, and hiccup throughout their day. When something goes wrong, you can read the diary to understand what happened, when it happened, and why.

In Docker, logs are the standard output (stdout) and standard error (stderr) streams of your running container. Any message your application prints to the console, whether it’s an informational message, a warning, or an error, gets captured by Docker.

Why are logs important?

  • Debugging: When your application crashes or misbehaves, logs are your first line of defense to pinpoint the problem.
  • Auditing: They can provide a historical record of events, useful for security or compliance.
  • Performance Analysis: You can log performance metrics or request times to identify bottlenecks.

Docker’s default logging mechanism, the json-file driver, stores these logs in JSON format on the host machine. While simple, for production, you often want more sophisticated solutions that centralize logs from many containers. We’ll touch on how to configure these.

What is Monitoring?

If logs are the diary, monitoring is like taking your application’s pulse, checking its temperature, and measuring its energy levels in real-time. It’s about observing the system’s performance and resource usage.

Why is monitoring important?

  • Resource Management: See if your container is using too much CPU or memory, which could indicate a problem or a need for more resources.
  • Performance Insights: Track response times, error rates, and other metrics to ensure your application is meeting its performance goals.
  • Proactive Issue Detection: Spot trends or anomalies before they turn into full-blown outages.

Docker provides a basic built-in monitoring tool (docker stats), but for comprehensive production monitoring, you’d typically integrate with external solutions like Prometheus and Grafana, which collect, store, and visualize metrics from all your services.

What are Health Checks?

Think of a health check as a quick, automated check-up for your container. Instead of just knowing if the container process is running, a health check tells you if the application inside the container is actually ready to serve requests and functioning correctly.

For example, a web server container might be running, but its database connection could be down, making it unable to serve pages. A simple process check wouldn’t catch this, but a health check that tries to access a specific endpoint would.

Why are health checks important?

  • Reliability: Prevents traffic from being routed to an unhealthy container, ensuring users only interact with fully functional instances.
  • Orchestration Integration: Tools like Docker Compose, Docker Swarm, and Kubernetes use health checks to determine when to restart containers or remove unhealthy ones from a load balancer.
  • Faster Recovery: Helps automated systems quickly detect and recover from application-level failures.

Docker’s HEALTHCHECK instruction in a Dockerfile defines how to perform this check, including how often to run it, how long to wait for a response, and how many failures to tolerate before marking a container as unhealthy.

Step-by-Step Implementation: Getting Hands-On

Let’s put these concepts into practice. We’ll create a super simple Node.js application that logs messages and has a basic web endpoint, then we’ll use it to explore logging, monitoring, and health checks.

1. Setting Up Our Sample Application

First, let’s create a directory for our project.

mkdir docker-diagnostics
cd docker-diagnostics

Now, let’s create our Node.js application file, app.js:

// docker-diagnostics/app.js
const http = require('http');
const os = require('os');
const port = process.env.PORT || 3000;

let requestCount = 0;

const server = http.createServer((req, res) => {
  requestCount++;
  console.log(`[INFO] Request received on ${os.hostname()} for ${req.url}. Total requests: ${requestCount}`);

  if (req.url === '/health') {
    // Simulate a health check that sometimes fails
    if (requestCount % 5 === 0) { // Fails every 5th request
      console.error(`[ERROR] Health check failed on purpose! Request count: ${requestCount}`);
      res.writeHead(500, { 'Content-Type': 'text/plain' });
      res.end('NOT OK - Simulating failure\n');
    } else {
      res.writeHead(200, { 'Content-Type': 'text/plain' });
      res.end('OK - Healthy\n');
    }
  } else if (req.url === '/slow') {
    console.warn(`[WARN] Simulating a slow response for ${req.url}`);
    setTimeout(() => {
      res.writeHead(200, { 'Content-Type': 'text/plain' });
      res.end('Slow response complete!\n');
    }, 5000); // 5-second delay
  } else {
    res.writeHead(200, { 'Content-Type': 'text/plain' });
    res.end(`Hello from Docker! Host: ${os.hostname()}, Request Count: ${requestCount}\n`);
  }
});

server.listen(port, () => {
  console.log(`[START] Server running on port ${port}`);
  console.log(`[INFO] Try accessing /health, /slow, or /`);
});

// Log an error every 10 seconds just for demo purposes
setInterval(() => {
    console.error(`[ERROR] This is a simulated error message at ${new Date().toISOString()}`);
}, 10000);

This simple app logs messages, serves a basic page, a /health endpoint that sometimes fails, and a /slow endpoint to simulate a slow process.

Next, create a package.json file:

// docker-diagnostics/package.json
{
  "name": "docker-diagnostics-app",
  "version": "1.0.0",
  "description": "A simple Node.js app for Docker diagnostics demo",
  "main": "app.js",
  "scripts": {
    "start": "node app.js"
  },
  "author": "AI Expert",
  "license": "ISC"
}

And finally, our Dockerfile:

# docker-diagnostics/Dockerfile
# Use the official Node.js 20 LTS image as of Dec 2025
FROM node:20-alpine

# Set the working directory in the container
WORKDIR /app

# Copy package.json and package-lock.json (if present)
# to install dependencies
COPY package*.json ./

# Install application dependencies
RUN npm install

# Copy the rest of the application code
COPY . .

# Expose the port our app runs on
EXPOSE 3000

# Define the command to run our application
CMD ["npm", "start"]

Now, let’s build our image:

docker build -t diagnostics-app:1.0 .

2. Exploring Logging with docker logs

With our image built, let’s run a container and see its logs.

docker run -d --name my-diagnostics-container -p 3000:3000 diagnostics-app:1.0

The -d runs it in detached mode. Now, let’s check the logs!

docker logs my-diagnostics-container

You should see output similar to this:

[START] Server running on port 3000
[INFO] Try accessing /health, /slow, or /
[ERROR] This is a simulated error message at 2025-12-04TXX:XX:XX.XXX

Docker collects everything sent to stdout and stderr by your application. Isn’t that neat?

Advanced Logging Options

What if you want to see logs in real-time, just like tail -f?

docker logs -f my-diagnostics-container

Now, try opening your browser to http://localhost:3000 or http://localhost:3000/health a few times. You’ll see the [INFO] and [ERROR] messages appear in your terminal instantly!

You can also filter logs:

  • Show logs since a specific time:

    docker logs --since "2025-12-04T10:00:00" my-diagnostics-container
    

    (Replace with a time relevant to your container’s start)

  • Show only the last N lines:

    docker logs --tail 10 my-diagnostics-container
    
  • Show logs with timestamps:

    docker logs -t my-diagnostics-container
    

    This adds a timestamp to each log entry, which is super useful for debugging.

Configuring Logging Drivers with Docker Compose

For production, relying on Docker’s default json-file driver without any configuration isn’t ideal. Logs can grow indefinitely, filling up your disk! Let’s use docker-compose to configure logging limits.

First, stop and remove your running container:

docker stop my-diagnostics-container
docker rm my-diagnostics-container

Now, create a docker-compose.yml file in your docker-diagnostics directory:

# docker-diagnostics/docker-compose.yml
version: '3.8'

services:
  app:
    image: diagnostics-app:1.0
    ports:
      - "3000:3000"
    logging:
      driver: "json-file"
      options:
        max-size: "10m" # Max size of the log file before rotation
        max-file: "3"   # Max number of log files to keep

Here, we’re explicitly setting the json-file driver and telling Docker to:

  • Keep log files to a maximum size of 10m (10 megabytes).
  • Rotate and keep only 3 such files. This prevents logs from consuming all your disk space.

Let’s run this with Docker Compose:

docker compose up -d

Now, your application is running with configured log rotation! You’ll still use docker logs to view them:

docker compose logs -f app

Feel free to generate a lot of traffic to http://localhost:3000 and observe the logs. Docker will manage the log files in the background according to your max-size and max-file settings.

3. Monitoring with docker stats

While docker logs tells you what your app is doing, docker stats tells you how it’s performing from a resource perspective.

Open a new terminal window (keep your docker compose logs -f app running in the other).

docker stats

You’ll see a live stream of resource usage for all your running containers. For our app container, you’ll see columns like:

  • CONTAINER ID / NAME: Identifies the container.
  • CPU %: Percentage of CPU usage.
  • MEM USAGE / LIMIT: How much memory the container is using out of its allocated limit.
  • NET I/O: Network input/output.
  • BLOCK I/O: Disk read/write activity.
  • PIDS: Number of processes running inside the container.

Try navigating to http://localhost:3000/slow multiple times in your browser. You’ll notice the CPU % might spike briefly, and NET I/O will increase as data is transferred. This gives you a quick snapshot of how your containers are utilizing resources. If you see consistently high CPU or memory, it’s a sign to investigate further!

To stop docker stats, press Ctrl+C.

4. Implementing Health Checks

Now for health checks! We’ll add a HEALTHCHECK instruction to our Dockerfile. This instruction tells Docker how to test if a container is still “healthy.”

Let’s modify our Dockerfile:

# docker-diagnostics/Dockerfile
# Use the official Node.js 20 LTS image as of Dec 2025
FROM node:20-alpine

# Set the working directory in the container
WORKDIR /app

# Copy package.json and package-lock.json (if present)
# to install dependencies
COPY package*.json ./

# Install application dependencies
RUN npm install

# Copy the rest of the application code
COPY . .

# Expose the port our app runs on
EXPOSE 3000

# --- ADD THIS HEALTHCHECK INSTRUCTION ---
HEALTHCHECK --interval=5s --timeout=3s --start-period=5s --retries=3 \
  CMD curl -f http://localhost:3000/health || exit 1

# Define the command to run our application
CMD ["npm", "start"]

Let’s break down that HEALTHCHECK line:

  • --interval=5s: Docker will run this check every 5 seconds.
  • --timeout=3s: If the command takes longer than 3 seconds to complete, it’s considered a failure.
  • --start-period=5s: Give the container 5 seconds to initialize before starting health checks. Any failures during this period don’t count towards --retries.
  • --retries=3: If the check fails 3 consecutive times after the start-period, the container is marked as unhealthy.
  • CMD curl -f http://localhost:3000/health || exit 1: This is the actual command Docker runs inside the container.
    • curl -f http://localhost:3000/health: Tries to fetch the /health endpoint. The -f flag tells curl to fail silently on HTTP errors (like 4xx or 5xx status codes).
    • || exit 1: If curl fails (returns a non-zero exit code due to -f or connection issues), the HEALTHCHECK command exits with code 1, indicating a failure. If curl succeeds, it exits with 0, indicating success.

Now, we need to rebuild our image since we changed the Dockerfile:

docker build -t diagnostics-app:1.0 .

Docker Compose will automatically pick up the new image if it has the same tag. Let’s restart our services:

docker compose up -d --force-recreate

(The --force-recreate ensures a new container is created from the updated image, even if the service definition hasn’t changed).

Now, check the health status:

docker ps

You’ll see a new column: STATUS. Initially, it might say (health: starting), then (healthy).

CONTAINER ID   IMAGE                  COMMAND                  CREATED          STATUS                            PORTS                                       NAMES
xxxxxxxxxxxx   diagnostics-app:1.0    "docker-entrypoint.s…"   10 seconds ago   Up 9 seconds (healthy)            0.0.0.0:3000->3000/tcp, :::3000->3000/tcp   docker-diagnostics-app-1

Remember our app.js simulates a health check failure every 5th request? Let’s trigger that! Access http://localhost:3000/health repeatedly (at least 5-6 times).

Keep running docker ps in your terminal. After a few failures, you should see the status change to (unhealthy)!

CONTAINER ID   IMAGE                  COMMAND                  CREATED          STATUS                            PORTS                                       NAMES
xxxxxxxxxxxx   diagnostics-app:1.0    "docker-entrypoint.s…"   About a minute ago   Up About a minute (unhealthy)   0.0.0.0:3000->3000/tcp, :::3000->3000/tcp   docker-diagnostics-app-1

This is incredibly powerful! Even though the container process is still running, Docker knows that the application inside isn’t behaving as expected. An orchestrator like Swarm or Kubernetes would then take action, such as restarting the container or routing traffic away from it.

You can also get more detailed health check output using:

docker inspect --format='{{json .State.Health}}' docker-diagnostics-app-1

This will show you the log of health check attempts, including their output and exit codes.

When you’re done, clean up:

docker compose down

Mini-Challenge: Diagnostic Duo

It’s your turn! Let’s combine what we’ve learned.

Challenge:

  1. Create a new directory for this challenge, separate from docker-diagnostics.
  2. Inside this new directory, create a simple Python Flask application. This app should:
    • Print a log message to stdout every time it receives a request.
    • Have a /status endpoint that always returns “OK”.
  3. Write a Dockerfile for this Flask application.
  4. Create a docker-compose.yml file that defines a service for your Flask app. This service should:
    • Use the image you built.
    • Expose port 5000.
    • Configure its json-file logging driver to have a max-size of 5m and max-file of 2.
    • Include a HEALTHCHECK instruction that checks the /status endpoint every 10 seconds, with a 2-second timeout, and 2 retries.
  5. Bring up your services, verify the health status, and check the logs.

Hint:

  • For the Flask app, remember to install Flask in your Dockerfile (pip install flask).
  • The HEALTHCHECK command could use curl or wget. Make sure curl or wget is available in your base image, or install it (RUN apk add --no-cache curl if using alpine base image). A good base for Flask is python:3.10-alpine.

What to observe/learn:

  • How to integrate logging driver configuration into docker-compose.yml.
  • How to define a robust HEALTHCHECK in a Dockerfile for a different language application.
  • Confirming that docker ps shows your container as (healthy).
  • Seeing your request logs appear with docker compose logs.

Common Pitfalls & Troubleshooting

Even with these powerful tools, it’s easy to stumble. Here are a few common issues and how to tackle them:

  1. Ignoring HEALTHCHECK results: Just because a container is unhealthy doesn’t mean it’s stopped. docker ps will still show it as Up. Many new users assume unhealthy means the container is down. Remember, the HEALTHCHECK status is primarily for orchestrators (like Docker Swarm or Kubernetes) to take action. You, as the operator, need to monitor this status and decide if manual intervention or an automated restart policy is needed.
  2. Health checks too aggressive or too lenient:
    • Too aggressive (short interval, timeout, low retries): Your application might take a moment to warm up or experience temporary network glitches, leading to false unhealthy statuses and unnecessary restarts.
    • Too lenient (long interval, start-period, high retries): Your application could be failing for a long time before Docker marks it as unhealthy, leading to degraded service for users.
    • Troubleshooting: Tune interval, timeout, start-period, and retries based on your application’s specific startup time and expected response latency. Use docker inspect to see the health check logs and fine-tune.
  3. Forgetting to install curl or wget for HEALTHCHECK: If your HEALTHCHECK command uses curl or wget (which is common for HTTP checks), but your base image doesn’t include it, the health check will always fail.
    • Troubleshooting: Add RUN apk add --no-cache curl (for Alpine) or RUN apt-get update && apt-get install -y curl (for Debian/Ubuntu based images) to your Dockerfile before the HEALTHCHECK instruction.
  4. Misunderstanding docker stats values: CPU % can sometimes go over 100% if your container has multiple CPU cores available. A single core is 100%, two cores would be 200%, etc.
    • Troubleshooting: Always consider the context of your host’s CPU count. For multi-core systems, 100% CPU usage might only mean one core is fully utilized.
  5. Log files filling up the disk: If you don’t configure logging drivers with max-size and max-file options, especially in docker-compose.yml or your daemon.json, logs can quickly consume all available disk space on your host.
    • Troubleshooting: Always implement log rotation policies for production environments. For advanced scenarios, consider sending logs to a centralized logging system (e.g., ELK stack, Splunk, cloud logging services) using appropriate logging drivers like fluentd, syslog, or cloud-specific drivers.

Summary: Your Application’s Guardian

Phew, you’ve just added some seriously powerful tools to your Docker arsenal! Let’s quickly recap what we covered:

  • docker logs: Your go-to for peering into the stdout and stderr streams of your containers, essential for debugging and understanding application behavior. We learned about docker logs -f, --since, --tail, and -t.
  • Logging Drivers: How to configure advanced logging options, like log rotation (max-size, max-file), using docker-compose.yml to prevent disk space issues and prepare for centralized logging.
  • docker stats: A real-time dashboard for monitoring your containers’ resource consumption (CPU, memory, network, disk I/O), helping you spot performance bottlenecks.
  • HEALTHCHECK: A crucial Dockerfile instruction that defines how Docker can determine if your application inside the container is truly healthy and ready to serve traffic, not just whether its process is running. We explored its parameters (--interval, --timeout, --start-period, --retries) and its importance for reliable deployments.

With these skills, you’re no longer just running containers; you’re actively monitoring their well-being, diagnosing issues, and ensuring they stay robust and responsive. This is a massive step towards running truly production-ready Dockerized applications!

In the next chapter, we’ll shift our focus to more advanced networking concepts, allowing your containers to communicate securely and efficiently in complex environments. Get ready to connect the dots!