Chapter 6: Docker Storage and Data Persistence

Introduction

In the previous chapters, we learned how to create, run, and manage Docker containers. However, one fundamental aspect we haven’t deeply explored is how Docker handles data. By default, the data generated by a container is stored within the container’s writable layer, which is ephemeral. This means that if you remove the container, all its data is lost. This behavior is problematic for applications that need to store persistent data, such as databases, logs, or user-uploaded files.

This chapter delves into Docker’s storage mechanisms, focusing on how to achieve data persistence for your applications. We’ll explore the various options Docker provides, including volumes, bind mounts, and tmpfs mounts, understanding their use cases, advantages, and disadvantages. Mastering these concepts is crucial for building robust, production-ready Dockerized applications.

Main Explanation

To understand Docker storage, it’s essential to first grasp how container filesystems work.

Understanding Container Filesystems

Docker containers utilize a Union File System (UFS) (like AUFS or OverlayFS). This system allows Docker to build a container’s filesystem from multiple layers:

Image Layers: These are read-only layers that form the base of your container, derived from the Docker image. Each instruction in a Dockerfile typically creates a new read-only layer.
Container Layer: When a container is started, a new writable layer is added on top of the image layers. All changes made by the running container (e.g., creating files, modifying existing ones, writing logs) occur in this writable layer.

The problem with the container layer is its ephemeral nature. If you delete the container, this writable layer is removed, and all data within it is lost. To store data persistently, Docker provides mechanisms to store data outside the container’s writable layer.

Docker Storage Options

Docker offers several options for persisting data, each with its own characteristics and best use cases:

Volumes: The preferred mechanism for persisting data generated by and used by Docker containers.
Bind Mounts: Allow you to mount a file or directory from the host machine into a container.
tmpfs Mounts: Store data in the host machine’s memory, ideal for non-persistent, sensitive data.

Let’s explore each in detail.

1. Volumes

Volumes are the recommended way to persist data in Docker. They are entirely managed by Docker, meaning Docker handles their creation, management, and deletion (though you can explicitly remove them).

Key characteristics of Volumes:

Managed by Docker: Docker creates and manages the volume on the host system (usually in /var/lib/docker/volumes/ on Linux).
Persistent: Data stored in a volume persists even if the container is stopped, removed, or recreated.
High Performance: Volumes are stored directly on the host’s filesystem, often outperforming the container’s writable layer.
Portable: Volumes can be easily backed up, migrated, and shared between containers.
Safe for Data: Docker ensures volumes are not tied to the container’s lifecycle, making them safer for critical data.

Types of Volumes:

Named Volumes: You give the volume a specific name, making it easy to reference and manage. E.g., my-data-volume.
Anonymous Volumes: Docker generates a random name for the volume. These are less common for explicit persistence but are used by Docker implicitly if you specify a mount point without a name.

How to use Volumes:

You can create volumes explicitly using docker volume create or implicitly when running a container using the --mount flag or the older -v flag. The --mount flag is preferred as it’s more explicit and readable.

2. Bind Mounts

Bind mounts allow you to mount a file or directory from the host machine directly into a container. This gives you fine-grained control over the exact location on the host where the data is stored.

Key characteristics of Bind Mounts:

Host-managed: You control the source path on the host machine.
Persistent: Data persists on the host machine as long as the host directory exists.
Direct Access: The host machine and the container share direct access to the files. Changes made by one are immediately visible to the other.
Development Workflow: Very useful for development, as you can mount source code into a container and see changes reflected instantly without rebuilding the image.
Less Portable: The specific host path makes bind mounts less portable compared to volumes.
Security Concerns: If containers can write to arbitrary paths on the host, it can pose security risks.

How to use Bind Mounts:

You use the --mount flag with type=bind or the -v flag, specifying the absolute path on the host and the path inside the container.

3. `tmpfs` Mounts

tmpfs mounts store data in the host machine’s memory (RAM), not on its filesystem. This means the data is not persistent; it’s lost when the container stops.

Key characteristics of tmpfs Mounts:

Non-persistent: Data is deleted when the container stops.
High Performance: Because data is stored in RAM, tmpfs mounts offer very fast I/O operations.
Security: Useful for storing sensitive, non-persistent data that you don’t want written to disk.
Temporary Data: Ideal for temporary files, caches, or session data that doesn’t need to survive container restarts.

How to use tmpfs Mounts:

You use the --mount flag with type=tmpfs or the --tmpfs flag.

Comparison: Volumes vs. Bind Mounts

Feature	Volumes	Bind Mounts
Management	Managed by Docker	Managed by the user/host filesystem
Location	Docker’s data directory (`/var/lib/docker/volumes/` on Linux)	Any specified directory on the host machine
Portability	Highly portable, independent of host path	Less portable, tied to specific host path
Backup	Easy to backup and migrate	Requires host-level backup of the specific directory
Performance	Generally good, optimized by Docker	Can be slightly faster for direct file access
Security	More secure, container cannot access arbitrary host files	Less secure, container can access/modify host files
Use Cases	Databases, persistent application data, sharing data between containers	Development, configuration files, host-specific tasks

Data Management Best Practices

Use Volumes for Persistent Data: For most applications requiring data persistence (databases, application data, logs that need to be kept), volumes are the best choice.
Use Bind Mounts for Development: When developing applications, bind mounting your source code allows for instant code changes without rebuilding images. It’s also suitable for mounting configuration files specific to the host.
Use tmpfs for Ephemeral, Sensitive Data: If you have temporary, non-persistent data that requires high performance or should never touch the disk, tmpfs mounts are appropriate.

Examples

Let’s walk through examples for each storage type.

1. Creating and Using a Named Volume

First, let’s create a named volume and then use it with an Nginx container.

# 1. Create a named volume
docker volume create my-nginx-data

# 2. Run an Nginx container, mounting the volume to Nginx's HTML directory
# This will make the /usr/share/nginx/html directory persistent
docker run -d \
  -p 80:80 \
  --name my-nginx-web \
  --mount source=my-nginx-data,target=/usr/share/nginx/html \
  nginx

# 3. Verify the container is running
docker ps

# 4. Access the container's shell to add some content to the mounted volume
docker exec -it my-nginx-web bash

# Inside the container:
# Create an index.html file in the mounted directory
echo "<h1>Hello from a Docker Volume!</h1>" > /usr/share/nginx/html/index.html
exit # Exit the container shell

# 5. Open your browser and navigate to http://localhost
# You should see "Hello from a Docker Volume!"

# 6. Stop and remove the container (the data in the volume will persist)
docker stop my-nginx-web
docker rm my-nginx-web

# 7. Run a new container using the *same* volume
docker run -d \
  -p 80:80 \
  --name my-nginx-web-again \
  --mount source=my-nginx-data,target=/usr/share/nginx/html \
  nginx

# 8. Open your browser and navigate to http://localhost again
# You should *still* see "Hello from a Docker Volume!", demonstrating persistence.

# 9. Clean up: stop and remove the container, then remove the volume
docker stop my-nginx-web-again
docker rm my-nginx-web-again
docker volume rm my-nginx-data

2. Using a Bind Mount

Let’s bind mount a local directory from your host into a container.

# 1. Create a directory on your host machine
mkdir -p ~/my-host-data

# 2. Create a file inside that directory
echo "This content is from the host machine via a bind mount." > ~/my-host-data/host_message.txt

# 3. Run an Nginx container, bind mounting the host directory
# Replace `$(pwd)/my-host-data` with the absolute path if you're not in your home directory
docker run -d \
  -p 81:80 \
  --name my-bind-nginx \
  --mount type=bind,source=$(pwd)/my-host-data,target=/usr/share/nginx/html \
  nginx

# 4. Verify the container is running
docker ps

# 5. Access the container's shell to check the mounted file
docker exec -it my-bind-nginx bash

# Inside the container:
ls /usr/share/nginx/html
cat /usr/share/nginx/html/host_message.txt
# You should see "This content is from the host machine via a bind mount."
exit # Exit the container shell

# 6. Now, let's modify the file on the host and see the change instantly in the container
echo "Updated content from the host!" > ~/my-host-data/host_message.txt

# 7. Access the container's shell again
docker exec -it my-bind-nginx bash
cat /usr/share/nginx/html/host_message.txt
# You should immediately see "Updated content from the host!", demonstrating real-time sync.
exit

# 8. Clean up
docker stop my-bind-nginx
docker rm my-bind-nginx
rm -rf ~/my-host-data

3. Using `tmpfs` Mount

This example shows how tmpfs mounts work for temporary, non-persistent data.

# 1. Run an Alpine container with a tmpfs mount
docker run -it --rm \
  --name my-tmpfs-container \
  --mount type=tmpfs,target=/app/tempdata \
  alpine sh

# Inside the container:
# Create a file in the tmpfs mounted directory
echo "This is temporary data." > /app/tempdata/temp_file.txt
ls /app/tempdata
cat /app/tempdata/temp_file.txt

# Exit the container. Since --rm was used, the container is removed,
# and with it, all data in the tmpfs mount is lost.
exit

# 2. Try to verify the data. It's gone.
# If you try to run a new container and mount the *same* target,
# it will be an empty directory because the previous data was in RAM.
docker run -it --rm \
  --name my-tmpfs-container-again \
  --mount type=tmpfs,target=/app/tempdata \
  alpine sh

# Inside the container:
ls /app/tempdata
# The directory will be empty.
exit

Mini Challenge

Your challenge is to create a simple Apache web server container that serves a custom index.html file using a named volume.

Create a named Docker volume called apache-html-volume.
Run an Apache HTTP Server container (using the httpd image).
Mount apache-html-volume to the default web root directory inside the Apache container (which is /usr/local/apache2/htdocs/ for the httpd image).
After the container starts, use docker exec to create an index.html file inside the container’s web root with some custom content (e.g., “
My Apache is serving from a volume!
”).
Access the Apache server from your host machine’s browser (e.g., http://localhost:8080 if you mapped port 8080). Verify your custom content is displayed.
Stop and remove the Apache container.
Start a new Apache container, using the same apache-html-volume.
Access the server again and verify that your index.html content is still present, demonstrating data persistence.
Clean up by removing the container and the volume.

Summary

Data persistence is a critical aspect of running stateful applications in Docker. In this chapter, we explored the ephemeral nature of container filesystems and introduced Docker’s primary storage solutions:

Volumes: Docker-managed storage, ideal for persistent data like databases, offering high performance, portability, and ease of management. They are the recommended choice for most persistent data needs.
Bind Mounts: Allow mounting host files or directories into containers, providing direct access and control. They are excellent for development workflows and injecting host-specific configurations but are less portable and have potential security implications.
tmpfs Mounts: Store data in memory, making them non-persistent but very fast. They are suitable for temporary or sensitive data that should not be written to disk.

By understanding and appropriately utilizing these storage options, you can ensure your Dockerized applications retain their data across container lifecycles, leading to more robust and reliable deployments. Always default to volumes for critical persistent data.