Welcome back, future containerization wizard! In this chapter, we’re going to put all your hard-earned knowledge about Apple’s container tool to the test by tackling a real-world, highly relevant scenario: containerizing a machine learning (ML) workflow.
Why is this important? Machine learning projects often involve complex dependencies (specific Python versions, libraries like TensorFlow, PyTorch, scikit-learn), specific data paths, and a need for reproducible environments. Containers provide an elegant solution to these challenges, ensuring your ML models train and behave consistently, regardless of where they run. By the end of this chapter, you’ll have a practical, portable, and reproducible ML pipeline running natively on your Mac using Apple’s cutting-edge container technology.
This chapter assumes you’re comfortable with the basics of container CLI, Dockerfile creation, and volume mounting, as covered in previous chapters. If any of those concepts feel a bit fuzzy, a quick refresher might be helpful before we dive into this exciting project!
What We’ll Learn
- How to structure an ML project for containerization.
- Writing a
Dockerfilefor a Python-based ML application. - Managing ML dependencies within a container.
- Using volumes to persist trained models and data.
- Executing a full ML training workflow inside an Apple container.
Let’s get started on building something truly practical!
Core Concepts: Why Containerize ML?
Before we jump into the code, let’s briefly recap why containerization is a game-changer for machine learning. Understanding the “why” often makes the “how” much clearer and more meaningful.
1. Reproducibility: Imagine you train a fantastic ML model, but a few months later, you (or a colleague) try to retrain it, and it behaves differently. This “it worked on my machine” problem is rampant in ML. It could be due to differing library versions, operating system patches, or even subtle environment variables.
- Containers solve this: By packaging your code, its dependencies, and the runtime environment into a single, isolated unit, containers guarantee that your ML workflow will run identically every time, everywhere.
2. Portability: Your data scientists might develop models on macOS, but production deployment could be on Linux servers. Moving an ML project between these environments without containers can be a headache of dependency hell.
- Containers solve this: An OCI-compliant container image built on your Mac using
containercan be run on any system that supports OCI containers (like Linux servers, cloud platforms), assuming the underlying architecture is compatible (e.g.,arm64for Apple Silicon,amd64for Intel).
3. Isolation: Running multiple ML experiments simultaneously, each with slightly different library versions or configurations, can lead to conflicts.
- Containers solve this: Each container runs in its own isolated environment, preventing conflicts and allowing you to manage different experimental setups side-by-side without interference.
4. Simplified Collaboration:
Sharing your ML work with teammates becomes much easier. Instead of providing a long list of installation instructions, you provide a Dockerfile and they can build and run your exact environment.
Here’s a simplified visual of how containerization fits into an ML workflow:
This diagram illustrates how steps like Dockerfile creation, image building, and running various stages of the ML workflow (data prep, training, evaluation) are all encapsulated within the containerization process, with persistence handled via volumes.
Step-by-Step Implementation: Containerizing Our ML Workflow
For this project, we’ll create a simple Python script that trains a basic classification model using scikit-learn on the Iris dataset, then saves the trained model to a file.
Step 1: Project Setup and ML Code
First, let’s create a new directory for our project and set up our Python script and dependencies.
Create Project Directory: Open your terminal and create a new directory:
mkdir ml_container_project cd ml_container_projectCreate the ML Script (
train_model.py): This script will load the Iris dataset, train aLogisticRegressionmodel, and save it.# ml_container_project/train_model.py import pandas as pd from sklearn.datasets import load_iris from sklearn.model_selection import train_test_split from sklearn.linear_model import LogisticRegression import joblib # For saving/loading models import os print("Starting ML Model Training...") # 1. Load Data iris = load_iris() X = pd.DataFrame(iris.data, columns=iris.feature_names) y = pd.Series(iris.target) print(f"Dataset loaded: {X.shape[0]} samples, {X.shape[1]} features.") # 2. Split Data X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) print(f"Training data size: {X_train.shape[0]}, Test data size: {X_test.shape[0]}.") # 3. Train Model # Using a simple Logistic Regression model model = LogisticRegression(max_iter=200, solver='liblinear') # Increased max_iter for convergence model.fit(X_train, y_train) print("Model training complete.") # 4. Evaluate Model (basic) accuracy = model.score(X_test, y_test) print(f"Model accuracy on test set: {accuracy:.4f}") # 5. Save Model # We'll save the model to a directory that will be mounted as a volume. # The container will write to /app/models, which maps to our local ./models directory. model_dir = "/app/models" os.makedirs(model_dir, exist_ok=True) # Ensure the directory exists inside the container model_path = os.path.join(model_dir, "iris_logistic_model.joblib") joblib.dump(model, model_path) print(f"Model saved to {model_path}") print("ML Model Training Finished.")Explanation:
- We import
pandasfor data handling,sklearnfor ML, andjoblibto save our trained model. load_iris()gets a classic dataset.train_test_splitdivides our data into training and testing sets.LogisticRegressionis our chosen model;max_iteris increased to ensure convergence for this simple example.model.scoregives us a quick accuracy check.- Crucially, we define
model_dir = "/app/models". This path inside the container is where our model will be saved. We’ll later map this to a local directory on your Mac using a volume mount.os.makedirs(model_dir, exist_ok=True)ensures this directory exists before saving.
- We import
Create Requirements File (
requirements.txt): This file lists all Python packages our script needs.# ml_container_project/requirements.txt pandas==2.2.1 scikit-learn==1.4.1 joblib==1.3.2Explanation:
- We specify exact versions for
pandas,scikit-learn, andjoblib. This is a best practice for reproducibility in production environments. - Version Check (2026-02-25): As of early 2026, these are robust and widely used versions. Always check PyPI (e.g.,
pypi.org/project/pandas/) for the absolute latest stable releases if you need newer features, but these will work perfectly for our example.
- We specify exact versions for
Step 2: Crafting the Dockerfile
Now, let’s write the Dockerfile that tells container how to build our ML environment.
Create the Dockerfile: In the
ml_container_projectdirectory, create a file namedDockerfile(no extension).# ml_container_project/Dockerfile # Use an official Python runtime as a parent image. # We choose a specific version for stability and reproducibility. # 'python:3.11-slim-bookworm' provides a lean Debian-based Python 3.11 image. FROM python:3.11-slim-bookworm # Set the working directory in the container. # All subsequent commands will be executed relative to this directory. WORKDIR /app # Copy the requirements file into the container at /app. # This is done separately to leverage Docker's layer caching. COPY requirements.txt . # Install any needed packages specified in requirements.txt. # The --no-cache-dir flag reduces the image size by not storing build artifacts. RUN pip install --no-cache-dir -r requirements.txt # Copy the entire project directory into the container at /app. # This includes our train_model.py script. COPY . . # Specify the command to run when the container starts. # This will execute our Python script. CMD ["python", "train_model.py"]Explanation:
FROM python:3.11-slim-bookworm: We start with a base image that already has Python 3.11 installed. The-slim-bookwormtag indicates a smaller image based on Debian 12 “Bookworm”, which is good for keeping image size down.WORKDIR /app: Sets/appas the current directory inside the container for subsequent commands.COPY requirements.txt .: Copies ourrequirements.txtfrom your local machine into the/appdirectory inside the container.RUN pip install --no-cache-dir -r requirements.txt: Installs all the Python dependencies listed inrequirements.txt.pip install --no-cache-diris a best practice to keep the image smaller.COPY . .: Copies all other files from your local project directory (includingtrain_model.py) into the/appdirectory in the container.CMD ["python", "train_model.py"]: This is the default command that will be executed when the container starts. It runs our ML training script.
Step 3: Building the Container Image
With our Dockerfile ready, let’s build the container image using Apple’s container CLI.
Build the Image: Make sure you are in the
ml_container_projectdirectory.container build -t ml-workflow:v1.0 .Explanation:
container build: This command initiates the image build process.-t ml-workflow:v1.0: This tags our image with a name (ml-workflow) and a version (v1.0). Tags are essential for identifying and managing images..: This tellscontainerto look for theDockerfilein the current directory.
You’ll see output indicating the build steps as
containerexecutes each instruction in yourDockerfile. This might take a few moments as it downloads the base Python image and installs dependencies.Quick Check: If you see any errors, double-check your
Dockerfilesyntax and ensure you have an active internet connection. Common errors include typos or incorrect paths.
Step 4: Running the Container and Persisting the Model
Now for the exciting part: running our ML workflow inside the container and making sure our trained model is saved outside the container’s ephemeral filesystem. This is where volumes come in!
Create a Local Directory for the Model: Before running, create a local directory where the trained model will be saved.
mkdir modelsThis
modelsdirectory will exist on your Mac, outside the container.Run the Container with a Volume Mount:
container run --mount type=bind,source="$(pwd)/models",target=/app/models ml-workflow:v1.0Explanation:
container run: The command to start a new container.--mount type=bind,source="$(pwd)/models",target=/app/models: This is the crucial part for persistence.type=bind: Specifies a bind mount, which links a file or directory on the host machine directly into the container.source="$(pwd)/models": This is the path on your Mac (the host).$(pwd)ensures we use the absolute path to yourml_container_project/modelsdirectory.target=/app/models: This is the path inside the container where thesourcedirectory will be mounted. Remember, in ourtrain_model.pyscript, we told it to save the model to/app/models.
ml-workflow:v1.0: The name and tag of the image we want to run.
As the container runs, you’ll see the print statements from your
train_model.pyscript.Starting ML Model Training... Dataset loaded: 150 samples, 4 features. Training data size: 120, Test data size: 30. Model training complete. Model accuracy on test set: 1.0000 Model saved to /app/models/iris_logistic_model.joblib ML Model Training Finished.Once the script finishes, the container will exit.
Verify the Saved Model: Check your local
modelsdirectory:ls -l models/You should see
iris_logistic_model.jobliblisted!-rw-r--r-- 1 youruser staff 12345 Feb 25 10:30 iris_logistic_model.joblib(The file size and date will vary.)
Congratulations! You’ve successfully trained an ML model inside an Apple container and persisted its output to your local filesystem. This is a fundamental pattern for many containerized data processing and ML tasks.
Step 5: (Optional) Loading and Using the Model
To confirm our model was saved correctly and can be loaded, let’s create a small script to load it.
Create a Model Loading Script (
predict_with_model.py):# ml_container_project/predict_with_model.py import joblib import pandas as pd from sklearn.datasets import load_iris import os print("Starting Model Prediction...") # Define the path to the model inside the container model_dir = "/app/models" model_path = os.path.join(model_dir, "iris_logistic_model.joblib") if not os.path.exists(model_path): print(f"Error: Model not found at {model_path}. Did you run train_model.py first?") exit(1) # Load the trained model model = joblib.load(model_path) print(f"Model loaded from {model_path}") # Simulate new data for prediction (e.g., first 5 samples from the test set) iris = load_iris() X_new = pd.DataFrame(iris.data, columns=iris.feature_names).iloc[:5] print(f"\nNew data for prediction:\n{X_new}") # Make predictions predictions = model.predict(X_new) print(f"\nPredictions: {predictions}") print(f"Actual labels (for comparison): {iris.target[:5]}") print("Model Prediction Finished.")Run the Prediction Script in a Container: We don’t need to rebuild the image, just run a new container, again mounting the
modelsdirectory, but this time executingpredict_with_model.py.container run --mount type=bind,source="$(pwd)/models",target=/app/models ml-workflow:v1.0 python predict_with_model.pyExplanation:
- Notice the
python predict_with_model.pyat the end. This overrides the defaultCMDspecified in ourDockerfile(CMD ["python", "train_model.py"]) and tells the container to run our new prediction script instead.
You should see output similar to this:
Starting Model Prediction... Model loaded from /app/models/iris_logistic_model.joblib New data for prediction: sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) 0 5.1 3.5 1.4 0.2 1 4.9 3.0 1.4 0.2 2 4.7 3.2 1.3 0.2 3 4.6 3.1 1.5 0.2 4 5.0 3.6 1.4 0.2 Predictions: [0 0 0 0 0] Actual labels (for comparison): [0 0 0 0 0] Model Prediction Finished.This confirms our model was successfully loaded and used for inference!
- Notice the
Mini-Challenge: Extend the Workflow!
You’ve done great so far! Now, let’s add another small step to our ML workflow.
Challenge:
Modify the train_model.py script to also save the model’s accuracy to a text file (e.g., metrics.txt) within the same /app/models directory. Then, rebuild your image and run the container to verify the metrics.txt file appears in your local models directory.
Hint:
- You’ll need to open a file in write mode (
'w') and write the accuracy string. - Remember to use
os.path.jointo create the full path formetrics.txtwithin/app/models. - Don’t forget to rebuild the image after changing
train_model.py!
What to Observe/Learn: This challenge reinforces how to persist multiple output files from your container and the importance of rebuilding your image when your application code changes.
Common Pitfalls & Troubleshooting
“No such file or directory” for
train_model.pyorrequirements.txt:- Cause: This usually means the
COPYcommands in yourDockerfilearen’t finding the files. - Solution: Ensure your
Dockerfile,train_model.py, andrequirements.txtare all in the same directory where you’re runningcontainer build. Also, double-check the file names for typos.
- Cause: This usually means the
Python dependency errors (e.g.,
ModuleNotFoundError):- Cause: The required Python packages were not installed correctly, or you forgot to add them to
requirements.txt. - Solution:
- Verify
requirements.txtcontains all necessary packages with correct spelling. - Ensure
RUN pip install -r requirements.txtis present and runs successfully in yourDockerfile. - Rebuild your image (
container build -t ml-workflow:v1.0 .) after making changes torequirements.txtor theDockerfile.
- Verify
- Cause: The required Python packages were not installed correctly, or you forgot to add them to
Model not saved to local
modelsdirectory:- Cause: The volume mount was incorrect, or the script saved the model to a different path inside the container.
- Solution:
- Check the
sourcepath in your--mountargument (source="$(pwd)/models"). Make sure it correctly points to your localml_container_project/modelsdirectory. - Check the
targetpath (target=/app/models). This must exactly match themodel_dirvariable in your Python script (model_dir = "/app/models"). - Ensure your script’s
os.makedirs(model_dir, exist_ok=True)call is present and correct.
- Check the
containerCLI version:- Cause: You might be following instructions that assume a slightly different version of the
containerCLI. - Solution: Always refer to the official Apple
containerGitHub repository’s Releases page for the latest stable version and its corresponding documentation. As of 2026-02-25, we’re assuming a stablev1.2.0or similar, but checking the official source is always the best practice.
- Cause: You might be following instructions that assume a slightly different version of the
Summary
Fantastic work! You’ve successfully navigated the complexities of containerizing a machine learning workflow on your Mac using Apple’s native container tools.
Here are the key takeaways from this project:
- Reproducibility is Key: Containers provide an isolated, consistent environment, crucial for ML experiments and deployments.
- Dockerfile for ML: We learned how to write a
Dockerfilespecifically tailored for Python-based ML applications, including base image selection, dependency installation, and application code copying. - Volume Mounts for Persistence: The
--mountflag is essential for saving trained models, evaluation metrics, or processed data from the container to your host machine, ensuring outputs are not lost when the container stops. - Overriding
CMD: You saw how to execute different scripts within the same image by providing a command after the image name incontainer run. - Real-World Application: This project demonstrates a practical, portable, and reproducible way to manage your ML development lifecycle on macOS.
In the next chapter, we’ll explore even more advanced topics, perhaps diving into multi-stage builds or integrating with other development tools. Keep up the great work!
References
- Apple Container GitHub Repository: The official source for the
containertool, including releases and documentation. - Apple Container Tutorial Documentation: A guided tour for common
containeroperations. - Docker Documentation on Dockerfile Best Practices: While specific to Docker, many
Dockerfileprinciples apply directly tocontaineras it’s OCI compliant. - Python
joblibdocumentation: For efficient saving and loading of Python objects, especially large NumPy arrays. - Scikit-learn User Guide: Comprehensive resource for machine learning in Python.
This page is AI-assisted and reviewed. It references official documentation and recognized resources where relevant.