Chapter 5: Advanced Logging: Artifacts, Models, and Custom Data

Welcome back, fellow MLOps explorer! In our previous chapters, you mastered the fundamentals of setting up Trackio, initializing runs, and logging basic scalar metrics like loss and accuracy. That’s a fantastic start, giving you a real-time pulse on your model’s training performance. But what happens when you need to track more than just numbers?

In the real world of machine learning, experiments generate much more than simple metrics. You’ll produce trained models, preprocessed datasets, stunning visualizations, and custom data tables. Just logging numbers isn’t enough to fully reproduce an experiment or understand its nuances. This chapter is your gateway to “advanced logging” with Trackio, where we’ll learn to treat these critical outputs as first-class citizens: artifacts.

By the end of this chapter, you’ll not only understand what artifacts are and why they’re crucial for robust ML workflows but also how to effectively log them using Trackio. We’ll cover logging trained models, datasets, images, and other custom data, ensuring your experiments are fully reproducible and easy to debug. Let’s elevate your experiment tracking game!

Core Concepts: Beyond Metrics – Understanding Artifacts

Before we jump into code, let’s solidify our understanding of what “artifacts” are in the context of machine learning and how Trackio helps us manage them.

What Exactly are ML Artifacts?

Think of an artifact as any file or collection of files that is either an input to your experiment or a significant output generated by it, beyond just scalar metrics. These are the tangible pieces of your ML project that are often critical for reproducibility, deployment, or further analysis.

Common examples of ML artifacts include:

Trained Models: The saved weights and architecture of your machine learning model (e.g., .pt, .h5, .pkl files).
Datasets: Preprocessed training, validation, or test datasets (e.g., .csv, .parquet, .json files).
Configuration Files: YAML or JSON files detailing hyperparameter settings, model architecture, or data preprocessing steps.
Visualizations: Plots, charts, and images generated during training or evaluation (e.g., .png, .jpg, .svg for loss curves, confusion matrices, ROC curves).
Evaluation Reports: Text files or custom data tables summarizing performance beyond simple metrics.

Why is Logging Artifacts So Important?

You might be thinking, “Can’t I just save these files to a folder?” And yes, you can. But simply saving them locally misses out on several key benefits that an experiment tracking system like Trackio provides:

Reproducibility: To truly reproduce an experiment, you need not only the code and hyperparameters but also the exact data and model produced or consumed. Logging artifacts links them directly to a specific run, making it easy to retrieve them later.
Version Control: Trackio, leveraging its integration with Hugging Face Datasets and Spaces, can help you implicitly manage versions of your artifacts. When you log an artifact, it’s associated with a specific experiment run, creating a historical record.
Sharing & Collaboration: Easily share specific models or datasets with team members by pointing them to a Trackio run ID, rather than managing file paths or cloud storage links manually.
Debugging & Auditing: If a model performs unexpectedly, having its exact weights, the data it was trained on, and all associated visualizations easily accessible through Trackio’s dashboard makes debugging much faster. It creates a clear audit trail.
Deployment Readiness: A trained model logged as an artifact is a clear candidate for deployment. You can fetch the correct model version directly from your tracking system.

How Trackio Handles Artifacts

Trackio is designed to be lightweight and API-compatible with popular tracking libraries like Weights & Biases (WandB). This means its trackio.log() function is quite versatile. While it doesn’t have a dedicated log_artifact() function in the same way some heavier systems do, it effectively handles artifacts by:

Logging File Paths: You save your artifact (model, data, image) to a local file, and then log the path to that file along with relevant metadata. Trackio’s backend then manages these files, potentially copying them to its local storage or preparing them for sync with Hugging Face Spaces.
Specialized Data Types: Trackio can intelligently handle certain Python objects or data types when passed to trackio.log, which it then serializes or processes for display in the dashboard.

Let’s visualize this workflow:

flowchart TD A[Start Experiment] --> B{Train Model / Process Data / Visualize} B --> C[Save Model to Local File] B --> D[Save Data to Local File] B --> E[Generate Plot & Save to Local Image] C --> F[Log Model File Path Trackio.log] D --> G[Log Data File Path Trackio.log] E --> H[Log Image File Path Trackio.log] F & G & H --> I[Trackio Backend Processes Artifacts] I --> J[View & Manage in Trackio Dashboard] J --> K[End Experiment]

Notice how each artifact type follows a similar pattern: create it, save it locally, and then tell Trackio about its location (and maybe some descriptive metadata).

Step-by-Step Implementation: Logging Real-World Artifacts

Let’s get our hands dirty and log some actual artifacts. We’ll simulate a simple machine learning workflow.

Prerequisites

Make sure you have Trackio installed, along with scikit-learn for a simple model, matplotlib for plotting, and joblib for model serialization.

pip install trackio==0.2.1 scikit-learn==1.3.2 matplotlib==3.8.2 joblib==1.3.2 numpy==1.26.2

(Note: Versions are as of 2026-01-01. Always prefer the latest stable versions.)

Step 1: Initialize Your Experiment

First, let’s set up a new Python file, say advanced_logging_example.py, and initialize a Trackio run.

# advanced_logging_example.py

import trackio
import os
import joblib
import matplotlib.pyplot as plt
import numpy as np
from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, confusion_matrix, ConfusionMatrixDisplay

# Ensure a directory for artifacts exists
ARTIFACTS_DIR = "trackio_artifacts"
os.makedirs(ARTIFACTS_DIR, exist_ok=True)

# 1. Initialize Trackio Run
# Remember to give your run a descriptive name!
run = trackio.init(project="Advanced_Artifact_Logging", name="logistic_regression_run_001")

print(f"Trackio run initialized: {run.id}")
print(f"Dashboard URL: {trackio.get_dashboard_url()}")

# We'll add more code here!

Explanation:

We import necessary libraries. os helps with directory creation, joblib for saving models, matplotlib for plots, numpy for numerical operations, and scikit-learn for our ML task.
ARTIFACTS_DIR is a local folder where we’ll temporarily save our artifacts before logging them.
trackio.init() starts a new run. We give it a project name and a specific name for this run.
trackio.get_dashboard_url() prints the URL where you can view your experiment’s progress. Open this URL in your browser!

Run this initial script: python advanced_logging_example.py. You should see a message confirming the run initialization and a dashboard URL.

Step 2: Prepare Data and Train a Simple Model

Now, let’s generate some synthetic data and train a basic logistic regression model.

Add the following code to advanced_logging_example.py, right after the print statements:

# ... (previous code) ...

# 2. Prepare Data
X, y = make_classification(n_samples=1000, n_features=10, n_informative=5, n_redundant=0, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Log hyperparameters (a good practice!)
hyperparameters = {
    "solver": "liblinear",
    "penalty": "l1",
    "C": 0.1,
    "random_state": 42
}
trackio.log(hyperparameters) # Log these as a dictionary

# 3. Train Model
model = LogisticRegression(**hyperparameters)
model.fit(X_train, y_train)

y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)

print(f"Model Accuracy: {accuracy:.4f}")
trackio.log({"test_accuracy": accuracy})

Explanation:

make_classification creates a synthetic dataset.
train_test_split divides our data.
We define hyperparameters as a dictionary and trackio.log() it. This is a great way to log configuration settings for easy retrieval later.
A LogisticRegression model is trained.
We calculate and trackio.log() the test_accuracy.

Run the script again. In your Trackio dashboard, you should now see the hyperparameters dictionary and the test_accuracy logged for this run!

Step 3: Logging a Trained Model as an Artifact

This is where advanced logging begins! We’ll save our trained LogisticRegression model to a file and then log its path.

Add this code to your script, after calculating accuracy:

# ... (previous code) ...

# 4. Log Trained Model as an Artifact
model_path = os.path.join(ARTIFACTS_DIR, "logistic_regression_model.pkl")
joblib.dump(model, model_path) # Save the model using joblib

print(f"Model saved locally to: {model_path}")

# Log the model artifact by providing its path and some metadata
trackio.log({
    "model_artifact": model_path,
    "model_name": "LogisticRegression",
    "model_version": "1.0",
    "model_framework": "scikit-learn"
})

Explanation:

os.path.join() creates a robust file path for our model within ARTIFACTS_DIR.
joblib.dump(model, model_path) serializes our scikit-learn model to a .pkl file. This is a common way to save Python objects.
trackio.log() is used again, but this time we’re logging a dictionary containing the model_artifact key with the path to our saved model. We also include useful metadata like model_name, model_version, and model_framework. This metadata is crucial for understanding the artifact later.

Run the script. Check your dashboard! You’ll see the model_artifact entry. Depending on Trackio’s backend, it might display as a link to a file or indicate that a file has been tracked.

Step 4: Logging a Data Artifact

Next, let’s imagine our X_test dataset is a crucial piece of data we want to log alongside our model for reproducibility.

Add this code:

# ... (previous code) ...

# 5. Log Test Data as an Artifact (e.g., as a CSV)
test_data_path = os.path.join(ARTIFACTS_DIR, "test_data.csv")
np.savetxt(test_data_path, X_test, delimiter=",") # Save test data as CSV

print(f"Test data saved locally to: {test_data_path}")

trackio.log({
    "test_data_artifact": test_data_path,
    "data_description": "Features for model evaluation",
    "data_format": "CSV",
    "num_samples": X_test.shape[0],
    "num_features": X_test.shape[1]
})

Explanation:

np.savetxt() saves our X_test NumPy array into a CSV file. For pandas DataFrames, you would use df.to_csv().
Again, trackio.log() records the path (test_data_artifact) along with descriptive metadata.

Run the script and observe the new test_data_artifact entry in your Trackio dashboard.

Step 5: Logging a Visualization Artifact (Image)

Visualizations are incredibly helpful for understanding model behavior. Let’s log a confusion matrix plot.

Add this code:

# ... (previous code) ...

# 6. Log a Visualization (Confusion Matrix) as an Artifact
cm = confusion_matrix(y_test, y_pred)
disp = ConfusionMatrixDisplay(confusion_matrix=cm, display_labels=["Class 0", "Class 1"])

fig, ax = plt.subplots(figsize=(6, 6))
disp.plot(cmap=plt.cm.Blues, ax=ax)
ax.set_title("Confusion Matrix")

confusion_matrix_path = os.path.join(ARTIFACTS_DIR, "confusion_matrix.png")
plt.savefig(confusion_matrix_path) # Save the plot as a PNG image
plt.close(fig) # Close the plot to free memory

print(f"Confusion matrix plot saved locally to: {confusion_matrix_path}")

trackio.log({
    "confusion_matrix_plot": confusion_matrix_path,
    "plot_type": "Confusion Matrix",
    "evaluation_set": "Test"
})

Explanation:

We use sklearn.metrics.confusion_matrix and ConfusionMatrixDisplay to create a visual representation of our model’s performance.
plt.savefig() is crucial here: it saves the generated plot to a file (.png in this case).
plt.close(fig) is good practice to prevent plots from accumulating in memory, especially in long-running scripts.
Finally, we trackio.log() the path to our saved image, again with descriptive metadata.

Run the script one last time. Your dashboard should now be rich with scalar metrics, hyperparameters, and three distinct artifacts: the trained model, the test data, and the confusion matrix plot!

Step 6: End the Trackio Run

It’s good practice to explicitly end your Trackio run, though it will often terminate automatically when your script finishes.

Add this to the very end of your script:

# ... (previous code) ...

# 7. End the Trackio Run
trackio.end()
print("Trackio run ended.")

Now, your advanced_logging_example.py is complete! You’ve successfully logged various types of artifacts.

Mini-Challenge: Log a Feature Importance Plot

You’ve done a great job logging models, data, and a basic plot. Now, it’s your turn to apply what you’ve learned.

Challenge: Extend the advanced_logging_example.py script to:

Calculate feature importances for your LogisticRegression model. While LogisticRegression doesn’t have a direct feature_importances_ attribute like tree-based models, its coef_ attribute can be interpreted as importance (absolute value).
Create a bar plot showing these feature importances.
Save this plot as an image file (e.g., feature_importance.png) in your trackio_artifacts directory.
Log the path to this feature importance plot as an artifact with Trackio, including relevant metadata.

Hint:

Access coefficients using model.coef_[0] (for binary classification).
Use np.abs() to get absolute values for importance.
plt.bar() is useful for bar plots.
Remember plt.savefig() and trackio.log()!

What to Observe/Learn:

How to extract insights (like feature importance) from your model.
The complete workflow of generating a visualization, saving it, and logging it as an artifact.
The richness of information you can associate with a single experiment run in Trackio.

Take your time, try to solve it independently, and then check the solution if you get stuck!

Click for Mini-Challenge Solution

# ... (previous code in advanced_logging_example.py) ...

# Add this section after logging the confusion matrix plot

# 7. Mini-Challenge Solution: Log Feature Importance Plot
# For Logistic Regression, coefficients indicate feature importance
feature_importances = np.abs(model.coef_[0])
feature_names = [f"Feature {i}" for i in range(X.shape[1])]

# Create a bar plot
fig_fi, ax_fi = plt.subplots(figsize=(10, 6))
ax_fi.bar(feature_names, feature_importances)
ax_fi.set_xlabel("Feature")
ax_fi.set_ylabel("Absolute Coefficient Value (Importance)")
ax_fi.set_title("Feature Importances (Logistic Regression Coefficients)")
plt.xticks(rotation=45, ha='right')
plt.tight_layout()

feature_importance_path = os.path.join(ARTIFACTS_DIR, "feature_importance.png")
plt.savefig(feature_importance_path)
plt.close(fig_fi)

print(f"Feature importance plot saved locally to: {feature_importance_path}")

trackio.log({
    "feature_importance_plot": feature_importance_path,
    "plot_type": "Feature Importance",
    "model_insight": "Coefficient-based importance"
})

# 8. End the Trackio Run (if not already added)
trackio.end()
print("Trackio run ended.")

Common Pitfalls & Troubleshooting

Even with clear steps, logging artifacts can sometimes throw a curveball. Here are a few common issues and how to tackle them:

“File Not Found” Errors when Logging Artifacts:
- Pitfall: You tried to log a file path with Trackio, but the file either doesn’t exist at that location or the path is incorrect.
- Troubleshooting:
  - Double-check the os.path.join() calls. Are your directories and filenames correct?
  - Ensure os.makedirs(ARTIFACTS_DIR, exist_ok=True) runs before you try to save any files.
  - Verify that plt.savefig() or joblib.dump() (or similar save functions) are successfully executed before trackio.log() is called for that artifact. Print the model_path, test_data_path, etc., to confirm they exist before logging.
Large Artifacts Slowing Down Your Workflow:
- Pitfall: Logging very large files (e.g., multi-GB datasets or high-resolution images) can consume significant local disk space and might slow down any potential syncing with remote services like Hugging Face Spaces.
- Troubleshooting:
  - Be selective: Do you really need to log the entire raw dataset for every run? Perhaps log only preprocessed data, or a sample, and reference the raw data’s location in cloud storage.
  - Optimize file formats: Use efficient formats like Parquet for tabular data, or compressed image formats (JPEG) when high fidelity isn’t strictly necessary.
  - Consider versioning systems: For extremely large datasets, dedicated data versioning tools (like DVC) might be combined with Trackio, where Trackio logs the DVC pointer rather than the entire file.
Missing or Unclear Artifacts in the Dashboard:
- Pitfall: You logged an artifact, but it doesn’t appear as expected in the Trackio dashboard, or its description is unhelpful.
- Troubleshooting:
  - Check trackio.log() arguments: Ensure you’re passing a dictionary where the key is descriptive (e.g., "model_artifact") and the value is the correct file path.
  - Include sufficient metadata: Always add extra keys to your logged dictionary (like model_name, data_description, plot_type) to make the artifact immediately understandable in the dashboard. Don’t just log the path by itself!
  - Refresh the dashboard: Sometimes a simple refresh is all it takes for new logs to appear.

Summary

Congratulations! You’ve successfully ventured into the world of advanced logging with Trackio. Let’s quickly recap what you’ve learned:

Artifacts are key: Beyond scalar metrics, artifacts like trained models, datasets, and visualizations are crucial for robust ML experiment tracking.
Why log artifacts: They ensure reproducibility, provide version control, facilitate sharing, aid in debugging, and prepare models for deployment.
Trackio’s approach: Trackio effectively logs artifacts by associating local file paths (and their content) with your experiment runs, leveraging trackio.log() with descriptive metadata.
Practical application: You’ve learned to save and log a trained scikit-learn model, a preprocessed dataset, and a matplotlib visualization as distinct artifacts.

By consistently logging these artifacts, you transform your Trackio dashboard from a simple metric tracker into a comprehensive repository for each experiment, making your machine learning workflow more organized, transparent, and reproducible.

What’s next? In the upcoming chapters, we’ll explore how to leverage Trackio’s dashboard for deeper analysis, delve into command-line tools for managing runs, and discover how to seamlessly sync your local experiments with Hugging Face Spaces for collaborative sharing and deployment.

References

This page is AI-assisted and reviewed. It references official documentation and recognized resources where relevant.

Chapter 5: Advanced Logging: Artifacts, Models, and Custom Data

Table of Contents

Core Concepts: Beyond Metrics – Understanding Artifacts

What Exactly are ML Artifacts?

Why is Logging Artifacts So Important?

How Trackio Handles Artifacts

Step-by-Step Implementation: Logging Real-World Artifacts

Prerequisites

Step 1: Initialize Your Experiment

Step 2: Prepare Data and Train a Simple Model

Step 3: Logging a Trained Model as an Artifact

Step 4: Logging a Data Artifact

Step 5: Logging a Visualization Artifact (Image)

Step 6: End the Trackio Run

Mini-Challenge: Log a Feature Importance Plot

Common Pitfalls & Troubleshooting

Summary

References