Chapter 4: Introduction to Classical Machine Learning

Introduction to Classical Machine Learning

Welcome back, future AI/ML expert! In the previous chapters, we laid the groundwork with essential programming skills in Python and familiarized ourselves with crucial data manipulation libraries like NumPy and Pandas. If you haven’t mastered those yet, take a moment to review, as they’re the bedrock of everything we’re about to build.

In this chapter, we’re taking our first exciting leap into the core of Artificial Intelligence: Classical Machine Learning. This field is where algorithms learn patterns from data to make predictions or decisions without being explicitly programmed for every single scenario. You’ll discover how these fundamental algorithms work, why they are still incredibly relevant in 2026, and gain hands-on experience implementing them using scikit-learn, Python’s most popular library for traditional machine learning.

By the end of this chapter, you’ll be able to:

Understand the difference between regression and classification tasks.
Implement and train basic machine learning models like Linear Regression, Logistic Regression, and Decision Trees.
Perform essential data preprocessing steps.
Evaluate your models using key metrics.
Feel confident in building your first end-to-end ML solution!

Ready to make your computer learn? Let’s dive in!

Core Concepts of Classical Machine Learning

Machine Learning is a vast field, but at its heart, it’s about enabling systems to learn from data. Classical Machine Learning refers to the algorithms developed before the deep learning revolution, which often work well on structured, tabular data and typically require less computational power than their deep learning counterparts.

What is Machine Learning?

At a high level, Machine Learning involves teaching computers to recognize patterns and make decisions from data. Instead of writing explicit rules for every possible input, we feed the machine data and let it “learn” the rules itself.

There are three main types of machine learning:

Supervised Learning: This is what we’ll focus on today. It involves learning from “labeled” data, meaning each piece of input data comes with the correct output or “label.” Think of it like a student learning with flashcards where each card has a question and its answer.
- Regression: Predicting a continuous numerical value (e.g., predicting house prices, temperature, stock prices).
- Classification: Predicting a categorical label or class (e.g., identifying if an email is spam or not, classifying an image as a cat or dog, predicting customer churn).
Unsupervised Learning: This deals with “unlabeled” data. The algorithm tries to find hidden patterns or structures within the data on its own (e.g., clustering customers into different segments, dimensionality reduction).
Reinforcement Learning: Here, an agent learns by interacting with an environment, receiving rewards for desired actions and penalties for undesirable ones (e.g., training an AI to play a game, controlling a robot).

For our introduction, we’ll concentrate on Supervised Learning as it’s the most common starting point and provides immediate, tangible results.

The Machine Learning Workflow

Regardless of the specific algorithm, most supervised machine learning projects follow a similar workflow:

Data Collection & Understanding: Gather relevant data and explore its characteristics (features, target, types, distributions).
Data Preprocessing: Clean and transform the data to make it suitable for machine learning algorithms. This is often the most time-consuming step!
Model Selection: Choose an appropriate machine learning algorithm for your task (regression or classification).
Model Training: Feed the preprocessed training data to the algorithm, allowing it to learn patterns.
Model Evaluation: Assess how well the trained model performs on unseen data using specific metrics.
Prediction/Deployment: Use the trained model to make predictions on new, real-world data.

Introducing `scikit-learn` (Version 1.5.0+)

For classical machine learning in Python, the scikit-learn library is the undisputed champion. It provides a consistent interface for a vast array of algorithms, data preprocessing tools, and evaluation metrics. As of early 2026, scikit-learn has continued to evolve, with its latest stable release typically in the 1.5.x range, offering robust features and performance enhancements.

You can find its official documentation at scikit-learn.org.

Algorithm Spotlight: Linear Regression

What it is: Linear Regression is one of the simplest and most fundamental algorithms for regression tasks. It assumes a linear relationship between the input features (independent variables) and the output target (dependent variable).

How it works: The goal is to find the “best-fitting” straight line (or hyperplane in higher dimensions) that minimizes the distance between the line and all the data points.

Imagine you’re trying to predict a student’s exam score based on the hours they studied. Linear regression would try to draw a line that best represents this relationship. The equation for a simple linear regression with one feature is:

y = mx + b

Where:

y is the predicted output (e.g., exam score).
x is the input feature (e.g., hours studied).
m is the slope of the line (how much y changes for a unit change in x).
b is the y-intercept (the predicted y when x is 0).

For multiple features, it extends to: y = b0 + b1x1 + b2x2 + ... + bnxn.

Why it matters: It’s highly interpretable, provides a baseline for more complex models, and forms the basis for many statistical concepts.

Algorithm Spotlight: Logistic Regression

What it is: Despite its name, Logistic Regression is a powerful and widely used algorithm for classification tasks, especially binary classification (two classes).

How it works: Instead of predicting a continuous value directly, Logistic Regression predicts the probability that an input belongs to a certain class. It achieves this by passing the output of a linear equation through a special function called the sigmoid function. The sigmoid function squashes any real-valued input into a value between 0 and 1, which can be interpreted as a probability.

If the probability is above a certain threshold (commonly 0.5), the model classifies it into one class; otherwise, it classifies it into the other.

Why it matters: It’s efficient, interpretable (you can understand the impact of each feature on the probability), and a fundamental building block for understanding more advanced classification techniques.

Algorithm Spotlight: Decision Trees

What it is: Decision Trees are versatile algorithms that can be used for both classification and regression tasks. They model decisions as a tree-like structure.

How it works: A Decision Tree recursively splits the dataset into smaller and smaller subsets based on the values of the input features. Each internal node in the tree represents a “decision” or a test on a feature (e.g., “Is ‘petal length’ > 2.45?”). Each branch represents the outcome of that decision, and each leaf node represents the final predicted outcome (a class label for classification, or a numerical value for regression).

Think of it like a series of if-else statements.

flowchart TD A[Root: All Data] -->|Feature X <= Threshold| B{Node 1: Subset 1} A -->|Feature X > Threshold| C{Node 2: Subset 2} B -->|Feature Y <= Threshold| D[Leaf 1: Class A] B -->|Feature Y > Threshold| E[Leaf 2: Class B] C -->|Feature Z <= Threshold| F[Leaf 3: Class C] C -->|Feature Z > Threshold| G[Leaf 4: Class D]

Why it matters: They are intuitive to understand and visualize, can handle both numerical and categorical data, and require minimal data preprocessing. However, they can be prone to overfitting if not properly controlled.

Step-by-Step Implementation: Building Your First Models

Let’s get our hands dirty! We’ll use scikit-learn to build and train a Linear Regression model for a simple synthetic dataset and then Logistic Regression and Decision Tree models for the famous Iris dataset.

Setup Your Environment

First, ensure you have Python 3.10+ installed. Then, create a virtual environment and install the necessary libraries.

# Create a virtual environment
python -m venv ml_env

# Activate the virtual environment
# On Windows:
# ml_env\Scripts\activate
# On macOS/Linux:
# source ml_env/bin/activate

# Install libraries (as of Jan 2026, these versions are stable or commonly used)
pip install scikit-learn==1.5.0 pandas==2.1.4 numpy==1.26.2 matplotlib==3.8.2 seaborn==0.13.0

Note: scikit-learn 1.5.0 is a hypothetical stable release for early 2026. Always check the official scikit-learn documentation for the absolute latest stable version if you encounter issues.

Create a new Python file, say classical_ml_project.py, to write your code.

Part 1: Linear Regression - Predicting a Trend

We’ll start with a simple linear regression to predict a continuous value.

Step 1: Import Libraries and Generate Data

We’ll use numpy to create a simple dataset where y is linearly dependent on x with some random noise.

# classical_ml_project.py

import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score

print("--- Linear Regression Example ---")

# 1. Generate some synthetic data
np.random.seed(42) # For reproducibility
X = 2 * np.random.rand(100, 1) # 100 samples, 1 feature (values between 0 and 2)
y = 4 + 3 * X + np.random.randn(100, 1) # y = 4 + 3x + noise

# Let's see what our data looks like
plt.figure(figsize=(8, 6))
plt.scatter(X, y, color='blue', label='Original Data')
plt.title('Synthetic Data for Linear Regression')
plt.xlabel('X (Feature)')
plt.ylabel('y (Target)')
plt.legend()
plt.grid(True)
plt.show()

Explanation:

We import numpy for numerical operations and matplotlib.pyplot for plotting.
np.random.seed(42) ensures that our random data is the same every time we run the script, which is great for debugging and reproducibility.
X is our feature (independent variable), a column vector of 100 random numbers.
y is our target (dependent variable), generated from X with a linear relationship (4 + 3 * X) plus some random noise (np.random.randn(100, 1)) to make it more realistic.
The plt.scatter command creates a scatter plot of our data, giving us a visual understanding.

Step 2: Split Data into Training and Testing Sets

It’s crucial to evaluate our model on data it has never seen before to ensure it generalizes well. We split our data into a training set (to teach the model) and a testing set (to evaluate it).

# ... (previous code) ...

# 2. Split the data into training and testing sets
# We use 80% for training and 20% for testing
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

print(f"Training data shape: X_train={X_train.shape}, y_train={y_train.shape}")
print(f"Testing data shape: X_test={X_test.shape}, y_test={y_test.shape}")

Explanation:

train_test_split from sklearn.model_selection is a powerful function for this.
test_size=0.2 means 20% of the data will be used for testing.
random_state=42 again ensures reproducibility of the split.

Step 3: Train the Linear Regression Model

Now, let’s create and train our model!

# ... (previous code) ...

# 3. Create and train the Linear Regression model
model = LinearRegression() # Initialize the model
model.fit(X_train, y_train) # Train the model using the training data

print("\nModel Training Complete!")
print(f"Intercept (b): {model.intercept_[0]:.2f}")
print(f"Coefficient (m): {model.coef_[0][0]:.2f}")

Explanation:

LinearRegression() creates an instance of our model.
model.fit(X_train, y_train) is where the magic happens! The model learns the optimal m and b values from our training data.
model.intercept_ gives us the b value, and model.coef_ gives us the m value(s). Notice how close they are to the true values (4 and 3) we used to generate the data!

Step 4: Make Predictions and Evaluate the Model

After training, we use the model to predict on our unseen test data and then evaluate its performance.

# ... (previous code) ...

# 4. Make predictions on the test set
y_pred = model.predict(X_test)

# 5. Evaluate the model
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print(f"\nMean Squared Error (MSE) on test set: {mse:.2f}")
print(f"R-squared (R2) score on test set: {r2:.2f}")

# Visualize the predictions
plt.figure(figsize=(8, 6))
plt.scatter(X_test, y_test, color='blue', label='Actual Test Data')
plt.plot(X_test, y_pred, color='red', linewidth=2, label='Linear Regression Prediction')
plt.title('Linear Regression: Actual vs. Predicted')
plt.xlabel('X (Feature)')
plt.ylabel('y (Target)')
plt.legend()
plt.grid(True)
plt.show()

Explanation:

model.predict(X_test) generates predictions for the input features in our test set.
mean_squared_error (MSE) measures the average squared difference between the actual and predicted values. Lower is better.
r2_score (R-squared) indicates how well the model explains the variance in the target variable. A value closer to 1 means a better fit.
The final plot shows the actual test data points and the line our model learned, demonstrating its fit.

Part 2: Logistic Regression - Classifying Iris Species

Now, let’s switch to a classification task using the famous Iris dataset, which is conveniently included with scikit-learn. The goal is to classify iris flowers into one of three species based on their sepal and petal measurements.

Step 1: Import Libraries and Load Data

# ... (previous linear regression code) ...

import pandas as pd
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
from sklearn.preprocessing import StandardScaler # Important for many ML models

print("\n--- Logistic Regression Example (Iris Dataset) ---")

# 1. Load the Iris dataset
iris = load_iris()
X_iris = iris.data # Features
y_iris = iris.target # Target (species labels)

# It's good practice to understand the data
# Let's put it into a Pandas DataFrame for better viewing
iris_df = pd.DataFrame(data=X_iris, columns=iris.feature_names)
iris_df['species'] = iris.target_names[y_iris]

print("\nFirst 5 rows of Iris dataset:")
print(iris_df.head())
print("\nIris target names (species):", iris.target_names)
print("Number of samples for each species:", np.bincount(y_iris))

Explanation:

We import load_iris to get the dataset, LogisticRegression for our model, and accuracy_score, classification_report, confusion_matrix for evaluation.
StandardScaler is also imported, which we’ll discuss next.
We load X_iris (features like sepal length, petal width) and y_iris (the species labels: 0, 1, or 2).
Converting to a Pandas DataFrame helps visualize the raw data and column names. iris.target_names maps the numerical labels (0, 1, 2) to actual species names (setosa, versicolor, virginica).

Step 2: Split Data and Preprocess (Feature Scaling)

For many machine learning algorithms, especially those that calculate distances between data points (like Logistic Regression, SVMs, K-Nearest Neighbors), it’s crucial to scale the features. Feature scaling ensures that no single feature dominates the learning process just because it has larger numerical values.

StandardScaler transforms features to have a mean of 0 and a standard deviation of 1.

# ... (previous code) ...

# 2. Split the data into training and testing sets
X_train_iris, X_test_iris, y_train_iris, y_test_iris = train_test_split(
    X_iris, y_iris, test_size=0.3, random_state=42, stratify=y_iris # stratify ensures even distribution of classes
)

# 3. Feature Scaling
# Initialize the scaler
scaler = StandardScaler()

# Fit the scaler on the training data and transform both training and testing data
# IMPORTANT: Fit ONLY on training data to prevent data leakage from the test set
X_train_iris_scaled = scaler.fit_transform(X_train_iris)
X_test_iris_scaled = scaler.transform(X_test_iris)

print("\nData splitting and scaling complete.")
print(f"Scaled X_train_iris_scaled (first 2 samples):\n{X_train_iris_scaled[:2]}")

Explanation:

We split the Iris data, using stratify=y_iris to ensure that the proportion of each species is roughly the same in both the training and testing sets. This is important for classification tasks with imbalanced classes.
StandardScaler() is initialized.
scaler.fit_transform(X_train_iris) calculates the mean and standard deviation from the training data and then applies the transformation.
scaler.transform(X_test_iris) applies the same transformation (using the mean/std learned from the training data) to the test set. Never fit_transform on the test set! This prevents “data leakage” where information from the test set subtly influences the training process.

Step 3: Train the Logistic Regression Model

# ... (previous code) ...

# 4. Create and train the Logistic Regression model
# solver='liblinear' is a good choice for small datasets, 'lbfgs' for larger ones (default in 1.5.0)
# max_iter increases the number of iterations for convergence, often needed for 'lbfgs'
log_reg_model = LogisticRegression(solver='lbfgs', max_iter=200, random_state=42)
log_reg_model.fit(X_train_iris_scaled, y_train_iris)

print("\nLogistic Regression Model Training Complete!")

Explanation:

We instantiate LogisticRegression.
solver='lbfgs' is the default solver in scikit-learn 1.5.0 and generally recommended for multiclass problems.
max_iter=200 increases the maximum number of iterations the solver runs to converge, which can be necessary to prevent warnings about non-convergence.
log_reg_model.fit() trains the model on our scaled training data.

Step 4: Make Predictions and Evaluate the Model

# ... (previous code) ...

# 5. Make predictions on the scaled test set
y_pred_iris = log_reg_model.predict(X_test_iris_scaled)

# 6. Evaluate the model
accuracy = accuracy_score(y_test_iris, y_pred_iris)
conf_matrix = confusion_matrix(y_test_iris, y_pred_iris)
class_report = classification_report(y_test_iris, y_pred_iris, target_names=iris.target_names)

print(f"\nAccuracy on Iris test set: {accuracy:.2f}")
print("\nConfusion Matrix:\n", conf_matrix)
print("\nClassification Report:\n", class_report)

Explanation:

log_reg_model.predict() makes predictions on the scaled test features.
accuracy_score simply tells us the proportion of correctly classified samples.
confusion_matrix is a table showing correct vs. incorrect predictions for each class. It’s incredibly useful for understanding where your model is making mistakes.
- Rows represent actual classes, columns represent predicted classes.
- Diagonal elements are correct predictions.
classification_report provides precision, recall, and F1-score for each class, giving a more nuanced view of performance than just accuracy.
- Precision: Out of all samples predicted as a certain class, how many were actually that class? (Minimizes false positives)
- Recall: Out of all samples that actually belong to a certain class, how many did the model correctly identify? (Minimizes false negatives)
- F1-Score: The harmonic mean of precision and recall, balancing both.

Part 3: Decision Tree - Classifying Iris Species (Alternative)

Let’s try a Decision Tree on the same Iris dataset to see how it compares. Decision Trees often perform well without extensive scaling.

Step 1: Import and Train Decision Tree

# ... (previous code) ...

from sklearn.tree import DecisionTreeClassifier, plot_tree

print("\n--- Decision Tree Example (Iris Dataset) ---")

# 1. Create and train the Decision Tree Classifier model
# We'll use the unscaled data for simplicity, as Decision Trees are less sensitive to feature scaling
# max_depth controls the maximum depth of the tree, preventing overfitting
dt_model = DecisionTreeClassifier(max_depth=3, random_state=42)
dt_model.fit(X_train_iris, y_train_iris) # Using unscaled training data

print("\nDecision Tree Model Training Complete!")

Explanation:

We import DecisionTreeClassifier.
Notice we’re using the unscaled X_train_iris data. Decision Trees are rule-based and don’t care about the magnitude of features, only their relative order.
max_depth=3 is a crucial hyperparameter. It limits how deep the tree can grow, which is a primary way to prevent overfitting (where the model learns the training data too well, including its noise, and performs poorly on new data).

Step 2: Make Predictions and Evaluate the Model

# ... (previous code) ...

# 2. Make predictions on the unscaled test set
y_pred_dt = dt_model.predict(X_test_iris) # Using unscaled test data

# 3. Evaluate the model
accuracy_dt = accuracy_score(y_test_iris, y_pred_dt)
conf_matrix_dt = confusion_matrix(y_test_iris, y_pred_dt)
class_report_dt = classification_report(y_test_iris, y_pred_dt, target_names=iris.target_names)

print(f"\nAccuracy on Iris test set (Decision Tree): {accuracy_dt:.2f}")
print("\nConfusion Matrix (Decision Tree):\n", conf_matrix_dt)
print("\nClassification Report (Decision Tree):\n", class_report_dt)

# Optional: Visualize the Decision Tree (requires matplotlib and graphviz, though plot_tree is built-in)
plt.figure(figsize=(15, 10))
plot_tree(dt_model, filled=True, feature_names=iris.feature_names, class_names=iris.target_names, rounded=True)
plt.title('Decision Tree Visualization (Max Depth 3)')
plt.show()

Explanation:

Similar to Logistic Regression, we predict on the test set and evaluate using accuracy, confusion matrix, and classification report.
The plot_tree function is a fantastic built-in tool in scikit-learn to visualize the decision-making process of the tree. This is a huge advantage of Decision Trees – their interpretability!

Mini-Challenge: Tune Your Decision Tree!

You’ve built and evaluated three models! Now it’s your turn to experiment.

Challenge: Modify the DecisionTreeClassifier in Part 3. Instead of max_depth=3, try max_depth=None (which means no limit on depth) or experiment with different integer values (e.g., 2, 5, 10). Observe how the accuracy and the shape of the visualized tree change.

Hint: Locate this line in your code: dt_model = DecisionTreeClassifier(max_depth=3, random_state=42) Change max_depth to None or another integer.

What to Observe/Learn:

How does max_depth=None affect accuracy on the training set versus the test set? (You’ll need to add dt_model.score(X_train_iris, y_train_iris) to see training accuracy).
Can you find a max_depth that gives a good balance between training and testing accuracy?
How does the tree visualization become more complex as max_depth increases? This is a direct visual representation of potential overfitting.

Take your time, try different values, and analyze the results. This is how real-world ML engineers learn to tune their models!

Common Pitfalls & Troubleshooting

As you embark on your ML journey, you’ll inevitably encounter challenges. Here are a few common pitfalls to be aware of:

Overfitting vs. Underfitting:
- Overfitting: Your model learns the training data too well, including its noise and idiosyncrasies. It performs brilliantly on training data but poorly on unseen test data. Think of it as memorizing answers for a specific test but failing to understand the subject.
- Underfitting: Your model is too simple to capture the underlying patterns in the data. It performs poorly on both training and test data. Think of it as not studying enough for the test.
- Troubleshooting: Compare training accuracy/error with test accuracy/error. If training accuracy is much higher, you might be overfitting. If both are low, you might be underfitting. For overfitting, try simplifying the model (e.g., max_depth for Decision Trees), getting more data, or using regularization. For underfitting, try a more complex model or adding more relevant features.
Forgetting Feature Scaling:
- Some algorithms (like Logistic Regression, SVMs, K-Nearest Neighbors) are highly sensitive to the scale of input features. If one feature has values from 0-1 and another from 0-10,000, the latter might disproportionately influence the model.
- Troubleshooting: Always consider scaling your numerical features, especially for algorithms that rely on distance calculations. As shown with StandardScaler, fit on training data and transform both train/test sets.
Data Leakage:
- This occurs when information from the test set “leaks” into the training process, leading to overly optimistic evaluation metrics. A common example is scaling the entire dataset before splitting, or performing feature engineering steps using the entire dataset.
- Troubleshooting: Ensure all data preprocessing steps (scaling, imputation, feature engineering) are applied after the train-test split, and that any parameters for these steps (like mean/std for scaling) are learned only from the training data.

Summary

Phew! You’ve just completed a significant milestone in your AI/ML journey. You’ve been introduced to the core concepts of classical machine learning and built your first predictive models!

Here are the key takeaways from this chapter:

Classical ML algorithms are foundational and highly effective for many real-world problems, especially with structured data.
Supervised Learning involves training models on labeled data for tasks like Regression (predicting continuous values) and Classification (predicting categorical labels).
The Machine Learning Workflow involves data preprocessing, model selection, training, and rigorous evaluation.
scikit-learn is your primary tool for implementing classical ML algorithms in Python, offering a consistent and user-friendly API.
Linear Regression is a simple yet powerful algorithm for regression tasks, finding the best-fit line.
Logistic Regression, despite its name, is a fundamental algorithm for classification, outputting probabilities via the sigmoid function.
Decision Trees are interpretable, rule-based models for both classification and regression, but require careful handling of max_depth to prevent overfitting.
Feature Scaling is crucial for many algorithms to ensure fair feature contribution.
Always evaluate your models on unseen test data to gauge their true performance and watch out for pitfalls like overfitting and data leakage.

You’ve built a solid foundation. In the next chapter, we’ll delve deeper into more advanced classical ML techniques, discuss model selection strategies, and explore powerful concepts like cross-validation and hyperparameter tuning to make your models even better!

References

This page is AI-assisted and reviewed. It references official documentation and recognized resources where relevant.

Chapter 4: Introduction to Classical Machine Learning

Table of Contents

Introduction to Classical Machine Learning

Core Concepts of Classical Machine Learning

What is Machine Learning?

The Machine Learning Workflow

Introducing scikit-learn (Version 1.5.0+)

Algorithm Spotlight: Linear Regression

Algorithm Spotlight: Logistic Regression

Algorithm Spotlight: Decision Trees

Step-by-Step Implementation: Building Your First Models

Setup Your Environment

Part 1: Linear Regression - Predicting a Trend

Step 1: Import Libraries and Generate Data

Step 2: Split Data into Training and Testing Sets

Step 3: Train the Linear Regression Model

Step 4: Make Predictions and Evaluate the Model

Part 2: Logistic Regression - Classifying Iris Species

Step 1: Import Libraries and Load Data

Step 2: Split Data and Preprocess (Feature Scaling)

Step 3: Train the Logistic Regression Model

Step 4: Make Predictions and Evaluate the Model

Part 3: Decision Tree - Classifying Iris Species (Alternative)

Step 1: Import and Train Decision Tree

Step 2: Make Predictions and Evaluate the Model

Mini-Challenge: Tune Your Decision Tree!

Common Pitfalls & Troubleshooting

Summary

References

Introducing `scikit-learn` (Version 1.5.0+)