Chapter 11: Parallel Processing and Performance Optimization

Welcome to Chapter 11! Up to this point, our static site generator (SSG) has been meticulously processing content, parsing frontmatter, converting Markdown to HTML, and rendering templates in a sequential fashion. While this approach is perfectly fine for smaller sites, as the number of content pages grows, the build time can become a significant bottleneck, impacting developer productivity and feedback cycles.

In this chapter, we will tackle this performance challenge head-on by introducing parallel processing into our SSG’s build pipeline. Rust’s excellent concurrency story, particularly with libraries like rayon, makes it straightforward to distribute computationally intensive tasks across multiple CPU cores. By the end of this chapter, our SSG will be capable of leveraging the full power of modern multi-core processors, drastically reducing build times for large projects, while maintaining the correctness and reliability of our generated output.

Planning & Design

The core idea behind optimizing our SSG’s build process is to identify independent tasks that can be executed concurrently. In an SSG, the processing of individual content pages (reading, parsing, rendering) is largely independent of other pages. This makes it an ideal candidate for data parallelism. We’ll focus on parallelizing the most time-consuming steps: content parsing, HTML transformation, template rendering, and file writing.

For this, we’ll integrate the rayon crate, a data-parallelism library for Rust. rayon provides an easy-to-use API for converting sequential iterators into parallel ones, allowing us to process collections of items across multiple threads with minimal code changes.

Architecture Diagram: Parallel Build Pipeline

The following diagram illustrates how the build pipeline will be enhanced with parallel processing at key stages:

flowchart TD A["Start Build Process"] --> B{"Load Content Paths"}; B --> C["Read and Parse Frontmatter/Markdown (Parallel)"]; C --> D["Transform AST to Renderable HTML (Parallel)"]; D --> E["Render HTML with Templates (Parallel)"]; E --> F["Write Output Files (Parallel)"]; F --> G["End Build Process"]; subgraph Build_Pipeline["SSG Build Pipeline"] C; D; E; F; end

As you can see, the sequential “Load Content Paths” step identifies all files, but then the subsequent CPU-intensive operations for each content item (parsing, transforming, rendering, writing) are executed in parallel. This significantly reduces the overall build time.

File Structure

We will primarily modify the existing src/build/builder.rs file, where our main build logic resides. No new top-level modules are expected, but internal functions might be adjusted to accept parallel iterators or be called within a parallel context.

Step-by-Step Implementation

Let’s begin by integrating rayon and refactoring our build logic for parallelism.

a) Setup/Configuration

First, we need to add rayon as a dependency to our Cargo.toml.

File: Cargo.toml

[package]
name = "my_ssg"
version = "0.1.0"
edition = "2021"

[dependencies]
# ... existing dependencies ...
serde = { version = "1.0", features = ["derive"] }
serde_yaml = "0.9"
tera = "1.19"
pulldown-cmark = "0.10"
anyhow = "1.0"
log = "0.4"
env_logger = "0.11"
# Add rayon for parallel processing
rayon = "1.8" # Use the latest stable version

After adding the dependency, run cargo build to ensure rayon is downloaded and compiled.

b) Core Implementation

We will now modify our Builder struct’s build method. We’ll assume you have a ContentProcessor or similar module that handles the parsing and rendering for individual pages. The goal is to make the iteration over content items parallel.

Let’s assume our Builder struct has a method, say process_content_files, that takes a list of PathBuf for content files and processes them. We’ll also assume we have a Content struct that represents a fully parsed and rendered page.

File: src/build/builder.rs

use std::path::{Path, PathBuf};
use std::fs;
use std::sync::{Arc, Mutex};
use anyhow::{Result, Context};
use log::{info, error, debug};
use rayon::prelude::*; // Import rayon's parallel iterator traits

use crate::config::Config;
use crate::content::{Content, ContentError}; // Assuming Content and ContentError exist
use crate::template::{TemplateEngine, TemplateError}; // Assuming TemplateEngine and TemplateError exist
use crate::router::{Router, RouteError}; // Assuming Router and RouteError exist

/// Represents the SSG builder, responsible for orchestrating the build process.
pub struct Builder {
    config: Arc<Config>,
    template_engine: Arc<TemplateEngine>,
    router: Arc<Router>,
    // Add a collection to store processed content
    processed_content: Mutex<Vec<Content>>,
}

impl Builder {
    pub fn new(config: Config, template_engine: TemplateEngine, router: Router) -> Self {
        Builder {
            config: Arc::new(config),
            template_engine: Arc::new(template_engine),
            router: Arc::new(router),
            processed_content: Mutex::new(Vec::new()),
        }
    }

    /// The main build orchestrator, now leveraging parallel processing.
    pub fn build(&self) -> Result<()> {
        info!("Starting SSG build process...");

        // 1. Scan content directory and get all content file paths
        let content_paths = self.scan_content_directory()
            .context("Failed to scan content directory")?;

        info!("Found {} content files.", content_paths.len());

        // 2. Process content files in parallel
        // This step involves reading, parsing frontmatter, and converting Markdown to HTML AST.
        // The `filter_map` is used to gracefully handle errors for individual files
        // without stopping the entire build, but logging them.
        let processed_pages_result: Vec<Result<Content>> = content_paths.par_iter()
            .map(|path| self.process_single_content_file(path))
            .collect();

        let mut successful_pages = Vec::new();
        for res in processed_pages_result {
            match res {
                Ok(content) => successful_pages.push(content),
                Err(e) => error!("Error processing content file: {:?}", e),
            }
        }
        
        info!("Successfully processed {} content files.", successful_pages.len());

        // Store processed content for later use (e.g., navigation, search index)
        *self.processed_content.lock().unwrap() = successful_pages;

        // 3. Render all pages to HTML in parallel
        self.render_all_pages()?;

        // 4. Copy static assets
        self.copy_static_assets()?;

        info!("SSG build completed successfully.");
        Ok(())
    }

    /// Scans the content directory for all Markdown files.
    fn scan_content_directory(&self) -> Result<Vec<PathBuf>> {
        let mut content_files = Vec::new();
        let content_dir = &self.config.content_dir;
        
        // Ensure the content directory exists
        if !content_dir.exists() {
            return Err(anyhow::anyhow!("Content directory not found: {:?}", content_dir));
        }

        for entry in fs::read_dir(content_dir)
            .context(format!("Failed to read content directory: {:?}", content_dir))?
        {
            let entry = entry?;
            let path = entry.path();
            if path.is_file() && path.extension().map_or(false, |ext| ext == "md") {
                content_files.push(path);
            } else if path.is_dir() {
                // Recursively scan subdirectories for content
                // For simplicity, we'll just log for now, but a full implementation
                // would use a recursive function or walkdir crate.
                debug!("Skipping directory: {:?}", path);
            }
        }
        Ok(content_files)
    }

    /// Processes a single content file: reads, parses frontmatter, converts Markdown.
    fn process_single_content_file(&self, path: &Path) -> Result<Content> {
        debug!("Processing content file: {:?}", path);
        let file_content = fs::read_to_string(path)
            .with_context(|| format!("Failed to read file: {:?}", path))?;

        // Assuming Content::from_markdown_with_frontmatter handles parsing
        let content = Content::from_markdown_with_frontmatter(path.to_path_buf(), &file_content)
            .with_context(|| format!("Failed to parse content from file: {:?}", path))?;
        
        Ok(content)
    }

    /// Renders all processed pages to their final HTML and writes them to disk in parallel.
    fn render_all_pages(&self) -> Result<()> {
        let processed_content = self.processed_content.lock().unwrap();
        let output_dir = &self.config.output_dir;

        // Ensure output directory exists
        fs::create_dir_all(output_dir)
            .context(format!("Failed to create output directory: {:?}", output_dir))?;

        let render_results: Vec<Result<()>> = processed_content.par_iter()
            .map(|content| {
                debug!("Rendering page: {}", content.metadata.title);
                
                // Determine output path based on router
                let relative_path = self.router.get_output_path(&content)
                    .with_context(|| format!("Failed to get output path for content: {}", content.metadata.title))?;
                let output_path = output_dir.join(&relative_path);

                // Render with template engine
                let rendered_html = self.template_engine.render_page(&content)
                    .with_context(|| format!("Failed to render template for content: {}", content.metadata.title))?;

                // Ensure parent directory exists for the output file
                if let Some(parent) = output_path.parent() {
                    fs::create_dir_all(parent)
                        .with_context(|| format!("Failed to create parent directory for output: {:?}", output_path))?;
                }

                // Write to file
                fs::write(&output_path, rendered_html)
                    .with_context(|| format!("Failed to write output file: {:?}", output_path))?;
                
                info!("Successfully rendered and wrote: {:?}", output_path);
                Ok(())
            })
            .collect(); // Collect all results, including errors

        // Check for any errors that occurred during parallel rendering
        for res in render_results {
            if let Err(e) = res {
                error!("Error during page rendering: {:?}", e);
            }
        }
        
        Ok(())
    }

    /// Copies static assets from the static directory to the output directory.
    fn copy_static_assets(&self) -> Result<()> {
        let static_dir = &self.config.static_dir;
        let output_dir = &self.config.output_dir;

        if !static_dir.exists() {
            info!("No static directory found at {:?}, skipping asset copy.", static_dir);
            return Ok(());
        }

        info!("Copying static assets from {:?} to {:?}", static_dir, output_dir);

        // A more robust implementation would recursively copy directories
        // For now, we'll just copy top-level files.
        for entry in fs::read_dir(static_dir)
            .context(format!("Failed to read static directory: {:?}", static_dir))?
        {
            let entry = entry?;
            let path = entry.path();
            if path.is_file() {
                let file_name = path.file_name().context("Invalid file name")?;
                let dest_path = output_dir.join(file_name);
                fs::copy(&path, &dest_path)
                    .with_context(|| format!("Failed to copy static asset from {:?} to {:?}", path, dest_path))?;
                debug!("Copied static asset: {:?} to {:?}", path, dest_path);
            } else if path.is_dir() {
                // TODO: Implement recursive directory copy for static assets.
                warn!("Skipping directory in static assets for now: {:?}", path);
            }
        }
        info!("Static assets copied.");
        Ok(())
    }
}

Explanation of Changes:

use rayon::prelude::*: This line imports the necessary traits from rayon that enable the .par_iter() method on standard collections.
content_paths.par_iter(): Instead of iter(), we now use par_iter() on the Vec<PathBuf> of content files. This automatically distributes the work of iterating and mapping each path to a different thread in rayon’s thread pool.
processed_pages_result: Vec<Result<Content>> = ... .collect(): When working with parallel iterators, map operations will produce a new parallel iterator. To get the results back into a usable collection, we collect() them. Crucially, we collect Result<Content> values. This allows us to process each file independently, log any errors, and continue the build for successful files, rather than failing the entire build on the first error.
Error Handling Loop: After collecting processed_pages_result, we iterate through it to separate successful Content objects from errors, logging the latter. This ensures that a single malformed file doesn’t halt the entire SSG build.
processed_content: Mutex<Vec<Content>>: We’ve added a Mutex<Vec<Content>> to the Builder struct. While rayon’s map and collect often avoid explicit locking, if you need to aggregate results into a shared structure within the parallel loop (which we are not doing directly here, but rather collecting results and then processing them sequentially), Arc<Mutex<T>> is the standard Rust approach for thread-safe shared mutable state. In our case, processed_content is updated after the parallel processing, by taking the successful_pages vector.
render_all_pages Parallelization: Similarly, in render_all_pages, we now iterate over processed_content.par_iter(), rendering each page and writing it to disk in parallel. Each map operation returns a Result<()>, and we collect these to check for errors after all pages have attempted to render.

c) Testing This Component

To test the parallelization, you’ll need a significant number of content files.

Create Test Content: Generate a few hundred (or even a few thousand) dummy Markdown files in your content directory. You can use a simple script for this:
File: scripts/generate_dummy_content.sh
```
#!/bin/bash
mkdir -p content/test_pages

for i in $(seq 1 1000); do
  cat <<EOF > content/test_pages/page_$i.md
```

+++ title = “Test Page $i” date = 2026-03-02 draft = false description = “This is a dummy page number $i for performance testing.” +++

This is some sample content for test page number $i. It helps us measure the performance of our SSG with a large number of files.

Item 1
Item 2
Item 3

Subheading $i

More content here, just to make the file a bit larger. EOF done

echo "Generated 1000 dummy content pages."
```
Run this script: `bash scripts/generate_dummy_content.sh`

Measure Build Times: Before running, ensure your main.rs is set up to call builder.build().

File: src/main.rs (example snippet)

// ... imports ...
use crate::build::builder::Builder;
use crate::config::Config;
use crate::template::TemplateEngine;
use crate::router::Router;

fn main() -> anyhow::Result<()> {
    env_logger::Builder::from_env(env_logger::Env::default().default_filter_or("info")).init();
    info!("Starting MySSG application.");

    let config = Config::load_from_file("config.toml")?;
    let template_engine = TemplateEngine::new(&config.template_dir)?;
    let router = Router::new(&config); // Assuming Router::new takes Config

    let builder = Builder::new(config, template_engine, router);

    let start_time = std::time::Instant::now();
    builder.build()?;
    let duration = start_time.elapsed();

    info!("Total build time: {:?}", duration);

    Ok(())
}

Run the build: cargo run --release (use --release for accurate performance measurements). Observe the “Total build time” in the logs.

To compare, you could temporarily revert par_iter() back to iter() in src/build/builder.rs and run again. You should see a noticeable improvement in build time with par_iter().

Expected Behavior:
- The build should complete faster, especially with many content files.
- All 1000 dummy pages should be generated in your output directory (e.g., output/test_pages/page_1.html, etc.).
- Errors in individual files (if any were introduced) should be logged, but the build should still attempt to process other files.
Debugging Tips:
- If the build is not faster, ensure you are running with cargo run --release. Debug builds often have optimizations disabled, which can mask performance gains.
- Check htop or your system’s task manager during the build. You should see multiple CPU cores being utilized. If only one core is active, rayon might not be kicking in, or the workload is too small to benefit.
- Ensure rayon::prelude::* is imported.
- Verify that the collect() call is correctly handling Result types to allow for error propagation without stopping the entire process.

Production Considerations

Resource Management: rayon by default uses a thread pool sized to the number of logical cores on your machine. This is generally a good default. For specific environments (e.g., CI/CD runners with limited resources, or very large servers where you want to cap CPU usage), you can control the number of threads rayon uses via the RAYON_NUM_THREADS environment variable or programmatically with rayon::ThreadPoolBuilder.
```
// Example of programmatic thread pool configuration
// Call this early in main.rs, before any parallel operations
// rayon::ThreadPoolBuilder::new().num_threads(4).build_global().unwrap();
```
I/O vs. CPU Bound: Our current parallelization primarily benefits CPU-bound tasks (parsing, rendering). File I/O (reading and writing files) can also be a bottleneck. While rayon can parallelize which files are read/written, the actual disk operations might still be sequential or limited by disk speed. For heavily I/O-bound SSGs, integrating an asynchronous runtime like tokio for file operations (e.g., tokio::fs) in conjunction with rayon for CPU-bound tasks could yield further improvements, though it adds significant complexity. For most SSGs, rayon on its own provides a substantial boost.
Memory Usage: Parallel processing can consume more memory than sequential processing because multiple pages might be held in memory simultaneously. For extremely large sites with many thousands of pages, monitor memory usage. If it becomes an issue, strategies like batching content processing or optimizing Content struct memory footprint might be necessary. Rust’s ownership model helps manage this efficiently, but it’s a consideration.
Logging: Ensure your logging (e.g., info!, error!) includes context for parallel operations. Our current implementation logs errors for individual files, which is critical for debugging. Centralized logging ensures that logs from different threads are interleaved correctly (though env_logger handles this well by default).
Error Handling: The collect::<Vec<Result<_>>>() pattern is vital. It allows the build to continue even if some pages fail, reporting all errors at the end instead of crashing prematurely. This is a production-ready approach for robust SSGs.

Code Review Checkpoint

At this point, you should have:

Added rayon to your Cargo.toml.
Modified src/build/builder.rs:
- The Builder struct now contains a Mutex<Vec<Content>> to store processed pages.
- The build method orchestrates the parallel processing.
- scan_content_directory remains largely sequential (as it’s a single directory scan).
- process_single_content_file is called in parallel using par_iter().map().collect().
- render_all_pages now uses par_iter().map().collect() to render and write files in parallel.
- Robust error handling is in place to log individual file processing/rendering errors without halting the entire build.

The core build logic should now look significantly faster for larger content sets.

Common Issues & Solutions

“Build not faster with rayon”:
- Issue: You’ve implemented par_iter() but see no performance improvement, or even a slight degradation.
- Solution:
  - Run in Release Mode: Always use cargo run --release for benchmarking. Debug builds have many optimizations disabled, making them significantly slower.
  - Insufficient Workload: For very small sites (e.g., 10-20 pages), the overhead of setting up thread pools and coordinating parallel tasks can sometimes outweigh the benefits. rayon shines with hundreds or thousands of independent tasks.
  - I/O Bound: If your bottleneck is primarily reading/writing to a slow disk, rayon will help less with the actual I/O operations themselves, though it will still parallelize which files are being processed.
  - CPU Core Count: You only gain as many speedups as you have CPU cores. If your machine only has 2 cores, you won’t see a 16x speedup.
“Errors in parallel tasks are confusing or missed”:
- Issue: An error occurs in one parallel task, but the build seems to continue, and you only see a generic “build failed” without specific details.
- Solution: Ensure you are collecting the results of your parallel operations into a Vec<Result<T, E>> and then iterating through that vector to explicitly log or handle each individual error. The pattern some_parallel_iter.map(|item| process(item)).collect::<Vec<Result<_, _>>>() is crucial for this. Our current implementation handles this well.
“Increased memory usage during build”:
- Issue: The SSG consumes much more RAM when running with parallel processing.
- Solution: This is an expected trade-off. If it becomes problematic for extremely large sites:
  - Batch Processing: Instead of processing all content items at once, process them in smaller batches. This would involve iterating through content_paths in chunks and calling par_iter() on each chunk.
  - Memory Profiling: Use tools like valgrind (if applicable) or Rust’s built-in perf integration to identify exactly where memory is being consumed.
  - Optimize Content Struct: Review your Content struct and any associated data structures. Are there large strings or unnecessary copies being made? Can Arc<str> or Arc<PathBuf> be used instead of String or PathBuf for shared immutable data?

Testing & Verification

To verify the work in this chapter:

Generate a large test dataset: Use the provided scripts/generate_dummy_content.sh script to create 1000+ dummy Markdown files.
Clean previous output: Run rm -rf output to ensure a fresh build.
Run the SSG in release mode: Execute cargo run --release.
Observe build time: Note the “Total build time” logged by your main.rs.
Check CPU utilization: Monitor your system’s CPU usage during the build. You should see multiple cores actively working, indicating rayon is effectively distributing the workload.
Verify output: Navigate to your output directory and confirm that all 1000+ HTML files have been generated correctly, with the expected content.
Introduce an error: Temporarily corrupt one of the dummy Markdown files (e.g., remove frontmatter delimiters) and re-run. The build should log an error for that specific file but still process and generate the other 999 files.

Summary & Next Steps

In this chapter, we significantly enhanced the performance of our SSG by integrating rayon for parallel processing. We refactored the build pipeline to execute computationally intensive tasks like content parsing, HTML transformation, template rendering, and file writing concurrently across multiple CPU cores. This has laid a robust foundation for handling large-scale content projects with much faster build times, a critical feature for any production-ready SSG. We also implemented resilient error handling to ensure the build gracefully manages individual content file failures.

While parallel processing dramatically speeds up full builds, a common developer workflow involves making small changes and needing a very quick rebuild. This is where incremental builds and caching come into play. In the next chapter, Chapter 12: Incremental Builds and Caching, we will explore how to detect changes, store build artifacts, and only reprocess what’s necessary, leading to near-instantaneous rebuilds for minor edits. This will further improve the developer experience and make our SSG truly powerful for day-to-day use.