Chapter Introduction

In the previous chapters, we laid the groundwork for our Mermaid analyzer by building a robust lexer and parser. While these components are crucial for understanding the Mermaid code’s structure, their current error reporting is rudimentary, often just returning a simple error message or panicking. For a production-grade tool that aims to mimic the reliability and user-friendliness of compilers like rustc, this is insufficient.

This chapter focuses on transforming our basic error handling into a sophisticated diagnostic system. We will design and implement a comprehensive Diagnostic structure capable of capturing detailed information about errors and warnings, including precise source code locations, unique error codes, severity levels, and actionable help messages. We will then integrate this system into our lexer and parser, enabling them to emit rich diagnostics instead of opaque error messages. Finally, we will build a DiagnosticEmitter that leverages the ariadne crate to render these diagnostics in a visually appealing, compiler-style format, complete with code highlighting and contextual information.

By the end of this chapter, our tool will no longer just fail; it will intelligently guide the user to understand what went wrong, where it went wrong, and how to fix it, significantly improving the developer experience and making our tool genuinely production-ready.

Planning & Design

A robust diagnostic system is at the heart of any reliable code analysis tool. It needs to provide clear, actionable feedback. Our design will follow these principles:

  1. Unified Diagnostic Structure: A single, extensible Diagnostic struct to represent all types of issues (errors, warnings, notes).
  2. Precise Location Tracking (Span): Every diagnostic must pinpoint the exact location in the source code using byte offsets, line numbers, and column numbers.
  3. Unique Error Codes: Each distinct error type will have a unique code (e.g., M001, M002) for easy reference and documentation.
  4. Severity Levels: Differentiate between critical errors, warnings, and informational notes.
  5. Contextual Information: Allow for multiple labels (primary, secondary) and supplementary notes or help messages.
  6. Pluggable Emitter: Separate the diagnostic data from its rendering, allowing different output formats (e.g., plain text, JSON, compiler-style). We’ll use ariadne for a rich console output.
  7. Non-Fatal Error Collection: Lexer and parser should collect all possible diagnostics, rather than stopping at the first error, to provide a comprehensive report.

Component Architecture

The diagnostic system will integrate with our existing lexer and parser, and later with the validator.

flowchart TD UserInput[Mermaid Code Input] --> SourceManager[Source Manager (Stores Code)] SourceManager --> Lexer[Lexer] Lexer -- Tokens + Spans --> Parser[Parser] Lexer -- Lexer Diagnostics --> DiagnosticCollector[Diagnostic Collector] Parser -- AST + Parser Diagnostics --> DiagnosticCollector DiagnosticCollector --> DiagnosticEmitter[Diagnostic Emitter (using Ariadne)] DiagnosticEmitter --> ConsoleOutput[Console Output] Parser -- Valid AST (if no fatal errors) --> Validator[Validator (Next Chapter)]

Explanation:

  • SourceManager: A simple component that holds the input Mermaid code, allowing diagnostics to reference it for highlighting.
  • Lexer: Now, in addition to emitting Tokens, it will also emit Diagnostics if it encounters lexical errors. Each Token will carry its Span.
  • Parser: Consumes Tokens with Spans. If syntax errors occur, it emits Diagnostics. It attempts to recover and continue parsing to find more errors, if possible, returning an Option<Ast> along with a Vec<Diagnostic>.
  • DiagnosticCollector: A conceptual component that aggregates all diagnostics generated by the lexer, parser, and future validator.
  • DiagnosticEmitter: Takes the collected Diagnostics and the original source code from the SourceManager to render them using ariadne.

File Structure

We’ll introduce a new src/diagnostics module:

src/
├── main.rs
├── lexer/
│   ├── mod.rs
│   └── token.rs
├── parser/
│   ├── mod.rs
│   └── ast.rs
└── diagnostics/
    ├── mod.rs        # Main diagnostic struct and builder
    ├── error_codes.rs # Enum for all error codes
    ├── span.rs       # Source code location tracking
    └── emitter.rs    # Diagnostic rendering logic (using ariadne)

Step-by-Step Implementation

3.1 Setup: Add ariadne Dependency

First, we need to add the ariadne crate to our project. ariadne is a fantastic library for producing beautiful, compiler-style error messages.

Open your Cargo.toml and add ariadne to the [dependencies] section:

# Cargo.toml

[package]
name = "mermaid-analyzer"
version = "0.1.0"
edition = "2021"

[dependencies]
ariadne = "0.4.0" # Add this line
log = "0.4"
env_logger = "0.11"
# ... other dependencies from previous chapters

Run cargo check to ensure the dependency is fetched.

3.2 Defining Span and SourceId

A Span represents a contiguous region in the source code. It’s crucial for pointing to the exact location of an error or warning. We’ll use byte offsets for internal consistency and ariadne compatibility, and also store line/column for human readability. SourceId will uniquely identify the source file (though for a single file CLI tool, it might just be “input” or the filename).

Create the file src/diagnostics/span.rs:

// src/diagnostics/span.rs

/// A unique identifier for a source file.
/// For a CLI tool processing a single input, this might just be "input".
#[derive(Debug, Clone, PartialEq, Eq, Hash, Copy)]
pub struct SourceId(pub &'static str); // Using &'static str for simplicity for now

impl From<&'static str> for SourceId {
    fn from(s: &'static str) -> Self {
        SourceId(s)
    }
}

/// Represents a contiguous region in the source code.
/// Stores byte offsets for precise highlighting and line/column for human readability.
#[derive(Debug, Clone, PartialEq, Eq, Copy)]
pub struct Span {
    pub source_id: SourceId,
    pub start: usize, // Start byte offset
    pub end: usize,   // End byte offset (exclusive)
    pub start_line: usize,
    pub end_line: usize,
    pub start_column: usize,
    pub end_column: usize,
}

impl Span {
    /// Creates a new span.
    pub fn new(
        source_id: SourceId,
        start: usize,
        end: usize,
        start_line: usize,
        end_line: usize,
        start_column: usize,
        end_column: usize,
    ) -> Self {
        Self {
            source_id,
            start,
            end,
            start_line,
            end_line,
            start_column,
            end_column,
        }
    }

    /// Creates a dummy span for cases where a real span isn't available (e.g., internal errors).
    pub fn dummy() -> Self {
        Self {
            source_id: SourceId("dummy"),
            start: 0,
            end: 0,
            start_line: 1,
            end_line: 1,
            start_column: 1,
            end_column: 1,
        }
    }

    /// Returns the length of the span in bytes.
    pub fn len(&self) -> usize {
        self.end - self.start
    }

    /// Checks if the span is empty.
    pub fn is_empty(&self) -> bool {
        self.len() == 0
    }

    /// Combines two spans into a single span that covers both.
    /// Assumes both spans are from the same source.
    pub fn merge(&self, other: &Self) -> Self {
        assert_eq!(self.source_id, other.source_id, "Cannot merge spans from different sources");

        let start = self.start.min(other.start);
        let end = self.end.max(other.end);

        let start_line = self.start_line.min(other.start_line);
        let end_line = self.end_line.max(other.end_line);

        let start_column = if self.start_line == start_line {
            self.start_column
        } else {
            other.start_column
        };

        let end_column = if self.end_line == end_line {
            self.end_column
        } else {
            other.end_column
        };


        Self {
            source_id: self.source_id,
            start,
            end,
            start_line,
            end_line,
            start_column,
            end_column,
        }
    }
}

// Implement ariadne's Span trait for our Span struct
impl ariadne::Span for Span {
    type SourceId = SourceId;

    fn source(&self) -> &Self::SourceId {
        &self.source_id
    }

    fn start(&self) -> usize {
        self.start
    }

    fn end(&self) -> usize {
        self.end
    }
}

Explanation:

  • SourceId: A wrapper around &'static str to identify the source file. ariadne needs this to associate spans with their content.
  • Span: Stores start and end byte offsets, which ariadne uses directly. We also store start_line, end_line, start_column, end_column for convenience and more granular human-readable output, though ariadne can compute these from byte offsets if given the source.
  • dummy(): Useful for internal errors or when a precise location isn’t available.
  • merge(): Essential for parser diagnostics, where an error might span multiple tokens.
  • ariadne::Span implementation: This trait is crucial for ariadne to understand how to use our Span type.

3.3 Defining ErrorCode and Severity

Explicit error codes help users quickly look up detailed documentation and provide a stable identifier for issues. Severity dictates how the diagnostic should be treated (e.g., halt compilation, just warn).

Create the file src/diagnostics/error_codes.rs:

// src/diagnostics/error_codes.rs

/// Represents the severity level of a diagnostic.
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub enum Severity {
    Error,
    Warning,
    Note,
    Help,
}

/// Unique error codes for different types of issues.
/// M = Mermaid Analyzer
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub enum ErrorCode {
    // Lexer errors (M001-M099)
    M001, // Unexpected character
    M002, // Unterminated string literal

    // Parser errors (M100-M199)
    M100, // Expected token, found something else
    M101, // Unmatched parenthesis/bracket
    M102, // Missing diagram type declaration (e.g., 'graph TD')
    M103, // Invalid node ID
    M104, // Malformed edge definition
    M105, // Invalid diagram type

    // Semantic / Validation errors (M200-M299) - will be used in future chapters
    M200, // Duplicate node ID
    M201, // Undefined node in edge
    M202, // Invalid direction for diagram type
    M203, // Cyclic dependency detected
    // ... more codes as we expand
}

impl ErrorCode {
    /// Returns a short, descriptive message for the error code.
    pub fn message(&self) -> &'static str {
        match self {
            ErrorCode::M001 => "Unexpected character",
            ErrorCode::M002 => "Unterminated string literal",
            ErrorCode::M100 => "Unexpected token",
            ErrorCode::M101 => "Unmatched parenthesis or bracket",
            ErrorCode::M102 => "Missing diagram type declaration",
            ErrorCode::M103 => "Invalid node ID format",
            ErrorCode::M104 => "Malformed edge definition",
            ErrorCode::M105 => "Invalid or unsupported diagram type",
            ErrorCode::M200 => "Duplicate node identifier",
            ErrorCode::M201 => "Edge refers to an undefined node",
            ErrorCode::M202 => "Invalid direction for this diagram type",
            ErrorCode::M203 => "Cyclic dependency detected in graph",
        }
    }

    /// Returns a longer, more detailed help message for the error code.
    pub fn help(&self) -> Option<&'static str> {
        match self {
            ErrorCode::M001 => Some("This character is not allowed in Mermaid syntax at this position. Please check for typos or invalid symbols."),
            ErrorCode::M002 => Some("String literals must be closed with a matching quote. Add a '\"' or apostrophe to terminate the string."),
            ErrorCode::M100 => Some("The parser expected a different kind of token here. This usually indicates a syntax error. Check the Mermaid syntax documentation for this construct."),
            ErrorCode::M101 => Some("Ensure all parentheses, brackets, and braces are correctly matched and balanced. Each opening symbol must have a corresponding closing symbol."),
            ErrorCode::M102 => Some("All Mermaid diagrams must start with a declaration like 'graph TD', 'sequenceDiagram', or 'classDiagram'. Add a diagram type declaration at the beginning of your code."),
            ErrorCode::M103 => Some("Node IDs must follow Mermaid naming conventions. They typically consist of alphanumeric characters and underscores, or be quoted if they contain special characters."),
            ErrorCode::M104 => Some("An edge definition must specify two nodes and a valid arrow type (e.g., 'A --> B'). Check for missing nodes, invalid arrow syntax, or extra characters."),
            ErrorCode::M105 => Some("The specified diagram type is either misspelled or not supported by Mermaid, or not yet implemented by this analyzer. Refer to Mermaid documentation for supported types."),
            ErrorCode::M200 => Some("Node IDs must be unique within a diagram. Rename one of the nodes to resolve this conflict."),
            ErrorCode::M201 => Some("All nodes referenced in an edge must be defined in the diagram. Ensure node '{node_id}' is declared before it's used in an edge."),
            ErrorCode::M202 => Some("The chosen direction (e.g., 'TD', 'LR') is not valid for this specific diagram type. Consult the Mermaid documentation for valid directions."),
            ErrorCode::M203 => Some("A cycle was detected in your graph, which might indicate a logical error or infinite loop in certain contexts. Review the relationships between nodes."),
            _ => None,
        }
    }
}

Explanation:

  • Severity: A simple enum to classify diagnostics.
  • ErrorCode: An enum for unique identifiers. We’ve started with some common lexer, parser, and future validation errors.
  • message(): Provides a concise description for each code.
  • help(): Offers a more detailed explanation and actionable advice, improving the user experience significantly.

3.4 The Diagnostic Structure

This is the core structure that holds all information about a single diagnostic. It will use the Span, ErrorCode, and Severity we just defined. We’ll also provide a DiagnosticBuilder for ergonomic creation.

Create the file src/diagnostics/mod.rs:

// src/diagnostics/mod.rs

pub mod error_codes;
pub mod span;
pub mod emitter;

use error_codes::{ErrorCode, Severity};
use span::{Span, SourceId};

/// Represents a label associated with a diagnostic.
#[derive(Debug, Clone, PartialEq, Eq)]
pub struct DiagnosticLabel {
    pub span: Span,
    pub message: String,
    pub is_primary: bool, // True for the main point of interest
}

/// A comprehensive diagnostic message, similar to a compiler error.
#[derive(Debug, Clone, PartialEq, Eq)]
pub struct Diagnostic {
    pub severity: Severity,
    pub code: ErrorCode,
    pub message: String,
    pub labels: Vec<DiagnosticLabel>,
    pub notes: Vec<String>,
    pub help: Option<String>,
}

impl Diagnostic {
    /// Creates a new `DiagnosticBuilder` for constructing a diagnostic.
    pub fn new(severity: Severity, code: ErrorCode) -> DiagnosticBuilder {
        DiagnosticBuilder::new(severity, code)
    }

    /// Convenience method for creating an error diagnostic.
    pub fn error(code: ErrorCode) -> DiagnosticBuilder {
        DiagnosticBuilder::new(Severity::Error, code)
    }

    /// Convenience method for creating a warning diagnostic.
    pub fn warning(code: ErrorCode) -> DiagnosticBuilder {
        DiagnosticBuilder::new(Severity::Warning, code)
    }

    /// Convenience method for creating a note diagnostic.
    pub fn note(code: ErrorCode) -> DiagnosticBuilder {
        DiagnosticBuilder::new(Severity::Note, code)
    }
}

/// A builder for constructing `Diagnostic` instances more ergonomically.
#[derive(Debug, Clone, PartialEq, Eq)]
pub struct DiagnosticBuilder {
    severity: Severity,
    code: ErrorCode,
    message: Option<String>,
    labels: Vec<DiagnosticLabel>,
    notes: Vec<String>,
    help: Option<String>,
}

impl DiagnosticBuilder {
    pub fn new(severity: Severity, code: ErrorCode) -> Self {
        Self {
            severity,
            code,
            message: None,
            labels: Vec::new(),
            notes: Vec::new(),
            help: None,
        }
    }

    /// Sets the primary message for the diagnostic. If not set, a default from `ErrorCode` is used.
    pub fn with_message(mut self, message: impl Into<String>) -> Self {
        self.message = Some(message.into());
        self
    }

    /// Adds a primary label, which points to the main location of the issue.
    pub fn with_primary_label(mut self, span: Span, message: impl Into<String>) -> Self {
        self.labels.push(DiagnosticLabel {
            span,
            message: message.into(),
            is_primary: true,
        });
        self
    }

    /// Adds a secondary label, for additional context or related locations.
    pub fn with_secondary_label(mut self, span: Span, message: impl Into<String>) -> Self {
        self.labels.push(DiagnosticLabel {
            span,
            message: message.into(),
            is_primary: false,
        });
        self
    }

    /// Adds a note, which is additional textual information.
    pub fn with_note(mut self, note: impl Into<String>) -> Self {
        self.notes.push(note.into());
        self
    }

    /// Sets the help message. If not set, a default from `ErrorCode` is used.
    pub fn with_help(mut self, help: impl Into<String>) -> Self {
        self.help = Some(help.into());
        self
    }

    /// Builds the `Diagnostic` instance.
    pub fn build(self) -> Diagnostic {
        let code_message = self.code.message();
        let code_help = self.code.help();

        Diagnostic {
            severity: self.severity,
            code: self.code,
            message: self.message.unwrap_or_else(|| code_message.to_string()),
            labels: self.labels,
            notes: self.notes,
            help: self.help.or_else(|| code_help.map(|s| s.to_string())),
        }
    }
}

Explanation:

  • DiagnosticLabel: Stores a Span and a message for highlighting specific parts of the code. is_primary helps ariadne determine the main highlight.
  • Diagnostic: Contains severity, code, the main message, labels (for highlights), notes, and an optional help message.
  • DiagnosticBuilder: Provides a fluent API for constructing Diagnostics, making it easier to add labels, notes, and customize messages. It also defaults to messages and help from ErrorCode if not explicitly overridden.

3.5 Integrating Spans into Lexer Tokens

Now, we need to modify our Token structure to include a Span. This Span will be calculated by the lexer as it processes the input.

Modify src/lexer/token.rs:

// src/lexer/token.rs

use crate::diagnostics::span::Span; // Import Span

/// Represents the different types of tokens in Mermaid syntax.
#[derive(Debug, Clone, PartialEq, Eq)]
pub enum TokenType {
    // Keywords
    Graph,
    Flowchart,
    SequenceDiagram,
    ClassDiagram,
    StateDiagram,
    Gantt,
    Pie,
    GitGraph,
    Ermaid,
    Journey,
    C4c,
    Mindmap,
    Timeline,
    Block,
    SanKey,
    RequirementDiagram,
    // Directions (for graph/flowchart)
    TD, // Top-Down
    BT, // Bottom-Top
    LR, // Left-Right
    RL, // Right-Left
    TB, // Top-Bottom (alias for TD)

    // Structural elements
    OpenParen,    // (
    CloseParen,   // )
    OpenBracket,  // [
    CloseBracket, // ]
    OpenBrace,    // {
    CloseBrace,   // }
    DoubleOpenBracket, // [[
    DoubleCloseBracket, // ]]
    Colon,        // :
    SemiColon,    // ;
    Comma,        // ,
    Dot,          // .
    Equals,       // =
    Plus,         // +
    Minus,        // -
    Star,         // *
    Hash,         // #
    Percent,      // %
    Pipe,         // |
    Backslash,    // \

    // Arrows and connectors
    ArrowRight,     // -->
    ArrowLeft,      // <--
    DoubleArrow,    // <-->
    ThickArrowRight, // ==>
    ThickArrowLeft,  // <==
    ThickDoubleArrow, // <==>
    DottedArrowRight, // -.->
    DottedArrowLeft,  // <-.
    DottedDoubleArrow, // <-->
    CrossArrowRight, // --x
    OpenCircleArrowRight, // --o
    ArrowRightWithText, // --- text -->
    ArrowLeftWithText, // <-- text ---
    DoubleArrowWithText, // <--- text --->
    ThickArrowRightWithText, // === text ==>
    DottedArrowRightWithText, // -.- text -.->

    // Literals and identifiers
    Identifier(String),
    StringLiteral(String), // Quoted strings
    Number(String),        // Numeric literals

    // Comments
    LineComment(String),   // %% comment
    BlockComment(String),  // /* comment */

    // Special
    NewLine,
    Whitespace, // Ignored by parser, but useful for span tracking in lexer
    EOF,
}

/// Represents a token found by the lexer, including its type and source span.
#[derive(Debug, Clone, PartialEq, Eq)]
pub struct Token {
    pub token_type: TokenType,
    pub span: Span, // Added Span here
}

impl Token {
    pub fn new(token_type: TokenType, span: Span) -> Self {
        Token { token_type, span }
    }
}

Now, update src/lexer/lexer.rs to correctly calculate and assign Span to each Token. This requires careful tracking of the current position (byte offset, line, column) as we consume characters.

Modify src/lexer/lexer.rs:

// src/lexer/lexer.rs

use crate::diagnostics::span::{Span, SourceId};
use crate::diagnostics::{Diagnostic, DiagnosticBuilder, error_codes::{ErrorCode, Severity}}; // Import Diagnostic components
use crate::lexer::token::{Token, TokenType};
use log::debug;

pub struct Lexer<'a> {
    source: &'a str,
    source_id: SourceId,
    chars: std::iter::Peekable<std::str::Chars<'a>>,
    current_offset: usize, // Current byte offset
    current_line: usize,
    current_column: usize,
    diagnostics: Vec<Diagnostic>, // Collect diagnostics
}

impl<'a> Lexer<'a> {
    pub fn new(source: &'a str, source_id: SourceId) -> Self {
        Lexer {
            source,
            source_id,
            chars: source.chars().peekable(),
            current_offset: 0,
            current_line: 1,
            current_column: 1,
            diagnostics: Vec::new(),
        }
    }

    /// Lexes the entire source and returns a vector of tokens and any diagnostics.
    pub fn lex(mut self) -> (Vec<Token>, Vec<Diagnostic>) {
        let mut tokens = Vec::new();

        while self.peek().is_some() {
            let start_offset = self.current_offset;
            let start_line = self.current_line;
            let start_column = self.current_column;

            if let Some(token_type) = self.next_token() {
                let end_offset = self.current_offset;
                let end_line = self.current_line;
                let end_column = self.current_column;

                let span = Span::new(
                    self.source_id,
                    start_offset,
                    end_offset,
                    start_line,
                    end_line,
                    start_column,
                    end_column,
                );
                tokens.push(Token::new(token_type, span));
            } else {
                // If next_token returns None, it means an error occurred and a diagnostic was emitted.
                // We try to recover by advancing past the invalid character.
                self.advance();
            }
        }

        let eof_span = Span::new(
            self.source_id,
            self.current_offset,
            self.current_offset,
            self.current_line,
            self.current_line,
            self.current_column,
            self.current_column,
        );
        tokens.push(Token::new(TokenType::EOF, eof_span));

        (tokens, self.diagnostics)
    }

    fn peek(&mut self) -> Option<char> {
        self.chars.peek().copied()
    }

    fn advance(&mut self) -> Option<char> {
        if let Some(c) = self.chars.next() {
            let char_len = c.len_utf8();
            self.current_offset += char_len;
            if c == '\n' {
                self.current_line += 1;
                self.current_column = 1;
            } else {
                self.current_column += 1;
            }
            Some(c)
        } else {
            None
        }
    }

    fn advance_if<F>(&mut self, predicate: F) -> Option<char>
    where
        F: FnOnce(char) -> bool,
    {
        if let Some(c) = self.peek() {
            if predicate(c) {
                return self.advance();
            }
        }
        None
    }

    fn advance_while<F>(&mut self, predicate: F) -> String
    where
        F: Fn(char) -> bool,
    {
        let mut s = String::new();
        while let Some(c) = self.peek() {
            if predicate(c) {
                s.push(self.advance().unwrap());
            } else {
                break;
            }
        }
        s
    }

    fn next_token(&mut self) -> Option<TokenType> {
        self.skip_whitespace();

        let start_offset = self.current_offset;
        let start_line = self.current_line;
        let start_column = self.current_column;

        let Some(c) = self.advance() else {
            return None; // EOF handled by lex function
        };

        let token_type = match c {
            '(' => TokenType::OpenParen,
            ')' => TokenType::CloseParen,
            '[' => {
                if self.advance_if(|c| c == '[').is_some() {
                    TokenType::DoubleOpenBracket
                } else {
                    TokenType::OpenBracket
                }
            }
            ']' => {
                if self.advance_if(|c| c == ']').is_some() {
                    TokenType::DoubleCloseBracket
                } else {
                    TokenType::CloseBracket
                }
            }
            '{' => TokenType::OpenBrace,
            '}' => TokenType::CloseBrace,
            ':' => TokenType::Colon,
            ';' => TokenType::SemiColon,
            ',' => TokenType::Comma,
            '.' => TokenType::Dot,
            '=' => {
                if self.advance_if(|c| c == '=').is_some() {
                    if self.advance_if(|c| c == '>').is_some() {
                        TokenType::ThickArrowRight
                    } else if self.advance_if(|c| c == '<').is_some() {
                        TokenType::ThickDoubleArrow // <==>
                    } else {
                        // Error: Malformed '==' or '==>'
                        // For now, treat as Equals, let parser handle
                        TokenType::Equals // Treat as single '=', let parser report if invalid
                    }
                } else {
                    TokenType::Equals
                }
            }
            '+' => TokenType::Plus,
            '-' => {
                // Handle arrows starting with '-'
                if self.peek() == Some('.') {
                    self.advance(); // consume '.'
                    if self.advance_if(|c| c == '>').is_some() {
                        TokenType::DottedArrowRight
                    } else {
                        // Error: Malformed '-.'
                        // Treat as minus, let parser handle
                        debug!("Malformed dotted arrow: '-.'");
                        TokenType::Minus
                    }
                } else if self.peek() == Some('-') {
                    self.advance(); // consume second '-'
                    if self.advance_if(|c| c == '>').is_some() {
                        TokenType::ArrowRight
                    } else if self.advance_if(|c| c == 'x').is_some() {
                        TokenType::CrossArrowRight
                    } else if self.advance_if(|c| c == 'o').is_some() {
                        TokenType::OpenCircleArrowRight
                    } else {
                        // Could be '---' for text
                        let _ = self.advance_while(|ch| ch == '-'); // Consume remaining dashes
                        // This needs more context for ArrowRightWithText, handle in parser for now
                        TokenType::ArrowRight // Simplified for now, parser will differentiate
                    }
                } else {
                    TokenType::Minus
                }
            }
            '*' => TokenType::Star,
            '#' => TokenType::Hash,
            '%' => {
                if self.peek() == Some('%') {
                    self.advance(); // consume second '%'
                    let comment_text = self.advance_while(|ch| ch != '\n');
                    TokenType::LineComment(comment_text.trim_end().to_string())
                } else {
                    TokenType::Percent
                }
            }
            '|' => TokenType::Pipe,
            '\\' => TokenType::Backslash,
            '<' => {
                if self.advance_if(|c| c == '=').is_some() {
                    if self.advance_if(|c| c == '=').is_some() {
                        if self.advance_if(|c| c == '>').is_some() {
                             TokenType::ThickDoubleArrow // <==>
                        } else {
                            TokenType::ThickArrowLeft // <==
                        }
                    } else {
                        // Error: Malformed '<='
                        debug!("Malformed arrow: '<='");
                        // For now, treat as single '<', let parser handle
                        TokenType::DoubleArrow // Simplified, parser will refine
                    }
                } else if self.advance_if(|c| c == '-').is_some() {
                    if self.peek() == Some('.') {
                        self.advance(); // consume '.'
                        TokenType::DottedArrowLeft
                    } else if self.advance_if(|c| c == '-').is_some() {
                        if self.advance_if(|c| c == '-').is_some() {
                            // Could be <--- text --->, let parser parse the text
                            TokenType::DoubleArrowWithText // Simplified
                        } else {
                            TokenType::ArrowLeft // <--
                        }
                    } else {
                        // Error: Malformed '<-'
                        debug!("Malformed arrow: '<-'");
                        TokenType::ArrowLeft // Simplified
                    }
                } else {
                    // This is problematic. '<' alone is not a valid token in Mermaid
                    // For now, treat as invalid character.
                    self.report_lexer_error(
                        ErrorCode::M001,
                        Span::new(
                            self.source_id,
                            start_offset,
                            self.current_offset,
                            start_line,
                            self.current_line,
                            start_column,
                            self.current_column,
                        ),
                        format!("Unexpected character '{}'. Mermaid arrows usually start with '-' or '<-'.", c),
                    );
                    return None; // Indicate an error, lexer will skip this char
                }
            }
            '"' | '\'' => {
                let quote_char = c;
                let mut content = String::new();
                let mut terminated = false;
                while let Some(next_char) = self.peek() {
                    if next_char == quote_char {
                        self.advance(); // consume closing quote
                        terminated = true;
                        break;
                    }
                    if next_char == '\n' {
                        // String literals cannot span multiple lines without explicit escaping
                        break;
                    }
                    content.push(self.advance().unwrap());
                }

                if !terminated {
                    self.report_lexer_error(
                        ErrorCode::M002,
                        Span::new(
                            self.source_id,
                            start_offset,
                            self.current_offset,
                            start_line,
                            self.current_line,
                            start_column,
                            self.current_column,
                        ),
                        format!("Unterminated string literal. Expected '{}'.", quote_char),
                    );
                    return None; // Indicate error
                }
                TokenType::StringLiteral(content)
            }
            c if c.is_ascii_digit() => {
                let num_str = c.to_string() + &self.advance_while(|ch| ch.is_ascii_digit() || ch == '.');
                // Basic check, full number validation could be a parser/validator task
                TokenType::Number(num_str)
            }
            c if c.is_alphabetic() => {
                let ident_str = c.to_string() + &self.advance_while(|ch| ch.is_alphanumeric() || ch == '_');
                match ident_str.as_str() {
                    // Keywords
                    "graph" => TokenType::Graph,
                    "flowchart" => TokenType::Flowchart,
                    "sequenceDiagram" => TokenType::SequenceDiagram,
                    "classDiagram" => TokenType::ClassDiagram,
                    "stateDiagram" => TokenType::StateDiagram,
                    "gantt" => TokenType::Gantt,
                    "pie" => TokenType::Pie,
                    "gitGraph" => TokenType::GitGraph,
                    "erDiagram" => TokenType::Ermaid, // Typo: ermaid -> erDiagram
                    "journey" => TokenType::Journey,
                    "C4Context" => TokenType::C4c,
                    "mindmap" => TokenType::Mindmap,
                    "timeline" => TokenType::Timeline,
                    "block" => TokenType::Block,
                    "sankey" => TokenType::SanKey,
                    "requirementDiagram" => TokenType::RequirementDiagram,

                    // Directions
                    "TD" | "TB" => TokenType::TD,
                    "BT" => TokenType::BT,
                    "LR" => TokenType::LR,
                    "RL" => TokenType::RL,
                    _ => TokenType::Identifier(ident_str),
                }
            }
            _ => {
                let span = Span::new(
                    self.source_id,
                    start_offset,
                    self.current_offset,
                    start_line,
                    self.current_line,
                    start_column,
                    self.current_column,
                );
                self.report_lexer_error(
                    ErrorCode::M001,
                    span,
                    format!("Unexpected character '{}'", c),
                );
                return None; // Indicate error, lexer will skip this char
            }
        };
        Some(token_type)
    }

    fn skip_whitespace(&mut self) {
        let _ = self.advance_while(|c| c.is_whitespace() && c != '\n');
    }

    // Helper to report a lexer-specific diagnostic
    fn report_lexer_error(&mut self, code: ErrorCode, span: Span, message: String) {
        let diag = Diagnostic::error(code)
            .with_message(message)
            .with_primary_label(span, "unexpected character")
            .build();
        self.diagnostics.push(diag);
    }
}

Explanation of Lexer Changes:

  • Lexer struct: Now includes source_id, current_offset, current_line, current_column to track the exact position, and diagnostics: Vec<Diagnostic> to collect errors.
  • lex() method: Returns (Vec<Token>, Vec<Diagnostic>), allowing it to return all tokens it could parse along with any errors encountered. It now iterates, calls next_token, and if next_token returns None (indicating an error was reported), it advances to try and recover.
  • Position Tracking: advance() method updates current_offset, current_line, current_column correctly, handling multi-byte UTF-8 characters and newlines.
  • Token Creation: Each Token is now created with a Span derived from the start_offset, start_line, start_column before consuming the token and the current_offset, current_line, current_column after.
  • Error Reporting:
    • For ErrorCode::M001 (Unexpected character), if an unhandled character is found, a Diagnostic is created using report_lexer_error and pushed to self.diagnostics. next_token then returns None.
    • For ErrorCode::M002 (Unterminated string), similar logic applies.
  • Arrow Parsing: The arrow parsing logic is still somewhat simplified for now. Complex arrows with text will be fully handled by the parser using these basic arrow tokens and identifiers/strings. The focus here is on correct tokenization and span tracking.

3.6 Integrating Diagnostics into Parser Errors

The parser needs to also collect diagnostics. When it encounters a syntax error (e.g., unexpected token, missing token), it should create a Diagnostic with an appropriate error code and span.

First, update the src/parser/parser.rs imports:

// src/parser/parser.rs (partial, imports only)

use crate::lexer::token::{Token, TokenType};
use crate::parser::ast::*;
use crate::diagnostics::span::{Span, SourceId};
use crate::diagnostics::{Diagnostic, DiagnosticBuilder, error_codes::{ErrorCode, Severity}}; // New imports
use log::{debug, error, warn};

// ... rest of the file

Now, modify the Parser struct and its methods. The parse method will now return (Option<Diagram>, Vec<Diagnostic>).

// src/parser/parser.rs (partial, modify Parser struct and parse method)

pub struct Parser<'a> {
    tokens: &'a [Token],
    current: usize,
    source_id: SourceId,
    diagnostics: Vec<Diagnostic>, // Collect diagnostics
}

impl<'a> Parser<'a> {
    pub fn new(tokens: &'a [Token], source_id: SourceId) -> Self {
        Parser {
            tokens,
            current: 0,
            source_id,
            diagnostics: Vec::new(),
        }
    }

    /// Parses the entire stream of tokens into a Diagram AST,
    /// collecting all encountered diagnostics.
    pub fn parse(mut self) -> (Option<Diagram>, Vec<Diagnostic>) {
        debug!("Starting parsing...");

        // Ensure we have tokens to process
        if self.peek().token_type == TokenType::EOF && self.tokens.len() == 1 {
            let diag = Diagnostic::error(ErrorCode::M102)
                .with_message("Empty input: missing diagram type declaration")
                .with_primary_label(self.peek().span, "expected 'graph TD', 'sequenceDiagram', etc.")
                .build();
            self.diagnostics.push(diag);
            return (None, self.diagnostics);
        }

        let diagram = self.parse_diagram_declaration();
        let mut ast = match diagram {
            Ok(d) => d,
            Err(_) => {
                // Error already reported by parse_diagram_declaration
                // Attempt to synchronize by skipping until a new line or EOF
                self.synchronize();
                return (None, self.diagnostics);
            }
        };

        match &mut ast {
            Diagram::Flowchart(flowchart) => {
                while self.peek().token_type != TokenType::EOF {
                    match self.parse_flowchart_statement() {
                        Ok(Some(stmt)) => flowchart.statements.push(stmt),
                        Ok(None) => {
                            // Recovered, but no statement parsed (e.g., skip whitespace)
                            self.synchronize(); // Try to skip to next statement
                        },
                        Err(_) => {
                            // Error reported by parse_flowchart_statement, attempt to synchronize
                            self.synchronize();
                        }
                    }
                }
            }
            // ... handle other diagram types similarly if they have statements
            _ => {
                // For now, other diagram types might not have further statements parsed
                // We'll expand this as we implement more diagram types.
                debug!("Diagram type {:?} does not yet support further statement parsing.", ast);
            }
        }


        if !self.diagnostics.is_empty() && self.diagnostics.iter().any(|d| d.severity == Severity::Error) {
            (None, self.diagnostics) // If any errors, return None AST
        } else {
            (Some(ast), self.diagnostics) // Otherwise, return AST
        }
    }

    // Helper to report a parser-specific diagnostic
    fn report_parser_error(&mut self, code: ErrorCode, span: Span, message: String, primary_label_msg: Option<String>) {
        let mut builder = Diagnostic::error(code)
            .with_message(message);
        if let Some(label_msg) = primary_label_msg {
            builder = builder.with_primary_label(span, label_msg);
        } else {
            builder = builder.with_primary_label(span, self.peek().token_type.to_string());
        }
        if let Some(help_msg) = code.help() {
            builder = builder.with_help(help_msg);
        }
        self.diagnostics.push(builder.build());
    }

    // Helper to consume a token of expected type, reporting an error if not found.
    fn consume(&mut self, expected: TokenType, error_code: ErrorCode, error_msg: &str) -> Result<Token, Diagnostic> {
        let token = self.peek();
        if token.token_type == expected {
            Ok(self.advance())
        } else {
            let span = token.span;
            let current_token_type = token.token_type.to_string();
            let diag = Diagnostic::error(error_code)
                .with_message(format!("{}. Expected '{}', but found '{}'.", error_msg, expected.to_string(), current_token_type))
                .with_primary_label(span, format!("expected '{}'", expected.to_string()))
                .with_note(format!("The parser expected a '{}' here to continue parsing this construct.", expected.to_string()))
                .with_help(error_code.help().unwrap_or("Review Mermaid syntax for this section.").to_string())
                .build();
            self.diagnostics.push(diag.clone());
            Err(diag) // Return an Err to indicate failure in this specific parsing step
        }
    }

    // New method for error recovery: skip tokens until a likely synchronization point.
    fn synchronize(&mut self) {
        debug!("Attempting to synchronize parser after error.");
        self.advance(); // Consume the erroneous token

        while self.peek().token_type != TokenType::EOF {
            match self.previous().token_type {
                TokenType::SemiColon | TokenType::NewLine => return, // End of statement
                TokenType::OpenBrace | TokenType::OpenParen | TokenType::OpenBracket => return, // Start of a new block
                _ => {}
            }
            // Skip until we find a likely start of a new statement or block
            match self.peek().token_type {
                TokenType::Graph | TokenType::Flowchart | TokenType::SequenceDiagram | TokenType::ClassDiagram => return,
                TokenType::NewLine => {
                    self.advance(); // Consume newline and return
                    return;
                },
                _ => {
                    self.advance();
                }
            }
        }
    }

    // --- Modify existing parsing methods to use diagnostics ---

    fn parse_diagram_declaration(&mut self) -> Result<Diagram, Diagnostic> {
        let start_token = self.peek();
        let diagram_type = match start_token.token_type {
            TokenType::Graph | TokenType::Flowchart => {
                self.advance(); // consume 'graph' or 'flowchart'
                let direction_token = self.peek();
                let direction = match direction_token.token_type {
                    TokenType::TD | TokenType::TB => { self.advance(); FlowchartDirection::TD },
                    TokenType::BT => { self.advance(); FlowchartDirection::BT },
                    TokenType::LR => { self.advance(); FlowchartDirection::LR },
                    TokenType::RL => { self.advance(); FlowchartDirection::RL },
                    _ => {
                        let diag = Diagnostic::error(ErrorCode::M102)
                            .with_message(format!("Missing or invalid direction for '{}' diagram.", start_token.token_type.to_string()))
                            .with_primary_label(direction_token.span, "expected 'TD', 'BT', 'LR', or 'RL'")
                            .with_help(ErrorCode::M102.help().unwrap().to_string())
                            .build();
                        self.diagnostics.push(diag.clone());
                        // Attempt to recover by assuming TD and continuing
                        FlowchartDirection::TD
                    }
                };
                return Ok(Diagram::Flowchart(FlowchartDiagram { direction, statements: Vec::new() }));
            }
            TokenType::SequenceDiagram => {
                self.advance();
                return Ok(Diagram::Sequence(SequenceDiagram { statements: Vec::new() }));
            }
            TokenType::ClassDiagram => {
                self.advance();
                return Ok(Diagram::Class(ClassDiagram { statements: Vec::new() }));
            }
            _ => {
                let diag = Diagnostic::error(ErrorCode::M102)
                    .with_message(format!("Missing or invalid diagram type declaration. Found '{}'.", start_token.token_type.to_string()))
                    .with_primary_label(start_token.span, "expected 'graph', 'flowchart', 'sequenceDiagram', etc.")
                    .with_help(ErrorCode::M102.help().unwrap().to_string())
                    .build();
                self.diagnostics.push(diag.clone());
                return Err(diag);
            }
        };
    }

    // Example of a statement parser that might return multiple diagnostics or an error
    fn parse_flowchart_statement(&mut self) -> Result<Option<FlowchartStatement>, Diagnostic> {
        self.skip_newlines_and_whitespace(); // Helper to advance past newlines/whitespace

        if self.peek().token_type == TokenType::EOF {
            return Ok(None);
        }

        let start_token = self.peek();
        if start_token.token_type == TokenType::Identifier || start_token.token_type == TokenType::StringLiteral {
            let node_id_token = self.advance();
            let node_id = match &node_id_token.token_type {
                TokenType::Identifier(s) => s.clone(),
                TokenType::StringLiteral(s) => s.clone(),
                _ => unreachable!(), // Handled by if condition
            };

            // Node definition or edge
            if self.peek().token_type == TokenType::OpenBracket {
                // Node definition: A[Label]
                self.advance(); // consume '['
                let label_token = self.peek();
                let label = match &label_token.token_type {
                    TokenType::Identifier(s) => { self.advance(); s.clone() },
                    TokenType::StringLiteral(s) => { self.advance(); s.clone() },
                    _ => {
                        let diag = Diagnostic::error(ErrorCode::M103)
                            .with_message("Expected a node label inside brackets.")
                            .with_primary_label(label_token.span, "expected label (identifier or string)")
                            .with_help(ErrorCode::M103.help().unwrap().to_string())
                            .build();
                        self.diagnostics.push(diag);
                        // Attempt to recover by using a dummy label
                        "MISSING_LABEL".to_string()
                    }
                };
                match self.consume(TokenType::CloseBracket, ErrorCode::M101, "Expected ']' to close node label.") {
                    Ok(_) => { /* OK */ },
                    Err(_) => { /* Error already reported */ }
                }
                return Ok(Some(FlowchartStatement::Node(Node { id: node_id, label: Some(label) })));
            } else if self.is_arrow_token(&self.peek().token_type) {
                // Edge definition: A --> B
                let arrow_token = self.advance(); // consume arrow
                let end_node_token = self.peek();
                let end_node_id = match &end_node_token.token_type {
                    TokenType::Identifier(s) => { self.advance(); s.clone() },
                    TokenType::StringLiteral(s) => { self.advance(); s.clone() },
                    _ => {
                        let diag = Diagnostic::error(ErrorCode::M104)
                            .with_message("Expected an end node identifier after arrow.")
                            .with_primary_label(end_node_token.span, "expected node ID")
                            .with_help(ErrorCode::M104.help().unwrap().to_string())
                            .build();
                        self.diagnostics.push(diag);
                        // Attempt to recover by using a dummy node
                        "UNDEFINED_NODE".to_string()
                    }
                };
                return Ok(Some(FlowchartStatement::Edge(Edge {
                    from: node_id,
                    to: end_node_id,
                    arrow_type: self.map_token_to_arrow_type(&arrow_token.token_type),
                    label: None, // For now, no labels on edges
                })));
            } else {
                // Just a node definition without a label or an edge
                return Ok(Some(FlowchartStatement::Node(Node { id: node_id, label: None })));
            }
        }

        // If we reach here, it's an unexpected token at the start of a statement
        let diag = Diagnostic::error(ErrorCode::M100)
            .with_message(format!("Unexpected token '{}' at the start of a statement.", start_token.token_type.to_string()))
            .with_primary_label(start_token.span, "unexpected token")
            .with_help("Expected a node ID, an edge, or a new diagram declaration. This might indicate a syntax error or a missing semicolon.")
            .build();
        self.diagnostics.push(diag);
        Err(diag) // Indicate error, parser will synchronize
    }

    fn skip_newlines_and_whitespace(&mut self) {
        while self.peek().token_type == TokenType::NewLine || self.peek().token_type == TokenType::Whitespace {
            self.advance();
        }
    }

    // ... other helper methods like is_arrow_token, map_token_to_arrow_type
    fn is_arrow_token(&self, token_type: &TokenType) -> bool {
        matches!(
            token_type,
            TokenType::ArrowRight
                | TokenType::ArrowLeft
                | TokenType::DoubleArrow
                | TokenType::ThickArrowRight
                | TokenType::ThickArrowLeft
                | TokenType::ThickDoubleArrow
                | TokenType::DottedArrowRight
                | TokenType::DottedArrowLeft
                | TokenType::DottedDoubleArrow
                | TokenType::CrossArrowRight
                | TokenType::OpenCircleArrowRight
                | TokenType::ArrowRightWithText
                | TokenType::ArrowLeftWithText
                | TokenType::DoubleArrowWithText
                | TokenType::ThickArrowRightWithText
                | TokenType::DottedArrowRightWithText
        )
    }

    fn map_token_to_arrow_type(&self, token_type: &TokenType) -> ArrowType {
        match token_type {
            TokenType::ArrowRight => ArrowType::Solid,
            TokenType::ArrowLeft => ArrowType::SolidLeft,
            TokenType::DoubleArrow => ArrowType::SolidDouble,
            TokenType::ThickArrowRight => ArrowType::Thick,
            TokenType::ThickArrowLeft => ArrowType::ThickLeft,
            TokenType::ThickDoubleArrow => ArrowType::ThickDouble,
            TokenType::DottedArrowRight => ArrowType::Dotted,
            TokenType::DottedArrowLeft => ArrowType::DottedLeft,
            TokenType::DottedDoubleArrow => ArrowType::DottedDouble,
            TokenType::CrossArrowRight => ArrowType::Cross,
            TokenType::OpenCircleArrowRight => ArrowType::OpenCircle,
            // For now, these are simplified, actual text parsing will happen here later
            TokenType::ArrowRightWithText => ArrowType::Solid,
            TokenType::ArrowLeftWithText => ArrowType::SolidLeft,
            TokenType::DoubleArrowWithText => ArrowType::SolidDouble,
            TokenType::ThickArrowRightWithText => ArrowType::Thick,
            TokenType::DottedArrowRightWithText => ArrowType::Dotted,
            _ => {
                warn!("Unknown arrow token type encountered: {:?}", token_type);
                ArrowType::Solid // Default to solid arrow for now
            }
        }
    }

    fn peek(&self) -> &Token {
        self.tokens.get(self.current).unwrap_or_else(|| {
            // Should not happen if EOF is always the last token
            &self.tokens[self.tokens.len() - 1]
        })
    }

    fn previous(&self) -> &Token {
        self.tokens.get(self.current - 1).unwrap_or_else(|| {
            // Should not happen if current is > 0
            &self.tokens[0]
        })
    }

    fn advance(&mut self) -> Token {
        if self.current < self.tokens.len() {
            self.current += 1;
        }
        self.previous().clone()
    }
}

Explanation of Parser Changes:

  • Parser struct: Now includes diagnostics: Vec<Diagnostic>.
  • parse() method:
    • Returns (Option<Diagram>, Vec<Diagnostic>). If any error diagnostics are collected, it returns None for the Diagram to signify an invalid AST.
    • It now calls parse_diagram_declaration and parse_flowchart_statement which can return Result<_, Diagnostic>.
    • If an Err(Diagnostic) is returned, it means a local parsing failure occurred, and the diagnostic is already pushed to self.diagnostics. The parser then calls synchronize().
  • report_parser_error(): A helper similar to the lexer’s, for creating and storing parser-specific diagnostics.
  • consume(): This critical helper now returns Result<Token, Diagnostic>. If the expected token isn’t found, it creates and stores an M100 (Unexpected token) diagnostic and returns Err. This allows calling methods to react to the error (e.g., attempt recovery).
  • synchronize(): This is a basic error recovery mechanism. After an error, it attempts to advance the parser’s current pointer past the problematic token(s) until it finds a “safe” point (like a semicolon, newline, or a keyword starting a new diagram/block). This prevents a single error from cascading into many irrelevant errors and allows the parser to find more distinct issues.
  • parse_diagram_declaration() and parse_flowchart_statement(): These methods now use consume() and report_parser_error() to generate diagnostics. They also try to return partial results or default values (e.g., FlowchartDirection::TD on error) to allow parsing to continue, even if the AST is technically malformed.

3.7 The DiagnosticEmitter (Reporter)

This component is responsible for taking our collected Diagnostics and rendering them beautifully using ariadne.

Create the file src/diagnostics/emitter.rs:

// src/diagnostics/emitter.rs

use std::collections::HashMap;
use ariadne::{Color, Fmt, Label, Report, ReportKind, Source};

use crate::diagnostics::{Diagnostic, DiagnosticLabel, error_codes::Severity, span::SourceId};

/// Manages and emits diagnostics in a compiler-like format.
pub struct DiagnosticEmitter<'a> {
    sources: HashMap<SourceId, Source<&'a str>>, // Stores source code for highlighting
}

impl<'a> DiagnosticEmitter<'a> {
    pub fn new() -> Self {
        DiagnosticEmitter {
            sources: HashMap::new(),
        }
    }

    /// Adds a source file to the emitter. The source content is needed for highlighting.
    pub fn add_source(&mut self, source_id: SourceId, content: &'a str) {
        self.sources.insert(source_id, Source::from(content));
    }

    /// Emits a collection of diagnostics to stderr.
    /// Returns true if any errors were emitted, false otherwise.
    pub fn emit_diagnostics(&self, diagnostics: &[Diagnostic]) -> bool {
        let mut has_errors = false;

        for diag in diagnostics {
            let report_kind = match diag.severity {
                Severity::Error => {
                    has_errors = true;
                    ReportKind::Error
                }
                Severity::Warning => ReportKind::Warning,
                Severity::Note => ReportKind::Advice, // Ariadne uses Advice for notes
                Severity::Help => ReportKind::Advice, // Ariadne uses Advice for help
            };

            let primary_label_span = diag.labels.iter()
                .filter(|l| l.is_primary)
                .map(|l| l.span)
                .next();

            // Ariadne requires at least one primary label span to anchor the report
            let report_builder = if let Some(span) = primary_label_span {
                Report::build(report_kind, span.source_id, span.start)
            } else {
                // Fallback for diagnostics without a primary label (e.g., global issues)
                // This might not highlight code, but will still print the message.
                // We pick the first label span if available, or a dummy span.
                let fallback_span = diag.labels.first().map(|l| l.span).unwrap_or_else(crate::diagnostics::span::Span::dummy);
                Report::build(report_kind, fallback_span.source_id, fallback_span.start)
            };


            let mut report = report_builder
                .with_code(format!("{:?}", diag.code))
                .with_message(&diag.message);

            // Add labels
            for label_data in &diag.labels {
                let color = match diag.severity {
                    Severity::Error => Color::Red,
                    Severity::Warning => Color::Yellow,
                    Severity::Note => Color::Blue,
                    Severity::Help => Color::Green,
                };
                let label = Label::new(label_data.span)
                    .with_message(&label_data.message)
                    .with_color(color);
                report = report.with_label(label);
            }

            // Add notes
            for note in &diag.notes {
                report = report.with_note(note);
            }

            // Add help message
            if let Some(help_msg) = &diag.help {
                report = report.with_help(help_msg);
            }

            // Emit the report
            if let Some(source) = self.sources.get(&primary_label_span.unwrap_or_else(crate::diagnostics::span::Span::dummy).source_id) {
                report.finish().print((&primary_label_span.unwrap_or_else(crate::diagnostics::span::Span::dummy).source_id, source))
                    .expect("Failed to print diagnostic report");
            } else {
                // If source is not found, print a simpler message
                eprintln!("{}: {}: {}",
                    match diag.severity {
                        Severity::Error => "error".fg(Color::Red),
                        Severity::Warning => "warning".fg(Color::Yellow),
                        Severity::Note => "note".fg(Color::Blue),
                        Severity::Help => "help".fg(Color::Green),
                    },
                    format!("{:?}", diag.code).fg(Color::Magenta),
                    diag.message
                );
                for note in &diag.notes {
                    eprintln!("  = note: {}", note);
                }
                if let Some(help_msg) = &diag.help {
                    eprintln!("  = help: {}", help_msg);
                }
            }
        }
        has_errors
    }
}

Explanation:

  • DiagnosticEmitter struct: Holds a HashMap of SourceId to ariadne::Source<&'a str>. This allows ariadne to retrieve the actual source code content for highlighting.
  • add_source(): Call this once for each input file to register its content with the emitter.
  • emit_diagnostics():
    • Iterates through each Diagnostic.
    • Maps our Severity to ariadne::ReportKind.
    • Constructs an ariadne::Report using Report::build(). It’s crucial to provide a primary label’s span as the anchor for the report.
    • Adds all DiagnosticLabels to the ariadne::Report using Label::new(), assigning colors based on severity.
    • Adds notes and help messages.
    • Finally, report.finish().print() renders the diagnostic to stderr, using the registered source content for highlighting.
    • Includes a fallback print if the source content for a given SourceId isn’t found, to ensure some output is always produced.

3.8 Updating main.rs to Use Diagnostics

Now, let’s update src/main.rs to wire everything together. We’ll read input, lex it, parse it, collect diagnostics, and then emit them.

// src/main.rs

mod lexer;
mod parser;
mod diagnostics;

use lexer::lexer::Lexer;
use parser::parser::Parser;
use diagnostics::emitter::DiagnosticEmitter;
use diagnostics::span::SourceId;
use log::{error, info, debug};
use env_logger::Env;
use std::fs;

fn main() {
    // Initialize logger for debug messages
    env_logger::Builder::from_env(Env::default().default_filter_or("info")).init();

    let args: Vec<String> = std::env::args().collect();
    let file_path = args.get(1).expect("Please provide a Mermaid file path as an argument.");
    let source_code = fs::read_to_string(file_path)
        .unwrap_or_else(|err| {
            error!("Failed to read file {}: {}", file_path, err);
            std::process::exit(1);
        });

    let source_id = SourceId(file_path.leak()); // Leak to get &'static str. For a real app, manage lifetimes or use String.
    info!("Analyzing Mermaid file: {}", file_path);

    // 1. Lexing
    debug!("Starting lexing...");
    let lexer = Lexer::new(&source_code, source_id);
    let (tokens, lexer_diagnostics) = lexer.lex();
    debug!("Lexing completed. Found {} tokens.", tokens.len());
    // debug!("Tokens: {:?}", tokens);

    // 2. Parsing
    debug!("Starting parsing...");
    let parser = Parser::new(&tokens, source_id);
    let (ast, parser_diagnostics) = parser.parse();
    debug!("Parsing completed.");
    // debug!("AST: {:?}", ast);

    // 3. Emit Diagnostics
    let mut emitter = DiagnosticEmitter::new();
    emitter.add_source(source_id, &source_code);

    let mut all_diagnostics = Vec::new();
    all_diagnostics.extend(lexer_diagnostics);
    all_diagnostics.extend(parser_diagnostics);

    if !all_diagnostics.is_empty() {
        info!("Emitting diagnostics...");
        let has_errors = emitter.emit_diagnostics(&all_diagnostics);
        if has_errors {
            error!("Analysis completed with errors.");
            std::process::exit(1);
        } else {
            info!("Analysis completed with warnings.");
        }
    } else {
        info!("No diagnostics found. Mermaid code is syntactically valid.");
    }

    match ast {
        Some(diagram) => {
            info!("Successfully parsed AST: {:#?}", diagram);
            // In future chapters, we'll pass this AST to the validator and rule engine.
        }
        None => {
            error!("Failed to produce a valid AST due to fatal errors.");
            std::process::exit(1);
        }
    }
}

Explanation:

  • Logging: env_logger is initialized for better terminal output of info!, debug!, error! macros.
  • File Reading: Reads the Mermaid file provided as a CLI argument.
  • SourceId: We’re leaking the file_path string to get a &'static str for SourceId. In a more complex application, you’d manage source lifetime more explicitly (e.g., using Arc<String> or passing String ownership). For a simple CLI tool, leak() is often acceptable if the source content is needed for the entire program lifetime.
  • Lexer and Parser Calls: The main function now calls lexer.lex() and parser.parse(), collecting their respective diagnostics.
  • DiagnosticEmitter: An instance is created, add_source is called to register the input file’s content, and then emit_diagnostics is called with all collected diagnostics.
  • Exit Code: The program exits with a non-zero code if any errors were reported, following standard CLI tool conventions.

Testing This Component

Let’s create a test Mermaid file with some intentional errors to see our diagnostics in action.

Create a file named test.mmd in your project root:

graph TD A[Start] --> B(Process) B --x C[End D --x A // Node D is not defined E[Another Node] <-- Malformed Arrow F[Node F] -.- G[Node G H == I J[Unterminated String " K[Node K] invalid keyword

Now, run your tool with this file:

cargo run test.mmd

Expected Output (will vary slightly based on terminal colors and exact ariadne version, but should be similar in structure):

info: mermaid_analyzer: Analyzing Mermaid file: test.mmd
debug: mermaid_analyzer: Starting lexing...
debug: mermaid_analyzer: Lexing completed. Found 32 tokens.
debug: mermaid_analyzer: Starting parsing...
debug: mermaid_analyzer: Parsing completed.
info: mermaid_analyzer: Emitting diagnostics...
error[M002]: Unterminated string literal
  ┌─ test.mmd:8:19
  │
8 │     J[Unterminated String "
  │                   ---------
  │                   │
  │                   unterminated string literal
  = help: String literals must be closed with a matching quote. Add a '"' or apostrophe to terminate the string.

error[M101]: Expected ']' to close node label. Expected 'CloseBracket', but found 'NewLine'.
  ┌─ test.mmd:3:19
  │
3 │     B --x C[End
  │                   ^ expected ']'
  = note: The parser expected a 'CloseBracket' here to continue parsing this construct.
  = help: Ensure all parentheses, brackets, and braces are correctly matched and balanced. Each opening symbol must have a corresponding closing symbol.

error[M101]: Expected ']' to close node label. Expected 'CloseBracket', but found 'NewLine'.
  ┌─ test.mmd:6:21
  │
6 │     F[Node F] -.- G[Node G
  │                     ^ expected ']'
  = note: The parser expected a 'CloseBracket' here to continue parsing this construct.
  = help: Ensure all parentheses, brackets, and braces are correctly matched and balanced. Each opening symbol must have a corresponding closing symbol.

error[M100]: Unexpected token 'Identifier("invalid")' at the start of a statement.
  ┌─ test.mmd:9:5
  │
9 │     invalid keyword
  │     ^^^^^^^ unexpected token
  = note: Expected a node ID, an edge, or a new diagram declaration. This might indicate a syntax error or a missing semicolon.
  = help: Expected token, found something else. This usually indicates a syntax error. Check the Mermaid syntax documentation for this construct.

error[M104]: Expected an end node identifier after arrow.
  ┌─ test.mmd:5:20
  │
5 │     E[Another Node] <-- Malformed Arrow
  │                        ^^^^^^^^^^^^^^^ expected node ID
  = note: The parser expected a 'Identifier' here to continue parsing this construct.
  = help: An edge definition must specify two nodes and a valid arrow type (e.g., 'A --> B'). Check for missing nodes, invalid arrow syntax, or extra characters.

error: Analysis completed with errors.

This output demonstrates:

  • Clear error[MXXX] codes.
  • Descriptive messages.
  • Precise line and column numbers.
  • Code highlighting with ariadne.
  • Contextual notes and help messages.

This is a significant improvement over simple error strings!

Production Considerations

  1. Performance:

    • String Allocations: Minimize String allocations for diagnostic messages where possible (e.g., using &'static str for ErrorCode messages). ariadne itself is optimized for performance.
    • Source Management: For very large files or multiple files, loading all source into HashMap<SourceId, Source<&'a str>> might consume memory. Consider a SourceManager that can load sources lazily or stream them if ariadne supports it for extremely large files (though usually not an issue for typical Mermaid diagrams).
    • Diagnostic Collection: The Vec<Diagnostic> approach is efficient for most cases. For an extreme number of diagnostics (e.g., parsing a huge, completely malformed file), consider a bounded collection or early exit if a certain error threshold is met.
  2. Logging and Monitoring:

    • Integration: Diagnostics are a specific type of log. Ensure they can be easily integrated with broader application logging systems (e.g., tracing).
    • Structured Output: For CI/CD pipelines or IDE integrations, a machine-readable JSON output format for diagnostics would be invaluable. The Diagnostic struct is already structured; only the DiagnosticEmitter needs to be adapted to serialize to JSON instead of printing with ariadne.
  3. Internationalization (i18n):

    • For a truly global tool, diagnostic messages and help texts would need to be localized. This means abstracting messages from hardcoded strings, perhaps using a message catalog system. The ErrorCode enum provides a good key for this.
  4. Error Recovery Strategy:

    • The current synchronize() method is a basic panic-mode recovery. For production, more sophisticated error recovery (e.g., using error productions in the grammar or more context-aware skipping) might be necessary to find even more errors in highly malformed input. However, “strict correctness” often implies less recovery, failing early and clearly. Our current approach balances reporting multiple errors with not getting stuck.

Code Review Checkpoint

At this stage, we have successfully implemented a robust diagnostic system:

  • src/diagnostics/span.rs: Defines Span and SourceId for precise source location tracking, and implements ariadne::Span.
  • src/diagnostics/error_codes.rs: Defines Severity and ErrorCode with descriptive messages and helpful suggestions.
  • src/diagnostics/mod.rs: Contains the core Diagnostic struct and its DiagnosticBuilder for ergonomic creation.
  • src/diagnostics/emitter.rs: Implements DiagnosticEmitter using ariadne to render rich, compiler-style error messages to the console.
  • src/lexer/token.rs: Token struct now includes a Span.
  • src/lexer/lexer.rs: Modified to calculate and assign Spans to tokens, and to collect Diagnostics for lexical errors (M001, M002). It returns (Vec<Token>, Vec<Diagnostic>).
  • src/parser/parser.rs: Modified to consume tokens with Spans, generate Diagnostics for syntax errors (M100, M101, M102, M103, M104), and employ a basic synchronize() error recovery strategy. It returns (Option<Diagram>, Vec<Diagnostic>).
  • src/main.rs: Orchestrates the lexing, parsing, and diagnostic emission process, exiting with an error code if critical diagnostics are found.

The project now produces much more user-friendly and actionable feedback, a critical step towards a production-ready tool.

Common Issues & Solutions

  1. Misaligned Highlights in ariadne Output:

    • Issue: The highlighted code snippet doesn’t match the reported Span or points to the wrong character.
    • Cause: Incorrect start and end byte offsets in the Span struct, or current_offset not being updated correctly in the lexer. This is especially tricky with multi-byte UTF-8 characters.
    • Solution: Double-check the advance() method in lexer.rs to ensure current_offset is incremented by c.len_utf8() (not just 1). Carefully trace the start_offset and end_offset calculations for each token. Use debug! logs to print current_offset, current_line, current_column at various points in the lexer for verification.
    • Prevention: Always use char.len_utf8() for byte offset calculations when dealing with str::Chars.
  2. ariadne Panic: “No source found for SourceId…”:

    • Issue: ariadne panics because it can’t find the source content for a given SourceId when trying to print a report.
    • Cause: The DiagnosticEmitter::add_source() method was not called for the SourceId associated with the diagnostic, or the SourceId being used for diagnostics is different from the one registered.
    • Solution: Ensure emitter.add_source(source_id, &source_code); is called before emitter.emit_diagnostics(). Verify that the SourceId passed to the lexer and parser is the exact same SourceId used when registering the source with the emitter. For &'static str IDs, this means they must literally point to the same string data or be identical values.
    • Prevention: Centralize SourceId creation and management. For single-file CLI tools, ensure file_path.leak() (or similar) is consistent.
  3. Parser Stops at First Error, Doesn’t Report More:

    • Issue: Only one error message is printed, even if there are multiple obvious syntax errors in the input.
    • Cause: The parser’s error handling immediately returns Err without attempting to recover or collect subsequent errors. The parse method might not be iterating correctly after an error.
    • Solution: Ensure methods like parse_diagram_declaration or parse_flowchart_statement add diagnostics to self.diagnostics and then return Err(diag) to indicate local failure, but the caller (e.g., the main parse loop) should then call self.synchronize() and continue its loop. The parse method should return (Option<AST>, Vec<Diagnostic>) to signify that parsing might have continued even if the AST is incomplete/invalid.
    • Prevention: Design error handling with explicit recovery points and ensure diagnostic collection is prioritized over immediate termination.

Testing & Verification

To thoroughly test and verify our new diagnostic system, we should create a suite of test files covering various error scenarios:

  1. Lexical Errors:
    • lexer_error_unexpected_char.mmd: Contains characters not allowed in Mermaid (e.g., graph TD !@#$).
    • lexer_error_unterminated_string.mmd: graph TD A["Unclosed string]
  2. Parser Syntax Errors:
    • parser_error_missing_declaration.mmd: Starts directly with A --> B without graph TD.
    • parser_error_unmatched_bracket.mmd: graph TD A[Label --> B (missing ]).
    • parser_error_malformed_edge.mmd: graph TD A -- B (missing arrow head).
    • parser_error_unexpected_token.mmd: graph TD A[Label] B C (unexpected B after a node definition).
    • parser_error_invalid_direction.mmd: graph ZZ A --> B (invalid direction).
  3. Mixed Errors:
    • A file combining several types of errors to test error recovery and multiple diagnostic reporting.

Verification Steps:

  1. Run with each test file: cargo run <test_file.mmd>
  2. Check Output:
    • Error Codes: Are the MXXX codes correct for the type of error?
    • Messages: Are the main messages clear and accurate?
    • Spans/Highlights: Does ariadne highlight the correct section of the code?
    • Notes/Help: Are the supplementary notes and help messages relevant and actionable?
    • Multiple Errors: For files with multiple errors, does the tool report all of them (or as many as it can find before recovery becomes impossible)?
    • Exit Code: Does the program exit with 1 if any errors were reported, and 0 if only warnings or no diagnostics?

By systematically testing these cases, you can ensure your diagnostic system is robust and provides the high-quality feedback expected from a production-grade tool.

Summary & Next Steps

In this chapter, we significantly enhanced our Mermaid analyzer by implementing a sophisticated diagnostic system. We defined granular Spans for precise location tracking, established a set of unique ErrorCodes with Severity levels, and created a Diagnostic structure to encapsulate all error information. Crucially, we integrated this system into our Lexer and Parser, allowing them to collect and report detailed errors instead of merely failing. Finally, we built a DiagnosticEmitter leveraging the ariadne crate to present these diagnostics in a visually rich, compiler-style format, greatly improving the user experience.

Our tool can now not only detect issues but also explain them clearly and guide the user toward a solution. This foundation is critical for the next phase of our project.

In Chapter 7: Semantic Validation: Ensuring Correct Mermaid Structure, we will build upon this diagnostic system to implement a dedicated Validator component. This validator will traverse the Abstract Syntax Tree (AST) produced by the parser to detect semantic errors that the lexer and parser cannot catch, such as duplicate node IDs, undefined nodes in edges, and invalid structural nesting, further enhancing the correctness and reliability of our Mermaid analyzer.