Chapter Introduction
In the previous chapters, we laid the groundwork for our Mermaid analyzer by building a robust lexer and parser. While these components are crucial for understanding the Mermaid code’s structure, their current error reporting is rudimentary, often just returning a simple error message or panicking. For a production-grade tool that aims to mimic the reliability and user-friendliness of compilers like rustc, this is insufficient.
This chapter focuses on transforming our basic error handling into a sophisticated diagnostic system. We will design and implement a comprehensive Diagnostic structure capable of capturing detailed information about errors and warnings, including precise source code locations, unique error codes, severity levels, and actionable help messages. We will then integrate this system into our lexer and parser, enabling them to emit rich diagnostics instead of opaque error messages. Finally, we will build a DiagnosticEmitter that leverages the ariadne crate to render these diagnostics in a visually appealing, compiler-style format, complete with code highlighting and contextual information.
By the end of this chapter, our tool will no longer just fail; it will intelligently guide the user to understand what went wrong, where it went wrong, and how to fix it, significantly improving the developer experience and making our tool genuinely production-ready.
Planning & Design
A robust diagnostic system is at the heart of any reliable code analysis tool. It needs to provide clear, actionable feedback. Our design will follow these principles:
- Unified Diagnostic Structure: A single, extensible
Diagnosticstruct to represent all types of issues (errors, warnings, notes). - Precise Location Tracking (Span): Every diagnostic must pinpoint the exact location in the source code using byte offsets, line numbers, and column numbers.
- Unique Error Codes: Each distinct error type will have a unique code (e.g.,
M001,M002) for easy reference and documentation. - Severity Levels: Differentiate between critical errors, warnings, and informational notes.
- Contextual Information: Allow for multiple labels (primary, secondary) and supplementary notes or help messages.
- Pluggable Emitter: Separate the diagnostic data from its rendering, allowing different output formats (e.g., plain text, JSON, compiler-style). We’ll use
ariadnefor a rich console output. - Non-Fatal Error Collection: Lexer and parser should collect all possible diagnostics, rather than stopping at the first error, to provide a comprehensive report.
Component Architecture
The diagnostic system will integrate with our existing lexer and parser, and later with the validator.
Explanation:
SourceManager: A simple component that holds the input Mermaid code, allowing diagnostics to reference it for highlighting.Lexer: Now, in addition to emittingTokens, it will also emitDiagnostics if it encounters lexical errors. EachTokenwill carry itsSpan.Parser: ConsumesTokens withSpans. If syntax errors occur, it emitsDiagnostics. It attempts to recover and continue parsing to find more errors, if possible, returning anOption<Ast>along with aVec<Diagnostic>.DiagnosticCollector: A conceptual component that aggregates all diagnostics generated by the lexer, parser, and future validator.DiagnosticEmitter: Takes the collectedDiagnostics and the original source code from theSourceManagerto render them usingariadne.
File Structure
We’ll introduce a new src/diagnostics module:
src/
├── main.rs
├── lexer/
│ ├── mod.rs
│ └── token.rs
├── parser/
│ ├── mod.rs
│ └── ast.rs
└── diagnostics/
├── mod.rs # Main diagnostic struct and builder
├── error_codes.rs # Enum for all error codes
├── span.rs # Source code location tracking
└── emitter.rs # Diagnostic rendering logic (using ariadne)
Step-by-Step Implementation
3.1 Setup: Add ariadne Dependency
First, we need to add the ariadne crate to our project. ariadne is a fantastic library for producing beautiful, compiler-style error messages.
Open your Cargo.toml and add ariadne to the [dependencies] section:
# Cargo.toml
[package]
name = "mermaid-analyzer"
version = "0.1.0"
edition = "2021"
[dependencies]
ariadne = "0.4.0" # Add this line
log = "0.4"
env_logger = "0.11"
# ... other dependencies from previous chapters
Run cargo check to ensure the dependency is fetched.
3.2 Defining Span and SourceId
A Span represents a contiguous region in the source code. It’s crucial for pointing to the exact location of an error or warning. We’ll use byte offsets for internal consistency and ariadne compatibility, and also store line/column for human readability. SourceId will uniquely identify the source file (though for a single file CLI tool, it might just be “input” or the filename).
Create the file src/diagnostics/span.rs:
// src/diagnostics/span.rs
/// A unique identifier for a source file.
/// For a CLI tool processing a single input, this might just be "input".
#[derive(Debug, Clone, PartialEq, Eq, Hash, Copy)]
pub struct SourceId(pub &'static str); // Using &'static str for simplicity for now
impl From<&'static str> for SourceId {
fn from(s: &'static str) -> Self {
SourceId(s)
}
}
/// Represents a contiguous region in the source code.
/// Stores byte offsets for precise highlighting and line/column for human readability.
#[derive(Debug, Clone, PartialEq, Eq, Copy)]
pub struct Span {
pub source_id: SourceId,
pub start: usize, // Start byte offset
pub end: usize, // End byte offset (exclusive)
pub start_line: usize,
pub end_line: usize,
pub start_column: usize,
pub end_column: usize,
}
impl Span {
/// Creates a new span.
pub fn new(
source_id: SourceId,
start: usize,
end: usize,
start_line: usize,
end_line: usize,
start_column: usize,
end_column: usize,
) -> Self {
Self {
source_id,
start,
end,
start_line,
end_line,
start_column,
end_column,
}
}
/// Creates a dummy span for cases where a real span isn't available (e.g., internal errors).
pub fn dummy() -> Self {
Self {
source_id: SourceId("dummy"),
start: 0,
end: 0,
start_line: 1,
end_line: 1,
start_column: 1,
end_column: 1,
}
}
/// Returns the length of the span in bytes.
pub fn len(&self) -> usize {
self.end - self.start
}
/// Checks if the span is empty.
pub fn is_empty(&self) -> bool {
self.len() == 0
}
/// Combines two spans into a single span that covers both.
/// Assumes both spans are from the same source.
pub fn merge(&self, other: &Self) -> Self {
assert_eq!(self.source_id, other.source_id, "Cannot merge spans from different sources");
let start = self.start.min(other.start);
let end = self.end.max(other.end);
let start_line = self.start_line.min(other.start_line);
let end_line = self.end_line.max(other.end_line);
let start_column = if self.start_line == start_line {
self.start_column
} else {
other.start_column
};
let end_column = if self.end_line == end_line {
self.end_column
} else {
other.end_column
};
Self {
source_id: self.source_id,
start,
end,
start_line,
end_line,
start_column,
end_column,
}
}
}
// Implement ariadne's Span trait for our Span struct
impl ariadne::Span for Span {
type SourceId = SourceId;
fn source(&self) -> &Self::SourceId {
&self.source_id
}
fn start(&self) -> usize {
self.start
}
fn end(&self) -> usize {
self.end
}
}
Explanation:
SourceId: A wrapper around&'static strto identify the source file.ariadneneeds this to associate spans with their content.Span: Storesstartandendbyte offsets, whichariadneuses directly. We also storestart_line,end_line,start_column,end_columnfor convenience and more granular human-readable output, thoughariadnecan compute these from byte offsets if given the source.dummy(): Useful for internal errors or when a precise location isn’t available.merge(): Essential for parser diagnostics, where an error might span multiple tokens.ariadne::Spanimplementation: This trait is crucial forariadneto understand how to use ourSpantype.
3.3 Defining ErrorCode and Severity
Explicit error codes help users quickly look up detailed documentation and provide a stable identifier for issues. Severity dictates how the diagnostic should be treated (e.g., halt compilation, just warn).
Create the file src/diagnostics/error_codes.rs:
// src/diagnostics/error_codes.rs
/// Represents the severity level of a diagnostic.
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub enum Severity {
Error,
Warning,
Note,
Help,
}
/// Unique error codes for different types of issues.
/// M = Mermaid Analyzer
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub enum ErrorCode {
// Lexer errors (M001-M099)
M001, // Unexpected character
M002, // Unterminated string literal
// Parser errors (M100-M199)
M100, // Expected token, found something else
M101, // Unmatched parenthesis/bracket
M102, // Missing diagram type declaration (e.g., 'graph TD')
M103, // Invalid node ID
M104, // Malformed edge definition
M105, // Invalid diagram type
// Semantic / Validation errors (M200-M299) - will be used in future chapters
M200, // Duplicate node ID
M201, // Undefined node in edge
M202, // Invalid direction for diagram type
M203, // Cyclic dependency detected
// ... more codes as we expand
}
impl ErrorCode {
/// Returns a short, descriptive message for the error code.
pub fn message(&self) -> &'static str {
match self {
ErrorCode::M001 => "Unexpected character",
ErrorCode::M002 => "Unterminated string literal",
ErrorCode::M100 => "Unexpected token",
ErrorCode::M101 => "Unmatched parenthesis or bracket",
ErrorCode::M102 => "Missing diagram type declaration",
ErrorCode::M103 => "Invalid node ID format",
ErrorCode::M104 => "Malformed edge definition",
ErrorCode::M105 => "Invalid or unsupported diagram type",
ErrorCode::M200 => "Duplicate node identifier",
ErrorCode::M201 => "Edge refers to an undefined node",
ErrorCode::M202 => "Invalid direction for this diagram type",
ErrorCode::M203 => "Cyclic dependency detected in graph",
}
}
/// Returns a longer, more detailed help message for the error code.
pub fn help(&self) -> Option<&'static str> {
match self {
ErrorCode::M001 => Some("This character is not allowed in Mermaid syntax at this position. Please check for typos or invalid symbols."),
ErrorCode::M002 => Some("String literals must be closed with a matching quote. Add a '\"' or apostrophe to terminate the string."),
ErrorCode::M100 => Some("The parser expected a different kind of token here. This usually indicates a syntax error. Check the Mermaid syntax documentation for this construct."),
ErrorCode::M101 => Some("Ensure all parentheses, brackets, and braces are correctly matched and balanced. Each opening symbol must have a corresponding closing symbol."),
ErrorCode::M102 => Some("All Mermaid diagrams must start with a declaration like 'graph TD', 'sequenceDiagram', or 'classDiagram'. Add a diagram type declaration at the beginning of your code."),
ErrorCode::M103 => Some("Node IDs must follow Mermaid naming conventions. They typically consist of alphanumeric characters and underscores, or be quoted if they contain special characters."),
ErrorCode::M104 => Some("An edge definition must specify two nodes and a valid arrow type (e.g., 'A --> B'). Check for missing nodes, invalid arrow syntax, or extra characters."),
ErrorCode::M105 => Some("The specified diagram type is either misspelled or not supported by Mermaid, or not yet implemented by this analyzer. Refer to Mermaid documentation for supported types."),
ErrorCode::M200 => Some("Node IDs must be unique within a diagram. Rename one of the nodes to resolve this conflict."),
ErrorCode::M201 => Some("All nodes referenced in an edge must be defined in the diagram. Ensure node '{node_id}' is declared before it's used in an edge."),
ErrorCode::M202 => Some("The chosen direction (e.g., 'TD', 'LR') is not valid for this specific diagram type. Consult the Mermaid documentation for valid directions."),
ErrorCode::M203 => Some("A cycle was detected in your graph, which might indicate a logical error or infinite loop in certain contexts. Review the relationships between nodes."),
_ => None,
}
}
}
Explanation:
Severity: A simple enum to classify diagnostics.ErrorCode: An enum for unique identifiers. We’ve started with some common lexer, parser, and future validation errors.message(): Provides a concise description for each code.help(): Offers a more detailed explanation and actionable advice, improving the user experience significantly.
3.4 The Diagnostic Structure
This is the core structure that holds all information about a single diagnostic. It will use the Span, ErrorCode, and Severity we just defined. We’ll also provide a DiagnosticBuilder for ergonomic creation.
Create the file src/diagnostics/mod.rs:
// src/diagnostics/mod.rs
pub mod error_codes;
pub mod span;
pub mod emitter;
use error_codes::{ErrorCode, Severity};
use span::{Span, SourceId};
/// Represents a label associated with a diagnostic.
#[derive(Debug, Clone, PartialEq, Eq)]
pub struct DiagnosticLabel {
pub span: Span,
pub message: String,
pub is_primary: bool, // True for the main point of interest
}
/// A comprehensive diagnostic message, similar to a compiler error.
#[derive(Debug, Clone, PartialEq, Eq)]
pub struct Diagnostic {
pub severity: Severity,
pub code: ErrorCode,
pub message: String,
pub labels: Vec<DiagnosticLabel>,
pub notes: Vec<String>,
pub help: Option<String>,
}
impl Diagnostic {
/// Creates a new `DiagnosticBuilder` for constructing a diagnostic.
pub fn new(severity: Severity, code: ErrorCode) -> DiagnosticBuilder {
DiagnosticBuilder::new(severity, code)
}
/// Convenience method for creating an error diagnostic.
pub fn error(code: ErrorCode) -> DiagnosticBuilder {
DiagnosticBuilder::new(Severity::Error, code)
}
/// Convenience method for creating a warning diagnostic.
pub fn warning(code: ErrorCode) -> DiagnosticBuilder {
DiagnosticBuilder::new(Severity::Warning, code)
}
/// Convenience method for creating a note diagnostic.
pub fn note(code: ErrorCode) -> DiagnosticBuilder {
DiagnosticBuilder::new(Severity::Note, code)
}
}
/// A builder for constructing `Diagnostic` instances more ergonomically.
#[derive(Debug, Clone, PartialEq, Eq)]
pub struct DiagnosticBuilder {
severity: Severity,
code: ErrorCode,
message: Option<String>,
labels: Vec<DiagnosticLabel>,
notes: Vec<String>,
help: Option<String>,
}
impl DiagnosticBuilder {
pub fn new(severity: Severity, code: ErrorCode) -> Self {
Self {
severity,
code,
message: None,
labels: Vec::new(),
notes: Vec::new(),
help: None,
}
}
/// Sets the primary message for the diagnostic. If not set, a default from `ErrorCode` is used.
pub fn with_message(mut self, message: impl Into<String>) -> Self {
self.message = Some(message.into());
self
}
/// Adds a primary label, which points to the main location of the issue.
pub fn with_primary_label(mut self, span: Span, message: impl Into<String>) -> Self {
self.labels.push(DiagnosticLabel {
span,
message: message.into(),
is_primary: true,
});
self
}
/// Adds a secondary label, for additional context or related locations.
pub fn with_secondary_label(mut self, span: Span, message: impl Into<String>) -> Self {
self.labels.push(DiagnosticLabel {
span,
message: message.into(),
is_primary: false,
});
self
}
/// Adds a note, which is additional textual information.
pub fn with_note(mut self, note: impl Into<String>) -> Self {
self.notes.push(note.into());
self
}
/// Sets the help message. If not set, a default from `ErrorCode` is used.
pub fn with_help(mut self, help: impl Into<String>) -> Self {
self.help = Some(help.into());
self
}
/// Builds the `Diagnostic` instance.
pub fn build(self) -> Diagnostic {
let code_message = self.code.message();
let code_help = self.code.help();
Diagnostic {
severity: self.severity,
code: self.code,
message: self.message.unwrap_or_else(|| code_message.to_string()),
labels: self.labels,
notes: self.notes,
help: self.help.or_else(|| code_help.map(|s| s.to_string())),
}
}
}
Explanation:
DiagnosticLabel: Stores aSpanand a message for highlighting specific parts of the code.is_primaryhelpsariadnedetermine the main highlight.Diagnostic: Containsseverity,code, the mainmessage,labels(for highlights),notes, and an optionalhelpmessage.DiagnosticBuilder: Provides a fluent API for constructingDiagnostics, making it easier to add labels, notes, and customize messages. It also defaults to messages and help fromErrorCodeif not explicitly overridden.
3.5 Integrating Spans into Lexer Tokens
Now, we need to modify our Token structure to include a Span. This Span will be calculated by the lexer as it processes the input.
Modify src/lexer/token.rs:
// src/lexer/token.rs
use crate::diagnostics::span::Span; // Import Span
/// Represents the different types of tokens in Mermaid syntax.
#[derive(Debug, Clone, PartialEq, Eq)]
pub enum TokenType {
// Keywords
Graph,
Flowchart,
SequenceDiagram,
ClassDiagram,
StateDiagram,
Gantt,
Pie,
GitGraph,
Ermaid,
Journey,
C4c,
Mindmap,
Timeline,
Block,
SanKey,
RequirementDiagram,
// Directions (for graph/flowchart)
TD, // Top-Down
BT, // Bottom-Top
LR, // Left-Right
RL, // Right-Left
TB, // Top-Bottom (alias for TD)
// Structural elements
OpenParen, // (
CloseParen, // )
OpenBracket, // [
CloseBracket, // ]
OpenBrace, // {
CloseBrace, // }
DoubleOpenBracket, // [[
DoubleCloseBracket, // ]]
Colon, // :
SemiColon, // ;
Comma, // ,
Dot, // .
Equals, // =
Plus, // +
Minus, // -
Star, // *
Hash, // #
Percent, // %
Pipe, // |
Backslash, // \
// Arrows and connectors
ArrowRight, // -->
ArrowLeft, // <--
DoubleArrow, // <-->
ThickArrowRight, // ==>
ThickArrowLeft, // <==
ThickDoubleArrow, // <==>
DottedArrowRight, // -.->
DottedArrowLeft, // <-.
DottedDoubleArrow, // <-->
CrossArrowRight, // --x
OpenCircleArrowRight, // --o
ArrowRightWithText, // --- text -->
ArrowLeftWithText, // <-- text ---
DoubleArrowWithText, // <--- text --->
ThickArrowRightWithText, // === text ==>
DottedArrowRightWithText, // -.- text -.->
// Literals and identifiers
Identifier(String),
StringLiteral(String), // Quoted strings
Number(String), // Numeric literals
// Comments
LineComment(String), // %% comment
BlockComment(String), // /* comment */
// Special
NewLine,
Whitespace, // Ignored by parser, but useful for span tracking in lexer
EOF,
}
/// Represents a token found by the lexer, including its type and source span.
#[derive(Debug, Clone, PartialEq, Eq)]
pub struct Token {
pub token_type: TokenType,
pub span: Span, // Added Span here
}
impl Token {
pub fn new(token_type: TokenType, span: Span) -> Self {
Token { token_type, span }
}
}
Now, update src/lexer/lexer.rs to correctly calculate and assign Span to each Token. This requires careful tracking of the current position (byte offset, line, column) as we consume characters.
Modify src/lexer/lexer.rs:
// src/lexer/lexer.rs
use crate::diagnostics::span::{Span, SourceId};
use crate::diagnostics::{Diagnostic, DiagnosticBuilder, error_codes::{ErrorCode, Severity}}; // Import Diagnostic components
use crate::lexer::token::{Token, TokenType};
use log::debug;
pub struct Lexer<'a> {
source: &'a str,
source_id: SourceId,
chars: std::iter::Peekable<std::str::Chars<'a>>,
current_offset: usize, // Current byte offset
current_line: usize,
current_column: usize,
diagnostics: Vec<Diagnostic>, // Collect diagnostics
}
impl<'a> Lexer<'a> {
pub fn new(source: &'a str, source_id: SourceId) -> Self {
Lexer {
source,
source_id,
chars: source.chars().peekable(),
current_offset: 0,
current_line: 1,
current_column: 1,
diagnostics: Vec::new(),
}
}
/// Lexes the entire source and returns a vector of tokens and any diagnostics.
pub fn lex(mut self) -> (Vec<Token>, Vec<Diagnostic>) {
let mut tokens = Vec::new();
while self.peek().is_some() {
let start_offset = self.current_offset;
let start_line = self.current_line;
let start_column = self.current_column;
if let Some(token_type) = self.next_token() {
let end_offset = self.current_offset;
let end_line = self.current_line;
let end_column = self.current_column;
let span = Span::new(
self.source_id,
start_offset,
end_offset,
start_line,
end_line,
start_column,
end_column,
);
tokens.push(Token::new(token_type, span));
} else {
// If next_token returns None, it means an error occurred and a diagnostic was emitted.
// We try to recover by advancing past the invalid character.
self.advance();
}
}
let eof_span = Span::new(
self.source_id,
self.current_offset,
self.current_offset,
self.current_line,
self.current_line,
self.current_column,
self.current_column,
);
tokens.push(Token::new(TokenType::EOF, eof_span));
(tokens, self.diagnostics)
}
fn peek(&mut self) -> Option<char> {
self.chars.peek().copied()
}
fn advance(&mut self) -> Option<char> {
if let Some(c) = self.chars.next() {
let char_len = c.len_utf8();
self.current_offset += char_len;
if c == '\n' {
self.current_line += 1;
self.current_column = 1;
} else {
self.current_column += 1;
}
Some(c)
} else {
None
}
}
fn advance_if<F>(&mut self, predicate: F) -> Option<char>
where
F: FnOnce(char) -> bool,
{
if let Some(c) = self.peek() {
if predicate(c) {
return self.advance();
}
}
None
}
fn advance_while<F>(&mut self, predicate: F) -> String
where
F: Fn(char) -> bool,
{
let mut s = String::new();
while let Some(c) = self.peek() {
if predicate(c) {
s.push(self.advance().unwrap());
} else {
break;
}
}
s
}
fn next_token(&mut self) -> Option<TokenType> {
self.skip_whitespace();
let start_offset = self.current_offset;
let start_line = self.current_line;
let start_column = self.current_column;
let Some(c) = self.advance() else {
return None; // EOF handled by lex function
};
let token_type = match c {
'(' => TokenType::OpenParen,
')' => TokenType::CloseParen,
'[' => {
if self.advance_if(|c| c == '[').is_some() {
TokenType::DoubleOpenBracket
} else {
TokenType::OpenBracket
}
}
']' => {
if self.advance_if(|c| c == ']').is_some() {
TokenType::DoubleCloseBracket
} else {
TokenType::CloseBracket
}
}
'{' => TokenType::OpenBrace,
'}' => TokenType::CloseBrace,
':' => TokenType::Colon,
';' => TokenType::SemiColon,
',' => TokenType::Comma,
'.' => TokenType::Dot,
'=' => {
if self.advance_if(|c| c == '=').is_some() {
if self.advance_if(|c| c == '>').is_some() {
TokenType::ThickArrowRight
} else if self.advance_if(|c| c == '<').is_some() {
TokenType::ThickDoubleArrow // <==>
} else {
// Error: Malformed '==' or '==>'
// For now, treat as Equals, let parser handle
TokenType::Equals // Treat as single '=', let parser report if invalid
}
} else {
TokenType::Equals
}
}
'+' => TokenType::Plus,
'-' => {
// Handle arrows starting with '-'
if self.peek() == Some('.') {
self.advance(); // consume '.'
if self.advance_if(|c| c == '>').is_some() {
TokenType::DottedArrowRight
} else {
// Error: Malformed '-.'
// Treat as minus, let parser handle
debug!("Malformed dotted arrow: '-.'");
TokenType::Minus
}
} else if self.peek() == Some('-') {
self.advance(); // consume second '-'
if self.advance_if(|c| c == '>').is_some() {
TokenType::ArrowRight
} else if self.advance_if(|c| c == 'x').is_some() {
TokenType::CrossArrowRight
} else if self.advance_if(|c| c == 'o').is_some() {
TokenType::OpenCircleArrowRight
} else {
// Could be '---' for text
let _ = self.advance_while(|ch| ch == '-'); // Consume remaining dashes
// This needs more context for ArrowRightWithText, handle in parser for now
TokenType::ArrowRight // Simplified for now, parser will differentiate
}
} else {
TokenType::Minus
}
}
'*' => TokenType::Star,
'#' => TokenType::Hash,
'%' => {
if self.peek() == Some('%') {
self.advance(); // consume second '%'
let comment_text = self.advance_while(|ch| ch != '\n');
TokenType::LineComment(comment_text.trim_end().to_string())
} else {
TokenType::Percent
}
}
'|' => TokenType::Pipe,
'\\' => TokenType::Backslash,
'<' => {
if self.advance_if(|c| c == '=').is_some() {
if self.advance_if(|c| c == '=').is_some() {
if self.advance_if(|c| c == '>').is_some() {
TokenType::ThickDoubleArrow // <==>
} else {
TokenType::ThickArrowLeft // <==
}
} else {
// Error: Malformed '<='
debug!("Malformed arrow: '<='");
// For now, treat as single '<', let parser handle
TokenType::DoubleArrow // Simplified, parser will refine
}
} else if self.advance_if(|c| c == '-').is_some() {
if self.peek() == Some('.') {
self.advance(); // consume '.'
TokenType::DottedArrowLeft
} else if self.advance_if(|c| c == '-').is_some() {
if self.advance_if(|c| c == '-').is_some() {
// Could be <--- text --->, let parser parse the text
TokenType::DoubleArrowWithText // Simplified
} else {
TokenType::ArrowLeft // <--
}
} else {
// Error: Malformed '<-'
debug!("Malformed arrow: '<-'");
TokenType::ArrowLeft // Simplified
}
} else {
// This is problematic. '<' alone is not a valid token in Mermaid
// For now, treat as invalid character.
self.report_lexer_error(
ErrorCode::M001,
Span::new(
self.source_id,
start_offset,
self.current_offset,
start_line,
self.current_line,
start_column,
self.current_column,
),
format!("Unexpected character '{}'. Mermaid arrows usually start with '-' or '<-'.", c),
);
return None; // Indicate an error, lexer will skip this char
}
}
'"' | '\'' => {
let quote_char = c;
let mut content = String::new();
let mut terminated = false;
while let Some(next_char) = self.peek() {
if next_char == quote_char {
self.advance(); // consume closing quote
terminated = true;
break;
}
if next_char == '\n' {
// String literals cannot span multiple lines without explicit escaping
break;
}
content.push(self.advance().unwrap());
}
if !terminated {
self.report_lexer_error(
ErrorCode::M002,
Span::new(
self.source_id,
start_offset,
self.current_offset,
start_line,
self.current_line,
start_column,
self.current_column,
),
format!("Unterminated string literal. Expected '{}'.", quote_char),
);
return None; // Indicate error
}
TokenType::StringLiteral(content)
}
c if c.is_ascii_digit() => {
let num_str = c.to_string() + &self.advance_while(|ch| ch.is_ascii_digit() || ch == '.');
// Basic check, full number validation could be a parser/validator task
TokenType::Number(num_str)
}
c if c.is_alphabetic() => {
let ident_str = c.to_string() + &self.advance_while(|ch| ch.is_alphanumeric() || ch == '_');
match ident_str.as_str() {
// Keywords
"graph" => TokenType::Graph,
"flowchart" => TokenType::Flowchart,
"sequenceDiagram" => TokenType::SequenceDiagram,
"classDiagram" => TokenType::ClassDiagram,
"stateDiagram" => TokenType::StateDiagram,
"gantt" => TokenType::Gantt,
"pie" => TokenType::Pie,
"gitGraph" => TokenType::GitGraph,
"erDiagram" => TokenType::Ermaid, // Typo: ermaid -> erDiagram
"journey" => TokenType::Journey,
"C4Context" => TokenType::C4c,
"mindmap" => TokenType::Mindmap,
"timeline" => TokenType::Timeline,
"block" => TokenType::Block,
"sankey" => TokenType::SanKey,
"requirementDiagram" => TokenType::RequirementDiagram,
// Directions
"TD" | "TB" => TokenType::TD,
"BT" => TokenType::BT,
"LR" => TokenType::LR,
"RL" => TokenType::RL,
_ => TokenType::Identifier(ident_str),
}
}
_ => {
let span = Span::new(
self.source_id,
start_offset,
self.current_offset,
start_line,
self.current_line,
start_column,
self.current_column,
);
self.report_lexer_error(
ErrorCode::M001,
span,
format!("Unexpected character '{}'", c),
);
return None; // Indicate error, lexer will skip this char
}
};
Some(token_type)
}
fn skip_whitespace(&mut self) {
let _ = self.advance_while(|c| c.is_whitespace() && c != '\n');
}
// Helper to report a lexer-specific diagnostic
fn report_lexer_error(&mut self, code: ErrorCode, span: Span, message: String) {
let diag = Diagnostic::error(code)
.with_message(message)
.with_primary_label(span, "unexpected character")
.build();
self.diagnostics.push(diag);
}
}
Explanation of Lexer Changes:
Lexerstruct: Now includessource_id,current_offset,current_line,current_columnto track the exact position, anddiagnostics: Vec<Diagnostic>to collect errors.lex()method: Returns(Vec<Token>, Vec<Diagnostic>), allowing it to return all tokens it could parse along with any errors encountered. It now iterates, callsnext_token, and ifnext_tokenreturnsNone(indicating an error was reported), it advances to try and recover.- Position Tracking:
advance()method updatescurrent_offset,current_line,current_columncorrectly, handling multi-byte UTF-8 characters and newlines. TokenCreation: EachTokenis now created with aSpanderived from thestart_offset,start_line,start_columnbefore consuming the token and thecurrent_offset,current_line,current_columnafter.- Error Reporting:
- For
ErrorCode::M001(Unexpected character), if an unhandled character is found, aDiagnosticis created usingreport_lexer_errorand pushed toself.diagnostics.next_tokenthen returnsNone. - For
ErrorCode::M002(Unterminated string), similar logic applies.
- For
- Arrow Parsing: The arrow parsing logic is still somewhat simplified for now. Complex arrows with text will be fully handled by the parser using these basic arrow tokens and identifiers/strings. The focus here is on correct tokenization and span tracking.
3.6 Integrating Diagnostics into Parser Errors
The parser needs to also collect diagnostics. When it encounters a syntax error (e.g., unexpected token, missing token), it should create a Diagnostic with an appropriate error code and span.
First, update the src/parser/parser.rs imports:
// src/parser/parser.rs (partial, imports only)
use crate::lexer::token::{Token, TokenType};
use crate::parser::ast::*;
use crate::diagnostics::span::{Span, SourceId};
use crate::diagnostics::{Diagnostic, DiagnosticBuilder, error_codes::{ErrorCode, Severity}}; // New imports
use log::{debug, error, warn};
// ... rest of the file
Now, modify the Parser struct and its methods. The parse method will now return (Option<Diagram>, Vec<Diagnostic>).
// src/parser/parser.rs (partial, modify Parser struct and parse method)
pub struct Parser<'a> {
tokens: &'a [Token],
current: usize,
source_id: SourceId,
diagnostics: Vec<Diagnostic>, // Collect diagnostics
}
impl<'a> Parser<'a> {
pub fn new(tokens: &'a [Token], source_id: SourceId) -> Self {
Parser {
tokens,
current: 0,
source_id,
diagnostics: Vec::new(),
}
}
/// Parses the entire stream of tokens into a Diagram AST,
/// collecting all encountered diagnostics.
pub fn parse(mut self) -> (Option<Diagram>, Vec<Diagnostic>) {
debug!("Starting parsing...");
// Ensure we have tokens to process
if self.peek().token_type == TokenType::EOF && self.tokens.len() == 1 {
let diag = Diagnostic::error(ErrorCode::M102)
.with_message("Empty input: missing diagram type declaration")
.with_primary_label(self.peek().span, "expected 'graph TD', 'sequenceDiagram', etc.")
.build();
self.diagnostics.push(diag);
return (None, self.diagnostics);
}
let diagram = self.parse_diagram_declaration();
let mut ast = match diagram {
Ok(d) => d,
Err(_) => {
// Error already reported by parse_diagram_declaration
// Attempt to synchronize by skipping until a new line or EOF
self.synchronize();
return (None, self.diagnostics);
}
};
match &mut ast {
Diagram::Flowchart(flowchart) => {
while self.peek().token_type != TokenType::EOF {
match self.parse_flowchart_statement() {
Ok(Some(stmt)) => flowchart.statements.push(stmt),
Ok(None) => {
// Recovered, but no statement parsed (e.g., skip whitespace)
self.synchronize(); // Try to skip to next statement
},
Err(_) => {
// Error reported by parse_flowchart_statement, attempt to synchronize
self.synchronize();
}
}
}
}
// ... handle other diagram types similarly if they have statements
_ => {
// For now, other diagram types might not have further statements parsed
// We'll expand this as we implement more diagram types.
debug!("Diagram type {:?} does not yet support further statement parsing.", ast);
}
}
if !self.diagnostics.is_empty() && self.diagnostics.iter().any(|d| d.severity == Severity::Error) {
(None, self.diagnostics) // If any errors, return None AST
} else {
(Some(ast), self.diagnostics) // Otherwise, return AST
}
}
// Helper to report a parser-specific diagnostic
fn report_parser_error(&mut self, code: ErrorCode, span: Span, message: String, primary_label_msg: Option<String>) {
let mut builder = Diagnostic::error(code)
.with_message(message);
if let Some(label_msg) = primary_label_msg {
builder = builder.with_primary_label(span, label_msg);
} else {
builder = builder.with_primary_label(span, self.peek().token_type.to_string());
}
if let Some(help_msg) = code.help() {
builder = builder.with_help(help_msg);
}
self.diagnostics.push(builder.build());
}
// Helper to consume a token of expected type, reporting an error if not found.
fn consume(&mut self, expected: TokenType, error_code: ErrorCode, error_msg: &str) -> Result<Token, Diagnostic> {
let token = self.peek();
if token.token_type == expected {
Ok(self.advance())
} else {
let span = token.span;
let current_token_type = token.token_type.to_string();
let diag = Diagnostic::error(error_code)
.with_message(format!("{}. Expected '{}', but found '{}'.", error_msg, expected.to_string(), current_token_type))
.with_primary_label(span, format!("expected '{}'", expected.to_string()))
.with_note(format!("The parser expected a '{}' here to continue parsing this construct.", expected.to_string()))
.with_help(error_code.help().unwrap_or("Review Mermaid syntax for this section.").to_string())
.build();
self.diagnostics.push(diag.clone());
Err(diag) // Return an Err to indicate failure in this specific parsing step
}
}
// New method for error recovery: skip tokens until a likely synchronization point.
fn synchronize(&mut self) {
debug!("Attempting to synchronize parser after error.");
self.advance(); // Consume the erroneous token
while self.peek().token_type != TokenType::EOF {
match self.previous().token_type {
TokenType::SemiColon | TokenType::NewLine => return, // End of statement
TokenType::OpenBrace | TokenType::OpenParen | TokenType::OpenBracket => return, // Start of a new block
_ => {}
}
// Skip until we find a likely start of a new statement or block
match self.peek().token_type {
TokenType::Graph | TokenType::Flowchart | TokenType::SequenceDiagram | TokenType::ClassDiagram => return,
TokenType::NewLine => {
self.advance(); // Consume newline and return
return;
},
_ => {
self.advance();
}
}
}
}
// --- Modify existing parsing methods to use diagnostics ---
fn parse_diagram_declaration(&mut self) -> Result<Diagram, Diagnostic> {
let start_token = self.peek();
let diagram_type = match start_token.token_type {
TokenType::Graph | TokenType::Flowchart => {
self.advance(); // consume 'graph' or 'flowchart'
let direction_token = self.peek();
let direction = match direction_token.token_type {
TokenType::TD | TokenType::TB => { self.advance(); FlowchartDirection::TD },
TokenType::BT => { self.advance(); FlowchartDirection::BT },
TokenType::LR => { self.advance(); FlowchartDirection::LR },
TokenType::RL => { self.advance(); FlowchartDirection::RL },
_ => {
let diag = Diagnostic::error(ErrorCode::M102)
.with_message(format!("Missing or invalid direction for '{}' diagram.", start_token.token_type.to_string()))
.with_primary_label(direction_token.span, "expected 'TD', 'BT', 'LR', or 'RL'")
.with_help(ErrorCode::M102.help().unwrap().to_string())
.build();
self.diagnostics.push(diag.clone());
// Attempt to recover by assuming TD and continuing
FlowchartDirection::TD
}
};
return Ok(Diagram::Flowchart(FlowchartDiagram { direction, statements: Vec::new() }));
}
TokenType::SequenceDiagram => {
self.advance();
return Ok(Diagram::Sequence(SequenceDiagram { statements: Vec::new() }));
}
TokenType::ClassDiagram => {
self.advance();
return Ok(Diagram::Class(ClassDiagram { statements: Vec::new() }));
}
_ => {
let diag = Diagnostic::error(ErrorCode::M102)
.with_message(format!("Missing or invalid diagram type declaration. Found '{}'.", start_token.token_type.to_string()))
.with_primary_label(start_token.span, "expected 'graph', 'flowchart', 'sequenceDiagram', etc.")
.with_help(ErrorCode::M102.help().unwrap().to_string())
.build();
self.diagnostics.push(diag.clone());
return Err(diag);
}
};
}
// Example of a statement parser that might return multiple diagnostics or an error
fn parse_flowchart_statement(&mut self) -> Result<Option<FlowchartStatement>, Diagnostic> {
self.skip_newlines_and_whitespace(); // Helper to advance past newlines/whitespace
if self.peek().token_type == TokenType::EOF {
return Ok(None);
}
let start_token = self.peek();
if start_token.token_type == TokenType::Identifier || start_token.token_type == TokenType::StringLiteral {
let node_id_token = self.advance();
let node_id = match &node_id_token.token_type {
TokenType::Identifier(s) => s.clone(),
TokenType::StringLiteral(s) => s.clone(),
_ => unreachable!(), // Handled by if condition
};
// Node definition or edge
if self.peek().token_type == TokenType::OpenBracket {
// Node definition: A[Label]
self.advance(); // consume '['
let label_token = self.peek();
let label = match &label_token.token_type {
TokenType::Identifier(s) => { self.advance(); s.clone() },
TokenType::StringLiteral(s) => { self.advance(); s.clone() },
_ => {
let diag = Diagnostic::error(ErrorCode::M103)
.with_message("Expected a node label inside brackets.")
.with_primary_label(label_token.span, "expected label (identifier or string)")
.with_help(ErrorCode::M103.help().unwrap().to_string())
.build();
self.diagnostics.push(diag);
// Attempt to recover by using a dummy label
"MISSING_LABEL".to_string()
}
};
match self.consume(TokenType::CloseBracket, ErrorCode::M101, "Expected ']' to close node label.") {
Ok(_) => { /* OK */ },
Err(_) => { /* Error already reported */ }
}
return Ok(Some(FlowchartStatement::Node(Node { id: node_id, label: Some(label) })));
} else if self.is_arrow_token(&self.peek().token_type) {
// Edge definition: A --> B
let arrow_token = self.advance(); // consume arrow
let end_node_token = self.peek();
let end_node_id = match &end_node_token.token_type {
TokenType::Identifier(s) => { self.advance(); s.clone() },
TokenType::StringLiteral(s) => { self.advance(); s.clone() },
_ => {
let diag = Diagnostic::error(ErrorCode::M104)
.with_message("Expected an end node identifier after arrow.")
.with_primary_label(end_node_token.span, "expected node ID")
.with_help(ErrorCode::M104.help().unwrap().to_string())
.build();
self.diagnostics.push(diag);
// Attempt to recover by using a dummy node
"UNDEFINED_NODE".to_string()
}
};
return Ok(Some(FlowchartStatement::Edge(Edge {
from: node_id,
to: end_node_id,
arrow_type: self.map_token_to_arrow_type(&arrow_token.token_type),
label: None, // For now, no labels on edges
})));
} else {
// Just a node definition without a label or an edge
return Ok(Some(FlowchartStatement::Node(Node { id: node_id, label: None })));
}
}
// If we reach here, it's an unexpected token at the start of a statement
let diag = Diagnostic::error(ErrorCode::M100)
.with_message(format!("Unexpected token '{}' at the start of a statement.", start_token.token_type.to_string()))
.with_primary_label(start_token.span, "unexpected token")
.with_help("Expected a node ID, an edge, or a new diagram declaration. This might indicate a syntax error or a missing semicolon.")
.build();
self.diagnostics.push(diag);
Err(diag) // Indicate error, parser will synchronize
}
fn skip_newlines_and_whitespace(&mut self) {
while self.peek().token_type == TokenType::NewLine || self.peek().token_type == TokenType::Whitespace {
self.advance();
}
}
// ... other helper methods like is_arrow_token, map_token_to_arrow_type
fn is_arrow_token(&self, token_type: &TokenType) -> bool {
matches!(
token_type,
TokenType::ArrowRight
| TokenType::ArrowLeft
| TokenType::DoubleArrow
| TokenType::ThickArrowRight
| TokenType::ThickArrowLeft
| TokenType::ThickDoubleArrow
| TokenType::DottedArrowRight
| TokenType::DottedArrowLeft
| TokenType::DottedDoubleArrow
| TokenType::CrossArrowRight
| TokenType::OpenCircleArrowRight
| TokenType::ArrowRightWithText
| TokenType::ArrowLeftWithText
| TokenType::DoubleArrowWithText
| TokenType::ThickArrowRightWithText
| TokenType::DottedArrowRightWithText
)
}
fn map_token_to_arrow_type(&self, token_type: &TokenType) -> ArrowType {
match token_type {
TokenType::ArrowRight => ArrowType::Solid,
TokenType::ArrowLeft => ArrowType::SolidLeft,
TokenType::DoubleArrow => ArrowType::SolidDouble,
TokenType::ThickArrowRight => ArrowType::Thick,
TokenType::ThickArrowLeft => ArrowType::ThickLeft,
TokenType::ThickDoubleArrow => ArrowType::ThickDouble,
TokenType::DottedArrowRight => ArrowType::Dotted,
TokenType::DottedArrowLeft => ArrowType::DottedLeft,
TokenType::DottedDoubleArrow => ArrowType::DottedDouble,
TokenType::CrossArrowRight => ArrowType::Cross,
TokenType::OpenCircleArrowRight => ArrowType::OpenCircle,
// For now, these are simplified, actual text parsing will happen here later
TokenType::ArrowRightWithText => ArrowType::Solid,
TokenType::ArrowLeftWithText => ArrowType::SolidLeft,
TokenType::DoubleArrowWithText => ArrowType::SolidDouble,
TokenType::ThickArrowRightWithText => ArrowType::Thick,
TokenType::DottedArrowRightWithText => ArrowType::Dotted,
_ => {
warn!("Unknown arrow token type encountered: {:?}", token_type);
ArrowType::Solid // Default to solid arrow for now
}
}
}
fn peek(&self) -> &Token {
self.tokens.get(self.current).unwrap_or_else(|| {
// Should not happen if EOF is always the last token
&self.tokens[self.tokens.len() - 1]
})
}
fn previous(&self) -> &Token {
self.tokens.get(self.current - 1).unwrap_or_else(|| {
// Should not happen if current is > 0
&self.tokens[0]
})
}
fn advance(&mut self) -> Token {
if self.current < self.tokens.len() {
self.current += 1;
}
self.previous().clone()
}
}
Explanation of Parser Changes:
Parserstruct: Now includesdiagnostics: Vec<Diagnostic>.parse()method:- Returns
(Option<Diagram>, Vec<Diagnostic>). If any error diagnostics are collected, it returnsNonefor theDiagramto signify an invalid AST. - It now calls
parse_diagram_declarationandparse_flowchart_statementwhich can returnResult<_, Diagnostic>. - If an
Err(Diagnostic)is returned, it means a local parsing failure occurred, and the diagnostic is already pushed toself.diagnostics. The parser then callssynchronize().
- Returns
report_parser_error(): A helper similar to the lexer’s, for creating and storing parser-specific diagnostics.consume(): This critical helper now returnsResult<Token, Diagnostic>. If the expected token isn’t found, it creates and stores anM100(Unexpected token) diagnostic and returnsErr. This allows calling methods to react to the error (e.g., attempt recovery).synchronize(): This is a basic error recovery mechanism. After an error, it attempts to advance the parser’scurrentpointer past the problematic token(s) until it finds a “safe” point (like a semicolon, newline, or a keyword starting a new diagram/block). This prevents a single error from cascading into many irrelevant errors and allows the parser to find more distinct issues.parse_diagram_declaration()andparse_flowchart_statement(): These methods now useconsume()andreport_parser_error()to generate diagnostics. They also try to return partial results or default values (e.g.,FlowchartDirection::TDon error) to allow parsing to continue, even if the AST is technically malformed.
3.7 The DiagnosticEmitter (Reporter)
This component is responsible for taking our collected Diagnostics and rendering them beautifully using ariadne.
Create the file src/diagnostics/emitter.rs:
// src/diagnostics/emitter.rs
use std::collections::HashMap;
use ariadne::{Color, Fmt, Label, Report, ReportKind, Source};
use crate::diagnostics::{Diagnostic, DiagnosticLabel, error_codes::Severity, span::SourceId};
/// Manages and emits diagnostics in a compiler-like format.
pub struct DiagnosticEmitter<'a> {
sources: HashMap<SourceId, Source<&'a str>>, // Stores source code for highlighting
}
impl<'a> DiagnosticEmitter<'a> {
pub fn new() -> Self {
DiagnosticEmitter {
sources: HashMap::new(),
}
}
/// Adds a source file to the emitter. The source content is needed for highlighting.
pub fn add_source(&mut self, source_id: SourceId, content: &'a str) {
self.sources.insert(source_id, Source::from(content));
}
/// Emits a collection of diagnostics to stderr.
/// Returns true if any errors were emitted, false otherwise.
pub fn emit_diagnostics(&self, diagnostics: &[Diagnostic]) -> bool {
let mut has_errors = false;
for diag in diagnostics {
let report_kind = match diag.severity {
Severity::Error => {
has_errors = true;
ReportKind::Error
}
Severity::Warning => ReportKind::Warning,
Severity::Note => ReportKind::Advice, // Ariadne uses Advice for notes
Severity::Help => ReportKind::Advice, // Ariadne uses Advice for help
};
let primary_label_span = diag.labels.iter()
.filter(|l| l.is_primary)
.map(|l| l.span)
.next();
// Ariadne requires at least one primary label span to anchor the report
let report_builder = if let Some(span) = primary_label_span {
Report::build(report_kind, span.source_id, span.start)
} else {
// Fallback for diagnostics without a primary label (e.g., global issues)
// This might not highlight code, but will still print the message.
// We pick the first label span if available, or a dummy span.
let fallback_span = diag.labels.first().map(|l| l.span).unwrap_or_else(crate::diagnostics::span::Span::dummy);
Report::build(report_kind, fallback_span.source_id, fallback_span.start)
};
let mut report = report_builder
.with_code(format!("{:?}", diag.code))
.with_message(&diag.message);
// Add labels
for label_data in &diag.labels {
let color = match diag.severity {
Severity::Error => Color::Red,
Severity::Warning => Color::Yellow,
Severity::Note => Color::Blue,
Severity::Help => Color::Green,
};
let label = Label::new(label_data.span)
.with_message(&label_data.message)
.with_color(color);
report = report.with_label(label);
}
// Add notes
for note in &diag.notes {
report = report.with_note(note);
}
// Add help message
if let Some(help_msg) = &diag.help {
report = report.with_help(help_msg);
}
// Emit the report
if let Some(source) = self.sources.get(&primary_label_span.unwrap_or_else(crate::diagnostics::span::Span::dummy).source_id) {
report.finish().print((&primary_label_span.unwrap_or_else(crate::diagnostics::span::Span::dummy).source_id, source))
.expect("Failed to print diagnostic report");
} else {
// If source is not found, print a simpler message
eprintln!("{}: {}: {}",
match diag.severity {
Severity::Error => "error".fg(Color::Red),
Severity::Warning => "warning".fg(Color::Yellow),
Severity::Note => "note".fg(Color::Blue),
Severity::Help => "help".fg(Color::Green),
},
format!("{:?}", diag.code).fg(Color::Magenta),
diag.message
);
for note in &diag.notes {
eprintln!(" = note: {}", note);
}
if let Some(help_msg) = &diag.help {
eprintln!(" = help: {}", help_msg);
}
}
}
has_errors
}
}
Explanation:
DiagnosticEmitterstruct: Holds aHashMapofSourceIdtoariadne::Source<&'a str>. This allowsariadneto retrieve the actual source code content for highlighting.add_source(): Call this once for each input file to register its content with the emitter.emit_diagnostics():- Iterates through each
Diagnostic. - Maps our
Severitytoariadne::ReportKind. - Constructs an
ariadne::ReportusingReport::build(). It’s crucial to provide a primary label’s span as the anchor for the report. - Adds all
DiagnosticLabels to theariadne::ReportusingLabel::new(), assigning colors based on severity. - Adds
notesandhelpmessages. - Finally,
report.finish().print()renders the diagnostic tostderr, using the registered source content for highlighting. - Includes a fallback print if the source content for a given
SourceIdisn’t found, to ensure some output is always produced.
- Iterates through each
3.8 Updating main.rs to Use Diagnostics
Now, let’s update src/main.rs to wire everything together. We’ll read input, lex it, parse it, collect diagnostics, and then emit them.
// src/main.rs
mod lexer;
mod parser;
mod diagnostics;
use lexer::lexer::Lexer;
use parser::parser::Parser;
use diagnostics::emitter::DiagnosticEmitter;
use diagnostics::span::SourceId;
use log::{error, info, debug};
use env_logger::Env;
use std::fs;
fn main() {
// Initialize logger for debug messages
env_logger::Builder::from_env(Env::default().default_filter_or("info")).init();
let args: Vec<String> = std::env::args().collect();
let file_path = args.get(1).expect("Please provide a Mermaid file path as an argument.");
let source_code = fs::read_to_string(file_path)
.unwrap_or_else(|err| {
error!("Failed to read file {}: {}", file_path, err);
std::process::exit(1);
});
let source_id = SourceId(file_path.leak()); // Leak to get &'static str. For a real app, manage lifetimes or use String.
info!("Analyzing Mermaid file: {}", file_path);
// 1. Lexing
debug!("Starting lexing...");
let lexer = Lexer::new(&source_code, source_id);
let (tokens, lexer_diagnostics) = lexer.lex();
debug!("Lexing completed. Found {} tokens.", tokens.len());
// debug!("Tokens: {:?}", tokens);
// 2. Parsing
debug!("Starting parsing...");
let parser = Parser::new(&tokens, source_id);
let (ast, parser_diagnostics) = parser.parse();
debug!("Parsing completed.");
// debug!("AST: {:?}", ast);
// 3. Emit Diagnostics
let mut emitter = DiagnosticEmitter::new();
emitter.add_source(source_id, &source_code);
let mut all_diagnostics = Vec::new();
all_diagnostics.extend(lexer_diagnostics);
all_diagnostics.extend(parser_diagnostics);
if !all_diagnostics.is_empty() {
info!("Emitting diagnostics...");
let has_errors = emitter.emit_diagnostics(&all_diagnostics);
if has_errors {
error!("Analysis completed with errors.");
std::process::exit(1);
} else {
info!("Analysis completed with warnings.");
}
} else {
info!("No diagnostics found. Mermaid code is syntactically valid.");
}
match ast {
Some(diagram) => {
info!("Successfully parsed AST: {:#?}", diagram);
// In future chapters, we'll pass this AST to the validator and rule engine.
}
None => {
error!("Failed to produce a valid AST due to fatal errors.");
std::process::exit(1);
}
}
}
Explanation:
- Logging:
env_loggeris initialized for better terminal output ofinfo!,debug!,error!macros. - File Reading: Reads the Mermaid file provided as a CLI argument.
SourceId: We’re leaking thefile_pathstring to get a&'static strforSourceId. In a more complex application, you’d manage source lifetime more explicitly (e.g., usingArc<String>or passingStringownership). For a simple CLI tool,leak()is often acceptable if the source content is needed for the entire program lifetime.- Lexer and Parser Calls: The
mainfunction now callslexer.lex()andparser.parse(), collecting their respective diagnostics. DiagnosticEmitter: An instance is created,add_sourceis called to register the input file’s content, and thenemit_diagnosticsis called with all collected diagnostics.- Exit Code: The program exits with a non-zero code if any errors were reported, following standard CLI tool conventions.
Testing This Component
Let’s create a test Mermaid file with some intentional errors to see our diagnostics in action.
Create a file named test.mmd in your project root:
Now, run your tool with this file:
cargo run test.mmd
Expected Output (will vary slightly based on terminal colors and exact ariadne version, but should be similar in structure):
info: mermaid_analyzer: Analyzing Mermaid file: test.mmd
debug: mermaid_analyzer: Starting lexing...
debug: mermaid_analyzer: Lexing completed. Found 32 tokens.
debug: mermaid_analyzer: Starting parsing...
debug: mermaid_analyzer: Parsing completed.
info: mermaid_analyzer: Emitting diagnostics...
error[M002]: Unterminated string literal
┌─ test.mmd:8:19
│
8 │ J[Unterminated String "
│ ---------
│ │
│ unterminated string literal
= help: String literals must be closed with a matching quote. Add a '"' or apostrophe to terminate the string.
error[M101]: Expected ']' to close node label. Expected 'CloseBracket', but found 'NewLine'.
┌─ test.mmd:3:19
│
3 │ B --x C[End
│ ^ expected ']'
= note: The parser expected a 'CloseBracket' here to continue parsing this construct.
= help: Ensure all parentheses, brackets, and braces are correctly matched and balanced. Each opening symbol must have a corresponding closing symbol.
error[M101]: Expected ']' to close node label. Expected 'CloseBracket', but found 'NewLine'.
┌─ test.mmd:6:21
│
6 │ F[Node F] -.- G[Node G
│ ^ expected ']'
= note: The parser expected a 'CloseBracket' here to continue parsing this construct.
= help: Ensure all parentheses, brackets, and braces are correctly matched and balanced. Each opening symbol must have a corresponding closing symbol.
error[M100]: Unexpected token 'Identifier("invalid")' at the start of a statement.
┌─ test.mmd:9:5
│
9 │ invalid keyword
│ ^^^^^^^ unexpected token
= note: Expected a node ID, an edge, or a new diagram declaration. This might indicate a syntax error or a missing semicolon.
= help: Expected token, found something else. This usually indicates a syntax error. Check the Mermaid syntax documentation for this construct.
error[M104]: Expected an end node identifier after arrow.
┌─ test.mmd:5:20
│
5 │ E[Another Node] <-- Malformed Arrow
│ ^^^^^^^^^^^^^^^ expected node ID
= note: The parser expected a 'Identifier' here to continue parsing this construct.
= help: An edge definition must specify two nodes and a valid arrow type (e.g., 'A --> B'). Check for missing nodes, invalid arrow syntax, or extra characters.
error: Analysis completed with errors.
This output demonstrates:
- Clear
error[MXXX]codes. - Descriptive messages.
- Precise line and column numbers.
- Code highlighting with
ariadne. - Contextual notes and help messages.
This is a significant improvement over simple error strings!
Production Considerations
Performance:
- String Allocations: Minimize
Stringallocations for diagnostic messages where possible (e.g., using&'static strforErrorCodemessages).ariadneitself is optimized for performance. - Source Management: For very large files or multiple files, loading all source into
HashMap<SourceId, Source<&'a str>>might consume memory. Consider aSourceManagerthat can load sources lazily or stream them ifariadnesupports it for extremely large files (though usually not an issue for typical Mermaid diagrams). - Diagnostic Collection: The
Vec<Diagnostic>approach is efficient for most cases. For an extreme number of diagnostics (e.g., parsing a huge, completely malformed file), consider a bounded collection or early exit if a certain error threshold is met.
- String Allocations: Minimize
Logging and Monitoring:
- Integration: Diagnostics are a specific type of log. Ensure they can be easily integrated with broader application logging systems (e.g.,
tracing). - Structured Output: For CI/CD pipelines or IDE integrations, a machine-readable JSON output format for diagnostics would be invaluable. The
Diagnosticstruct is already structured; only theDiagnosticEmitterneeds to be adapted to serialize to JSON instead of printing withariadne.
- Integration: Diagnostics are a specific type of log. Ensure they can be easily integrated with broader application logging systems (e.g.,
Internationalization (i18n):
- For a truly global tool, diagnostic messages and help texts would need to be localized. This means abstracting messages from hardcoded strings, perhaps using a message catalog system. The
ErrorCodeenum provides a good key for this.
- For a truly global tool, diagnostic messages and help texts would need to be localized. This means abstracting messages from hardcoded strings, perhaps using a message catalog system. The
Error Recovery Strategy:
- The current
synchronize()method is a basic panic-mode recovery. For production, more sophisticated error recovery (e.g., using error productions in the grammar or more context-aware skipping) might be necessary to find even more errors in highly malformed input. However, “strict correctness” often implies less recovery, failing early and clearly. Our current approach balances reporting multiple errors with not getting stuck.
- The current
Code Review Checkpoint
At this stage, we have successfully implemented a robust diagnostic system:
src/diagnostics/span.rs: DefinesSpanandSourceIdfor precise source location tracking, and implementsariadne::Span.src/diagnostics/error_codes.rs: DefinesSeverityandErrorCodewith descriptive messages and helpful suggestions.src/diagnostics/mod.rs: Contains the coreDiagnosticstruct and itsDiagnosticBuilderfor ergonomic creation.src/diagnostics/emitter.rs: ImplementsDiagnosticEmitterusingariadneto render rich, compiler-style error messages to the console.src/lexer/token.rs:Tokenstruct now includes aSpan.src/lexer/lexer.rs: Modified to calculate and assignSpans to tokens, and to collectDiagnostics for lexical errors (M001,M002). It returns(Vec<Token>, Vec<Diagnostic>).src/parser/parser.rs: Modified to consume tokens withSpans, generateDiagnostics for syntax errors (M100,M101,M102,M103,M104), and employ a basicsynchronize()error recovery strategy. It returns(Option<Diagram>, Vec<Diagnostic>).src/main.rs: Orchestrates the lexing, parsing, and diagnostic emission process, exiting with an error code if critical diagnostics are found.
The project now produces much more user-friendly and actionable feedback, a critical step towards a production-ready tool.
Common Issues & Solutions
Misaligned Highlights in
ariadneOutput:- Issue: The highlighted code snippet doesn’t match the reported
Spanor points to the wrong character. - Cause: Incorrect
startandendbyte offsets in theSpanstruct, orcurrent_offsetnot being updated correctly in the lexer. This is especially tricky with multi-byte UTF-8 characters. - Solution: Double-check the
advance()method inlexer.rsto ensurecurrent_offsetis incremented byc.len_utf8()(not just 1). Carefully trace thestart_offsetandend_offsetcalculations for each token. Usedebug!logs to printcurrent_offset,current_line,current_columnat various points in the lexer for verification. - Prevention: Always use
char.len_utf8()for byte offset calculations when dealing withstr::Chars.
- Issue: The highlighted code snippet doesn’t match the reported
ariadnePanic: “No source found for SourceId…”:- Issue:
ariadnepanics because it can’t find the source content for a givenSourceIdwhen trying to print a report. - Cause: The
DiagnosticEmitter::add_source()method was not called for theSourceIdassociated with the diagnostic, or theSourceIdbeing used for diagnostics is different from the one registered. - Solution: Ensure
emitter.add_source(source_id, &source_code);is called beforeemitter.emit_diagnostics(). Verify that theSourceIdpassed to the lexer and parser is the exact sameSourceIdused when registering the source with the emitter. For&'static strIDs, this means they must literally point to the same string data or be identical values. - Prevention: Centralize
SourceIdcreation and management. For single-file CLI tools, ensurefile_path.leak()(or similar) is consistent.
- Issue:
Parser Stops at First Error, Doesn’t Report More:
- Issue: Only one error message is printed, even if there are multiple obvious syntax errors in the input.
- Cause: The parser’s error handling immediately returns
Errwithout attempting to recover or collect subsequent errors. Theparsemethod might not be iterating correctly after an error. - Solution: Ensure methods like
parse_diagram_declarationorparse_flowchart_statementadd diagnostics toself.diagnosticsand then returnErr(diag)to indicate local failure, but the caller (e.g., the mainparseloop) should then callself.synchronize()and continue its loop. Theparsemethod should return(Option<AST>, Vec<Diagnostic>)to signify that parsing might have continued even if the AST is incomplete/invalid. - Prevention: Design error handling with explicit recovery points and ensure diagnostic collection is prioritized over immediate termination.
Testing & Verification
To thoroughly test and verify our new diagnostic system, we should create a suite of test files covering various error scenarios:
- Lexical Errors:
lexer_error_unexpected_char.mmd: Contains characters not allowed in Mermaid (e.g.,graph TD !@#$).lexer_error_unterminated_string.mmd:graph TD A["Unclosed string]
- Parser Syntax Errors:
parser_error_missing_declaration.mmd: Starts directly withA --> Bwithoutgraph TD.parser_error_unmatched_bracket.mmd:graph TD A[Label --> B(missing]).parser_error_malformed_edge.mmd:graph TD A -- B(missing arrow head).parser_error_unexpected_token.mmd:graph TD A[Label] B C(unexpectedBafter a node definition).parser_error_invalid_direction.mmd:graph ZZ A --> B(invalid direction).
- Mixed Errors:
- A file combining several types of errors to test error recovery and multiple diagnostic reporting.
Verification Steps:
- Run with each test file:
cargo run <test_file.mmd> - Check Output:
- Error Codes: Are the
MXXXcodes correct for the type of error? - Messages: Are the main messages clear and accurate?
- Spans/Highlights: Does
ariadnehighlight the correct section of the code? - Notes/Help: Are the supplementary notes and help messages relevant and actionable?
- Multiple Errors: For files with multiple errors, does the tool report all of them (or as many as it can find before recovery becomes impossible)?
- Exit Code: Does the program exit with
1if any errors were reported, and0if only warnings or no diagnostics?
- Error Codes: Are the
By systematically testing these cases, you can ensure your diagnostic system is robust and provides the high-quality feedback expected from a production-grade tool.
Summary & Next Steps
In this chapter, we significantly enhanced our Mermaid analyzer by implementing a sophisticated diagnostic system. We defined granular Spans for precise location tracking, established a set of unique ErrorCodes with Severity levels, and created a Diagnostic structure to encapsulate all error information. Crucially, we integrated this system into our Lexer and Parser, allowing them to collect and report detailed errors instead of merely failing. Finally, we built a DiagnosticEmitter leveraging the ariadne crate to present these diagnostics in a visually rich, compiler-style format, greatly improving the user experience.
Our tool can now not only detect issues but also explain them clearly and guide the user toward a solution. This foundation is critical for the next phase of our project.
In Chapter 7: Semantic Validation: Ensuring Correct Mermaid Structure, we will build upon this diagnostic system to implement a dedicated Validator component. This validator will traverse the Abstract Syntax Tree (AST) produced by the parser to detect semantic errors that the lexer and parser cannot catch, such as duplicate node IDs, undefined nodes in edges, and invalid structural nesting, further enhancing the correctness and reliability of our Mermaid analyzer.