Welcome to Chapter 4 of our journey to build a production-grade Mermaid analyzer and fixer! In the previous chapter, we laid the groundwork by creating a robust lexer that converts raw Mermaid text into a stream of meaningful tokens. Now, it’s time to elevate our understanding of the Mermaid structure by transforming these tokens into a rich, strongly typed Abstract Syntax Tree (AST).

The AST is the heart of any compiler or language tool. It’s an intermediate representation of the source code that captures its hierarchical structure and all its essential components, independent of the original text’s formatting. For our Mermaid tool, the AST will serve as the single source of truth for validation, formatting, and any future transformations. By building a precise and comprehensive AST, we ensure that our tool can accurately understand, analyze, and manipulate Mermaid diagrams, adhering strictly to their official specifications.

By the end of this chapter, you will have designed and implemented the core Rust data structures that represent a Mermaid diagram. We will focus primarily on the Flowchart diagram type, which covers many fundamental concepts applicable to other diagram types. This chapter will not involve parsing logic yet; instead, it’s about defining the shape of the data our parser will eventually produce. This incremental approach ensures that each component is well-defined and testable before integration.

Planning & Design: Structuring Our Mermaid AST

The Abstract Syntax Tree (AST) is a tree representation of the abstract syntactic structure of source code, where each node in the tree denotes a construct occurring in the source code. For our Mermaid analyzer, this means defining Rust enums and structs that mirror the grammar of Mermaid diagrams.

Core Principles for AST Design:

  1. Strictness: Every AST node must directly correspond to a valid Mermaid construct. No ambiguities, no “maybe” types.
  2. Strong Typing: Leverage Rust’s type system to enforce correctness at compile time.
  3. Completeness: The AST should be able to represent all valid Mermaid constructs for the supported diagram types.
  4. Extensibility: Design with future additions in mind (new diagram types, new features within existing types).
  5. Location Tracking (Span): Each significant AST node should carry Span information (start and end byte offsets in the source string) for precise error reporting and highlighting.

Component Architecture:

We’ll start by defining common types used across the AST, such as Span and DiagramType. Then, we’ll dive into the specific structures for Flowcharts.

flowchart TD A[Mermaid AST Root] --> B{Diagram Type} B --> C[Flowchart Diagram] B --> D[Sequence Diagram (Future)] B --> E[Class Diagram (Future)] C --> F[Flowchart Body: List of Statements] F --> G{Flowchart Statement} G --> H[Node Statement] G --> I[Edge Statement] G --> J[Subgraph Statement] G --> K[Direction Statement] G --> L[Click/Style Statements (Future)] H --> H1[Node ID] H --> H2[Node Kind] H --> H3[Node Label (Optional)] H --> H4[Span] I --> I1[Source Node Reference] I --> I2[Edge Kind] I --> I3[Edge Label (Optional)] I --> I4[Target Node Reference] I --> I5[Span] J --> J1[Subgraph ID] J --> J2[Subgraph Label (Optional)] J --> J3[Subgraph Body: List of Statements] J --> J4[Span] K --> K1[Direction Enum] K --> K2[Span] I1 & I4 --> M[NodeRef Enum] M --> M1[Node ID] M --> M2[Node Label String] M --> M3[Raw Identifier String]

File Structure:

We will organize our AST definitions within a dedicated src/ast module:

src/
├── main.rs
├── lexer.rs (from Chapter 3)
├── token.rs (from Chapter 3)
└── ast/
    ├── mod.rs
    ├── common.rs  // General types like Span, Location, DiagramType
    └── flowchart.rs // Flowchart-specific AST nodes

Step-by-Step Implementation

a) Setup/Configuration

First, let’s create the necessary files and define our initial common types.

Create the src/ast directory and the mod.rs and common.rs files within it.

mkdir -p src/ast
touch src/ast/mod.rs src/ast/common.rs src/ast/flowchart.rs

Now, let’s define the fundamental building blocks: Span and Location, which are crucial for associating AST nodes back to their original source code positions for diagnostics.

File: src/ast/common.rs

//! Common AST types used across different Mermaid diagram structures.

use std::fmt;

/// Represents a span of text in the source code, defined by start and end byte offsets.
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
pub struct Span {
    /// The starting byte offset of the span (inclusive).
    pub start: usize,
    /// The ending byte offset of the span (exclusive).
    pub end: usize,
}

impl Span {
    /// Creates a new `Span`.
    pub fn new(start: usize, end: usize) -> Self {
        Self { start, end }
    }

    /// Returns true if the span is empty (start == end).
    pub fn is_empty(&self) -> bool {
        self.start == self.end
    }

    /// Returns the length of the span.
    pub fn len(&self) -> usize {
        self.end.saturating_sub(self.start)
    }

    /// Merges two spans, returning a new span that covers both.
    /// Panics if the spans are not logically ordered (e.g., `self` starts after `other` ends).
    pub fn merge(&self, other: &Span) -> Self {
        Span::new(self.start.min(other.start), self.end.max(other.end))
    }
}

/// Represents a specific point in the source code, including line and column numbers.
/// While `Span` is for byte offsets, `Location` is for human-readable positions.
/// This will be computed from `Span` during diagnostic generation.
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
pub struct Location {
    pub line: usize,
    pub column: usize,
}

impl Location {
    pub fn new(line: usize, column: usize) -> Self {
        Self { line, column }
    }
}

/// Represents the overall type of a Mermaid diagram.
#[derive(Debug, Clone, PartialEq, Eq, Hash)]
pub enum DiagramType {
    Flowchart,
    Sequence,
    Class,
    State,
    Gantt,
    Pie,
    Git,
    C4c,
    Mindmap,
    Timeline,
    Block,
    Quadrant,
    Requirement,
    Erm,
    Zen,
    // Add other diagram types as they are officially supported and implemented
    Unknown(String), // For future extensibility or unsupported types
}

impl fmt::Display for DiagramType {
    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
        match self {
            DiagramType::Flowchart => write!(f, "flowchart"),
            DiagramType::Sequence => write!(f, "sequenceDiagram"),
            DiagramType::Class => write!(f, "classDiagram"),
            DiagramType::State => write!(f, "stateDiagram-v2"),
            DiagramType::Gantt => write!(f, "gantt"),
            DiagramType::Pie => write!(f, "pie"),
            DiagramType::Git => write!(f, "gitGraph"),
            DiagramType::C4c => write!(f, "C4Context"),
            DiagramType::Mindmap => write!(f, "mindmap"),
            DiagramType::Timeline => write!(f, "timeline"),
            DiagramType::Block => write!(f, "block-beta"),
            DiagramType::Quadrant => write!(f, "quadrantChart"),
            DiagramType::Requirement => write!(f, "requirementDiagram"),
            DiagramType::Erm => write!(f, "erDiagram"),
            DiagramType::Zen => write!(f, "zen-of-mermaid"),
            DiagramType::Unknown(s) => write!(f, "unknown diagram type: {}", s),
        }
    }
}

/// The top-level structure representing a parsed Mermaid diagram.
#[derive(Debug, Clone, PartialEq, Eq, Hash)]
pub struct Diagram {
    /// The type of the diagram (e.g., Flowchart, Sequence).
    pub diagram_type: DiagramType,
    /// The specific body of the diagram, which varies by type.
    pub body: DiagramBody,
    /// The span covering the entire diagram declaration (e.g., `graph TD`).
    pub span: Span,
}

/// An enum representing the specific body content for each diagram type.
/// This allows us to store different AST structures based on `DiagramType`.
#[derive(Debug, Clone, PartialEq, Eq, Hash)]
pub enum DiagramBody {
    Flowchart(crate::ast::flowchart::FlowchartDiagram),
    Sequence, // Placeholder for future sequence diagram AST
    Class,    // Placeholder for future class diagram AST
    // Add other diagram bodies here
    Empty, // Represents a diagram type declared but with no content (e.g., "graph TD\n")
}

impl DiagramBody {
    /// Returns a reference to the FlowchartDiagram if this body is a Flowchart.
    pub fn as_flowchart(&self) -> Option<&crate::ast::flowchart::FlowchartDiagram> {
        match self {
            DiagramBody::Flowchart(ref d) => Some(d),
            _ => None,
        }
    }

    /// Returns a mutable reference to the FlowchartDiagram if this body is a Flowchart.
    pub fn as_flowchart_mut(&mut self) -> Option<&mut crate::ast::flowchart::FlowchartDiagram> {
        match self {
            DiagramBody::Flowchart(ref mut d) => Some(d),
            _ => None,
        }
    }
}

Explanation:

  • Span: A simple struct to store start and end byte offsets. This is critical for error reporting, allowing us to pinpoint exact locations in the source code. The merge method is a utility for combining spans of multiple tokens/nodes.
  • Location: While Span is for machine-readable byte offsets, Location will store human-readable line and column numbers. This will be derived from Span and the source text when generating diagnostics.
  • DiagramType: An enum to distinguish between different Mermaid diagram kinds (flowchart, sequence, etc.). This is the first level of structural differentiation.
  • Diagram: The root of our AST. It holds the DiagramType and an enum DiagramBody which allows us to have different AST structures for different diagram types.
  • DiagramBody: This enum is key for extensibility. When we implement sequence diagrams, we’ll add Sequence(SequenceDiagram) here, ensuring type safety. We include helper methods as_flowchart and as_flowchart_mut for convenient downcasting.

Now, let’s update src/ast/mod.rs to make these common types publicly available within the ast module.

File: src/ast/mod.rs

//! Defines the Abstract Syntax Tree (AST) for Mermaid diagrams.

pub mod common;
pub mod flowchart;

pub use common::{Diagram, DiagramBody, DiagramType, Location, Span};
pub use flowchart::{
    Direction, Edge, EdgeArrow, EdgeKind, FlowchartDiagram, FlowchartStatement, Node, NodeKind,
    NodeRef, Subgraph,
};

b) Core Implementation - Flowchart AST

Now we define the specific AST nodes for Flowcharts. This will be the most complex part of this chapter.

File: src/ast/flowchart.rs

//! Defines the Abstract Syntax Tree (AST) for Mermaid Flowcharts.

use super::common::Span;
use std::collections::HashMap;

/// Represents a Flowchart diagram's overall structure.
#[derive(Debug, Clone, PartialEq, Eq, Hash)]
pub struct FlowchartDiagram {
    /// The primary direction of the flowchart (e.g., TD, LR).
    pub direction: Option<Direction>,
    /// A list of all statements within the flowchart body.
    pub statements: Vec<FlowchartStatement>,
    /// Optional title for the diagram.
    pub title: Option<String>,
    /// A map to store explicit node declarations or implicit nodes from edges.
    /// Key: Node ID (e.g., "A", "node_1")
    /// Value: The actual Node structure. This is a semantic detail, but useful for validation.
    pub nodes: HashMap<String, Node>,
    /// The span covering the flowchart's body content, excluding the initial `graph TD` declaration.
    pub span: Span,
}

/// Represents a single statement within a Flowchart diagram.
#[derive(Debug, Clone, PartialEq, Eq, Hash)]
pub enum FlowchartStatement {
    /// A declaration of a node, potentially with a specific shape and label.
    Node(Node),
    /// An edge connecting two nodes.
    Edge(Edge),
    /// A subgraph containing its own set of statements.
    Subgraph(Subgraph),
    /// An explicit direction declaration (e.g., `direction LR`).
    Direction(Direction),
    /// A comment line.
    Comment(String, Span),
    /// Placeholder for style definitions (e.g., `style A fill:#fff`).
    Style(Span),
    /// Placeholder for click definitions (e.g., `click A call myFunction()`).
    Click(Span),
    /// Represents a raw, unparsed line or statement that couldn't be categorized.
    /// Useful for error recovery or future extensibility.
    Unknown(String, Span),
}

/// Represents a node in a Flowchart.
#[derive(Debug, Clone, PartialEq, Eq, Hash)]
pub struct Node {
    /// The unique identifier for the node (e.g., "A", "id1").
    pub id: String,
    /// The visual kind or shape of the node.
    pub kind: NodeKind,
    /// The label displayed on the node (optional, defaults to ID if not present).
    /// Can be multi-line.
    pub label: Option<String>,
    /// The span covering the node definition (e.g., `A[Label]`).
    pub span: Span,
}

impl Node {
    pub fn new(id: String, kind: NodeKind, label: Option<String>, span: Span) -> Self {
        Self {
            id,
            kind,
            label,
            span,
        }
    }
}

/// Represents the visual kind or shape of a node.
/// Based on Mermaid's official node shapes.
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
pub enum NodeKind {
    /// `id` or `id[label]` - Default rectangle
    Default,
    /// `id(label)` - Rounded rectangle
    Rounded,
    /// `id((label))` - Circle
    Circle,
    /// `id>label]` - Rhombus
    Rhombus,
    /// `id{label}` - Hexagon
    Hexagon,
    /// `id{{label}}` - Subroutine
    Subroutine,
    /// `id[/label/]` - Asymmetric shape (parallelogram)
    Asymmetric,
    /// `id[\label\]` - Asymmetric shape (parallelogram alt)
    AsymmetricAlt,
    /// `id((label))` - Double circle
    DoubleCircle,
    /// `id>label]` - Stadium
    Stadium,
    /// `id[(label)]` - Cylinder
    Cylinder,
    /// `id{{{label}}}` - Cloud
    Cloud,
    /// `id[("label")]` - Trapeze
    Trapeze,
    /// `id(("label"))` - Double Trapeze
    DoubleTrapeze,
    /// `id[/label\]` - Flag
    Flag,
    /// `id[\label/]` - Alt Flag
    AltFlag,
    /// `id[|label|]` - Class style
    Class,
    /// `id[[label]]` - Square/Rectangle with double border
    DoubleSquare,
    /// `id[("label")]` - Bubble
    Bubble,
    /// Placeholder for an unknown or custom node kind.
    Unknown,
}

/// Represents an edge connecting two nodes.
#[derive(Debug, Clone, PartialEq, Eq, Hash)]
pub struct Edge {
    /// The source node of the edge.
    pub source: NodeRef,
    /// The kind of arrow and line style.
    pub kind: EdgeKind,
    /// The label displayed on the edge (optional).
    pub label: Option<String>,
    /// The target node of the edge.
    pub target: NodeRef,
    /// The span covering the entire edge definition (e.g., `A -- Text --> B`).
    pub span: Span,
}

impl Edge {
    pub fn new(
        source: NodeRef,
        kind: EdgeKind,
        label: Option<String>,
        target: NodeRef,
        span: Span,
    ) -> Self {
        Self {
            source,
            kind,
            label,
            target,
            span,
        }
    }
}

/// Represents the reference to a node, which can be just an ID or an ID with an explicit label.
/// This is important because `A["My Label"]` defines node `A` and gives it a label,
/// but `A` could also be implicitly defined by an edge `A --> B`.
#[derive(Debug, Clone, PartialEq, Eq, Hash)]
pub enum NodeRef {
    /// A simple node reference by its ID (e.g., `A`).
    Id(String),
    /// A node reference with an explicit label (e.g., `A["My Label"]`).
    /// The inner String is the node ID, the Option<String> is the label.
    IdWithLabel(String, Option<String>),
    /// A raw identifier that might be a node ID or part of a label that needs further processing.
    /// Used during parsing before full node resolution.
    Raw(String),
}

impl AsRef<str> for NodeRef {
    fn as_ref(&self) -> &str {
        match self {
            NodeRef::Id(s) => s.as_str(),
            NodeRef::IdWithLabel(s, _) => s.as_str(),
            NodeRef::Raw(s) => s.as_str(),
        }
    }
}

/// Represents the visual kind of an edge (line style and arrowheads).
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
pub struct EdgeKind {
    /// The arrow type at the start of the edge.
    pub start_arrow: EdgeArrow,
    /// The arrow type at the end of the edge.
    pub end_arrow: EdgeArrow,
    /// Whether the line is solid, dotted, or thick.
    pub line_style: LineStyle,
}

impl EdgeKind {
    pub fn new(start_arrow: EdgeArrow, end_arrow: EdgeArrow, line_style: LineStyle) -> Self {
        Self {
            start_arrow,
            end_arrow,
            line_style,
        }
    }

    /// Creates a default solid arrow edge.
    pub fn default_arrow() -> Self {
        Self::new(EdgeArrow::None, EdgeArrow::Normal, LineStyle::Solid)
    }

    /// Creates a default solid open edge.
    pub fn default_open() -> Self {
        Self::new(EdgeArrow::None, EdgeArrow::None, LineStyle::Solid)
    }
}

/// Represents the type of arrowhead or tail at the start/end of an edge.
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
pub enum EdgeArrow {
    None,   // No arrow head
    Normal, // Standard arrow `-->`
    Cross,  // Cross `x--x`
    Circle, // Circle `o--o`
    Async,  // Async arrow `-->>`
}

/// Represents the style of the line for an edge.
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
pub enum LineStyle {
    Solid,  // `---`
    Dotted, // `-.->`
    Thick,  // `===`
}

/// Represents a subgraph within a Flowchart.
#[derive(Debug, Clone, PartialEq, Eq, Hash)]
pub struct Subgraph {
    /// The unique identifier for the subgraph (optional, can be implicit).
    pub id: Option<String>,
    /// The label displayed for the subgraph (optional).
    pub label: Option<String>,
    /// The statements contained within the subgraph.
    /// We use `Box<Vec<...>>` to avoid recursive type with infinite size,
    /// as a `Subgraph` can contain `FlowchartStatement`s, which can in turn be `Subgraph`s.
    pub statements: Box<Vec<FlowchartStatement>>,
    /// The span covering the entire subgraph definition.
    pub span: Span,
}

impl Subgraph {
    pub fn new(
        id: Option<String>,
        label: Option<String>,
        statements: Vec<FlowchartStatement>,
        span: Span,
    ) -> Self {
        Self {
            id,
            label,
            statements: Box::new(statements),
            span,
        }
    }
}

/// Represents the primary direction of a flowchart or subgraph.
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
pub enum Direction {
    TopToBottom, // TD
    BottomToTop, // BT
    LeftToRight, // LR
    RightToLeft, // RL
    // Add other directions if Mermaid supports them in the future
}

impl Direction {
    /// Creates a `Direction` from a string, case-insensitive.
    pub fn from_str(s: &str) -> Option<Self> {
        match s.to_uppercase().as_str() {
            "TD" => Some(Direction::TopToBottom),
            "TB" => Some(Direction::TopToBottom), // TB is alias for TD
            "BT" => Some(Direction::BottomToTop),
            "LR" => Some(Direction::LeftToRight),
            "RL" => Some(Direction::RightToLeft),
            _ => None,
        }
    }

    /// Returns the canonical string representation of the direction.
    pub fn to_str(&self) -> &'static str {
        match self {
            Direction::TopToBottom => "TD",
            Direction::BottomToTop => "BT",
            Direction::LeftToRight => "LR",
            Direction::RightToLeft => "RL",
        }
    }
}

Explanation of src/ast/flowchart.rs:

  • FlowchartDiagram: The root struct for a flowchart. It holds the direction, a Vec of FlowchartStatements, an optional title, and a HashMap of nodes. The nodes HashMap is a crucial semantic addition. While the parser builds the AST based on syntax, the validator will use this map to check for undefined nodes or duplicate IDs.
  • FlowchartStatement: An enum that represents all possible top-level constructs within a flowchart (nodes, edges, subgraphs, direction, comments, etc.). This allows us to have a heterogeneous list of statements.
  • Node: Represents a single node with its id, kind (shape), label (optional), and span.
  • NodeKind: An enum covering all standard Mermaid node shapes. We include Unknown for robustness against future changes or malformed input.
  • Edge: Connects a source NodeRef to a target NodeRef with a kind (arrow style) and an optional label.
  • NodeRef: This is a critical design choice. A node can be referenced by just its ID (A) or by its ID with an explicit label (A["My Label"]). The NodeRef enum handles both cases, ensuring we can capture the full information. Raw is used as an intermediate state during parsing.
  • EdgeKind: A struct combining start_arrow, end_arrow, and line_style to fully describe an edge’s visual appearance.
  • EdgeArrow & LineStyle: Enums for the various arrow types and line styles Mermaid supports.
  • Subgraph: Represents a nested block of flowchart statements. It uses Box<Vec<FlowchartStatement>> because a Subgraph can contain FlowchartStatements, which can in turn be Subgraphs. Without Box, Rust’s compiler would complain about potentially infinite size for the Subgraph type. Box allocates the Vec on the heap, breaking the recursion.
  • Direction: An enum for flowchart layout directions (TD, LR, etc.) with utility methods for conversion.

Finally, ensure src/main.rs (or src/lib.rs if you’re building a library) references the ast module.

File: src/main.rs

mod lexer;
mod token;
mod ast; // Add this line to include our new AST module

fn main() {
    // This will be our CLI entry point later.
    // For now, we can add a simple test to ensure AST types compile.
    println!("Mermaid Analyzer - AST module compiled successfully!");

    // Example of creating a simple AST node (for compilation check)
    let node_span = ast::Span::new(0, 5);
    let my_node = ast::Node::new(
        "A".to_string(),
        ast::NodeKind::Default,
        Some("Start".to_string()),
        node_span,
    );

    let edge_span = ast::Span::new(6, 10);
    let my_edge = ast::Edge::new(
        ast::NodeRef::Id("A".to_string()),
        ast::EdgeKind::default_arrow(),
        None,
        ast::NodeRef::Id("B".to_string()),
        edge_span,
    );

    let statements = vec![
        ast::FlowchartStatement::Node(my_node),
        ast::FlowchartStatement::Edge(my_edge),
    ];

    let flowchart_diagram = ast::FlowchartDiagram {
        direction: Some(ast::Direction::TopToBottom),
        statements,
        title: None,
        nodes: std::collections::HashMap::new(), // Will be populated by parser/validator
        span: ast::Span::new(0, 100),
    };

    let diagram = ast::Diagram {
        diagram_type: ast::DiagramType::Flowchart,
        body: ast::DiagramBody::Flowchart(flowchart_diagram),
        span: ast::Span::new(0, 100),
    };

    println!("Example Diagram AST created: {:?}", diagram);
}

c) Testing This Component (Compilation & Basic Structure)

At this stage, we are defining data structures, not implementing logic. The primary “test” is to ensure that cargo check passes, meaning our Rust types are valid and well-formed. We can also add a small manual test in main.rs as shown above, or create a dedicated unit test file to ensure we can instantiate these types.

File: tests/ast_types_test.rs (Create this file)

use mermaid_analyzer::ast; // Assuming your project is named `mermaid_analyzer`

#[test]
fn test_span_creation_and_merge() {
    let span1 = ast::Span::new(0, 5);
    assert_eq!(span1.start, 0);
    assert_eq!(span1.end, 5);
    assert_eq!(span1.len(), 5);
    assert!(!span1.is_empty());

    let span2 = ast::Span::new(7, 10);
    let merged = span1.merge(&span2);
    assert_eq!(merged.start, 0);
    assert_eq!(merged.end, 10);
    assert_eq!(merged.len(), 10);

    let empty_span = ast::Span::new(5, 5);
    assert!(empty_span.is_empty());
    assert_eq!(empty_span.len(), 0);
}

#[test]
fn test_node_creation() {
    let node_id = "A".to_string();
    let node_label = Some("Start Node".to_string());
    let node_span = ast::Span::new(0, 15);
    let node = ast::Node::new(node_id.clone(), ast::NodeKind::Rounded, node_label.clone(), node_span);

    assert_eq!(node.id, node_id);
    assert_eq!(node.kind, ast::NodeKind::Rounded);
    assert_eq!(node.label, node_label);
    assert_eq!(node.span, node_span);
}

#[test]
fn test_edge_creation() {
    let source = ast::NodeRef::Id("A".to_string());
    let target = ast::NodeRef::IdWithLabel("B".to_string(), Some("End Node".to_string()));
    let edge_kind = ast::EdgeKind::default_arrow();
    let edge_label = Some("connects to".to_string());
    let edge_span = ast::Span::new(16, 30);

    let edge = ast::Edge::new(source.clone(), edge_kind, edge_label.clone(), target.clone(), edge_span);

    assert_eq!(edge.source, source);
    assert_eq!(edge.kind, edge_kind);
    assert_eq!(edge.label, edge_label);
    assert_eq!(edge.target, target);
    assert_eq!(edge.span, edge_span);
}

#[test]
fn test_subgraph_creation() {
    let subgraph_id = Some("sub1".to_string());
    let subgraph_label = Some("My Subgraph".to_string());
    let subgraph_span = ast::Span::new(0, 50);

    let inner_node = ast::Node::new(
        "X".to_string(),
        ast::NodeKind::Default,
        Some("Inner Node".to_string()),
        ast::Span::new(10, 20),
    );
    let statements = vec![ast::FlowchartStatement::Node(inner_node)];

    let subgraph = ast::Subgraph::new(
        subgraph_id.clone(),
        subgraph_label.clone(),
        statements.clone(),
        subgraph_span,
    );

    assert_eq!(subgraph.id, subgraph_id);
    assert_eq!(subgraph.label, subgraph_label);
    assert_eq!(*subgraph.statements, statements); // Dereference Box for comparison
    assert_eq!(subgraph.span, subgraph_span);
}

#[test]
fn test_diagram_root_creation() {
    let flowchart_diagram = ast::FlowchartDiagram {
        direction: Some(ast::Direction::TopToBottom),
        statements: vec![],
        title: None,
        nodes: std::collections::HashMap::new(),
        span: ast::Span::new(0, 0),
    };

    let diagram = ast::Diagram {
        diagram_type: ast::DiagramType::Flowchart,
        body: ast::DiagramBody::Flowchart(flowchart_diagram.clone()),
        span: ast::Span::new(0, 0),
    };

    assert_eq!(diagram.diagram_type, ast::DiagramType::Flowchart);
    assert_eq!(diagram.body.as_flowchart(), Some(&flowchart_diagram));
}

To run these tests, ensure your Cargo.toml specifies the library name if you’re building a library, or use cargo test --workspace if it’s a binary with tests. If your project is named mermaid-analyzer (as typically used in Cargo.toml), the use mermaid_analyzer::ast; line will work. If your Cargo.toml has name = "mermaid-cli", then use use mermaid_cli::ast;.

Run cargo test to execute these basic tests. They should pass, confirming our AST structures can be instantiated and hold the expected data.

Production Considerations

When designing core data structures like an AST, production readiness involves several key considerations:

  1. Memory Usage:

    • Recursive Types and Box: We’ve already addressed this with Subgraph using Box<Vec<FlowchartStatement>>. This is standard practice in Rust for recursive data structures to ensure known compile-time sizes and prevent stack overflows for deep trees.
    • String Ownership: We primarily use String for IDs and labels. While &str references could save allocations, the AST needs to own its data, especially since it’s built from tokens (which might be &str slices of the input) and needs to outlive the input string. String is the correct choice here.
    • HashMap for Nodes: The nodes HashMap in FlowchartDiagram is a semantic cache. It allows for efficient lookup of nodes by ID during validation and transformation, preventing repeated linear scans of the statements vector. This is a performance optimization for larger diagrams.
  2. Performance:

    • Immutability and Cloning: Our AST nodes are Clone, PartialEq, Eq, Hash. While Clone can be expensive for large structures, it’s often necessary for certain compiler passes (e.g., transformations that need to modify a copy). We should be mindful of when we clone and avoid it unnecessarily during performance-critical traversals.
    • Vec vs. LinkedList: Vec is generally preferred in Rust for cache locality and efficient iteration, even for insertions/deletions in the middle if they are infrequent. For ASTs, appending and iterating are common, making Vec a good default.
  3. Error Handling & Diagnostics:

    • Span Integration: The Span field in every significant AST node is paramount. It allows our diagnostic system (to be built in a later chapter) to precisely point to the location of errors or warnings.
    • Semantic vs. Syntax Errors: The AST itself represents the syntactic structure. Semantic checks (e.g., “node A is defined twice”, “edge refers to undefined node B”) happen in a separate validation pass after the AST is built. The AST’s role is to provide the data for these checks.
  4. Extensibility:

    • Enums for Diagram Types and Statements: Using enums like DiagramBody and FlowchartStatement makes it easy to add new diagram types or new statement kinds without altering existing code significantly, following the Open/Closed Principle.
    • Unknown Variants: Including Unknown variants in enums like NodeKind or FlowchartStatement allows the parser to gracefully handle unsupported or malformed constructs without crashing, enabling better error recovery.

Code Review Checkpoint

At this point, we have successfully defined the foundational data structures for our Mermaid AST, with a particular focus on Flowcharts.

What we’ve built:

  • src/ast/common.rs: Defined Span, Location, DiagramType, Diagram, and DiagramBody to provide general AST utilities and the top-level structure.
  • src/ast/flowchart.rs: Implemented the detailed AST for Flowcharts, including FlowchartDiagram, FlowchartStatement, Node, NodeKind, Edge, NodeRef, EdgeKind, EdgeArrow, LineStyle, Subgraph, and Direction.
  • src/ast/mod.rs: Consolidated and re-exported all AST components for easy access.
  • src/main.rs: Updated to include the ast module and a basic instantiation example to verify compilation.
  • tests/ast_types_test.rs: Added unit tests to ensure our AST structures can be correctly created and hold data.

These structures form the backbone of our Mermaid analyzer. The Diagram enum, in particular, will be the output of our parser (Chapter 5) and the input for our validator and formatter (Chapters 6 and 7). The careful design of NodeRef and the use of Box for Subgraph ensures both correctness and performance.

Common Issues & Solutions

  1. “Recursive type has infinite size” error:

    • Issue: This typically occurs when a struct contains a field of its own type, or a type that directly or indirectly refers back to itself, without using Box. For example, if Subgraph had statements: Vec<FlowchartStatement> and FlowchartStatement could contain Subgraph directly, without Box.
    • Solution: Use Box<T> for recursive fields. Box allocates T on the heap, giving it a known, fixed size on the stack (a pointer). We applied this to statements: Box<Vec<FlowchartStatement>> in Subgraph.
  2. Missing Debug, Clone, PartialEq, Eq, Hash traits:

    • Issue: When writing tests or needing to print, copy, compare, or use types in collections like HashMap or HashSet, you’ll encounter errors if these traits are not derived or manually implemented.
    • Solution: For simple data structures like most of our AST nodes, Rust’s #[derive(...)] macro makes this trivial. We’ve applied #[derive(Debug, Clone, PartialEq, Eq, Hash)] to almost all our AST types. Remember that Hash requires all fields to also implement Hash.
  3. Over-normalization in AST design:

    • Issue: Sometimes developers try to make the AST too clean or normalized, losing information about the original source text. For example, collapsing A["Label"] and A into a single Node { id: "A", label: Some("Label") } even if the original input was just A.
    • Solution: The AST should represent the parsed structure, not necessarily the ideal formatted structure. Our NodeRef enum (Id vs. IdWithLabel) is an example of preserving this distinction. Formatting and semantic validation come after the AST is built. The AST needs enough information to reconstruct the original (or a canonical) form.

Testing & Verification

To verify the work in this chapter:

  1. Run cargo check: Navigate to your project root and run cargo check. This will compile all your code without running tests. If there are any type errors or syntax issues in your AST definitions, cargo check will report them immediately. A clean cargo check indicates your Rust types are valid.
  2. Run cargo test: Execute cargo test to run the unit tests defined in tests/ast_types_test.rs. All tests should pass, confirming that you can instantiate and manipulate the AST structures as expected.
  3. Review main.rs output: If you included the example AST instantiation in src/main.rs, run cargo run. You should see the println! output confirming that the AST module compiled and your example diagram AST was created.

At this stage, you should have a solid, compilable, and testable foundation for representing Mermaid diagrams in Rust.

Summary & Next Steps

In this chapter, we have successfully designed and implemented the core Abstract Syntax Tree (AST) for our Mermaid analyzer. We started with common utilities like Span and DiagramType, then progressively built out the detailed structures for Flowchart diagrams, including nodes, edges, subgraphs, and directions. We paid close attention to Rust’s type system, using enums for extensibility and Box for recursive types, ensuring our AST is robust, memory-efficient, and strictly typed.

This AST is a critical milestone. It transforms the flat stream of tokens from our lexer into a hierarchical, semantically rich data structure that our tool can easily traverse, analyze, and modify. It is the central representation that will enable all subsequent compiler-like phases.

In the next chapter, Chapter 5: Building the Parser: From Tokens to AST, we will leverage the AST structures we’ve defined here. We will implement the parser, which will take the token stream generated by our lexer and combine those tokens into instances of the Diagram and FlowchartDiagram structs, populating our AST with actual parsed Mermaid content. This is where the real “compiler magic” begins!