Welcome to Chapter 4 of our journey to build a production-grade Mermaid analyzer and fixer! In the previous chapter, we laid the groundwork by creating a robust lexer that converts raw Mermaid text into a stream of meaningful tokens. Now, it’s time to elevate our understanding of the Mermaid structure by transforming these tokens into a rich, strongly typed Abstract Syntax Tree (AST).
The AST is the heart of any compiler or language tool. It’s an intermediate representation of the source code that captures its hierarchical structure and all its essential components, independent of the original text’s formatting. For our Mermaid tool, the AST will serve as the single source of truth for validation, formatting, and any future transformations. By building a precise and comprehensive AST, we ensure that our tool can accurately understand, analyze, and manipulate Mermaid diagrams, adhering strictly to their official specifications.
By the end of this chapter, you will have designed and implemented the core Rust data structures that represent a Mermaid diagram. We will focus primarily on the Flowchart diagram type, which covers many fundamental concepts applicable to other diagram types. This chapter will not involve parsing logic yet; instead, it’s about defining the shape of the data our parser will eventually produce. This incremental approach ensures that each component is well-defined and testable before integration.
Planning & Design: Structuring Our Mermaid AST
The Abstract Syntax Tree (AST) is a tree representation of the abstract syntactic structure of source code, where each node in the tree denotes a construct occurring in the source code. For our Mermaid analyzer, this means defining Rust enums and structs that mirror the grammar of Mermaid diagrams.
Core Principles for AST Design:
- Strictness: Every AST node must directly correspond to a valid Mermaid construct. No ambiguities, no “maybe” types.
- Strong Typing: Leverage Rust’s type system to enforce correctness at compile time.
- Completeness: The AST should be able to represent all valid Mermaid constructs for the supported diagram types.
- Extensibility: Design with future additions in mind (new diagram types, new features within existing types).
- Location Tracking (
Span): Each significant AST node should carrySpaninformation (start and end byte offsets in the source string) for precise error reporting and highlighting.
Component Architecture:
We’ll start by defining common types used across the AST, such as Span and DiagramType. Then, we’ll dive into the specific structures for Flowcharts.
File Structure:
We will organize our AST definitions within a dedicated src/ast module:
src/
├── main.rs
├── lexer.rs (from Chapter 3)
├── token.rs (from Chapter 3)
└── ast/
├── mod.rs
├── common.rs // General types like Span, Location, DiagramType
└── flowchart.rs // Flowchart-specific AST nodes
Step-by-Step Implementation
a) Setup/Configuration
First, let’s create the necessary files and define our initial common types.
Create the src/ast directory and the mod.rs and common.rs files within it.
mkdir -p src/ast
touch src/ast/mod.rs src/ast/common.rs src/ast/flowchart.rs
Now, let’s define the fundamental building blocks: Span and Location, which are crucial for associating AST nodes back to their original source code positions for diagnostics.
File: src/ast/common.rs
//! Common AST types used across different Mermaid diagram structures.
use std::fmt;
/// Represents a span of text in the source code, defined by start and end byte offsets.
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
pub struct Span {
/// The starting byte offset of the span (inclusive).
pub start: usize,
/// The ending byte offset of the span (exclusive).
pub end: usize,
}
impl Span {
/// Creates a new `Span`.
pub fn new(start: usize, end: usize) -> Self {
Self { start, end }
}
/// Returns true if the span is empty (start == end).
pub fn is_empty(&self) -> bool {
self.start == self.end
}
/// Returns the length of the span.
pub fn len(&self) -> usize {
self.end.saturating_sub(self.start)
}
/// Merges two spans, returning a new span that covers both.
/// Panics if the spans are not logically ordered (e.g., `self` starts after `other` ends).
pub fn merge(&self, other: &Span) -> Self {
Span::new(self.start.min(other.start), self.end.max(other.end))
}
}
/// Represents a specific point in the source code, including line and column numbers.
/// While `Span` is for byte offsets, `Location` is for human-readable positions.
/// This will be computed from `Span` during diagnostic generation.
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
pub struct Location {
pub line: usize,
pub column: usize,
}
impl Location {
pub fn new(line: usize, column: usize) -> Self {
Self { line, column }
}
}
/// Represents the overall type of a Mermaid diagram.
#[derive(Debug, Clone, PartialEq, Eq, Hash)]
pub enum DiagramType {
Flowchart,
Sequence,
Class,
State,
Gantt,
Pie,
Git,
C4c,
Mindmap,
Timeline,
Block,
Quadrant,
Requirement,
Erm,
Zen,
// Add other diagram types as they are officially supported and implemented
Unknown(String), // For future extensibility or unsupported types
}
impl fmt::Display for DiagramType {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
match self {
DiagramType::Flowchart => write!(f, "flowchart"),
DiagramType::Sequence => write!(f, "sequenceDiagram"),
DiagramType::Class => write!(f, "classDiagram"),
DiagramType::State => write!(f, "stateDiagram-v2"),
DiagramType::Gantt => write!(f, "gantt"),
DiagramType::Pie => write!(f, "pie"),
DiagramType::Git => write!(f, "gitGraph"),
DiagramType::C4c => write!(f, "C4Context"),
DiagramType::Mindmap => write!(f, "mindmap"),
DiagramType::Timeline => write!(f, "timeline"),
DiagramType::Block => write!(f, "block-beta"),
DiagramType::Quadrant => write!(f, "quadrantChart"),
DiagramType::Requirement => write!(f, "requirementDiagram"),
DiagramType::Erm => write!(f, "erDiagram"),
DiagramType::Zen => write!(f, "zen-of-mermaid"),
DiagramType::Unknown(s) => write!(f, "unknown diagram type: {}", s),
}
}
}
/// The top-level structure representing a parsed Mermaid diagram.
#[derive(Debug, Clone, PartialEq, Eq, Hash)]
pub struct Diagram {
/// The type of the diagram (e.g., Flowchart, Sequence).
pub diagram_type: DiagramType,
/// The specific body of the diagram, which varies by type.
pub body: DiagramBody,
/// The span covering the entire diagram declaration (e.g., `graph TD`).
pub span: Span,
}
/// An enum representing the specific body content for each diagram type.
/// This allows us to store different AST structures based on `DiagramType`.
#[derive(Debug, Clone, PartialEq, Eq, Hash)]
pub enum DiagramBody {
Flowchart(crate::ast::flowchart::FlowchartDiagram),
Sequence, // Placeholder for future sequence diagram AST
Class, // Placeholder for future class diagram AST
// Add other diagram bodies here
Empty, // Represents a diagram type declared but with no content (e.g., "graph TD\n")
}
impl DiagramBody {
/// Returns a reference to the FlowchartDiagram if this body is a Flowchart.
pub fn as_flowchart(&self) -> Option<&crate::ast::flowchart::FlowchartDiagram> {
match self {
DiagramBody::Flowchart(ref d) => Some(d),
_ => None,
}
}
/// Returns a mutable reference to the FlowchartDiagram if this body is a Flowchart.
pub fn as_flowchart_mut(&mut self) -> Option<&mut crate::ast::flowchart::FlowchartDiagram> {
match self {
DiagramBody::Flowchart(ref mut d) => Some(d),
_ => None,
}
}
}
Explanation:
Span: A simple struct to storestartandendbyte offsets. This is critical for error reporting, allowing us to pinpoint exact locations in the source code. Themergemethod is a utility for combining spans of multiple tokens/nodes.Location: WhileSpanis for machine-readable byte offsets,Locationwill store human-readable line and column numbers. This will be derived fromSpanand the source text when generating diagnostics.DiagramType: An enum to distinguish between different Mermaid diagram kinds (flowchart, sequence, etc.). This is the first level of structural differentiation.Diagram: The root of our AST. It holds theDiagramTypeand anenum DiagramBodywhich allows us to have different AST structures for different diagram types.DiagramBody: Thisenumis key for extensibility. When we implement sequence diagrams, we’ll addSequence(SequenceDiagram)here, ensuring type safety. We include helper methodsas_flowchartandas_flowchart_mutfor convenient downcasting.
Now, let’s update src/ast/mod.rs to make these common types publicly available within the ast module.
File: src/ast/mod.rs
//! Defines the Abstract Syntax Tree (AST) for Mermaid diagrams.
pub mod common;
pub mod flowchart;
pub use common::{Diagram, DiagramBody, DiagramType, Location, Span};
pub use flowchart::{
Direction, Edge, EdgeArrow, EdgeKind, FlowchartDiagram, FlowchartStatement, Node, NodeKind,
NodeRef, Subgraph,
};
b) Core Implementation - Flowchart AST
Now we define the specific AST nodes for Flowcharts. This will be the most complex part of this chapter.
File: src/ast/flowchart.rs
//! Defines the Abstract Syntax Tree (AST) for Mermaid Flowcharts.
use super::common::Span;
use std::collections::HashMap;
/// Represents a Flowchart diagram's overall structure.
#[derive(Debug, Clone, PartialEq, Eq, Hash)]
pub struct FlowchartDiagram {
/// The primary direction of the flowchart (e.g., TD, LR).
pub direction: Option<Direction>,
/// A list of all statements within the flowchart body.
pub statements: Vec<FlowchartStatement>,
/// Optional title for the diagram.
pub title: Option<String>,
/// A map to store explicit node declarations or implicit nodes from edges.
/// Key: Node ID (e.g., "A", "node_1")
/// Value: The actual Node structure. This is a semantic detail, but useful for validation.
pub nodes: HashMap<String, Node>,
/// The span covering the flowchart's body content, excluding the initial `graph TD` declaration.
pub span: Span,
}
/// Represents a single statement within a Flowchart diagram.
#[derive(Debug, Clone, PartialEq, Eq, Hash)]
pub enum FlowchartStatement {
/// A declaration of a node, potentially with a specific shape and label.
Node(Node),
/// An edge connecting two nodes.
Edge(Edge),
/// A subgraph containing its own set of statements.
Subgraph(Subgraph),
/// An explicit direction declaration (e.g., `direction LR`).
Direction(Direction),
/// A comment line.
Comment(String, Span),
/// Placeholder for style definitions (e.g., `style A fill:#fff`).
Style(Span),
/// Placeholder for click definitions (e.g., `click A call myFunction()`).
Click(Span),
/// Represents a raw, unparsed line or statement that couldn't be categorized.
/// Useful for error recovery or future extensibility.
Unknown(String, Span),
}
/// Represents a node in a Flowchart.
#[derive(Debug, Clone, PartialEq, Eq, Hash)]
pub struct Node {
/// The unique identifier for the node (e.g., "A", "id1").
pub id: String,
/// The visual kind or shape of the node.
pub kind: NodeKind,
/// The label displayed on the node (optional, defaults to ID if not present).
/// Can be multi-line.
pub label: Option<String>,
/// The span covering the node definition (e.g., `A[Label]`).
pub span: Span,
}
impl Node {
pub fn new(id: String, kind: NodeKind, label: Option<String>, span: Span) -> Self {
Self {
id,
kind,
label,
span,
}
}
}
/// Represents the visual kind or shape of a node.
/// Based on Mermaid's official node shapes.
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
pub enum NodeKind {
/// `id` or `id[label]` - Default rectangle
Default,
/// `id(label)` - Rounded rectangle
Rounded,
/// `id((label))` - Circle
Circle,
/// `id>label]` - Rhombus
Rhombus,
/// `id{label}` - Hexagon
Hexagon,
/// `id{{label}}` - Subroutine
Subroutine,
/// `id[/label/]` - Asymmetric shape (parallelogram)
Asymmetric,
/// `id[\label\]` - Asymmetric shape (parallelogram alt)
AsymmetricAlt,
/// `id((label))` - Double circle
DoubleCircle,
/// `id>label]` - Stadium
Stadium,
/// `id[(label)]` - Cylinder
Cylinder,
/// `id{{{label}}}` - Cloud
Cloud,
/// `id[("label")]` - Trapeze
Trapeze,
/// `id(("label"))` - Double Trapeze
DoubleTrapeze,
/// `id[/label\]` - Flag
Flag,
/// `id[\label/]` - Alt Flag
AltFlag,
/// `id[|label|]` - Class style
Class,
/// `id[[label]]` - Square/Rectangle with double border
DoubleSquare,
/// `id[("label")]` - Bubble
Bubble,
/// Placeholder for an unknown or custom node kind.
Unknown,
}
/// Represents an edge connecting two nodes.
#[derive(Debug, Clone, PartialEq, Eq, Hash)]
pub struct Edge {
/// The source node of the edge.
pub source: NodeRef,
/// The kind of arrow and line style.
pub kind: EdgeKind,
/// The label displayed on the edge (optional).
pub label: Option<String>,
/// The target node of the edge.
pub target: NodeRef,
/// The span covering the entire edge definition (e.g., `A -- Text --> B`).
pub span: Span,
}
impl Edge {
pub fn new(
source: NodeRef,
kind: EdgeKind,
label: Option<String>,
target: NodeRef,
span: Span,
) -> Self {
Self {
source,
kind,
label,
target,
span,
}
}
}
/// Represents the reference to a node, which can be just an ID or an ID with an explicit label.
/// This is important because `A["My Label"]` defines node `A` and gives it a label,
/// but `A` could also be implicitly defined by an edge `A --> B`.
#[derive(Debug, Clone, PartialEq, Eq, Hash)]
pub enum NodeRef {
/// A simple node reference by its ID (e.g., `A`).
Id(String),
/// A node reference with an explicit label (e.g., `A["My Label"]`).
/// The inner String is the node ID, the Option<String> is the label.
IdWithLabel(String, Option<String>),
/// A raw identifier that might be a node ID or part of a label that needs further processing.
/// Used during parsing before full node resolution.
Raw(String),
}
impl AsRef<str> for NodeRef {
fn as_ref(&self) -> &str {
match self {
NodeRef::Id(s) => s.as_str(),
NodeRef::IdWithLabel(s, _) => s.as_str(),
NodeRef::Raw(s) => s.as_str(),
}
}
}
/// Represents the visual kind of an edge (line style and arrowheads).
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
pub struct EdgeKind {
/// The arrow type at the start of the edge.
pub start_arrow: EdgeArrow,
/// The arrow type at the end of the edge.
pub end_arrow: EdgeArrow,
/// Whether the line is solid, dotted, or thick.
pub line_style: LineStyle,
}
impl EdgeKind {
pub fn new(start_arrow: EdgeArrow, end_arrow: EdgeArrow, line_style: LineStyle) -> Self {
Self {
start_arrow,
end_arrow,
line_style,
}
}
/// Creates a default solid arrow edge.
pub fn default_arrow() -> Self {
Self::new(EdgeArrow::None, EdgeArrow::Normal, LineStyle::Solid)
}
/// Creates a default solid open edge.
pub fn default_open() -> Self {
Self::new(EdgeArrow::None, EdgeArrow::None, LineStyle::Solid)
}
}
/// Represents the type of arrowhead or tail at the start/end of an edge.
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
pub enum EdgeArrow {
None, // No arrow head
Normal, // Standard arrow `-->`
Cross, // Cross `x--x`
Circle, // Circle `o--o`
Async, // Async arrow `-->>`
}
/// Represents the style of the line for an edge.
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
pub enum LineStyle {
Solid, // `---`
Dotted, // `-.->`
Thick, // `===`
}
/// Represents a subgraph within a Flowchart.
#[derive(Debug, Clone, PartialEq, Eq, Hash)]
pub struct Subgraph {
/// The unique identifier for the subgraph (optional, can be implicit).
pub id: Option<String>,
/// The label displayed for the subgraph (optional).
pub label: Option<String>,
/// The statements contained within the subgraph.
/// We use `Box<Vec<...>>` to avoid recursive type with infinite size,
/// as a `Subgraph` can contain `FlowchartStatement`s, which can in turn be `Subgraph`s.
pub statements: Box<Vec<FlowchartStatement>>,
/// The span covering the entire subgraph definition.
pub span: Span,
}
impl Subgraph {
pub fn new(
id: Option<String>,
label: Option<String>,
statements: Vec<FlowchartStatement>,
span: Span,
) -> Self {
Self {
id,
label,
statements: Box::new(statements),
span,
}
}
}
/// Represents the primary direction of a flowchart or subgraph.
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
pub enum Direction {
TopToBottom, // TD
BottomToTop, // BT
LeftToRight, // LR
RightToLeft, // RL
// Add other directions if Mermaid supports them in the future
}
impl Direction {
/// Creates a `Direction` from a string, case-insensitive.
pub fn from_str(s: &str) -> Option<Self> {
match s.to_uppercase().as_str() {
"TD" => Some(Direction::TopToBottom),
"TB" => Some(Direction::TopToBottom), // TB is alias for TD
"BT" => Some(Direction::BottomToTop),
"LR" => Some(Direction::LeftToRight),
"RL" => Some(Direction::RightToLeft),
_ => None,
}
}
/// Returns the canonical string representation of the direction.
pub fn to_str(&self) -> &'static str {
match self {
Direction::TopToBottom => "TD",
Direction::BottomToTop => "BT",
Direction::LeftToRight => "LR",
Direction::RightToLeft => "RL",
}
}
}
Explanation of src/ast/flowchart.rs:
FlowchartDiagram: The root struct for a flowchart. It holds thedirection, aVecofFlowchartStatements, an optionaltitle, and aHashMapofnodes. ThenodesHashMapis a crucial semantic addition. While the parser builds the AST based on syntax, the validator will use this map to check for undefined nodes or duplicate IDs.FlowchartStatement: An enum that represents all possible top-level constructs within a flowchart (nodes, edges, subgraphs, direction, comments, etc.). This allows us to have a heterogeneous list of statements.Node: Represents a single node with itsid,kind(shape),label(optional), andspan.NodeKind: An enum covering all standard Mermaid node shapes. We includeUnknownfor robustness against future changes or malformed input.Edge: Connects asourceNodeRefto atargetNodeRefwith akind(arrow style) and an optionallabel.NodeRef: This is a critical design choice. A node can be referenced by just its ID (A) or by its ID with an explicit label (A["My Label"]). TheNodeRefenum handles both cases, ensuring we can capture the full information.Rawis used as an intermediate state during parsing.EdgeKind: A struct combiningstart_arrow,end_arrow, andline_styleto fully describe an edge’s visual appearance.EdgeArrow&LineStyle: Enums for the various arrow types and line styles Mermaid supports.Subgraph: Represents a nested block of flowchart statements. It usesBox<Vec<FlowchartStatement>>because aSubgraphcan containFlowchartStatements, which can in turn beSubgraphs. WithoutBox, Rust’s compiler would complain about potentially infinite size for theSubgraphtype.Boxallocates theVecon the heap, breaking the recursion.Direction: An enum for flowchart layout directions (TD,LR, etc.) with utility methods for conversion.
Finally, ensure src/main.rs (or src/lib.rs if you’re building a library) references the ast module.
File: src/main.rs
mod lexer;
mod token;
mod ast; // Add this line to include our new AST module
fn main() {
// This will be our CLI entry point later.
// For now, we can add a simple test to ensure AST types compile.
println!("Mermaid Analyzer - AST module compiled successfully!");
// Example of creating a simple AST node (for compilation check)
let node_span = ast::Span::new(0, 5);
let my_node = ast::Node::new(
"A".to_string(),
ast::NodeKind::Default,
Some("Start".to_string()),
node_span,
);
let edge_span = ast::Span::new(6, 10);
let my_edge = ast::Edge::new(
ast::NodeRef::Id("A".to_string()),
ast::EdgeKind::default_arrow(),
None,
ast::NodeRef::Id("B".to_string()),
edge_span,
);
let statements = vec![
ast::FlowchartStatement::Node(my_node),
ast::FlowchartStatement::Edge(my_edge),
];
let flowchart_diagram = ast::FlowchartDiagram {
direction: Some(ast::Direction::TopToBottom),
statements,
title: None,
nodes: std::collections::HashMap::new(), // Will be populated by parser/validator
span: ast::Span::new(0, 100),
};
let diagram = ast::Diagram {
diagram_type: ast::DiagramType::Flowchart,
body: ast::DiagramBody::Flowchart(flowchart_diagram),
span: ast::Span::new(0, 100),
};
println!("Example Diagram AST created: {:?}", diagram);
}
c) Testing This Component (Compilation & Basic Structure)
At this stage, we are defining data structures, not implementing logic. The primary “test” is to ensure that cargo check passes, meaning our Rust types are valid and well-formed. We can also add a small manual test in main.rs as shown above, or create a dedicated unit test file to ensure we can instantiate these types.
File: tests/ast_types_test.rs (Create this file)
use mermaid_analyzer::ast; // Assuming your project is named `mermaid_analyzer`
#[test]
fn test_span_creation_and_merge() {
let span1 = ast::Span::new(0, 5);
assert_eq!(span1.start, 0);
assert_eq!(span1.end, 5);
assert_eq!(span1.len(), 5);
assert!(!span1.is_empty());
let span2 = ast::Span::new(7, 10);
let merged = span1.merge(&span2);
assert_eq!(merged.start, 0);
assert_eq!(merged.end, 10);
assert_eq!(merged.len(), 10);
let empty_span = ast::Span::new(5, 5);
assert!(empty_span.is_empty());
assert_eq!(empty_span.len(), 0);
}
#[test]
fn test_node_creation() {
let node_id = "A".to_string();
let node_label = Some("Start Node".to_string());
let node_span = ast::Span::new(0, 15);
let node = ast::Node::new(node_id.clone(), ast::NodeKind::Rounded, node_label.clone(), node_span);
assert_eq!(node.id, node_id);
assert_eq!(node.kind, ast::NodeKind::Rounded);
assert_eq!(node.label, node_label);
assert_eq!(node.span, node_span);
}
#[test]
fn test_edge_creation() {
let source = ast::NodeRef::Id("A".to_string());
let target = ast::NodeRef::IdWithLabel("B".to_string(), Some("End Node".to_string()));
let edge_kind = ast::EdgeKind::default_arrow();
let edge_label = Some("connects to".to_string());
let edge_span = ast::Span::new(16, 30);
let edge = ast::Edge::new(source.clone(), edge_kind, edge_label.clone(), target.clone(), edge_span);
assert_eq!(edge.source, source);
assert_eq!(edge.kind, edge_kind);
assert_eq!(edge.label, edge_label);
assert_eq!(edge.target, target);
assert_eq!(edge.span, edge_span);
}
#[test]
fn test_subgraph_creation() {
let subgraph_id = Some("sub1".to_string());
let subgraph_label = Some("My Subgraph".to_string());
let subgraph_span = ast::Span::new(0, 50);
let inner_node = ast::Node::new(
"X".to_string(),
ast::NodeKind::Default,
Some("Inner Node".to_string()),
ast::Span::new(10, 20),
);
let statements = vec![ast::FlowchartStatement::Node(inner_node)];
let subgraph = ast::Subgraph::new(
subgraph_id.clone(),
subgraph_label.clone(),
statements.clone(),
subgraph_span,
);
assert_eq!(subgraph.id, subgraph_id);
assert_eq!(subgraph.label, subgraph_label);
assert_eq!(*subgraph.statements, statements); // Dereference Box for comparison
assert_eq!(subgraph.span, subgraph_span);
}
#[test]
fn test_diagram_root_creation() {
let flowchart_diagram = ast::FlowchartDiagram {
direction: Some(ast::Direction::TopToBottom),
statements: vec![],
title: None,
nodes: std::collections::HashMap::new(),
span: ast::Span::new(0, 0),
};
let diagram = ast::Diagram {
diagram_type: ast::DiagramType::Flowchart,
body: ast::DiagramBody::Flowchart(flowchart_diagram.clone()),
span: ast::Span::new(0, 0),
};
assert_eq!(diagram.diagram_type, ast::DiagramType::Flowchart);
assert_eq!(diagram.body.as_flowchart(), Some(&flowchart_diagram));
}
To run these tests, ensure your Cargo.toml specifies the library name if you’re building a library, or use cargo test --workspace if it’s a binary with tests. If your project is named mermaid-analyzer (as typically used in Cargo.toml), the use mermaid_analyzer::ast; line will work. If your Cargo.toml has name = "mermaid-cli", then use use mermaid_cli::ast;.
Run cargo test to execute these basic tests. They should pass, confirming our AST structures can be instantiated and hold the expected data.
Production Considerations
When designing core data structures like an AST, production readiness involves several key considerations:
Memory Usage:
- Recursive Types and
Box: We’ve already addressed this withSubgraphusingBox<Vec<FlowchartStatement>>. This is standard practice in Rust for recursive data structures to ensure known compile-time sizes and prevent stack overflows for deep trees. - String Ownership: We primarily use
Stringfor IDs and labels. While&strreferences could save allocations, the AST needs to own its data, especially since it’s built from tokens (which might be&strslices of the input) and needs to outlive the input string.Stringis the correct choice here. HashMapfor Nodes: ThenodesHashMapinFlowchartDiagramis a semantic cache. It allows for efficient lookup of nodes by ID during validation and transformation, preventing repeated linear scans of thestatementsvector. This is a performance optimization for larger diagrams.
- Recursive Types and
Performance:
- Immutability and Cloning: Our AST nodes are
Clone,PartialEq,Eq,Hash. WhileClonecan be expensive for large structures, it’s often necessary for certain compiler passes (e.g., transformations that need to modify a copy). We should be mindful of when we clone and avoid it unnecessarily during performance-critical traversals. Vecvs.LinkedList:Vecis generally preferred in Rust for cache locality and efficient iteration, even for insertions/deletions in the middle if they are infrequent. For ASTs, appending and iterating are common, makingVeca good default.
- Immutability and Cloning: Our AST nodes are
Error Handling & Diagnostics:
SpanIntegration: TheSpanfield in every significant AST node is paramount. It allows our diagnostic system (to be built in a later chapter) to precisely point to the location of errors or warnings.- Semantic vs. Syntax Errors: The AST itself represents the syntactic structure. Semantic checks (e.g., “node A is defined twice”, “edge refers to undefined node B”) happen in a separate validation pass after the AST is built. The AST’s role is to provide the data for these checks.
Extensibility:
- Enums for Diagram Types and Statements: Using enums like
DiagramBodyandFlowchartStatementmakes it easy to add new diagram types or new statement kinds without altering existing code significantly, following the Open/Closed Principle. UnknownVariants: IncludingUnknownvariants in enums likeNodeKindorFlowchartStatementallows the parser to gracefully handle unsupported or malformed constructs without crashing, enabling better error recovery.
- Enums for Diagram Types and Statements: Using enums like
Code Review Checkpoint
At this point, we have successfully defined the foundational data structures for our Mermaid AST, with a particular focus on Flowcharts.
What we’ve built:
src/ast/common.rs: DefinedSpan,Location,DiagramType,Diagram, andDiagramBodyto provide general AST utilities and the top-level structure.src/ast/flowchart.rs: Implemented the detailed AST for Flowcharts, includingFlowchartDiagram,FlowchartStatement,Node,NodeKind,Edge,NodeRef,EdgeKind,EdgeArrow,LineStyle,Subgraph, andDirection.src/ast/mod.rs: Consolidated and re-exported all AST components for easy access.src/main.rs: Updated to include theastmodule and a basic instantiation example to verify compilation.tests/ast_types_test.rs: Added unit tests to ensure our AST structures can be correctly created and hold data.
These structures form the backbone of our Mermaid analyzer. The Diagram enum, in particular, will be the output of our parser (Chapter 5) and the input for our validator and formatter (Chapters 6 and 7). The careful design of NodeRef and the use of Box for Subgraph ensures both correctness and performance.
Common Issues & Solutions
“Recursive type has infinite size” error:
- Issue: This typically occurs when a struct contains a field of its own type, or a type that directly or indirectly refers back to itself, without using
Box. For example, ifSubgraphhadstatements: Vec<FlowchartStatement>andFlowchartStatementcould containSubgraphdirectly, withoutBox. - Solution: Use
Box<T>for recursive fields.BoxallocatesTon the heap, giving it a known, fixed size on the stack (a pointer). We applied this tostatements: Box<Vec<FlowchartStatement>>inSubgraph.
- Issue: This typically occurs when a struct contains a field of its own type, or a type that directly or indirectly refers back to itself, without using
Missing
Debug,Clone,PartialEq,Eq,Hashtraits:- Issue: When writing tests or needing to print, copy, compare, or use types in collections like
HashMaporHashSet, you’ll encounter errors if these traits are not derived or manually implemented. - Solution: For simple data structures like most of our AST nodes, Rust’s
#[derive(...)]macro makes this trivial. We’ve applied#[derive(Debug, Clone, PartialEq, Eq, Hash)]to almost all our AST types. Remember thatHashrequires all fields to also implementHash.
- Issue: When writing tests or needing to print, copy, compare, or use types in collections like
Over-normalization in AST design:
- Issue: Sometimes developers try to make the AST too clean or normalized, losing information about the original source text. For example, collapsing
A["Label"]andAinto a singleNode { id: "A", label: Some("Label") }even if the original input was justA. - Solution: The AST should represent the parsed structure, not necessarily the ideal formatted structure. Our
NodeRefenum (Idvs.IdWithLabel) is an example of preserving this distinction. Formatting and semantic validation come after the AST is built. The AST needs enough information to reconstruct the original (or a canonical) form.
- Issue: Sometimes developers try to make the AST too clean or normalized, losing information about the original source text. For example, collapsing
Testing & Verification
To verify the work in this chapter:
- Run
cargo check: Navigate to your project root and runcargo check. This will compile all your code without running tests. If there are any type errors or syntax issues in your AST definitions,cargo checkwill report them immediately. A cleancargo checkindicates your Rust types are valid. - Run
cargo test: Executecargo testto run the unit tests defined intests/ast_types_test.rs. All tests should pass, confirming that you can instantiate and manipulate the AST structures as expected. - Review
main.rsoutput: If you included the example AST instantiation insrc/main.rs, runcargo run. You should see theprintln!output confirming that the AST module compiled and your example diagram AST was created.
At this stage, you should have a solid, compilable, and testable foundation for representing Mermaid diagrams in Rust.
Summary & Next Steps
In this chapter, we have successfully designed and implemented the core Abstract Syntax Tree (AST) for our Mermaid analyzer. We started with common utilities like Span and DiagramType, then progressively built out the detailed structures for Flowchart diagrams, including nodes, edges, subgraphs, and directions. We paid close attention to Rust’s type system, using enums for extensibility and Box for recursive types, ensuring our AST is robust, memory-efficient, and strictly typed.
This AST is a critical milestone. It transforms the flat stream of tokens from our lexer into a hierarchical, semantically rich data structure that our tool can easily traverse, analyze, and modify. It is the central representation that will enable all subsequent compiler-like phases.
In the next chapter, Chapter 5: Building the Parser: From Tokens to AST, we will leverage the AST structures we’ve defined here. We will implement the parser, which will take the token stream generated by our lexer and combine those tokens into instances of the Diagram and FlowchartDiagram structs, populating our AST with actual parsed Mermaid content. This is where the real “compiler magic” begins!