Introduction
Welcome to Chapter 7! In the previous chapters, we laid the foundational groundwork for our Mermaid analysis tool by building a robust Lexer to tokenize input, a Parser to construct a strongly typed Abstract Syntax Tree (AST), and a Validator to perform initial syntax and semantic checks. With a validated AST in hand, we now move to the core of our linter and fixer: the Rule Engine.
This chapter is dedicated to designing and implementing a deterministic rule engine that can traverse our AST, identify potential issues (linting), and, if configured, apply safe, minimal, and reversible fixes directly to the AST. This engine will encapsulate our tool’s “intelligence” for enforcing Mermaid best practices and correcting common mistakes. We will define a Rule trait, allowing us to create modular and extensible checks and transformations. Our goal is to ensure that all fixes strictly adhere to Mermaid syntax specifications, never introducing invalid constructs or ambiguity.
By the end of this chapter, our tool will be able to operate in different modes: lint (report issues), fix (apply safe fixes), and strict (apply only guaranteed safe fixes, failing on any ambiguity). We will implement two foundational rules: one to ensure a graph declaration is present and another to normalize arrow syntax. This brings us significantly closer to a production-ready Mermaid compiler-like utility.
Planning & Design
The Rule Engine is where we move beyond mere validation to active analysis and transformation. It needs to be flexible enough to accommodate various types of rules (from simple style checks to structural modifications) while maintaining strict determinism and safety.
Core Principles
- Modularity: Each linting or fixing concern should be encapsulated in its own
Ruleimplementation. - Determinism: Given the same input AST and rule set, the output AST and diagnostics must always be identical. Rule application order will be carefully managed.
- Safety: Fixes must be guaranteed to produce valid Mermaid. Ambiguous or potentially destructive fixes are forbidden, especially in
strictmode. - Idempotence: Applying the rule engine multiple times to an already corrected AST should yield no further changes.
- Reversibility (Conceptual): While we won’t implement an undo feature, the fixes should be minimal and easily understandable, making manual reversal straightforward if needed.
Architecture of the Rule Engine
The rule engine will operate on the AST produced by the parser and validated by the semantic analyzer. It will consist of a manager that orchestrates the application of multiple rules. Each rule will implement a common Rule trait, defining methods for checking and fixing the AST.
Here’s a high-level architecture diagram for our Rule Engine:
Explanation:
RuleTrait: Defines the interface for all rules. It will include methods likename(),check(), andapply_fix().RuleEngine(Manager): This struct will hold a collection ofBox<dyn Rule>instances. It will provide methods to run linting and fixing passes.- Check Pass: In this phase, all registered rules iterate over the AST to identify issues and generate
Diagnosticmessages. No modifications are made. - Fix Pass: If in
fixorstrictmode, rules will attempt to modify the AST. This pass might be repeated until no more changes occur (idempotence). - Diagnostics: All rules will contribute to a shared
Diagnosticscollection, which will then be reported to the user. - Modes: The
RuleEnginewill adapt its behavior based on the requested mode (lint,fix,strict).
File Structure
We’ll introduce a new module src/rules to house our rule definitions and the engine itself.
src/
├── main.rs
├── lexer/
│ └── ...
├── parser/
│ └── ...
├── ast/
│ └── ...
├── validator/
│ └── ...
├── diagnostics/
│ └── ...
└── rules/
├── mod.rs // Defines the Rule trait and RuleEngine struct
├── common.rs // Utility functions or shared types for rules
├── missing_graph_decl.rs // Rule for missing graph declaration
└── arrow_normalization.rs // Rule for arrow syntax normalization
Step-by-Step Implementation
1. Setup Rule Module and Trait
First, let’s create the src/rules directory and define the Rule trait in src/rules/mod.rs. This trait will be the cornerstone for all our linting and fixing logic.
a) Create src/rules/mod.rs:
// src/rules/mod.rs
use crate::ast::MermaidAst;
use crate::diagnostics::{Diagnostic, DiagnosticLevel, Diagnostics};
use std::fmt::Debug;
// Re-export specific rules for easier access
pub mod missing_graph_decl;
pub mod arrow_normalization;
/// Defines the operation mode for the rule engine.
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub enum FixMode {
/// Only report diagnostics, no modifications.
LintOnly,
/// Apply safe, reversible fixes.
ApplyFixes,
/// Apply only guaranteed safe and minimal fixes, fail on any ambiguity.
StrictFixes,
}
/// The trait that all linting and fixing rules must implement.
///
/// Rules are designed to be deterministic and operate on the AST.
/// They can produce diagnostics and, in fixing modes, modify the AST.
pub trait Rule: Debug + Send + Sync {
/// Returns the unique name of the rule.
fn name(&self) -> &'static str;
/// Checks the AST for issues and adds diagnostics.
/// This method should NOT modify the AST.
fn check(&self, ast: &MermaidAst, diagnostics: &mut Diagnostics);
/// Attempts to apply a fix to the AST.
/// Returns `true` if any modification was made, `false` otherwise.
/// This method should only be called in `ApplyFixes` or `StrictFixes` modes.
/// It should also add diagnostics for the applied fix.
fn apply_fix(&self, ast: &mut MermaidAst, diagnostics: &mut Diagnostics) -> bool;
/// Indicates which fix modes this rule supports.
/// By default, rules support `ApplyFixes`. Override for `StrictFixes`.
fn supports_fix_mode(&self, mode: FixMode) -> bool {
match mode {
FixMode::LintOnly => true, // All rules can lint
FixMode::ApplyFixes => true,
FixMode::StrictFixes => false, // By default, rules are not strict-fix safe
}
}
}
/// The Rule Engine orchestrates the application of various rules.
#[derive(Default)]
pub struct RuleEngine {
rules: Vec<Box<dyn Rule>>,
}
impl RuleEngine {
/// Creates a new RuleEngine with no rules registered.
pub fn new() -> Self {
Self { rules: Vec::new() }
}
/// Registers a new rule with the engine.
pub fn register_rule(&mut self, rule: Box<dyn Rule>) {
self.rules.push(rule);
}
/// Runs the rule engine in lint-only mode.
/// It checks the AST against all registered rules and collects diagnostics.
pub fn run_lint(&self, ast: &MermaidAst) -> Diagnostics {
let mut diagnostics = Diagnostics::new();
for rule in &self.rules {
rule.check(ast, &mut diagnostics);
}
diagnostics
}
/// Runs the rule engine to apply fixes to the AST.
///
/// It performs multiple passes until no more fixes can be applied by any rule
/// or a maximum number of passes is reached to prevent infinite loops.
///
/// Returns the collected diagnostics, including those for applied fixes.
pub fn run_fix(&self, ast: &mut MermaidAst, fix_mode: FixMode) -> Diagnostics {
let mut diagnostics = Diagnostics::new();
if fix_mode == FixMode::LintOnly {
// Fallback to lint-only if fix mode is incorrectly passed
return self.run_lint(ast);
}
const MAX_FIX_PASSES: usize = 10; // Prevent infinite loops
let mut changed_in_last_pass = true;
let mut pass_count = 0;
while changed_in_last_pass && pass_count < MAX_FIX_PASSES {
changed_in_last_pass = false;
pass_count += 1;
// First, run a check pass to gather all current diagnostics
// This is important because applying a fix might introduce new issues
// or resolve existing ones that other rules might detect.
// We clear and re-populate diagnostics in each pass.
let mut current_pass_diagnostics = Diagnostics::new();
for rule in &self.rules {
rule.check(ast, &mut current_pass_diagnostics);
}
// Merge current pass diagnostics into the main diagnostics object
diagnostics.extend(current_pass_diagnostics);
// Then, attempt to apply fixes
for rule in &self.rules {
if rule.supports_fix_mode(fix_mode) {
if rule.apply_fix(ast, &mut diagnostics) {
changed_in_last_pass = true;
}
}
}
}
if changed_in_last_pass {
diagnostics.add_diagnostic(Diagnostic {
level: DiagnosticLevel::Warning,
code: "R000_FIX_LIMIT".to_string(),
message: format!("Reached maximum fix passes ({}). Some issues might remain. Please re-run the tool.", MAX_FIX_PASSES),
span: None,
help: Some("This usually indicates a complex interaction between rules or a rule that isn't fully idempotent. Consider manual inspection or re-running the tool.".to_string()),
});
}
diagnostics
}
/// Helper to register all default rules.
pub fn register_default_rules(&mut self) {
self.register_rule(Box::new(missing_graph_decl::MissingGraphDeclarationRule));
self.register_rule(Box::new(arrow_normalization::ArrowNormalizationRule));
// Register other rules here as they are implemented
}
}
Explanation of src/rules/mod.rs:
FixModeEnum: Defines the three operational modes for our rule engine.RuleTrait:name(): Provides a string identifier for the rule.check(): This method is used for linting. It takes an immutable reference to the AST (&MermaidAst) and a mutable reference toDiagnostics. Rules addDiagnosticmessages here.apply_fix(): This method is used for fixing. It takes a mutable reference to the AST (&mut MermaidAst) andDiagnostics. It returnstrueif any change was made,falseotherwise. This is crucial for theRuleEngineto know if another pass is needed.supports_fix_mode(): Allows rules to declare their compatibility withApplyFixesorStrictFixes. By default, rules are only safe forApplyFixes.
RuleEngineStruct:rules: AVec<Box<dyn Rule>>to store all registered rules. UsingBox<dyn Rule>allows for polymorphism, meaning we can store different concrete rule types that all implement theRuletrait.new(): Constructor.register_rule(): Adds a rule to the engine.run_lint(): Executes allcheck()methods on the AST.run_fix(): This is the more complex method. It iterates through rules, applyingapply_fix()methods. It includes aMAX_FIX_PASSESloop to ensure idempotence and prevent infinite loops if rules interact in unexpected ways. It also re-runscheckin each pass to ensure diagnostics are up-to-date with the modified AST.register_default_rules(): A convenience method to easily register all built-in rules.
2. Implement the MissingGraphDeclarationRule
Many Mermaid diagrams implicitly assume a graph TD or flowchart TD declaration. While the Mermaid renderer often tolerates its absence, explicitly declaring it is a best practice for clarity and strictness. This rule will enforce that.
a) Create src/rules/missing_graph_decl.rs:
// src/rules/missing_graph_decl.rs
use super::{Rule, FixMode};
use crate::ast::{MermaidAst, Statement, GraphDeclaration, GraphOrientation};
use crate::diagnostics::{Diagnostic, DiagnosticLevel, Diagnostics};
use crate::span::Span; // Assuming Span is defined and used for locations
/// Rule to ensure that a Mermaid diagram starts with a graph or flowchart declaration.
/// If missing, it adds a default 'graph TD' declaration.
#[derive(Debug)]
pub struct MissingGraphDeclarationRule;
impl Rule for MissingGraphDeclarationRule {
fn name(&self) -> &'static str {
"missing-graph-declaration"
}
fn check(&self, ast: &MermaidAst, diagnostics: &mut Diagnostics) {
if !ast.statements.iter().any(|stmt| matches!(stmt, Statement::GraphDeclaration(_))) {
diagnostics.add_diagnostic(Diagnostic {
level: DiagnosticLevel::Warning,
code: "R701_MISSING_GRAPH_DECL".to_string(),
message: "Mermaid diagram is missing a 'graph' or 'flowchart' declaration.".to_string(),
span: ast.statements.first().map(|s| s.span()), // Point to the start of the file
help: Some("Consider adding 'graph TD' or 'flowchart TD' at the beginning of your diagram for clarity and strict compliance.".to_string()),
});
}
}
fn apply_fix(&self, ast: &mut MermaidAst, diagnostics: &mut Diagnostics) -> bool {
if !ast.statements.iter().any(|stmt| matches!(stmt, Statement::GraphDeclaration(_))) {
// Create a default 'graph TD' declaration.
// We'll use a dummy span as this is an inserted node.
// For a real-world scenario, you might want to infer a more accurate span.
let default_decl = Statement::GraphDeclaration(GraphDeclaration {
kind: "graph".to_string(), // Use "graph" as the default
orientation: Some(GraphOrientation::TD),
span: Span::new(0, 0), // Dummy span, as it's inserted
});
// Insert at the beginning of the statements
ast.statements.insert(0, default_decl);
diagnostics.add_diagnostic(Diagnostic {
level: DiagnosticLevel::Note,
code: "R701_MISSING_GRAPH_DECL_FIXED".to_string(),
message: "Added 'graph TD' declaration to the beginning of the diagram.".to_string(),
span: Some(Span::new(0, 0)), // Point to the very beginning
help: None,
});
true // Modification made
} else {
false // No modification needed
}
}
fn supports_fix_mode(&self, mode: FixMode) -> bool {
// This is a safe fix, so it can be applied in both ApplyFixes and StrictFixes modes.
matches!(mode, FixMode::ApplyFixes | FixMode::StrictFixes)
}
}
b) Update src/rules/mod.rs to include the new rule:
We already added pub mod missing_graph_decl; and registered it in register_default_rules().
Explanation of MissingGraphDeclarationRule:
check(): It iterates through the AST’s top-level statements. If noStatement::GraphDeclarationis found, it adds aWarningdiagnostic.apply_fix(): If the declaration is missing, it creates aStatement::GraphDeclarationforgraph TDand inserts it at the beginning ofast.statements. It returnstrueto indicate a change.supports_fix_mode(): This fix is very safe and deterministic, so it supportsStrictFixes.
3. Implement the ArrowNormalizationRule
Mermaid allows various arrow syntaxes (e.g., ---, ==>, --x), but for consistency and strictness, we might want to normalize them to a standard form like --> for simple edges. This rule will focus on standardizing simple directed arrows.
a) Create src/rules/arrow_normalization.rs:
// src/rules/arrow_normalization.rs
use super::{Rule, FixMode};
use crate::ast::{MermaidAst, Statement, Edge, EdgeArrow, EdgeLine, EdgeStyle};
use crate::diagnostics::{Diagnostic, DiagnosticLevel, Diagnostics};
use crate::span::Span;
/// Rule to normalize various arrow syntaxes to a standard form (e.g., `-->`).
/// This rule primarily targets simple directed arrows.
#[derive(Debug)]
pub struct ArrowNormalizationRule;
impl Rule for ArrowNormalizationRule {
fn name(&self) -> &'static str {
"arrow-normalization"
}
fn check(&self, ast: &MermaidAst, diagnostics: &mut Diagnostics) {
// This rule primarily identifies non-standard arrows that _could_ be normalized.
// It iterates through the AST to find edges and checks their arrow styles.
for statement in &ast.statements {
if let Statement::Edge(edge) = statement {
if let Some(ref edge_style) = edge.style {
// Check for non-standard directed arrows that can be normalized to '-->'
let is_non_standard_arrow = matches!(
(edge_style.line, edge_style.arrow_start, edge_style.arrow_end),
(EdgeLine::Solid, EdgeArrow::None, EdgeArrow::Arrow) | // e.g., "---"
(EdgeLine::Dotted, EdgeArrow::None, EdgeArrow::Arrow) // e.g., "-.->"
);
if is_non_standard_arrow {
diagnostics.add_diagnostic(Diagnostic {
level: DiagnosticLevel::Warning,
code: "R702_NON_STANDARD_ARROW".to_string(),
message: format!("Non-standard arrow syntax detected: '{}'.", edge.raw_arrow()),
span: Some(edge.span()),
help: Some("Consider normalizing to '-->' for consistency.".to_string()),
});
}
}
}
}
}
fn apply_fix(&self, ast: &mut MermaidAst, diagnostics: &mut Diagnostics) -> bool {
let mut changed = false;
// We need to traverse the AST mutably. This can be complex for deeply nested structures.
// For simplicity here, we iterate top-level statements. A full implementation
// would likely use an AST visitor pattern to handle all levels.
for statement in &mut ast.statements {
if let Statement::Edge(edge) = statement {
if let Some(ref mut edge_style) = edge.style {
// Apply fix for non-standard directed arrows to '-->'
let should_normalize = matches!(
(edge_style.line, edge_style.arrow_start, edge_style.arrow_end),
(EdgeLine::Solid, EdgeArrow::None, EdgeArrow::Arrow) | // e.g., "---"
(EdgeLine::Dotted, EdgeArrow::None, EdgeArrow::Arrow) // e.g., "-.->"
);
if should_normalize {
// Normalize to '-->'
edge_style.line = EdgeLine::Solid;
edge_style.arrow_start = EdgeArrow::None;
edge_style.arrow_end = EdgeArrow::Arrow;
diagnostics.add_diagnostic(Diagnostic {
level: DiagnosticLevel::Note,
code: "R702_ARROW_NORMALIZED".to_string(),
message: format!("Normalized arrow syntax to '-->'. Original was: '{}'", edge.raw_arrow()),
span: Some(edge.span()),
help: None,
});
changed = true;
}
}
}
}
changed
}
fn supports_fix_mode(&self, mode: FixMode) -> bool {
// This fix is generally safe, but normalizing all arrows might not be
// desired in all 'strict' contexts if the original syntax was valid.
// For now, we'll keep it to ApplyFixes. A more advanced rule might
// distinguish between strictly equivalent forms and stylistic choices.
matches!(mode, FixMode::ApplyFixes)
}
}
b) Update src/rules/mod.rs to include the new rule:
We already added pub mod arrow_normalization; and registered it in register_default_rules().
Explanation of ArrowNormalizationRule:
check(): It iterates throughEdgestatements and checks theirEdgeStyle. If it finds a simple directed arrow that isn’t-->(e.g.,---or-.->), it adds aWarning.apply_fix(): For identified non-standard arrows, it modifies theEdgeStyleto represent-->(solid line, no start arrow, end arrow). It returnstrueif a modification occurred.supports_fix_mode(): This rule is set toApplyFixesonly. While normalizing---to-->is generally safe,-.->has a different visual meaning (dotted line). ForStrictFixes, we might only allow normalization if the meaning is identical. This highlights the trade-off in strictness.
4. Integrate Rule Engine into the Main Application Flow
Now, let’s update src/main.rs to utilize the RuleEngine after parsing and initial validation. We’ll also need to add command-line arguments to select the operational mode.
a) Update src/main.rs:
We’ll use clap for command-line argument parsing. Add it to Cargo.toml if not already present:
# Cargo.toml
[dependencies]
clap = { version = "4", features = ["derive"] }
# ... other dependencies
Then, modify src/main.rs:
// src/main.rs
use clap::{Parser, Subcommand};
use std::fs;
use std::path::PathBuf;
// Import our modules
mod lexer;
mod parser;
mod ast;
mod diagnostics;
mod validator;
mod rules; // New!
mod span;
use lexer::Lexer;
use parser::Parser as MermaidParser; // Alias to avoid conflict with clap::Parser
use validator::AstValidator;
use diagnostics::DiagnosticLevel;
use rules::{RuleEngine, FixMode}; // New!
/// A strict linter and formatter for Mermaid diagrams.
#[derive(Parser, Debug)]
#[command(author, version, about = "Strict linter and formatter for Mermaid diagrams", long_about = None)]
struct Cli {
/// Path to the Mermaid file to process
#[arg(name = "FILE", help = "Path to the Mermaid file to process")]
file: PathBuf,
#[command(subcommand)]
command: Commands,
}
#[derive(Subcommand, Debug)]
enum Commands {
/// Checks the Mermaid file for errors and warnings (lint mode).
Lint {
/// Exit with a non-zero code if any errors or warnings are found.
#[arg(long)]
strict_exit: bool,
},
/// Attempts to fix common Mermaid syntax issues.
Fix {
/// Overwrite the original file with the fixed content.
#[arg(long)]
write: bool,
/// Apply only guaranteed safe and minimal fixes, fail on any ambiguity.
#[arg(long)]
strict: bool,
},
// Future: Add `format` command here.
}
fn main() -> Result<(), Box<dyn std::error::Error>> {
let cli = Cli::parse();
let source_code = fs::read_to_string(&cli.file)?;
let mut diagnostics = diagnostics::Diagnostics::new();
// 1. Lexing
let lexer = Lexer::new(&source_code);
let tokens = lexer.collect::<Vec<_>>();
// Note: Lexer errors are typically collected during tokenization if designed that way.
// For now, we assume a successful tokenization or handle basic errors in parser.
// 2. Parsing
let mut parser = MermaidParser::new(tokens);
let mut ast = match parser.parse() {
Ok(ast) => ast,
Err(parse_diagnostics) => {
diagnostics.extend(parse_diagnostics);
diagnostics.print_diagnostics(&source_code);
if diagnostics.has_errors() {
eprintln!("Error: Parsing failed. Exiting.");
std::process::exit(1);
}
// If parser returns diagnostics but no AST, we can't proceed.
// For now, assume it returns an AST even with errors for further processing.
// A robust parser would return Result<Option<Ast>, Diagnostics>
// For this example, let's mock an empty AST if parsing fails completely
return Err("Parsing failed and produced no AST. Cannot continue.".into());
}
};
diagnostics.extend(parser.into_diagnostics()); // Collect parser-specific diagnostics
// 3. Semantic Validation (from previous chapter)
let validator = AstValidator::new();
validator.validate(&ast, &mut diagnostics);
// 4. Rule Engine (Linting and Fixing) - NEW
let mut rule_engine = RuleEngine::new();
rule_engine.register_default_rules(); // Register all our default rules
match &cli.command {
Commands::Lint { strict_exit } => {
let lint_diagnostics = rule_engine.run_lint(&ast);
diagnostics.extend(lint_diagnostics);
diagnostics.print_diagnostics(&source_code);
if *strict_exit && diagnostics.has_errors_or_warnings() {
eprintln!("Linting failed with errors or warnings. Exiting strictly.");
std::process::exit(1);
} else if diagnostics.has_errors() {
eprintln!("Linting failed with errors. Exiting.");
std::process::exit(1);
}
}
Commands::Fix { write, strict } => {
let fix_mode = if *strict { FixMode::StrictFixes } else { FixMode::ApplyFixes };
let fix_diagnostics = rule_engine.run_fix(&mut ast, fix_mode);
diagnostics.extend(fix_diagnostics);
// Re-validate after fixing to catch any new issues introduced by fixes (shouldn't happen with safe fixes)
let mut post_fix_diagnostics = diagnostics::Diagnostics::new();
validator.validate(&ast, &mut post_fix_diagnostics);
diagnostics.extend(post_fix_diagnostics);
diagnostics.print_diagnostics(&source_code);
if diagnostics.has_errors() {
eprintln!("Error: Fixes were applied, but new errors were detected or existing errors could not be resolved. Exiting.");
std::process::exit(1);
}
// If fixes were applied and no errors, output the modified AST
if diagnostics.has_fix_notes() || diagnostics.has_level(DiagnosticLevel::Note) { // Check for notes which indicate fixes
println!("\n--- Fixed Mermaid Code ---");
// TODO: For now, we'll just print a debug representation or a simplified string.
// A proper formatter (Chapter 8) will convert AST back to clean Mermaid string.
// For now, let's just show a placeholder or a simplistic AST to string.
// This will be replaced by the Formatter in the next chapter.
println!("{:?}", ast); // Placeholder
println!("--------------------------");
if *write {
// This is where the formatter (Chapter 8) would convert `ast` back to a string
// For now, we'll use a crude representation or just indicate overwrite.
eprintln!("`--write` option specified. Original file would be overwritten with formatted content.");
eprintln!("(Actual file overwrite is pending implementation of the Formatter in Chapter 8.)");
// fs::write(&cli.file, ast_to_string_representation)?; // This will be the actual call
}
} else {
println!("\nNo fixes applied or no issues found that could be fixed.");
}
}
}
Ok(())
}
Explanation of src/main.rs changes:
clapintegration: AddedCliandCommandsto definelintandfixsubcommands with their respective options (--strict-exit,--write,--strict).RuleEngineinstantiation:RuleEngine::new()andrule_engine.register_default_rules()are called.- Command Handling:
Commands::Lint: Callsrule_engine.run_lint(). Diagnostics are printed, andstrict_exitcontrols the exit code.Commands::Fix: DeterminesFixModebased on the--strictflag. Callsrule_engine.run_fix().- Post-Fix Validation: Crucially, after fixes are applied, the AST is re-validated to ensure fixes didn’t introduce new errors.
- Output: If fixes are applied, a placeholder message is printed. The actual conversion of the modified AST back to a Mermaid string will be handled by the Formatter in Chapter 8. The
--writeoption is acknowledged but not fully implemented yet.
5. Update Diagnostics to include Note level for fixes
We used DiagnosticLevel::Note for reporting successful fixes. Ensure our Diagnostics system (from Chapter 5) can handle and print Note level messages.
a) Update src/diagnostics.rs (if necessary):
Ensure DiagnosticLevel enum includes Note and print_diagnostics handles it.
// src/diagnostics.rs (Relevant parts, assuming previous implementation)
#[derive(Debug, Clone, PartialEq, Eq, Copy)]
pub enum DiagnosticLevel {
Error,
Warning,
Note, // Added this
Help, // Used as a sub-message, not a primary level
}
// ... other Diagnostic struct and methods ...
impl Diagnostics {
// ... existing methods ...
pub fn has_errors_or_warnings(&self) -> bool {
self.diagnostics.iter().any(|d| d.level == DiagnosticLevel::Error || d.level == DiagnosticLevel::Warning)
}
pub fn has_errors(&self) -> bool {
self.diagnostics.iter().any(|d| d.level == DiagnosticLevel::Error)
}
pub fn has_warnings(&self) -> bool {
self.diagnostics.iter().any(|d| d.level == DiagnosticLevel::Warning)
}
pub fn has_fix_notes(&self) -> bool {
self.diagnostics.iter().any(|d| d.level == DiagnosticLevel::Note && d.code.contains("FIXED"))
}
pub fn has_level(&self, level: DiagnosticLevel) -> bool {
self.diagnostics.iter().any(|d| d.level == level)
}
pub fn print_diagnostics(&self, source_code: &str) {
use colored::Colorize; // Assuming you have 'colored' crate for nice output
for diag in self.diagnostics.iter().filter(|d| d.level != DiagnosticLevel::Help) { // Don't print Help as primary
let level_str = match diag.level {
DiagnosticLevel::Error => "error".red().bold(),
DiagnosticLevel::Warning => "warning".yellow().bold(),
DiagnosticLevel::Note => "note".blue().bold(), // Handle Note
DiagnosticLevel::Help => continue, // Should not be printed as primary
};
let code_str = diag.code.bright_black();
eprintln!("{}: {} [{}]", level_str, diag.message, code_str);
if let Some(span) = diag.span {
let lines: Vec<&str> = source_code.lines().collect();
if let Some(line_num) = span.start_line(source_code) {
eprintln!(" {} {}:{}:{}", "-->".blue().bold(), cli_args.file.display(), line_num + 1, span.start_col(source_code) + 1);
// Print context line
if let Some(line_content) = lines.get(line_num) {
eprintln!(" {} | {}", (line_num + 1).to_string().blue(), line_content);
// Highlight the exact span
eprintln!(" {} {}{}", " ".blue(), " ".repeat(span.start_col(source_code)), "^".repeat(span.len()).green().bold());
}
}
}
if let Some(ref help) = diag.help {
eprintln!(" {} {}", "help:".cyan().bold(), help);
}
eprintln!(); // Blank line for separation
}
}
}
Self-correction: The print_diagnostics function needs cli_args.file.display() which is not directly accessible from Diagnostics. For a production-ready tool, Diagnostics would likely take a &Path to the file or the main function would handle printing the file path. For now, I’ll simplify the file display in the diagnostic output for this example.
Let’s refine the diagnostic printing slightly within src/diagnostics.rs to take the file path as an argument.
// src/diagnostics.rs (Corrected print_diagnostics signature)
use colored::Colorize;
use std::path::Path;
// ... DiagnosticLevel and Diagnostic struct ...
impl Diagnostics {
// ... existing methods ...
pub fn print_diagnostics(&self, source_code: &str, file_path: &Path) { // Added file_path
for diag in self.diagnostics.iter().filter(|d| d.level != DiagnosticLevel::Help) {
let level_str = match diag.level {
DiagnosticLevel::Error => "error".red().bold(),
DiagnosticLevel::Warning => "warning".yellow().bold(),
DiagnosticLevel::Note => "note".blue().bold(),
DiagnosticLevel::Help => continue,
};
let code_str = diag.code.bright_black();
eprintln!("{}: {} [{}]", level_str, diag.message, code_str);
if let Some(span) = diag.span {
let lines: Vec<&str> = source_code.lines().collect();
if let Some(line_num) = span.start_line(source_code) {
eprintln!(" {} {}:{}:{}", "-->".blue().bold(), file_path.display(), line_num + 1, span.start_col(source_code) + 1);
if let Some(line_content) = lines.get(line_num) {
eprintln!(" {} | {}", (line_num + 1).to_string().blue(), line_content);
eprintln!(" {} {}{}", " ".blue(), " ".repeat(span.start_col(source_code)), "^".repeat(span.len()).green().bold());
}
}
}
if let Some(ref help) = diag.help {
eprintln!(" {} {}", "help:".cyan().bold(), help);
}
eprintln!();
}
}
}
And update the main function call:
// src/main.rs (Relevant part in main function)
diagnostics.print_diagnostics(&source_code, &cli.file);
Production Considerations
- Rule Ordering: The order in which rules are applied can matter significantly. Some rules might create conditions that other rules then fix, or vice versa. The current
RuleEngineapplies rules in the order they are registered. For a complex tool, a dependency graph for rules or a fixed, documented ordering might be necessary to ensure deterministic behavior and optimal performance. - Performance & AST Traversal: Rules often involve traversing the AST. For very large Mermaid diagrams, inefficient traversal (e.g., repeated full scans) can be slow. Using a dedicated AST visitor pattern (which can be implemented once and used by all rules) can optimize traversal. Our current rules iterate top-level statements; for deeply nested structures (like subgraphs), a recursive visitor would be essential.
- Error Handling within Rules: While the AST should be valid by the time rules run (thanks to the validator), rules should still be robust against unexpected AST structures. Using
OptionandResulttypes where appropriate is good practice. - Idempotence and Fix Passes: The
MAX_FIX_PASSESinrun_fixis a safeguard against infinite loops. Ideally, each rule should be idempotent on its own, meaning applying it twice to the same AST yields no further changes. IfMAX_FIX_PASSESis hit, it indicates a potential issue with rule interactions or a non-idempotent rule. - Strictness Levels: The
FixModeenum allows fine-grained control over how aggressive fixes are. This is crucial for a production tool where users might have varying tolerances for automatic modifications. TheStrictFixesmode ensures that only truly unambiguous and safe changes are made. - Extensibility: The
Ruletrait design makes it easy to add new rules without modifying theRuleEngineitself, promoting extensibility and maintainability. - Logging: In a production environment, logging which rules were applied and what changes were made (especially in
fixmode) is invaluable for debugging and auditing. Our currentDiagnosticLevel::Noteserves this purpose for user-facing output.
Code Review Checkpoint
At this stage, we have accomplished the following:
- Defined the
Ruletrait: This provides a clear, extensible interface for creating linting and fixing logic. - Implemented the
RuleEnginestruct: This orchestrates the application of rules, manages differentFixModes, and handles multiple passes for fixes. - Created
MissingGraphDeclarationRule: A concrete example of a rule that checks for and fixes the absence of agraphdeclaration. - Created
ArrowNormalizationRule: A rule that identifies and normalizes non-standard arrow syntax. - Integrated the
RuleEngineintosrc/main.rs: The CLI now supportslintandfixcommands, leveraging the newly built engine. - Updated
Diagnostics: To includeNotelevel for reporting successful fixes and to accept a file path for better output.
Files Created/Modified:
src/rules/mod.rs(New directory and module)src/rules/missing_graph_decl.rs(New rule file)src/rules/arrow_normalization.rs(New rule file)src/main.rs(Modified for CLI, RuleEngine integration)src/diagnostics.rs(Modified forNotelevel andfile_pathinprint_diagnostics)Cargo.toml(Addedclapdependency)
This new component significantly enhances our tool’s capabilities, moving it from a pure validator to an active linter and fixer.
Common Issues & Solutions
Rules Conflict or Cause Infinite Loops:
- Issue: Two rules might continuously “fix” each other’s changes, or a rule might not be idempotent, leading to
MAX_FIX_PASSESbeing hit. - Solution:
- Test Thoroughly: Unit tests for each rule and integration tests for rule sets are crucial.
- Idempotency: Design each
apply_fixmethod to make changes only if truly necessary and to produce the same AST if run repeatedly on an already fixed AST. - Rule Ordering: If conflicts arise, experimenting with rule registration order in
RuleEngine::register_default_rules()can sometimes resolve them. For complex interactions, a dependency system for rules might be needed. - Clear Diagnostics: Ensure diagnostics for fixes clearly state what was changed, helping debug interactions.
- Limit
StrictFixes: Rules that might lead to conflicts or non-idempotent behavior should not supportStrictFixes.
- Issue: Two rules might continuously “fix” each other’s changes, or a rule might not be idempotent, leading to
Performance Degradation with Many Rules/Large ASTs:
- Issue: Each rule traversing the entire AST independently can become slow for large diagrams or many rules.
- Solution:
- Optimized Traversal: Implement a generic AST visitor pattern. Rules can then subscribe to specific AST node types, and the visitor traverses the AST once, calling relevant rule methods for each node.
- Lazy Evaluation: Some rules might only need to run if certain conditions are met, avoiding unnecessary work.
- Caching: If rules frequently query the AST for similar information, cache results for performance.
Fixes Introduce New Syntax Errors:
- Issue: A poorly designed
apply_fixmethod might transform a valid (though non-standard) AST into an invalid one. - Solution:
- Post-Fix Validation: As implemented in
main.rs, always re-run the semantic validator after applying fixes. If new errors appear, report them and potentially revert changes (though reverting is complex and outside the scope of this project). - Strict Adherence to Grammar: Every fix must be explicitly validated against the official Mermaid grammar. Never guess or assume.
- Golden Tests: Crucially, implement golden tests (see below) that ensure fixed output is valid and matches expectations.
- Post-Fix Validation: As implemented in
- Issue: A poorly designed
Testing & Verification
Testing the rule engine involves multiple layers:
Unit Tests for Individual Rules:
- Verify
check()method correctly identifies issues and produces diagnostics for various valid and invalid inputs. - Verify
apply_fix()method correctly transforms the AST and returnstruewhen changes are made, andfalseotherwise. - Test idempotence: Applying
apply_fixtwice should only result in changes on the first application. - Test
supports_fix_mode()behavior.
Example test structure for
missing_graph_decl.rs:// src/rules/missing_graph_decl.rs (within a #[cfg(test)] mod) #[cfg(test)] mod tests { use super::*; use crate::lexer::Lexer; use crate::parser::Parser; use crate::ast::{MermaidAst, Statement, GraphDeclaration, GraphOrientation}; use crate::span::Span; fn parse_and_get_ast(input: &str) -> MermaidAst { let lexer = Lexer::new(input); let tokens = lexer.collect(); let mut parser = Parser::new(tokens); parser.parse().expect("Failed to parse test input") } #[test] fn test_missing_graph_decl_check_warns() { let ast = parse_and_get_ast("A --> B"); let mut diagnostics = Diagnostics::new(); MissingGraphDeclarationRule.check(&ast, &mut diagnostics); assert_eq!(diagnostics.len(), 1); assert_eq!(diagnostics.get(0).unwrap().code, "R701_MISSING_GRAPH_DECL"); assert_eq!(diagnostics.get(0).unwrap().level, DiagnosticLevel::Warning); } #[test] fn test_missing_graph_decl_check_no_warn() { let ast = parse_and_get_ast("graph TD\nA --> B"); let mut diagnostics = Diagnostics::new(); MissingGraphDeclarationRule.check(&ast, &mut diagnostics); assert!(diagnostics.is_empty()); } #[test] fn test_missing_graph_decl_apply_fix() { let mut ast = parse_and_get_ast("A --> B"); let mut diagnostics = Diagnostics::new(); let changed = MissingGraphDeclarationRule.apply_fix(&mut ast, &mut diagnostics); assert!(changed); assert_eq!(ast.statements.len(), 2); assert!(matches!(ast.statements[0], Statement::GraphDeclaration(_))); assert_eq!(diagnostics.len(), 1); assert_eq!(diagnostics.get(0).unwrap().code, "R701_MISSING_GRAPH_DECL_FIXED"); // Test idempotence let changed_again = MissingGraphDeclarationRule.apply_fix(&mut ast, &mut diagnostics); assert!(!changed_again); // Should not change again } #[test] fn test_missing_graph_decl_supports_strict_fix() { let rule = MissingGraphDeclarationRule; assert!(rule.supports_fix_mode(FixMode::ApplyFixes)); assert!(rule.supports_fix_mode(FixMode::StrictFixes)); } }Similar tests would be written for
arrow_normalization.rs.- Verify
Integration Tests for
RuleEngine:- Test
run_lintwith various inputs, ensuring all relevant rules produce diagnostics. - Test
run_fixwith inputs that require multiple passes, ensuring idempotence and correct final state. - Test
run_fixinStrictFixesmode, verifying that only rules supporting this mode are applied. - Verify
MAX_FIX_PASSESwarning is generated if an infinite loop is simulated.
- Test
Golden Tests:
- These are critical for a linter/formatter. Create a directory of
input.mmdfiles and correspondingexpected_output.mmdfiles. - The test reads
input.mmd, runs theRuleEngineinfixmode, then compares the resulting (formatted) AST output againstexpected_output.mmd. This will be fully implemented when the formatter is ready in Chapter 8. - Include tests for malformed inputs to ensure graceful error handling.
- These are critical for a linter/formatter. Create a directory of
Performance Benchmarks:
- Use Rust’s
criterioncrate or similar benchmarking tools to measure the performance ofrun_lintandrun_fixon large, complex Mermaid diagrams. This helps identify performance bottlenecks as rules are added.
- Use Rust’s
Summary & Next Steps
In this chapter, we successfully designed and implemented the core Rule Engine for our Mermaid analysis tool. We established a flexible Rule trait, allowing us to define modular linting and fixing logic. We built the RuleEngine itself, capable of running in lint, fix, and strict modes, and integrated it into our main application flow. We also implemented two practical rules: MissingGraphDeclarationRule and ArrowNormalizationRule, demonstrating how to identify and deterministically fix common Mermaid issues.
The tool can now process Mermaid code, tokenize it, parse it into an AST, validate it, and apply rule-based checks and transformations. While the AST can be modified, we currently lack a robust way to convert the modified AST back into a clean, formatted Mermaid string.
The next crucial step, covered in Chapter 8: The Formatter: Pretty-Printing the AST, will be to implement a production-grade formatter that takes our (potentially modified) AST and converts it back into a human-readable and syntactically correct Mermaid diagram string, adhering to consistent style guidelines. This will complete our compiler-like pipeline, enabling our tool to act as a full-fledged linter and auto-formatter.