General Trees
The first page in this module introduced the tree ADT but left out implementation details due to complexity. Now that you have some simpler trees under your belt, its time to take a look at a generic tree strcture and implement a Markdown (MD) header visualization program.
Table of Contents Generator
This section covers three examples that generate a properly indented table of contents (TOC) for Markdown (MD) documents in an Astro project such as this website. This section only provides limited snippets. Visit this DSA project’s associated repo on GitHub for more details.
Document Parsing
The first step is to parse the MD document. All the TOC generator examples in this section use the same regular expressions (regex) to parse the document, one for the YAML-formatted frontmatter and another for the MD headings. The primary regex for detecting headings is ^(#{1,6})\s+(.*)
, which matches sequences of 1 to 6 #
characters and captures both their count and the remainder of the line. The parsing function returns a tuple containing the title and a vector of Heading
types, which themselves contain fields for level
and title
to describe each node. When pretty-printed, this looks something like the following:
( "Location Tree Test", [ Heading { level: 2, title: "Landlocked" }, Heading { level: 3, title: "Switzerland" }, Heading { level: 4, title: "Geneva" }, Heading { level: 5, title: "Old Town" }, Heading { level: 6, title: "Cathédrale Saint-Pierre" }, Heading { level: 3, title: "Bolivia" }, Heading { level: 6, title: "Puerta del Sol" }, Heading { level: 6, title: "Puerta de la Luna" }, Heading { level: 2, title: "Islands" }, Heading { level: 3, title: "Marine" }, Heading { level: 4, title: "Australia" }, Heading { level: 3, title: "Fresh Water" } ] )
Simple Print
The first example in this journey simply prints an appropriately indented TOC based on the document’s MD heading values. The parsing function makes this trivial. This approach involves a single for
loop to print headings. Each line includes a space
formatter that multiplies a set spacing by the Heading.level
value to format the line.
println!("📄 {}\n │", title);for e in headings { let space = " ".repeat(e.level - 1); let line = format!(" │{}{}", space, e.title); println!("{line}")}println!(" │\n")
📄 Location Tree Test │ │ Landlocked │ Switzerland │ Geneva │ Old Town │ Cathédrale Saint-Pierre │ Bolivia │ Puerta del Sol │ Islands │ Marine │ Australia │ Fresh Water │
But… this approach doesn’t actually use any trees at all! If your goal is simply to render a table of contents hierarchically then its true that this is the simplest solution. The caveat is that the sexy box drawing notation in programs like tree
requires some tricky additions that Im not clever enough to implement without an actual tree structure. Naturally that’s where this is headed, so buckle up.
Base Structure
Once you’ve decided that you actually need a tree your next decision is to choose which kind. Documents can have any number of headings of each value so this example defines a hybrid n-ary tree construction and traversal approach that uses both linked and indexed-based structures. The result is a positional, link-based structure with nodes that contain indexed lists of pointers to track and process that node’s children. Again, it’s possible to use an indexed list like a Vec
as the tree’s base structure, but the classic approach highlights positional lists, and Im not trying to reinvent the wheel here.
You’ll need to access from any given node to its ancestors and its descendants. The result is something like a doubly-linked list. Unless you wanna futz with verbose RefCell
elements this likely means more unsafe
Rust. The goal is an acyclic structure, but construction operations do necessarily require navigation to ancestor nodes which can introduce cycles if not handled correctly. You’ll also need a way to drop
your pointers when the time comes. Great, so you need a bunch of Miri testing too.
Once you’re ready to start coding, you can define each tree instance as one root node pointer and a size field. This implementation aliases raw node pointers (positions) as Pos<T>
. The root points to a Node
that contains a pointer to its parent, a list of children, and an optional data field.
type Pos<T> = Option<*mut Node<T>>;
pub struct Tree<T> { pub root: Pos<T>, size: usize,}
...
pub struct Node<T> { parent: Pos<T>, children: Vec<Pos<T>>, data: Option<T>,}
The basic structure is generic over T
which needs a concrete definition for practical use. This example defines a Heading
type that includes level
and title
fields for each parsed heading.
pub struct Heading { level: usize, title: String,}
Easy! Ok, now for the fun part.
Tree Construction
Now that you have a base structure you’ll need to figure out how to actually go about constructing the tree. The type for this project, presented as a combination of traits and type-specific impl
blocks, requires the following operations:
add_child(parent, node)
Links a child node
to the provided parent
children(node)
Returns a collection of children from the provided node
get(node)
Returns an immutable reference to the data inside a node
parent(node)
Returns an immutable reference to the parent of the provided node
is_root(node)
Returns a Boolean if the provided node
is the tree’s root
You also need some way to instantiate the structure and add children. This requires implementation-specific new
and add_child
operations. The tree is a simple, static structure, so actually constructing the tree only requires the new
, add_child
, get
, and parent
operations.
N-ary Tree Traversal
To get the classic tree
-like structure you’ll want to visit the parent node before you visit its children. Naturally a pre-order traversal is the right choice for this project. Consider the following strict preorder traversal algorithm:
fn strict_preorder(position: &Pos<Heading>, prefix: &str) { if let Some(p) = position { // Visits the current node, prints all but ROOT let node = Node::get(Some(*p)); if node.unwrap().title != "ROOT".to_string() { println!(" {}{}", prefix, node.unwrap().title); }
// Gets the node's children, visits recursively let children: &Vec<Pos<Heading>> = unsafe { (*(*p)).children.as_ref() }; for e in children { strict_preorder(e, &format!("{} ", prefix)); } } else { println!("Not a valid position"); }}
This strict version produces something like the intended result, but makes box drawing characters tricky.
📄 Location Tree Test
Landlocked Switzerland Geneva Old Town Cathédrale Saint-Pierre Bolivia [] [] Puerta del Sol Puerta de la Luna Islands Marine Australia Fresh Water
Instead, the project uses a slightly modified traversal algorithm with a wrapper to hide the recursive cruft which produces the desired tree
-like formatting.
/** Serves as a wrapper for a modified preorder traversal function */pub fn pretty_print(name: &str, position: &Pos<Heading>) { //println!("\t[] {name}\n\t│"); println!("📄 {}\n\t│", name); preorderish(position, ""); println!("");}
/** Traverse the tree recursively, printing each node's title and children */fn preorderish(position: &Pos<Heading>, prefix: &str) { // Checks that the position (node) exists if let Some(p) = position { // Visit the node at the referenced position let children: &Vec<Pos<Heading>> = unsafe { (*(*p)).children.as_ref() }; let mut index = children.len();
// Recursively visit each child for e in children { let node = Node::get(*e).unwrap(); index -= 1; if index == 0 { println!("\t{}└── {}", prefix, node.title); preorderish(e, &format!("{} ", prefix)); } else { println!("\t{}├── {}", prefix, node.title); preorderish(e, &format!("{}│ ", prefix)); } } } else { println!("Not a valid position") }}
Putting It All Together
This particular implementation relies on the following chain of operations:
- The
parse_file
function takes a path to a Markdown file and leverages two regular expressions to return a tuple that contains the document title and a list ofHeading
types that contain the headings and their hierarchical values as H2, H3, etc. - The
construct
function takes the list of tuples and builds the tree structure according to each heading’s hierarchical value. - Finally, the “pretty print” function wraps the recursive
preorder
traversal algorithm which walks the tree and contains the box drawing output formatting.
As a convenience function, this example also includes a top-level, recursive navigator
function that takes a path and applies a recursive algorithm to visit all files within the directory. Upon encountering a file the navigator calls the previously mentioned chain of operations. In this way you can pretty-print the table of contents for an entire directory tree in one command. Recursion and trees really do go hand-in-hand.
The design criteria for the construction algorithm allows for generational skips in both directions. For descendants more than one generation removed from a parent, the algorithm inserts empty placeholder nodes to retain the heading values. Correct construction and traversal algorithms result in something a little more fancy.
📄 Location Tree Test │ ├── Landlocked │ ├── Switzerland │ │ └── Geneva │ │ └── Old Town │ │ └── Cathédrale Saint-Pierre │ └── Bolivia │ └── [] │ └── [] │ └── Puerta del Sol └── Islands ├── Marine │ └── Australia └── Fresh Water