Graph Structures

This group covers computational graph structures, and provides an introduction to several algorithms that solve common real-world graph problems. Vanilla graph structures can be some of the simplest structures to implement, and some of the most difficult to design correct algorithms for.

Computational graphs are not the same things as bar charts or mathematical function mappings. At their core, graphs are a way to represent relationships between object pairs in a way that allows you to model complex systems or networks. Graphs consist of a set of vertices (sometimes called nodes) and a collection of edges (sometimes called links or arcs).

Graph Semantics

Edges can be described with the following characteristics to make their use/utility/implementation classification easier to navigate.

This text defines edges as something like a tuple (u, v) where each element u and v represent a vertex (node). The vertices on either end of an edge are called end vertices or endpoints of the edge. Two vertices are said to be adjacent if they are connected by an edge. Similarly, edges are said to be incident to a vertex if the vertex is one of the edge’s endpoints. The degree of a vertex, denoted by deg(v), is the number of its indicent edges.

Graphs can have more than one edge connecting the same vertices. Situations where more than one edge exists between two vertices are called parallel edges. This is why each vertex has a collection of edges instead of a set of edges, because sets imply unique members. Edges that conned one vertex with itself is called a self loop. Graphs that have neither parallel edges nor self loops are called simple graphs. Simple graphs are by far the most common variant. A path is a sequence of vertices and edges that starts and ends with a vertex, and consists of edges that are incident to their predicessor and successor vertex. In a graph G, vertex v is reachabe from vertex u if a path exists between the two vertices.

A subgraph H is a graph whose vertices and edges are subsets of the vertices and edges of supergraph G. A spanning subgraph is a subgraph that contains all of the vertices of the supergraph.

Directed vs undirected

If edges are ordered, that is, the vertices have order where the domain model only makes sense if one vertex preceeds another (such as a model of inheritance), the edge is said to be directed. In a directed graph, also called a digraph, (u, v) != (v, u). If the vertices have no ordering constraint, the graph is said to be undirected. In an undirected graph (u, v) == (v, u). Graphs with some directed and some undirected edges are called mixed graphs. Mixed graph modeling is incredibly useful for real-world situations, like city streets where some streets have two-way traffic, and some streets only have one-way traffic. Following the mathematical origins, you can describe these relationships as symmetric or asymmetric, though it is much more common to refer to them as directed or undirected. Endpoints are referred to as origin and destination for directed edges, and their corresponding edges can be incoming or outgoing depending on the direction. The in-degree (indeg(v)) and out-degree (outdeg(v)) of a vertex are the number of incoming and outgoing edges.

In undirected graphs, reachability is symmetric, and asymmetric for directed graphs. That is, any path from u to v in an undirected graph can be reversed, which may not be possible with a directed graph.

Graphs can sometimes have parallel edges, or

Connected vs unconnected

A graph is connected if there exists a path between every pair of vertices. For example, a doubly-linked list is a connected graph. An n-ary tree where each node contains links to its children and its parents is technically a connected graph.

A directed graph is strongly connected if, for every pair of vertices u and v, there exists a path from u to v and from v to u. For example, a city street grid with no one-way streets, modeled with two directed edges per street, is a strongly connected directed graph.

The maximal connected subgraphs of an unconnected graph are called its connected components.

Cyclic vs acyclic

A graph is cyclic if it contains at least one cycle: a closed path that starts and ends at the same vertex and does not repeat any other vertices. This is distinct from a self-loop, which is an edge that connects a vertex to itself. Cycles are also different from bidirectional adjacency, such as within a doubly-linked list. For example, a doubly-linked list is acyclic because its bidirectional links represent undirected edges in a path graph, and there is no closed simple path involving more than one vertex.

A graph without cycles is called a forest, and forest without cycles is called a tree. Trees are a very common and very widely studied sub-topic within graph theory. Indeed this module contains anentire group on trees, which is generally more for pedagogical purposes than real-world relevancy.

On being simple

A graph is simple if it has no self-loops or parallel edges. A path is simple if it visits each vertex at most once. A cycle is simple if it visits each vertex at most once, except that the first and last vertex are the same.

Conversely, a multigraph (or pseudograph can have parallel edges, and a pseudograph is a multigraph with self-loops.

Weighted vs unweighted

Labeled vs unlabeled

The Graph ADT

The graph ADT isn’t overly complex and mostly contains operations to operate on vertices and edges.

Accessors

num_vertices() Returns the number of vertices in the graph
num_edges() Returns the number of edges in the graph
get_edge(u, v) Returns the edge between vertices u and v, this can be positional for directed graphs, and symmetric for undirected graphs
end_vertices(e) Returns an array containing the two endpoint vertides of edge e, this can be positional for directed graphs, and symmetric for undirected graphs
opposite(v, e) Returns the pair vertex of an edge e incident to vertex v
is_adjacent(u, v) Returns a Boolean to indicate if vertices u and v are adjacent
out_degree(v) Returns the number of outgoing edges from vertex v
in_degree(v) Returns the number of incoming edges from vertex v

Mutators

insert_vertex(x) Inserts a new vertex storing element x
insert_edge(u, v, x) Inserts a new edge from vertex u to vertex v storing element x such as a weight; in a directed graph, u and v are positional, but symmetric for undirected graphs
remove_vertex(v) Removes vertex v and all of its incident edges
remove_edge(e) Removes edge e

Factories (Iterators)

out_edges(v) Returns an iterator over outgoing edges from vertex v
in_edges(v) Returns an iterator over incoming edges from vertex v
vertices() Returns an iterator over all vertices in the graph
edges() Returns an iterator over all edges in the graph

Graph Families (structural constraints; what shapes are allowed?)

Acyclic Graphs

Trees (connected, acyclic)

One of the most well-known types of graph are computational trees. By strict definition, trees are a type of acyclic, connected graph.

Forests (disconnected trees)

Directed Acyclic Graphs (DAGs)

Build graphs

Dependency graphs

ASTs (often DAGs, not trees)

Cyclic Graphs

General directed graphs

Undirected cyclic graphs

Special Graph Families

Bipartite graphs

Planar graphs

Complete graphs

Multigraphs

Hypergraphs

Representations

This section describes several possible data structures for representing graphs. In all representations the structure maintains a list of vertices. The major difference appear in how they organize and represent edges.

Edge Lists

Edge lists represent unordered lists of all edges. Edge lists do not provide an efficient way to locate a specific edge (u, v) or edges incident to a specific vertex. In fact, while they may be the simplest, simple edge lists are definitely not the most efficient graph representations.

It is common to build edge lists with two positional (linked) lists to maintain fast access to a vertex’s position without the O(n) element shifts in a backing array during removals.

struct Node<V> {
    data: V,
    pos: Position<Node<V>> // Unique handle in the vertices list
}

struct Edge<V, E> {
    // References to the positions in the vertices list
    origin: Position<Node<V>>,
    destination: Position<Node<V>>,
    data: E,
    pos: Position<Edge<V, E>> // Unique handle in the edges list
}

pub struct EdgeList<V, E> {
    vertices: PosList<Node<V>>,
    edges: PosList<Edge<V, E>>
}

This design necessarily requires O(n + m) space for a graph with n vertices and m edges. The major downside of this representation is that operations like get_edge, out_degree, in_degree, outgoing_edges, and incoming_edges require O(m) running time because the structure has to search the edge list exhaustively. This notably also includes remove_vertex because once a vertex is removed, all of the associated edges that point to the removed vertex must also be removed.

Adjacency Lists

Adjacency lists maintain a separate list of edges that are incident to a vertex. This approach provides more efficient ways to find all edges incident to a given vertex. This provides O(n) access to adjacent vertices.

struct Vertex<T, X> {
    data: T,
    edges: Vec<Edge<T, X>>
}

struct Edge<T, X> {
    origin: Vertex<T, X>,
    destination: Vertex<T, X>,
    metadata: Option<X> // For weighted graphs
}

pub struct AdjacencyList<T, X> {
    vertices: Vec<Vertex<T, X>>
}

Pointer-Based Node Adjacency Lists

General n-ary trees (Vec of children, Rc/RefCell)

Binary Trees (left/right Box pointers) ← `BinTree<T>`

Contiguous (arena-based) Adjacency Lists

Adjacency Maps

Similar to an adjacency list, but the secondary contaier is a map, not just a list. That is, in addition to a base set of vertices, an adjacency map stores a separate map of edges incident to a given vertex with adjacent vertices as keys. This essentially provides O(1) access advantage over adjacency lists.

Adjacency Matrices

Provides O(1) worst case access to a specific edge (u, v) by maintaining an n * n matrix for a graph with n vertices.