Graph Structures
This group covers computational graph structures, and provides an introduction to several algorithms that solve common real-world graph problems. Vanilla graph structures can be some of the simplest structures to implement, and some of the most difficult to design correct algorithms for.
Computational graphs are not the same things as bar charts or mathematical function mappings. At their core, graphs are a way to represent relationships between object pairs in a way that allows you to model complex systems or networks. Graphs consist of a set of vertices (sometimes called nodes) and a collection of edges (sometimes called links or arcs).
Graph Semantics
Edges can be described with the following characteristics to make their use/utility/implementation classification easier to navigate.
This text defines edges as something like a tuple (u, v) where each element u and v represent a vertex (node). The vertices on either end of an edge are called end vertices or endpoints of the edge. Two vertices are said to be adjacent if they are connected by an edge. Similarly, edges are said to be incident to a vertex if the vertex is one of the edge’s endpoints. The degree of a vertex, denoted by deg(v), is the number of its indicent edges.
Graphs can have more than one edge connecting the same vertices. Situations where more than one edge exists between two vertices are called parallel edges. This is why each vertex has a collection of edges instead of a set of edges, because sets imply unique members. Edges that conned one vertex with itself is called a self loop. Graphs that have neither parallel edges nor self loops are called simple graphs. Simple graphs are by far the most common variant.
A path is a sequence of vertices and edges that starts and ends with a vertex, and consists of edges that are incident to their predicessor and successor vertex. In a graph G, vertex v is reachabe from vertex u if a path exists between the two vertices.
A subgraph H is a graph whose vertices and edges are subsets of the vertices and edges of supergraph G. A spanning subgraph is a subgraph that contains all of the vertices of the supergraph.
Directed vs undirected
If edges are ordered, that is, the vertices have order where the domain model only makes sense if one vertex preceeds another (such as a model of inheritance), the edge is said to be directed. In a directed graph, also called a digraph, (u, v) != (v, u). If the vertices have no ordering constraint, the graph is said to be undirected. In an undirected graph (u, v) == (v, u). Graphs with some directed and some undirected edges are called mixed graphs. Mixed graph modeling is incredibly useful for real-world situations, like city streets where some streets have two-way traffic, and some streets only have one-way traffic. Following the mathematical origins, you can describe these relationships as symmetric or asymmetric, though it is much more common to refer to them as directed or undirected. Endpoints are referred to as origin and destination for directed edges, and their corresponding edges can be incoming or outgoing depending on the direction. The in-degree (indeg(v)) and out-degree (outdeg(v)) of a vertex are the number of incoming and outgoing edges.
In undirected graphs, reachability is symmetric, and asymmetric for directed graphs. That is, any path from u to v in an undirected graph can be reversed, which may not be possible with a directed graph.
Graphs can sometimes have parallel edges, or
Connected vs unconnected
A graph is connected if there exists a path between every pair of vertices. For example, a doubly-linked list is a connected graph. An n-ary tree where each node contains links to its children and its parents is technically a connected graph.
A directed graph is strongly connected if, for every pair of vertices u and v, there exists a path from u to v and from v to u. For example, a city street grid with no one-way streets, modeled with two directed edges per street, is a strongly connected directed graph.
The maximal connected subgraphs of an unconnected graph are called its connected components.
Cyclic vs acyclic
A graph is cyclic if it contains at least one cycle: a closed path that starts and ends at the same vertex and does not repeat any other vertices. This is distinct from a self-loop, which is an edge that connects a vertex to itself. Cycles are also different from bidirectional adjacency, such as within a doubly-linked list. For example, a doubly-linked list is acyclic because its bidirectional links represent undirected edges in a path graph, and there is no closed simple path involving more than one vertex.
A graph without cycles is called a forest, and forest without cycles is called a tree. Trees are a very common and very widely studied sub-topic within graph theory. Indeed this module contains anentire group on trees, which is generally more for pedagogical purposes than real-world relevancy.
On being simple
A graph is simple if it has no self-loops or parallel edges. A path is simple if it visits each vertex at most once. A cycle is simple if it visits each vertex at most once, except that the first and last vertex are the same.
Conversely, a multigraph (or pseudograph can have parallel edges, and a pseudograph is a multigraph with self-loops.
Weighted vs unweighted
Labeled vs unlabeled
The Graph ADT
The graph ADT isn’t overly complex and mostly contains operations to operate on vertices and edges.
Accessors
num_vertices()Returns the number of vertices in the graphnum_edges()Returns the number of edges in the graphget_edge(u, v)Returns the edge between verticesuandv, this can be positional for directed graphs, and symmetric for undirected graphsend_vertices(e)Returns an array containing the two endpoint vertides of edgee, this can be positional for directed graphs, and symmetric for undirected graphsopposite(v, e)Returns the pair vertex of an edgeeincident to vertexvis_adjacent(u, v)Returns a Boolean to indicate if verticesuandvare adjacentout_degree(v)Returns the number of outgoing edges from vertexvin_degree(v)Returns the number of incoming edges from vertexv
Mutators
insert_vertex(x)Inserts a new vertex storing elementxinsert_edge(u, v, x)Inserts a new edge from vertexuto vertexvstoring elementxsuch as a weight; in a directed graph,uandvare positional, but symmetric for undirected graphsremove_vertex(v)Removes vertexvand all of its incident edgesremove_edge(e)Removes edgee
Factories (Iterators)
out_edges(v)Returns an iterator over outgoing edges from vertexvin_edges(v)Returns an iterator over incoming edges from vertexvvertices()Returns an iterator over all vertices in the graphedges()Returns an iterator over all edges in the graph
Graph Families (structural constraints; what shapes are allowed?)
Acyclic Graphs
Trees (connected, acyclic)
One of the most well-known types of graph are computational trees. By strict definition, trees are a type of acyclic, connected graph.
Forests (disconnected trees)
Directed Acyclic Graphs (DAGs)
Build graphs
Dependency graphs
ASTs (often DAGs, not trees)
Cyclic Graphs
General directed graphs
Undirected cyclic graphs
Special Graph Families
Bipartite graphs
Planar graphs
Complete graphs
Multigraphs
Hypergraphs
Representations
This section describes several possible data structures for representing graphs. In all representations the structure maintains a list of vertices. The major difference appear in how they organize and represent edges.
Edge Lists
Edge lists represent unordered lists of all edges. Edge lists do not provide an efficient way to locate a specific edge (u, v) or edges incident to a specific vertex. In fact, while they may be the simplest, simple edge lists are definitely not the most efficient graph representations.
It is common to build edge lists with two positional (linked) lists to maintain fast access to a vertex’s position without the O(n) element shifts in a backing array during removals.
struct Node<V> { data: V, pos: Position<Node<V>> // Unique handle in the vertices list}
struct Edge<V, E> { // References to the positions in the vertices list origin: Position<Node<V>>, destination: Position<Node<V>>, data: E, pos: Position<Edge<V, E>> // Unique handle in the edges list}
pub struct EdgeList<V, E> { vertices: PosList<Node<V>>, edges: PosList<Edge<V, E>>}This design necessarily requires O(n + m) space for a graph with n vertices and m edges. The major downside of this representation is that operations like get_edge, out_degree, in_degree, outgoing_edges, and incoming_edges require O(m) running time because the structure has to search the edge list exhaustively. This notably also includes remove_vertex because once a vertex is removed, all of the associated edges that point to the removed vertex must also be removed.
Adjacency Lists
Adjacency lists maintain a separate list of edges that are incident to a vertex. This approach provides more efficient ways to find all edges incident to a given vertex. This provides O(n) access to adjacent vertices.
struct Vertex<T, X> { data: T, edges: Vec<Edge<T, X>>}
struct Edge<T, X> { origin: Vertex<T, X>, destination: Vertex<T, X>, metadata: Option<X> // For weighted graphs}
pub struct AdjacencyList<T, X> { vertices: Vec<Vertex<T, X>>}Pointer-Based Node Adjacency Lists
General n-ary trees (Vec of children, Rc/RefCell)
Binary Trees (left/right Box pointers) ← BinTree<T>
Contiguous (arena-based) Adjacency Lists
Adjacency Maps
Similar to an adjacency list, but the secondary contaier is a map, not just a list. That is, in addition to a base set of vertices, an adjacency map stores a separate map of edges incident to a given vertex with adjacent vertices as keys. This essentially provides O(1) access advantage over adjacency lists.
Adjacency Matrices
Provides O(1) worst case access to a specific edge (u, v) by maintaining an n * n matrix for a graph with n vertices.