Module indexed_skip_list

Expand description

A safe, indexed skip list

§About

Skip lists are sorted, probabalistic structures made up of logically stacked lists of varying length to allow for truncated O(log(n)) navigation. Canonically linked lists are built from doubly-linked lists, but this is not a defining characteristic of the ADT. Regardless of the base list representation used, the navigational algorithm results in what is essentially a logical linked list.

Properly implemented skip lists provide O(log(n)) expected time complexity for search, insert, and removal operations. This provides a significant advantage over keeping sorted array- or link-based list invariants, which have worst-case O(n) removal (average O(n/2)) temporal performance. Skip lists are also simpler than self-balancing tree structures, which are commonly used for sorted list and map structures. Skip lists also generally provide easier and finer-grained control when adapted for concurrent operations. There is a reason Java’s concurrentSkipListMap is so popular.

§Design

This design uses Vec-backed storage for [SkipNode]s that contain a list (tower) of “next” values, and a single “previous” value that represent indexes within the backing vector.

The list features a dynamic max height h that is logarithmically proportional to the number of elements in the list n such that h = log(n) in the expected case. The logarithmic growth ensures that the average search, insertion, and deletion operations remain efficient, typically with expected O(log(n)) time complexity.

William Pugh’s original paper from 1990 conveniently spells out random level, search, insert, and remove operations as pseudocode that is used to guide this module’s design. Note that the pseudocode is modified from the original paper to fit the notation convention present in this module (that is, CLRS-style with ASCII characters), but is otherwise unchanged from the original paper.

§The Search Algorithm

The search algorithm as its presented in the original paper generalizes some public-facing operation for list search which returns either the node representing the value associated with a search_key or a failure/nil/None value to indicate that the search_key is not in the list.

0    Search(list, search_key)
1      x = list.header
2      // loop invariant: x.key < search_key
3      for i = list.level downto 1 do
4        while x.forward[i].key < search_key do
5          x = x.forward[i]
6      // x.key < search_key <= x.forward[1].key
7      x = x.forward[1]
8      if x.key == search_key then return x.value
9        else return failure

It makes sense to split this algorithm into two different pieces, roughly separated at line 5. In this implementation, the first part represents a private search operation skip_search(e) that returns a position that is strictly < search_key. This sub-routine is then re-used by the get(e), contains(e), insert(e), and remove(e) operations. If the list is empty, skip_search(e) returns 0. An empty list contains a single sentinel node, so there is always a previous node to insert a value, even if its the sentinel.

The second part of the algorithm simply represents a forward iteration and an equality check with the supplied search_key. This second phase is represented in a public get(e) operation that returns the value associated with the search_key, if it exists in the list. The equality check is crucial to determine whether the “next” node is actually the one being searched for.

§Insertion & Removal Algorithms

The insertion and deletion algorithms re-use much of the search algorithm’s first phase, so they can be abstracted into [SkipList::skip_search] operations which return the node that is strictly smaller than the “search_key”, which in this case is a new entry. The rest of the algorithm creates a new SkipNode, generates the “tower” len with a random number generator, populates the next node array for each level, and sets a singular previous node position. The

 0    Insert(list, search_key, newValue)
 1      local update[1..MaxLevel]
 2      x = list.header
 3      for i = list.level downto 1 do
 4        while x.forward[i].key < search_key do
 5          x = x.forward[i]
 6        // x.key < search_key <= x.forward[i].key
 7        update[i] = x

 8      x = x.forward[1]
 9      if x.key = search_key then x.value = newValue
10      else
11        lvl = randomLevel()
12        if lvl > list.level then
13          for i = list.level + 1 to lvl do
14            update[i] = list.header
15          list.level = lvl
16        x = makeNode(lvl, search_key, value)
17        for i = 1 to level do
18          x.forward[i] = update[i].forward[i]
19          update[i].forward[i] = x

This structure uses a contiguous backing structure instead of stable pointers/Position objects. As a result the list cannot strictly maintain the original design’s asymptotics. The major advantage of linked lists is O(1) node insertion/removal if a handle exists to the node. Contiguous lists generally require either O(n) moves for insertion/removal of arbitrary elements. However, there are two options to deal with this; either use a Vec::swap_remove operation for O(1) removals without wasting space, or using a free list to identify and fill holes after removal. For simplicity, this structure uses the first approach, meaning that indexes are not stable, and as such are not surfaced in the public API. This design keeps the space requirements in check, but changes the canonical O(log(n)) removal time to O(n * height), which is O(n * log(n)) expected, and O(n^2) worst case (even though the list’s height is technically capped).

Pugh’s original removal algorithm (which is altered slightly in this implementation):

 0    Delete(list, search_key)
 1      local update[1..MaxLevel]
 2      x = list.header
 3      for i = list.level downto 1 do
 4        while x.forward[i].key < search_key do
 5          x = x.forward[i]
 6        update[i] = x
 7      x = x.forward[1]
 8      if x.key = search_key then
 9        for i = 1 to list.level do
10          if update[i].forward[i] != x then break
11          update[i].forward[i] = x.forward[i]
12        free(x)
13        while list.level > 1 and
14          list.header.forward[list.level] == NIL do
15          list.level = list.level – 1

The remove(e) as it exists in this module:

 0    Delete(list, searchKey)
 
 1      local update[0..MaxLevel] = FindPredecessors(list, searchKey)
 2      target = update[0].forward[0]
 
 3      // Early return for elements not in the list
 4      if target = NIL or target.key != searchKey then
 5        return failure
 6      last = list.nodes.last

 7      // unlink from skip structure
 8      for i = 0 to list.level - 1 do
 9        if update[i].forward[i] = target then
10          update[i].forward[i] = target.forward[i]

11      if target.forward[0] != NIL then
12        target.forward[0].prev = target.prev

13      removed = swap_remove(list.nodes, target)

14      // fix relocated node (if any)
15      if target < list.nodes.length then

16      for each node in list.nodes do
17        replace all forward pointers = last with target

18      if node at target has forward[0] != NIL then
19        forward[0].prev = target

20      if node at target.prev = last then
21        node.prev = target

22      while list.level > 1 and list.header.forward[list.level - 1] = NIL do
23        list.level -= 1

24      return removed.value

§Visual Examples

An initial, empty skip list with one level and no data:

S0: HEAD -> None

Inserting the first node triggers an automatic tower level, even if it ends up empty. This provides the algorithm with a starting point:

S1: HEAD ----------> None
S0: HEAD -> [ 5 ] -> None

After inserting ['a', 'c', 'e', 'd', 'b', 'i', 'g', 'h', 'f'], the list’s SkipNodes might contain the following towers.

HEAD[0]: [1, 2, 9, 7]
a[1]: [5]
c[2]: [4, 4]
e[3]: [9]
d[4]: [3, 9]
b[5]: [2]
i[6]: []
g[7]: [8, 6, 6]
h[8]: [6]
f[9]: [7, 7, 7]

Note that its always possible to tell the last item in the list because its tower is empty. This makes sense, because the last element within the sorted arrangement can only point to None. As you can see by the index notation on the left-hand side of the table, the backing structure retains the insertion order; the backing structure remains unsorted.

The structure simply appends elements to the backing structure, so when printed the list retains its insertion order, not its sorted arrangement. As a result, the towers appear to contain rather nonsensical values. However, if you follow the indexes from the HEAD node, and re-arrange the nodes into lexicographically sorted order, which is what the navigational algorithms in the skiplist achieve, you get the following towers.

HEAD[0]: [1, 2, 9, 7]
a[1]: [5]
b[5]: [2]
c[2]: [4, 4]
d[4]: [3, 9]
e[3]: [9]
f[9]: [7, 7, 7]
g[7]: [8, 6, 6]
h[8]: [6]
i[6]: []

When you rotate the mapping 90 degrees you can start to visualize the skip list layers as logically linked lists formed by “next” element indexes.

L3: [ g[7] ] -> None
L2: [ f[9] ] -> [ g[7] ] -> [ i[6] ] -> None
L1: [ c[2] ] -> [ d[4] ] -> [ f[9] ] -> [ g[7] ] -> [ i[6] ] -> None
L0: [ a[1] ] -> [ b[5] ] -> [ c[2] ] -> [ d[4] ] -> [ e[3] ] -> [ f[9] ] -> [ g[7] ] -> [ h[8] ] -> [ i[6] ] -> None

Finally, if you extend each “next” index reference to align with its sorted position within the list, a classical skip list diagram of towers emerges.

L3: HEAD -------------------------------------------------------------------------> [ g[7] ] -------------------------> None
L2: HEAD -------------------------------------------------------------> [ f[9] ] -> [ g[7] ] -------------> [ i[6] ] -> None
L1: HEAD -------------------------> [ c[2] ] -> [ d[4] ] -------------> [ f[9] ] -> [ g[7] ] -------------> [ i[6] ] -> None
L0: HEAD -> [ a[1] ] -> [ b[5] ] -> [ c[2] ] -> [ d[4] ] -> [ e[3] ] -> [ f[9] ] -> [ g[7] ] -> [ h[8] ] -> [ i[6] ] -> None

§Example code

    let mut list = dsa_rust::sequences::indexed_skip_list::SkipList::<char>::new();

    // An unsorted list of values and a sorted version to compare against
    let values = ['a', 'c', 'e', 'd', 'b', 'i', 'g', 'h', 'f'];
    let sorted = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i'];

    // Inserts unsorted values into the skip list with a consuming iterator
    for e in values.into_iter() {
        list.insert(e)
    }

    // Illustrates that the list exists as a sorted invariant
    for (i, e) in list.iter().enumerate() {
        assert_eq!(e, &sorted[i]);
    }

    // Illustrates the Kth function in a 0-indexed list.
    // That is, e occupies the 2 index for insertion order,
    // but is the 4th element in the 0-indexed sorted arrangement.
    assert_eq!(list.get_kth(4).unwrap(), &'e');

    // Query by range using Rust's RangeBounds semantics
    let val = ['c', 'd', 'e', 'f'];
    for (i, e) in list.range('c'..='f').enumerate() {
        assert_eq!(e, &val[i])
    }

Structs§

Iter
RangeIter
SkipList

Module indexed_skip_listCopy item path