Skip to content

Sorted Maps & Sets

Sorted Map ADT

The sorted map variant extends the map with the following operations:

  • first_entry() Returns the entry with the smallest relative key or null/None if the key does not exist in the map
  • last_entry() Returns the entry with the largest relative key or null/None if the key does not exist in the map
  • ceiling(k) Returns the next smallest key that is >= k, or null/None if the key does not exist in the map
  • floor(k) Returns the next largest key that is <= k, or null/None if the key does not exist in the map
  • upper(k) Returns the next largest key that is > k, or null/None if the key does not exist in the map
  • lower(k) Returns the next smallest key that is < k, or null/None if the key does not exist in the map
  • sub_map(k1, k2) Returns an iterable range of all entries with keys >= k1 and < k2

Sorted maps are good for, you guessed it, search operations. One example is using timestamps as keys and performing operations on elements before, after, or within a range of timestamps. One convenient example of using sorted maps is with lexicographically sorted tuple key-value pairs such as timestamped flight records. Sorted maps are also useful in maintaining maxima sets. Take for example a set of entries where the keys are “cost” and values are “performance”. You can write a function like best(cost) that returns the best performance for the value.

When implementing a sorted map you can use a vanilla list structure and just call it a map because who cares. Resize, add, and remove operations require O(n) time. The first/last operations require O(1), and get/ceiling/floor/higher/lower operations take O(log n) when using a binary search algorithm. The submap operation requires O(s + log n) where s represents the number of items included. On the other hand, you can implement sorted maps with skip lists that result in O(log n) queries and O((1 + r) log n) adds/updates where r is the number of removed entries.

Skip Lists

You can make a sorted table with a regular old index-based list, but it still may require O(n) time for update operations due to shifts, and linked lists dont allow for fast searches. To get around this you can use either a binary search tree (BST) or a skip list. Both BSTs and skip lists provide O(long n) expected (on average) time bounds for search, insert, update, and delete operations. The choice between a skip list and a BST comes down to two trade-offs. BSTs have better cache locality but are difficult to implement for concurrent operations. If you need a safe, concurrent map with

Skip lists are a collection of lists S0 to Sh. List S0 contains all entries in the skip list with keys sorted by some logical methodology as well as two sentinel entries representing keys lower and higher than all possible keys. Visually you can imagine the sentinel entries as -∞ and +∞. The magic of a skip list relies on the fact that each subsequent list contains some random number of elements of the base list S0, also between two sentinel entries -∞ and +∞. Each subsequent list contains roughly half the number of entries as the feeder list, with the specific entries (indexes) chosen at random. If list S0 for map M contains 10 entries (plus two sentinel entries), list S1 might contain 5 entries, S2 might contain 3 entries, S3 might contain 1 entry, and S4 would contain just the sentinel entries.

To visualize skip list operations you can start by layering each list S0 through Sh as sorted lists from left to right with S0 on the bottom under S1 through Sh at the top where h represents the height of the skip list. The result is a kind of grid where each horizontal row is a list Si that represents a level and each column of randomly chosen entries forms a tower.

The heart of the skip list is the skip_search(k) operation which finds the first key that is less than or equal to key k. The skip search operation starts at the “top left” position of the imaginary grid of layered lists S0 up to Sh, which is the sentinel node of Sh represented by key -∞. The operation first checks if key k is <= -∞. If the operation returns null/None, the operation terminates. Otherwise, the operation drops down a level with in the same tower to Sh-1. Next the operation advances to the right-most position p occupied by an entry at position p where key(p) <= k, which is called scanning forward. The operation repreats these steps until it finds the largest key at position p that is <= k. Algorithm:

skip_search(k):
p = head of highest list
while lower_list(p) != null // Drop to lower list
p = lower_list(p)
while k >= key(next(p)) do // Scan forward
p = next(p)
return p

Multimaps & mutlisets

A multimap is like a map, except that a single key can be associated with multiple different values. Multisets, also called bags, are sets that allow duplicates. The biggest takeaway here is how you define sameness/otherness. For example, are you counting the number of references to a single object, or are you counting multiple separate instances of bitwise equivalence? In multisets it is probably most common to list occurences and thus implement these structures with Entry<K, N> types where K is the element andN is the number of occurences. Multimaps are similar, except N is typically a list of values associated with key K. This might look more like Entry<K, V>.

// Base multimap struct
pub struct Entires<K, V> {
key: K,
vec: Vec<V>,
}

Multiset ADT

In addition to all the regular map operations, multisets contain the following specialized operations:

  • count(k) Returns the number of times key k appears in the set.
  • remove(k, n) Removes n number of occurences of key k in the set.
  • size() Returns the total number of elements in the set, including duplicates.

Multimap ADT

In addition to all the regular map operations, multimaps contain the following specialized operations:

  • remove_all(k) Returns an iterator over all key:value pairs for key k.
  • key_set(k) Returns an iterator over non-duplicative keys k in the map.
  • values(k) Returns an iterator over all the values for key k.