Skip to content

Maps & Sets

The previous chapter explored lists as fundamental storage containers as well as how they can be used to back more complex abstractions. One of the super powers of numerical indexing in contiguous memory list structures is the O(1) access to arbitrary elements in the list. The biggest drawback with lists is that you have to already know where information is in order to use it. Otherwise, using nodes in a list requires checking every item resulting in O(n) find operations.

This chapter explores how the concepts of key-value pairs can be used to retrieve arbitrary data without having to keep track of an index or position.

Key-Value Relationships

Key-value pairs lay at the heart of keyed structure types. To illustrate key-value associations and access, consider the following pseudocode set of person objects that include name, age, and favorite food. This set of objects is used as example data throughout the rest of this section.

┌───────────────────────────────────────────┐ │ {name: Brain, age: 37, food: tapioca} │ │ {name: Dichael, age: 22, food: ice cream} │ │ {name: Bobson, age: 36, food: hamburgers} │ └───────────────────────────────────────────┘

If you want to know how old Bobson is, you could put all the objects into a list and scan the names until you find the person you’re looking for. From there you can easily accesse any of the other fields for that person object, but ensuring you have the correct object is expensive, necessarily requiring O(n) time to ensure that you aren’t just stopping at the first match.

To reduce lookup times you could use an array-based structure and use indexes as a form of direct addressing in a lookup table. In a lookup table with direct indexing the indexes become keys to the objects in the array, allowing you to access each of the objects in O(1) time.

┌───────┬───────────────────────────────────────────┐ │ Index │ Person object │ ├───────┼───────────────────────────────────────────┤ │ 1 │ {name: Brain, age: 37, food: tapioca} │ │ 2 │ {name: Dichael, age: 22, food: ice cream} │ │ 3 │ {name: Bobson, age: 36, food: hamburgers} │ └───────┴───────────────────────────────────────────┘

The only problem is that now you have to remember the index of each object you want to access. Numerical indexes naturally do not carry any logical association with the underlying data.

Maps/dictionaries allow you to directly associate an abitrarily-typed key with an arbitrarily-complex value without relying on a secondary assocation. The only caveat is that your keys must be unique. Maps that allow multiple values to be associated with the same key are called multimaps.

Continuing the person object example, you could use the person’s name as the key and set the value to an object containing all the personal details without having to store any extra positional information about where the data is located. This key-value association is the superpower of maps/dictonaries.

┌─────────┬─────────────────────────────┐ │ Key │ Value │ ├─────────┼─────────────────────────────┤ │ Brain │ {age: 37, food: tapioca} │ │ Dichael │ {age: 22, food: ice cream} │ │ Bobson │ {age: 36, food: hamburgers} │ └─────────┴─────────────────────────────┘

Sorted variants (discussed in the next chapter) allow you to order keys in some defined way depending on the implementation. In this example, the map is sorted lexigraphically by key.

┌─────────┬─────────────────────────────┐ │ Key │ Value object │ ├─────────┼─────────────────────────────┤ │ Bobson │ {age: 36, food: hamburgers} │ │ Brain │ {age: 37, food: tapioca} │ │ Dichael │ {age: 22, food: ice cream} │ └─────────┴─────────────────────────────┘

The Map ADT

Maps, sometimes called dictionaries, are ADTs that store arbitrarily typed key-value pairs called entries. At its core the map ADT looks very similar to the list ADT or its variants. It includes functions to insert, remove, and iterate over members of the data structure in various ways.

  • size() Returns the number of entries in the map
  • is_empty() Returns a Boolean that indicates whether the map is empty
  • get(k) Returns the value v associa}ted with key k
  • put(k, v) Adds entry (k, v), overwriting any value v associated with an existing key k, returns old value; Optionally you can include a try_put(k, v) operation that checks if key k exists and does not overwrite any associated value v, potentially returning a bool or some error if key k already exists
  • remove(k) Removes the entry (k, v) associated with key k
  • keys() Returns an iterator that yields immutable references to all keys in the map; Optionally, a keys_mut() may return mutable references
  • values() Returns an iterator that yields immutable references to all values in the map; Optionally, a values_mut() operation may return mutable references
  • entry_set() Returns an iterator that yields references to all (k, v) entries in the map
  • contains(k) Returns a Boolean indicating the presence of key k in the map

As with other ADTs in this module, operations that return iterators may alternatively return iterable collections if you’ve got your heart set on that. Notice that the map ADT does not specify any sorted properties or operations. Creaing and maintaining sorted structures is covered in more depth in later pages. Once again, there are far more fundamental matters to discuss before discussing sorted structures.

The Set ADT

In many programming libraries, a set is implemented as a restricted form of a map. In mathematics, however, sets are foundational objects: a set S is a collection of distinct elements, often defined via a predicate P as S={x such that P(x)}, where order and multiplicity are irrelevant. In programming, a set is an abstract data type that enforces uniqueness and is typically optimized for efficient membership, insertion, and deletion operations.

The set type contains the following fundamental operations:

  • add(e) Adds the element e to the set
  • remove(e) Removes the entry e from the set
  • contains(e) Returns a Boolean if the map contains element e

Additionally, you may consider using the fundamental methods to implement the following opertations for common use cases:

  • iterator() One of the most common uses of a set is to iterate through its elements
  • union(T) -> Iter A union operation that returns an iterator over all unique borrowed of both self and T
  • extend(T) A union operation that mutates S such that S contains all unique values of S + T
  • intersection(T) -> Iter An Intersection operation that returns an iterator over all borrowed values in both self and T
  • retain_all(T) (intersect(T)) An intersection operation that mutates S such that S only keeps elements that are also in T
  • difference(T) -> Self A subtraction operation that returns a new set with only elements that are not in T
  • subtract(T) A subtraction operation that mutates S such that S only contains elements that are not in T
  • retain(|predicate|) A filter operation that mutates S to only contain values that adhere to the given predicate (lambda)

Multimaps & mutlisets

A multimap is like a map, except that a single key can be associated with multiple different values. Multisets, also called bags, are sets that allow duplicates. The biggest takeaway here is how you define sameness/otherness. For example, are you counting the number of references to a single object, or are you counting multiple separate instances of bitwise equivalence? In multisets it is probably most common to list occurences and thus implement these structures with Entry<K, N> types where K is the element andN is the number of occurences. Multimaps are similar, except N is typically a list of values associated with key K. This might look more like Entry<K, V>.

// Base multimap struct
pub struct Entires<K, V> {
key: K,
vec: Vec<V>,
}

Multiset ADT

In addition to all the regular map operations, multisets contain the following specialized operations:

  • count(k) Returns the number of times key k appears in the set.
  • remove(k, n) Removes n number of occurences of key k in the set.
  • size() Returns the total number of elements in the set, including duplicates.

Multimap ADT

In addition to all the regular map operations, multimaps contain the following specialized operations:

  • remove_all(k) Returns an iterator over all key:value pairs for key k.
  • key_set(k) Returns an iterator over non-duplicative keys k in the map.
  • values(k) Returns an iterator over all the values for key k.

Map & Set Representations

Because sets are often just restricted maps, they can be represented with the same basic designs. The most common unsorted map/set representation (by a wide margin) is the hash table. A map ADT represented with a hash table is commonly called a hash map, and a set ADT represented with a hash table is commonly called a hash set. It is possible to represent sorted maps naively using techniques already covered, but the most efficient and common representations for sorted maps rely on techniques and concepts covered in the next chapter on hierarchical structures.