Skip to content

Search Trees

The heaps page introduced the idea of using trees to structure sorted data to obtain O(log(n)) look-ups, insertions, and removals. This strategy provided an excellent starting point for implementing priority queues. The heap structure is great for sorting collections, finding k-th elements, or ranges, but the primary limitation is that it only returns values of approximate value. This section extends the binary tree logic to implement a sorted map.

Binary Search Trees

The BST is essentially just a sorted binary tree where all nodes in a given node’s left sub-tree contain values less than the node’s value, and all nodes in the right sub-tree contain values greater than the node’s value. Or, if you prefer inequalities, the standard BST for node n with a left sub-tree l and a right sub-tree r, l < n < r. It is also possible to add a single “or equal to” to the inequality to allow duplicates with either l < n <= r or l <= n < r. However, it is not possible to allow both sub-trees to implement <=, as this may end up in non-deterministic behaviors and incorrect structuring. Mathematically this may be expressed as:
For any node n whereL is the set of values in the left subtree of n and R is the set of values in the right subtree of n:

∀x∈L, x<n
and
∀y∈R, y>n

For BSTs that allow duplicates (policy-based), you can allow duplicates to go to the right:

∀x∈L, x<n ∀y∈R, y≥n

Or allow duplicates go to the left:

∀x∈L,x≤n ∀y∈R,y>n

The BST ADT contains the following functions:

  • contains(v) Returns true if the tree contains v, or false if it does not
  • find_min() and find_max() Return the position p of the smallest and largest elements in the tree
  • insert(v) Inserts the value v into the correct (sorted) location, adjusting the tree as necessary to maintain order
  • remove(v) Removes a node with value v and adjusts the tree as necessary to maintain order

This example illustrates two binary trees, but only the tree on the right is also a search tree. This is because, in the left tree, the value 34 is larger than 13, which violates the search tree property where all of the values in a node’s left sub-tree are smaller than (or equal to) the node’s value.

23 23
/ \ / \
13 45 13 45
/ \ / \ /
8 17 8 17 34
\
34

Balanced Search Trees

Later, wheYou can also swap the inequalities such that l < n <= r, depending on preference and implementation. It is important that at least one side is strictly “less than” to maintain balance properties later on. The structure of basic binary search trees depends on their insertion order. Consider the following scenarios involving different insertion order of the same set of values.

let v = [23, 13, 39, 8, 17, 19, 31, 41, 43];
23
/ \
13 39
/ \ / \
8 17 31 43
\ /
19 41
let v = [31, 13, 23, 39, 41, 43, 8, 17, 19];
31
/ \
13 39
/ \ \
8 23 41
/ \
17 43
\
19

In the worst case, adding sorted values creates a BST that is no different than a linked list!

let v = [8, 13, 17, 19, 23, 31, 39, 41, 43];
8
\
13
\
17
\
19
\
23
\
31
\
39
\
41
\
43

You can see that each scenario produces a tree of a different height. Similarly to the heap structure, the greatest benefit to using a search tree is the ability to reduce the number of comparisons when finding a particular node or position. Indeed, BSTs are also capable of O(1) find\_min() operations, and O(log(n)) insert/remove operations, but can find arbitrary keys in O(log(n)) time as well. This makes them preferable to heaps when implementing something like a sorted map.

Restructures

Each of the balanced BST sub-types discussed in this section rely on some balance condition that relies on operations to balance the structure to keep the height of the BST as close to O(log(n)) as is practical. These operations are known as restructures and contain one or more rotations depending on the balance condition and state of the tree. Rotations essentially swap child-parent nodes and re-link children to minimize the sub-tree’s height.

Restructures can occur in four unique cases:

  1. Insertion at the left subtree of the left child of node a. (LL)
  2. Insertion at the right subtree of the left child of node a. (RL)
  3. Insertion at the left subtree of the right child of node a. (LR)
  4. Insertion at the right subtree of the right child of node a. (RR)

Keen eyes may notice that these four cases detail two pairs of symmetrical operations. Symmetrical, in this case, means a LL insertion is symmetrical to a RR insertion, and a LR insertion is symmetrical to a RL insertion. These symmetrical pairs can be categorized as “outside” and “inside” operations. The splay tree, another BST covered on the next page, uses a similar and potentially easier to imagine “zig-zig” and “zig-zag” terminology. The names come from a visualization of the insertion within a tree. The following diagrams illustrate how LL and RR insertions occur on the outside of the visualized tree, and the LR and RL insertions occur on the inside of the visualized tree. It is important to note that not all insertions trigger balance conditions. For simplicity, consider the following illustrations of inside and outside insertions that do not trigger restructures. These diagrams illustrate the symmetry of the cases.

8 8
/ \ / \
7 10 7 10
/ \
4* <- LL (outside) insertion 12* <- RR (outside) insertion
8 8
/ \ / \
6 10 6 10
/ \
9* <- LR (inside) insertion 7* <- RL (inside) insertion

For insertions that do violate balance conditions and trigger restructures, the restructure operation must cover all four insertion cases.

Single Rotations

You must rotate nodes to rebalance the tree after insertion. Rotations only affect the branches for the insertion. Conveniently, you can handle outside insertions with single rotations, and inside insertions with double rotations.

Consider inserting a new node 3 into the following (previously) balanced AVL tree. In this case, the new insertion represents a LL/outside insertion under node 4. The keen among you will notice that this causes not one but two nodes to violate the balanced height property. After inserting node 3, the nodes 7 and 8 both have left and right sub-trees that differ in height by more than one. It is customary to perform all rotations on the lowest possible node that violates the height property, which in this case is node 7. It can be said that the insertion triggers a right rotation at node 7. The values below 7 and above 9 remain in place, but the values between 7 and 9 switch from left children to right children of their new parent:

8 -> 8
/ \ -> / \
7 10 -> 4 10
/ -> / \
4 -> 3 7
/
3*

To illustrate single rotations consider the case of adding elements [1..7] to an AVL tree (in natural sequential order).

  1. The first rotation happens when you add 3, which causes node 1 to become unbalanced and results in a left rotation. It is customary to perform all rotations on the highest (closest to the insertion) possible node that violates the height property.
  2. Then, you require another left rotation when adding 5, because it results in node 3 becoming unbalanced and you need another left rotation.
  3. The next imbalance happens when adding 6, because it now results in the root node 2 becomming unbalanced.
  4. Adding the final node 7 causes the last imbalance at 5 resulting in the last leftward rotation.
  5. The final result is a perfectly balanced (and in this case complete) AVL tree.
1) 2) 3) 4) 5)
1 -> 2 -> 2 -> 4 -> 4
\ -> / \ -> / \ -> / \ -> / \
2 -> 1 3 -> 1 4 -> 2 5 -> 2 6
\ -> \ -> / \ -> / \ \ -> / \ / \
3* -> 4 -> 3 5 -> 1 3 6 -> 1 3 5 7
-> \ -> \ -> \ ->
-> 5* -> 6* -> 7* ->

If you perform a similar operation on a slightly more complex tree, you can see the same mechanism in action. In this case the imbalance happens one level further up the tree, at the tree’s root. Notice that all of the nodes

10 -> 7
/ \ -> / \
7 13 -> 5 10
/ \ -> / / \
5 8 -> 4 8 13
/ ->
4* ->

Once more on an even more complex tree. This time, the example represents a RR/outside insertion. Finding the lowest node that violates the height balance property identifies node 39. The rotation swaps node 39 with its child in the direction of the imbalance, which is node 43. All nodes between those two switch from being left children to right children, or from right children to left children.

23 -> 23
/ \ -> / \
17 39 -> 17 43
/ \ / \ -> / \ / \
9 19 26 43 -> 9 19 39 50
/ \ -> / \ \
41 50 -> 26 41 55
\ ->
55* ->

To illustrate single rotations consider the case of adding elements [1..7] to an AVL tree. The first rotation happens when you add 3, which causes node 1 to become unbalanced and results in a left rotation:

1 -> 2
\ -> / \
2 -> 1 3
\ ->
3* ->

Then, you require another left rotation when adding 5, because it results in node 3 becoming unbalanced and you need another left rotation:

2 -> 2
/ \ -> / \
1 3 -> 1 4
\ -> / \
4 -> 3 5
\ ->
5* ->

You’ll run into another imbalance when adding 6, because it now results in the root node 2 becomming unbalanced.

2 -> 4
/ \ -> / \
1 4 -> 2 5
/ \ -> / \ \
3 5 -> 1 3 6
\ ->
6* ->

Adding the final node 7 causes an imbalance at 5 resulting in the last leftward rotation, and a perfectly balanced (and complete) AVL tree.

4 -> 4
/ \ -> / \
2 5 -> 2 6
/ \ \ -> / \ / \
1 3 6 -> 1 3 5 7
\ ->
7* ->

All of these cases have the following things in common:

  1. Nodes always get inserted at leaf positions
  2. Insertions requiring rotations do not increase the overall height of the base tree
  3. Single rotations result in relinking no more than three nodes

Algorithm:

  1. Identify the lowest node with a balance factor that is >= |2|.
  2. Rotate the node away from the heavy imbalance side.

Double Rotations

Double rotations are slightly more complex, but only just. In fact, double rotations simply perform single rotations twice. Fancy that!

Consider the following inside insertion caused by the sequence [2, 5, 3].

  1. The insertion of node 3 is identified as a RL insertion and causes an imbalance at node 2.
  2. The first rotation happens at node 5.
  3. The second rotation happens at node 2.
1) 2) 3)
2 -> 2 -> 3
\ -> \ -> / \
5 -> 3 -> 2 5
/ -> \ ->
3* -> 5 ->

Looking at a slightly more complex tree.

  1. The insertion of node 5 causes an imbalance at node 8.
  2. The first rotation happens between
8 -> 6
/ \ -> / \
3 10 -> 3 8
/ \ / \ -> / \ / \
1 6 9 12 -> 1 4 7 10
/ \ / \ -> / \ \ / \
0 2 4 7 -> 0 2 5 9 12
\ ->
5* ->

Restructure Operations

Its not too hard to combine single and double rotations into a single restructure() operation. Algorithmically this presents as a “search-and-repair” strategy where you check the height difference between left and right sub-trees to the root.

Balancing Implementation

Balancing is commonly implemented through algorithmic hooks, which represent specific places in the code where external code can “hook into” the algorithm. Hooks look like functions or interfaces and allow you to drop in any number of implementations, including implementations that do nothing at all! Using hooks for rebalance operations allows you to design a basic BST and convert it to different algorithmic self-balancing sub-types without totally reimplementing the base BST. This page covers two of the most popular self-balancing BSTs; AVL trees and red-black trees.

AVL Trees

An Adelson-Velsky and Landis (AVL) tree is any binary search tree where the heights of each of the tree’s children differ by no more than one. This property remains true recursively for each node in the tree. Keep in mind that a tree’s height represents the number of the greatest number of levels. The following tree example is an AVL tree beacue the height of the root’s right child is three, and the height of the root’s left child is four. This is also true recursively for all nodes (sub-trees) in the greater structure.

Node 23: L: 4 R: 3 23
/ \
Node 13: L: 3 R: 3 13 39
/ \ / \
Node 8: L: 2 R: 1 8 17 31 43
/ \ / \
Node 3: L: 1 R: 0 3 10 15 56
/
Node 1: L: 0 R: 0 1

Constructing AVL Trees

Take the sequence presented in the previous section on balanced trees.

let v = [31, 13, 23, 39, 41, 43, 8, 17, 19];

You can insert elements, starting with 31, until the tree violates the balanced height property. Coincidentally, this happens after inserting the third element 23. At this point, the sub-trees of root 31 violate the balanced tree property because the height of the root’s left child is 2 and the height of the root’s right child is 2. At this point the algorithm trips a balance hook which performs a double rotation.

31 23
/ / \
13 -> 13 31
\
23

The insert operation adds 39, and trips the balance hook again at 41 which performs a single rotation.

23 23
/ \ / \
13 23 -> 13 39
\ / \
39 23 41
\
41*

The insert operation then adds 43, 8, 17, and 19.

23
/ \
13 39
/ \ / \
8 17 23 41
\ \
19 43

Red-Black Trees

Analysis

This table assumes all binary search trees are properly balanced implementations, as the asymptotics for BSTs without balancing can quickly become skewed towards simple linked lists.

OperationBSTHeap
Insertlog(n)log(n)*
Insert (avg.)log(n)1**
Find anylog(n)n
Find min/max1 ***1
Createn log(n)n
Deletelog(n)log(n)

* Individual heap insert operations are at worst O(log(n)) for linked structures, but can be as bad as O(n) for array-based heaps, even though they are amortized to O(log(n)).

** Insertions are really O(log(n)), but amortize to constant time in practice.

*** It is trivial to modify a BST to keep track of the min/max element, otherwise finding min/max elements is O(log(n)). You can update the min/max whenever you change that element. On insertion of a larger one swap, on removal find the second largest.

Advantages of binary heap over a BST

  • Insertions into binary heaps amortize to O(1) time but remain O(log(n)) for BSTs.

  • There are also other heaps which reach O(1) amortized insertions, like the Fibonacci Heap, and even worst case, like the Brodal queue, although they may not be practical because of non-asymptotic performance.

  • Binary heaps can be efficiently implemented on top of either dynamic arrays or pointer-based trees, but BSTs can only be efficiently implemented with pointer-based trees. So for the heap its possible to choose the more space- (and cache-) efficient array implementation, that is if you can afford occasional O(1) re-size operations.

  • Binary heap creation is O(n) worst case, O(n * log(n)) for BSTs.

Advantage of BST over binary heap

  • Search for arbitrary elements is O(log(n)) in BSTs, but O(n) in binary heaps that can only return the min/max in O(1) time.