Searching, Sorting, & Selectiong
// - An introduction to comparative ordering (sorting) // - Practice manipulating list structures // - Introduction to search algorithms // - Introduction to order statistics and Kth element queries // - Introduction to range queries // - The skip list ADT
A large portion of this chapter deals with sorted structures. Sorted structures rely on some way to meaningfully compare elements. This page defines some terms regarding those ordering schemes.
Comparative Ordering
// - structural vs lexicographical vs comparative ordering // - Partial order vs total order // - Sorting collections vs sorted structures, e.g. ephemeral // vs persistent comparative ordering // - Sorting imposes a total order on a finite sequence
Comparative ordering includes such as numeric or lexicographic order. While structural ordering dictates how the data is physically or logically traversed, comparative ordering governs the correctness of operations like search, insertion, or range queries. Distinguishing these two allows us to reason about efficiency, correctness, and the design of both mutable and persistent data structures independently of their physical layout.
Ordering schemes are an important part of the logic of data structures and algorithms. For example, how do you compare String types? Should strings be ordered by length or lexicographical order?
Ordering schemes can be classified by the properties they exhibit. The four main properties are as follows:
- Trasitivity: If k1 <= k2 && k2 <= k3, then k1 <= k3; In a “strict ordering” this applies to the < operator such that if k1 < k2 && k2 < k3, then k1 < k3; Indifference extends comparison operations to “rough equivalency”
- Reflexivity: k <= k; In contrast irriflexivity is when k !< k and applies mostly to “strict ordering”
- Antisemmetry: if k1 <= k2 and k2 <= k1, then k1 == k2; This ensures that no two distinct elements can relate in both directions unless they are the same
- Asymmetry: A type of antisemmetry such that if k1 < k2, then k2 !< k1, mostly used in “strict ordering” to enforce a one-way relationship
- Comparability: Every element must be directly comparable and the primary way to distinguish “partial ordering” from “total ordering”; k1 <= k2 || k2 <= k1
You can combine these properties (and their variants) into the following major ordering type classifications:
| Type | Transitive | Reflexive | Antisymmetric | Comparability |
|---|---|---|---|---|
| Total Order | X | X | X | X |
| Partial Order | X | X | X | |
| Strict Order | X | X (Asymmetric) | ||
| Weak Order | X | X | ||
| Quasi-order | X | X | ||
| NOTE: Quasi-order is also known as “preorder” in some camps. |
It is also necessary to distinguish some classifications. Consider atomic and natural ordering.
Atomic (or classical alphanumeric) ordering represents an element-wise ordering based on individual comparisons within each unit. For example, the number (unit) 69 is atomically ordered because 6 < 9. The number 420 is also atomically ordered because each element follows a strict ordering where 4 > 2 > 0. The number 187 is not atomically ordered because 1 < 8 and 8 > 7, which breaks the ordering. You can extend ordering to sorting logic as well. Consider the following sequence of units (integers), separated by spaces, which illustrates an atomic sorting of atomically ordered units: 1 10 11 2 23 5 This may be counter-intuitive for humans because how is 11 < 2? Atomic ordering does not consider the length of the unit, just the elements of each unit themselves. The sorting compares each element with the unit in its atomic order, so think of 11 as [1, 1] and 2 as [2, _]. Because 1 < 2, 11 comes before 2 in an atomic sort. Similarly 23 comes before 5 because 2 < 5.
By contrast, natural ordering compares the entire unit in something that is more “natural” for humans to grok. The same sequence in natural (sorted) order appears as: 1 2 5 10 11 23 Natural ordering extends to alphanumeric sequences and may include traditional numerical systems like hexadecimal as well as arbitrary alphanumeric combinations like “z3”, where 1222 < z3 in ASCII. Of course these types of ordering schemes are dependent on the underlying encoding schemes for the alpha characters.
You can apply this logic to two additional common ordering schemes; lexicographical and dictionary ordering. Both of these ordering schemes compare words by atomic order, where each letter represents an individual unit/value. The biggest difference is how each accounts for capital letters. In ASCII the capital letter P is 80, and the lower-case p is 112. Because P != p, dictionary ordering can only be considered a partial order, whereas lexicographical ordering represents a case-sensitive total order extension of the alphabetic ordering in Unicode. Using this logic you can compare Strings. For example, the String “Peter” translates to 80 101 116 101 114, “Paul” translates to 80 97 117 108, and “Abby” translates to 65 98 98 121 (with capital letters) in ASCII. Individually none of the strings are atomically ordered. In Peter and Paul 8 > 0, but 0 !> 1. In Abby 8 > 0 but 0 !> 6. However, its easy to see that a lexical (alphabetical, in this case) and dictionary ordering places the three names (units) in the following sequence: Abby Paul Peter
Sorting
// - Producing totally ordered collections, e.g. transforming // a sequence into a totally ordered sequence // - Sorting a list vs maintaining a sorted invariant // - Comparison-based list sorting has a lower bound of Ω(n * log(n)) // - Maintaining a sorted invariant using balanced tree structures requires // O(n log n) total build time, but supports O(log n) search, // predecessor/successor, and O(log n + k) range queries // - Heaps can be built in O(n) time using Floyd’s algorithm, but heapsort // requires O(n log n) time due to repeated extractions. // - Unlike balanced trees, heaps do not support efficient range queries or // ordered traversal // - This section covers two very rudimentary sorting algorithms // that are efficient for small sets, more advanced // algorithms are covered in the algorithm reference section // - The more advanced algorithms cover the divide-and-conquer // technique, which is a good general purpose technique // - Stability; stable vs unstable sorting algorithms // - In-place vs auxilary requirements
Insertion Sort
Selection Sort
// - Simple and effective, but only on small sets // - The algorithm reference module provides more practical // (and advanced) algorithms for sorting sequential structures // - All comparison-based sorting algs are Ω(n * log(n)) // e.g. radix and bucket sort impose domain assumptions to run // in O(n + k) time
Searching
// - Exploiting totally ordered collections // - Given a key k, determine if it exists in a collection, // and if so, return it // - For unsorted collections you must either perform linear O(n) // scans or use key-based lookups // - There are a handful of search algorithms, but the most // practical way to search a sorted collection is binary search
Binary Search
// - Necessarily performed on a sorted collection with // random access (like indexing or numerical positioning) // - Used to find lower bound (first element >= target) and upper // bound (first element > target) in O(log(n)) time
Selection
// - Rank-based queries in unsorted collections, e.g. Kth elements // - What are order statistics? The element of rank k in a // totally ordered collection // - What is selection? The algorithmic process of computing the // Kth order statistic in an unsorted collection // - There are a handful of ways to perform selection; The easiest // being to simply linear scan in a sorted collection, but the // most common and practical way in unsorted collections is // the (randomized) quick select algorithm // - Sorting solves selection, but selection does not // require sorting
Quickselect
// - Used to find the kth smallest/largest element in an unsorted array in O(n) time // - Used for efficient partial order statistics // - By comparison, kth values in sorted sets are O(1)
The Skip List ADT
// - Introduction to the first sorted structure of the module // - Probabalistic vs deterministic sorted structures // - Design patterns??? (towers, layers, etc.) // - Easy Kth element method and predecessor/successor search // - Range query method with index semantics