Skip to content

Lists

Sequential structures are so fundamental to modern programming that it can be difficult to choose where to start. Some of the earliest higher-level programming languages to support dynamic data structures, such as Information Processing Language (IPL), were effectively structured assembly languages that imposed list semantics on raw memory. These languages exposed machine-level storage addresses, links, and storage management directly, making dynamic memory layout an explicit part of the programming model. Fortran-era programming practices similarly relied on explicit memory and indexing to simulate dynamic structures. Lisp, so named as a shortened version of list processing language, whose primary data structure was the list, later abstracted these ideas into a much richer, higher-level model complete with automatic memory management. Over decades, these features evolved into a powerful, flexible model for symbolic computation, macros, and functional programming that persists in modern languages and approaches.

Even today, computer memory is fundamentally a linear address space, represented by a sequence of addressable locations. At the lowest level, all data storage and access reduces to allocating ranges of address space and interpreting relationships between addresses. There is no intrinsic notion of objects or data structures in memory, only locations and references.

Lists provide the primary conceptual bridge between raw or virtualized memory address ranges and structured data. Lists model how collections of values are laid out, traversed, and related within a linear address space. Whether explicit or implicit, list-like organization underlies nearly all practical data structure implementations.

The List ADT

Lists can be defined as a collection of elements (or nodes) such that each element (besides the first and last element) has a single predecessor and a single successor that you can follow sequentially from start to finish. In this way you can say that the list is fundamentally ordered.

In the context of data structures, ordering can refer to two related but distinct concepts. First, there is structural ordering or layout, which describes how nodes or elements are arranged in memory. Second, there is comparative ordering of values, which is the logical ordering imposed on stored data according to some comparator. By this disambiguation, a logically ordered collection is generally synonymous with a sorted collection. Generally speaking, the ability to sort an ADT is not universally defined and represents either a specialized subtype or an intrinsic feature of its API. These two types of ordering exist independently: all data structures exhibit some structural order, but only some implement logical ordering as a sorted collection. Sorted collections are introduced with heaps in the hierarchical structures chapter.

The basic list ADT may the following (non-exhaustive) set of operations:

  • add(e) / push(e) Adds an element e to the list (typically as an append operation)
  • add(i, e) / insert(i, e) Adds an element e to the list at the ith position/index
  • set(i, e) Mutates an element e at the ith position/index
  • get(i) / peek(i) Returns the value to or reference of the data at the ith position/index
  • remove(i) Removes an element at the ith position/index
  • size() / length() Returns the number of elements currently in the list
  • capacity() Returns the total number of elements the list can contain
  • is_empty() Returns a Boolean indicating whether the list is empty

Notice that the list ADT does not specify any sorted properties or operations. Creaing and maintaining sorted structures is covered in more depth in later pages. Presently there are far more fundamental matters to discuss.

List Representations

At the representation level, designers realize list ordering in two primary ways:

  • Contiguous memory lists; where elements occupy adjacent addresses
  • Linked lists; where ordering is expressed through explicit references to arbitrary addresses

These two strategies, and the variations built around them, form the substrate for most higher-level data structures. Indeed all of the data structure representations in the rest of this module are backed by one of these two representation strategies.

Contiguous Storage

Contiguous memory structures encompass a range of designs, from minimal, arena-based allocation strategies navigable via manual pointer arithmetic, to more polished designs that include integer-based indexing or dynamic resizing logic. The common thread across all contiguous storage designs is that the structures occupy a continuous range of memory addresses.

Most contiguous memory lists also assume storage of a single data type, which evenly spaces elements in memory for simpler navigation and management. Additionally, most high-level contiguous memory lists provide indexing semantics over those uniformly sized types. This layout enables simplified O(1) access to arbitrary elements in the list because the memory offset of any element can be computed directly via pointer arithmetic. For example, to access the 5th element in a list of homogeneously sized types, you start at the memory address corresponding to the beginning of the structure and advance a pointer by 5 * size_of(element).

Containers with APIs that expose access via integer indices are called indexed structures. In fact, this convention is so common that the term “indexed structure” is often used synonymously with a contiguous memory structure.

Pointer arithmetic over contiguous memory is extremely efficient. Coupled with the contiguous region’s locality of reference—which results in more cache hits compared to linked structures with arbitrary pointer locations—this typically produces superior performance for iteration and random access.

The primary downside of contiguous structures is that adding or removing elements anywhere except at the end of the list requires O(n) data movement. For instance, inserting a book in the middle of a tightly packed bookshelf requires shifting all books to the right one space. In memory, while there may not be a literal “physical move,” elements must still be copied to make room for the insert.

Some designs attempt to reduce these linear shifts by relaxing the dense-indexing invariant. Techniques such as tracking slots via a free list or using sentinel values transform the structure into a sparse or arena-like representation, which introduces different iteration and access tradeoffs. These designs are uncommon and are not covered in this module.

Static Arrays

The first contiguous storage list representation most programmers come across is the humble array. Arrays (or array-like structures) are so fundamental to modern programming languages that they are often included as built-in language constructs with dedicated syntax. Specifically, array types in many languages borrow the (square) “bracketed indexing” convetion of linear algebra and numerical analysis that matured in the mid-20th century. The actual definition of arrays can vary, but it’s common to define classical (static) array structures as fixed-capacity allocations designed to store collections of homogeneously-typed elements that occupy contiguous regions of memory.

The range of possible index values in an array represents the number of elements you can fit within the allocated space, commonly referred to as the array’s capacity. As you add elements to an empty array you change its size or length, but its capacity remains static. Most array-like structures are 0-indexed, meaning that an array with a capacity of n elements has a range of indexes corresponding to 0..n-1. For example, an array with three elements contains indexes 0, 1, and 2. Arrays can also be 1-indexed, but these are less common among modern language constructs. Its generally considered an error to access empty indexes within the structure’s capacity. For example, if you instantiate an array a with a capacity of 5 and insert 2 elements, attempting to access index a[4] represents an illegal out-of-bounds operation.

The following code examples illustrate the similarities between built-in array syntax across languages. Its worth noting that the performance benefits of arrays primarily extends to arrays with tightly packed, statically-sized types. Array-like structures that contain dynamically-sized types may utilize pointer indirection in a way such that not all of the structure’s data is stored contiguously. For example, Python’s list type uses array syntax with square brackets but actually stores duck-typed objects, or more accurately pointers to objects, which may actually be stored. For example, list = [1, "hello", 3.14, object()] is a valid assignment despite containing a grab-bag of types. Rust’s [T; N] array stores primitives and sized types contiguously, but may default to the same pointer indirection for arrays of dynamically sized types such as String.

These examples illustrate filling an array with dynamically sized, heterogeneous types to illustrate potential pointer indirection. Notice that Java, C, and Rust make this explicit in their use of “objects”.

Python
# List of object references (heterogeneous, heap-allocated)
# because Python's list type only stores object references
array = ["hello", 3, True]
Java
// Creates an array of Object types that each hold
// (point to) different sub-types
Object[] array = new Object[] {"hello", 3, true};
C
// Creates the array's members
const char *s = "hello";
int i = 3;
_Bool b = 1;
// Instantiates an array and fills it with pointers to the
// previously-created "objects"
void *array[3];
array[0] = (void *)s;
array[1] = &i;
array[2] = &b;
Rust
// Creates an enum called Object to store primitives
enum Object {
Str(&'static str),
Int(i32),
Bool(bool),
}
// Creates an array of references
let array = [
Object::Str("hello"),
Object::Int(3),
Object::Bool(true),
];

Rust imposes strict bounds on its array type [T; N] such that T: Sized and the array length N must be a compile-time constant (e.g., no runtime variables). Additionally, the expression used to initialize the array must be const-evaluable (or Copy) for the standard [expr; N] syntax. These constraints make Rust’s array type a truly fixed-size, classical array. The design rationale is rooted in the guarantees this provides: predictable stack layout without unintended indirection or hidden heap allocation, and the ability to use arrays as generic parameters without introducing runtime overhead. This enables zero-cost abstraction, giving the compiler full information for optimizations and performance guarantees.

Rust
// NOTE: Rust requires compile-time const-evaluable expressions for initialization;
#[derive(Default)] // Required for generic array initialization
pub struct Podium {
data: [Option<Entry>; PODIUM_SIZE],
size: usize,
}
impl Podium {
/// Creates a list that contains `const PODIUM_SIZE` number of
/// elements with indexes from 0 to (PODIUM_SIZE - 1)
pub fn new() -> Podium {
Podium {
// Use a const block for types that aren't Sized
data: [const { None }; PODIUM_SIZE],
//data: [None; new_capacity], // Illegal
size: 0,
}
}
...

Dynamic Arrays

Dynamic arrays are array-like structures that can grow or shrink the list’s capacity throughout its lifetime. The resizing logic is design-specific, meaning that two different designs can both be considered dynamic arrays despite implementing different growth strategies and/or APIs. Automatic re-sizing generally involves checking the size of the list against its capacity before attempting to add elements. If there is not enough room for the list’s new size, the operation automatically clones the object to a new region of memory with some new capacity before adding or removing the new element. Re-sizing operations are inherently expensive due to several factors including a potential syscall for allocation of a new range of addresses as well as O(n) data copies into the new block of memory. As a result, it is much more common to limit re-sizing and potential re-allocation to growth alone, and not for the removal of elements.

When designing a dynamic array structure you can choose to re-size arithmetically by adding/removing a fixed number to the list’s capacity, or you can choose to re-size geometrically by adding/removing some multiple of the current capacity. While outside of the scope of this module, it is possible to mathematically prove that geometric re-sizing is the most efficient, leading to amortized O(1) insert functions despite occasionally expensive O(n) move operations. It is also out of the scope of this text to prove the optimum geometric progression, but it’s common to see dynamic array implementations increase the capacity of the underlying array geometrically by a factor of two, thereby doubling the list’s capacity during growth operations.

It is similarly common to impose some analysis and geometric re-sizing operation for (predominantly manual) downsize operations. Again, it is out of scope to provide proofs for sweet spot(s) (Id love user input on this unfortunate math-related trend!), but it is common to halve the list’s capacity when the list’s size reaches a quarter of its current capacity.

Re-sizing usually happens automatically from within insert and remove functions, or as stand-alone (manual) resize or trim functions for the type. For example Java has trimToSize() on its dynamic array type ArrayList<T>, and Rust has shrink_to_fit() on Vec<T> for manual downsizing operations.

Linked Storage

Linked lists, by contrast to contiguous structures, are made of individual data objects, called nodes, that contain references or pointers to other nodes in the list. Each node exists at some arbitrary memory address within the program’s virtual memory space, and the list only exist as a logical structure within memory. Linked lists typically do not have direct indexes and instead deal with relativistic terms such as “first”, “next”, “previous”, and “last” elements.

In general, it’s not possible to directly access arbitrarily positioned nodes in the list without tracking a pointer to that node or its data directly. To provide additional utility inked lists may include some sort of cursor mechanism that operates with one or more pointers to stable node addresses in the list. To quote the Rust documentation,

A Cursor is like an iterator, except that it can freely seek back-and-forth.

Such lists are called positional lists. The positions are exposed as safe handles to a node’s data. The cursors themselves still have to traverse the list sequentially, but the ability to store handles allows you perform more complex operations with navigation, list splitting, splicing, and efficient operations on ranges within the list. Traversal always starts with the “first” (head) node A0 or the “last” (tail) element An-1 and works its way through each “next” and/or “previous” pointers to the desired node.

The superpower of linked lists is that you can mutate the list at arbitrary positions in fast O(1) time, though navigation to arbitrary nodes still requires O(n) time. In practice, this only really makes sense if you’re performing an exceptional amount of mid-list mutations.

Positional lists may add the following (non-exhaustive) list of operations in addition to the basic list ADT functions:

  • insert(p, e) Adds an element e to the list at position p
  • set(p, e) Sets the position p to element e
  • get(p) / get_kth(p) Returns the data at position p
  • remove(p) / remove_kth(p) Removes (and optionally returns) an element at position p
  • find(e) / contains(e) Alternative implementation that typically returns the first position of an element e

It is common to categorize or define linked list variations based on how they’re linked. For example, a singly-linked list contains only one pointer per node resulting in only one direction to traverse the list. Singly-linked lists typically contain pointers to the “next” node in the list, meaning you can only traverse the list from the “first” (or “head”) node to the “last” (or “tail”) node in the list. A doubly-linked list includes two pointers as “next” and “previous” so you can traverse the list forwards and backwards. A circularly-linked list connects the head to the tail nodes for a looping traversal.

Because linked lists require some amount of “pointer chasing” across memory, common operations such as iteration can be more expensive compared to contiguous structures. Indeed, unless you are storing incredibly large nodes or performing a truly spectacular amount of interior list mutations where large amounts of node data copies would be too expensive, linked lists almost always perform worse than contiguous structures at scale.

Basic linked lists remain relevant for their conceptual and pedagogical properties within DSA curricula. Despite being relatively rare in the “real world”, specialized linked lists are still commonly found in memory allocators and some common compiler operations.

Implementation Notes

List structures are ironically some of simplest structures on a conceptual level, but some of the hardest to implement in Rust due to the necessary interaction with low-level memory arrangement and allocation strategies.

Singly-linked lists are as easy as stringing together Box<T> pointers, but implementing doubly-linked lists is particularly tricky because each node represents a mutable, shared resource—something that Rust goes to great lengths to prevent in its safe APIs. To get around this, you will likely end up relying on raw, mutable pointers within unsafe blocks.

Implementing contiguous list structures like Vec<T> require operational knowledge of low-level APIs such as the std::alloc, and std::ptr, both of which also require some unsafe dereferencing. Of course using unsafe Rust gives you access to all the foot-guns that Rust was meant to make obsolete.

The good news is that the Rust community developed Miri to keep you from injuring yourself too badly. Miri is a Rust interpreter that helps detect undefined behavior (UB) and even some memory leaks on top of your existing operational tests. The catch, naturally, is that it only checks what you give it, so it’s advantageous to write many tests for all sorts of use cases.

The low-level memory management required by classical list structures means it is not possible to truly implement them in all languages. Fortunately just about every high-level programming language ships with list structures that serve similar purposes, and allow you to back even more interesting data structures and algorithms. That being said it may be worth taking a look at your favorite language’s array implementation because the concepts are fundamental in making data-oriented design decisions about your designs and compositions.

Conclusion

List structures provide the basic building blocks for data structure design by introducing memory layout and allocation concepts.

Linked lists are popular in classical DSA curricula in part due to when the material matured and the fact that they are great ways to introduce students to pointers and memory management. Despite their popularity in DSA courses, linked lists aren’t used much in the real world. Things get funny down at the operating system level where some specialized intrusive linked lists exist, but in day-to-day code, contiguous structures are much more common and usually more efficient. Specifically, modern architectures make efficient use of caching that works best with contiguous, or more accurately localized (nearby) memory as opposed to the random access required by non-contiguous memory structures. The re-sizing logic of dynamic arrays also amortizes operations down to similar constant time complexities as linked list operations. Indeed most of the theoretical benefits of linked lists are often minimized by properly designed dynamic array structures.

According to the author(s) of Learn Rust With Entirely Too Many Linked Lists the Vec and VecDeque in Rust are,

…blatantly superior data structures for most workloads due to less frequent allocation, lower memory overhead, true random access, and cache locality.

The book goes on to name some reasons where a linked list may be superior to a dynamic array. It should be noted that these use cases are rare, potentially solve difficult/specialized problems, or don’t actually relate to the language choice itself.

  • Intense amounts of splitting or merging in (very) large lists
  • Lock-free concurrency
  • Resource-constrained environments (kernels and embedded) that warrant intrusive lists
  • Pure functional languages with limited semantics and absence of mutation