Smart Pointers
The concept of smart pointers originates in C++. They were introduced as a way to manage memory and ensure safe operations, particularly in scenarios where manual memory management (e.g., using raw pointers) could lead to memory leaks or undefined behavior. The concept of ownership as we know it in Rust with its compiler-enforced rules is unique to Rust, but the idea that you should write programs with memory safe design patterns that impose limits on who has what access to which memory is not. Rust’s ownership rules make it difficult, but not impossible, to allocate memory that is never deallocated. Memory that never gets cleaned up is referred to as a memory leak. The language is still memory safe, meaning that all potential leaks are still memory safe.
In a very general sense smart pointers are any type that provides behavior beyond simple references. In this way smart pointers can be considered more of a design pattern than a distinct type. Often, smart pointers are used for automatic memory management (typically as some form of reference counting), ownership tracking, and implementing various patterns like lazy initialization or copy-on-write.
This section covers three smart pointer types provided by Rust;
Box<T>
: for allocating values on the heap; has a single owner and allows for mutable or immutable borrowsstd::rc::Rc<T>
: a reference counting type that allows multiple ownership with immutable borrows only; provides a strong reference by default which can be converted to a weak reference withRc::downgrade
, and later (potentially)converted back to a strong reference withWeak::upgrade
; alternatively you can create a weak reference from the start withstd::rc::Weak<T>
std::cell::Ref<T>
/std::cell::RefMut<T>
, accessed throughstd::cell::RefCell<T>
: used when you need an immutable type but need to change an inner value of that type (interior mutability) by enforcing the borrowing rules at runtime instead of compile time; has a single owner; allows either multiple immutable borrows or a single mutable borrow
From JonHoo lecture:
Cell
- In Cell
you can never get a reference to the thing inside the Cell
. This is what makes it safe, if you can only own it, its always safe to mutate it because you’ll never screw up someone else’s references.
- Cell
does not implement Sync
, which means you cannot give a reference to Cell
to another thread.
The Box<T>
Type
The Box<T>
type is used to store data on the heap. The Box<T>
type implements the Deref
trait (among many, many others). The Deref
trait allows Box<T>
values to be treated like references. See the module level documentation for more information. The ability to allocate Box data on the heap, manage lifetimes, and transfer ownership, all with the ability to dereference the data is what makes Box<T>
a smart pointer. In fact, the Deref
trait is such a powerful mechanism that it alone is sometimes all that its implementation is all that’s needed to call a type a “smart pointer”.
Using Box<T>
The type is most frequently used in the following situations;
- When you have a type whose size can’t be known at compile time and you want to use a value of that type in a context that requires an exact size.
- When you have a large amount of data and you want to transfer ownership but ensure the data won’t be copied when you do so. Transferring ownership of a large amount of data can take a long time because the data is copied around on the stack. To improve performance in this situation, we can store the large amount of data on the heap in a box. Then, only the small amount of pointer data is copied around on the stack, while the data it references stays in one place on the heap.
- When you want to own a value and you care only that it’s a type that implements a particular trait rather than being of a specific type. This is known as a trait object and is covered in more depth in the trait objects section.
Box is defined as a struct has some handy implementation methods. Boxes provide only indirection (storing a pointer instead of the value) and heap allocation (of the value). Perhaps the first method to cover is the new()
method.
let b = Box::new(5); println!("b = {}", b);
In this case b
, the instantiated box, is stored on the stack and can be accessed just like any other stack data. The actual data is stored on the heap. When b
goes out of scope, both the data on the stack and the heap are deallocated. This is because Box<T>
implements the Drop
trait. The example here is just for illustration, and Box<T>
is not really used this way. Lets explore some situations where it is used.
Dereferencing values
As weIf we were to run the following code we’d get an error because, just like regular references, b
and 5
are different types.
let b = Box::new(5); assert_eq!(5, b); // Illegal!
error[E0277]: can't compare `{integer}` with `Box<{integer}>` --> src/cncpt/types/smart_pointers.rs:22:5 |22 | assert_eq!(5, b); | ^^^^^^^^^^^^^^^^ no implementation for `{integer} == Box<{integer}>` |
Because Box<T>
implements the Deref
trait we can treat it like any other reference.
let b = Box::new(5); assert_eq!(5, *b); // Legal
Compile time size
There are plenty of examples of elements that cannot be known at compile time, but recursive types are a fun example.
If we try to define the following recursive type that emulates Lisp’s cons
(construct function) as a precursor to linked lists, we’ll get an error.
enum List { Cons(i32, List), Nil,}
The error indicates that the recursive type has an infinite size. This is a problem because we need to know how much memory to allocate at compile time.
error[E0072]: recursive type `smart_pointers::List` has infinite size --> src/cncpt/types/smart_pointers.rs:9:1 |9 | enum List { | ^^^^^^^^^10 | Cons(i32, List), | ---- recursive without indirection |help: insert some indirection (e.g., a `Box`, `Rc`, or `&`) to break the cycle |10 | Cons(i32, Box<List>), | ++++ +
The help message suggests using “indirection”. This indicates that Rust wants us to store a pointer to the value instead of the value itself. We can alleviate this by using Box<T>
, because its a smart pointer! More specifically, a pointer is a fixed size, regardless of the size of the data it points to. Conceptually this works similarly to the recursive type, but in reality the items are stored next to each other in heap memory.
enum List { Cons(i32, Box<List>), Nil,}
The Rc<T>
Type
The Rc<T>
type is useful in situations where we need to read (immutable) heap-allocated data from multiple, non-deterministically timed processes in the same thread. Because we dont know which process will finish first we need to guarantee that the data is valid until no processes require it anymore. Graph structures are one such example where multiple edges need to point to the same immutable node. Multiple ownership can be done with the reference counting type, appropriately abbreviated as Rc<T>
. This type tracks the number of references and doesn’t allow destructor invocation until none are left. This ensures that the underlying heap data doesn’t get freed until all references are finished.
use std::rc::Rc; // Ref counting type is not in the standard prelude
pub fn ref_counter() { let a = Rc::new("Hello"); println!("There are {} live references to \"{}\".", Rc::strong_count(&a), &a); { let b = Rc::clone(&a); println!("There are {} live references to \"{}\".", Rc::strong_count(&a), &a); } println!("There are {} live references to \"{}\".", Rc::strong_count(&a), &a);}
There are 1 live references to "Hello".There are 2 live references to "Hello".There are 1 live references to "Hello".
Interior Mutability With RefCel<T>
The RefCel<T>
type is useful for situations that require the interior mutability pattern. Interior mutability is a design pattern in Rust that allows you to mutate data even when there are immutable references to that data. See the Ownership rules portion of the Ownership Rule Summary section for more details about this pattern.
Strong Vs Weak References
Common Traits In Smart Pointers
The smart pointers covered in this section typically use the following traits.
The Deref
trait
The Deref
trait has only one required method; the deref()
method. The deref()
method returns a reference to the inner value, but doesn’t explicitly dereference it. When used with the unary dereference operator *
we can access those inner values.
let x: String = String::from("Hello"); let y: &str = &x;
// Compares String with &str // The Deref trait allows type coercion here assert_eq!(String::from("Hello"), y); assert_eq!(x, y);
let a: i8 = 5; let b = Box::new(a); assert_eq!(5, *b);
// The deref() is superfluous, it happens under the hood with the dereference operator on a type that implements Deref assert_eq!(5, *b.deref());
The Drop
trait
The Drop
trait allows us to customize what happens when a type instance goes out of scope. This trait is often used to release resources like files or network connections. This is similar to the free
function call in C. The difference here is that the Drop
trait allows us to define (implement) and run code automatically when an element is about to go out of scope. The drop()
function is the only required method for the trait, and indeed the only method for the trait. Variables are dropped in the reverse order of their creation.
The drop()
cannot be called directly. That is because the drop()
function is automatically called when the instanced type is about to go out of scope. If we call drop()
early, we could run into a double free error. We cant disable this automatic call, but we can the std::mem::drop()
function. This function takes ownership of its argument, calls the drop()
method (from Drop
) on it, and then prevents the value from being dropped again by marking it as uninitialized.
Smart Pointers & Memory Leaks
One of the most common ways to leak memory is with cyclic/circular reference counting types. In these scenarios the (strong) reference count of a Rc<T>
type never reaches 0 so the program never deallocates the object/memory. The way around this is generally to use a weak reference counted type as std::rc::Weak<T>
. The Rust book uses a list implementation to illustrate this concept. Just another one of the many reasons why linked lists in Rust are a bad idea. The book does explain that these situations are rare though.
Creating reference cycles is not easily done, but it’s not impossible either. If you have `RefCell` values that contain `Rc ` values or similar nested combinations of types with interior mutability and reference counting, you must ensure that you don’t create cycles; you can’t rely on Rust to catch them.