Skip to content

Strings

Rust technically doesn’t have a string type in the core language, not as you might know it from other languages anyway. This is due to a decision to treat strings closer to the way the computer actually handles character data rather than how you might think of strings as text. Instead, Rust provides an immutable primitive with all of Rust’s safety guarantees and a convenient wrapper API for more convenient string operations. Both the core language’s primitive and it’s API represent UTF-8 encoded character values as bytes. The byte values can be translated to scalar values (char types), and ultimately to grapheme clusters (the closest thing to what you would call letters).

The str Type

To understand the basics of working with strings and string data in Rust you must first understand three concepts: slices, dynamically sized type handling in Rust, and the string slice primitive str. This section contains some parallel concepts so it may be helpful to read through this section and revisit it after reading the Ownership section as these concepts are closely aligned.

Strings and the string slice jump off where the discussion of Rust’s primitive types end. This subject bridges the gap between primitives and You can read more about slices on the Primitives page. The TL;DR is that in Rust slices represent a view into a contiguous collection of elements, and that the str type is guaranteed to contain UTF-8 encoded data.

The string slice in Rust, identified with the str keyword, is a special kind of slice that expands on the universal definition of a slice. String slices in Rust are immutable, dynamically-sized, and always guaranteed to contain UTF-8 encoded data. These characteristics are what make working with raw string slices difficult and is why Rust includes the String wrapper for manipulating text. Before talking about the String type lets explore the implications for the raw string slice type str.

The str type is a dynamically sized primitive type. Because you cannot know a dynamically sized type’s size at compile time Rust requires all dynamically sized types (DSTs) to Rust Rust cannot know the size of the type at compile time This means that you cannot create a str type directly. The string slice is more commonly seen as a reference to UTF-8-encoded data in memory. The slice type is most commonly seen in its (immutable) borrowed form as &str. Slices are actually just references to a contiguous sequence of elements within a collection, such as a Vector. Slices do not have ownership because they are references. String slices can be declared and bound to variables. Typically speaking, we aren’t going to declare slices unless a function or method requires them as type.

let s: &str = "Hello, World"; // String literal bound to a referenced string slice

String literals like the one used in our Hello World program get converted to slices at compile time and are stored in the program’s binary. String literals exist in static memory, something we’ll cover later.

fn main() {
println!("Hello World!"); // String slice as a literal
}

Slices are immutable which is not always convenient or desirable. What if we want to work with a string of unknown size such as an input? What if we want to change the string? This is where the String (with a capital S) type comes in.

The String wrapper

The String type was added to the standard library to deal with string manipulations, though like most things in Rust it is immutable by default. The String type has ownership. The String type is actually a wrapper around the Vec<T> type with some additional methods and guard rails. This String type is more in line with what we typically think of as a string from other languages. The String type has many of the same operations as the Vec<T> due to their relationship.

Creating new Strings

Declaring and binding String variable is easy with the new() method.

let mut s: String = String::new();

We can also create a string using a literal or a variable with the from() method.

let s: String = String::from("Hello"); //String from literal
let mut sp = String::from(&s); //String from slice reference

We can then use any of the handful of methods on our new a String type variable. To add characters we can “push” information to it as strs.

sp.push_str(" world"); //We can only push to mutable variables!

All together with a print it may look something like this.

let s = String::from("Hello");
let mut s = String::from(&s);
sp.push_str(" world");
println!("{}\n{}", s, sp);
Hello
Hello world

It is also possible to use the to_string() method on any type that implements the Display trait to create a new string. The to_string() method works the same way as the from() method. The choice between from() and to_string() is up to you and may depend on style or readability. Most of this text uses the from() method because it just looks cleaner to the author’s eye. Whatever you choose, stay consistent!

let data = "Im literal";
let s = data.to_string(); //Works with variables
let sp = "Im also literal".to_string(); //Also works with literals

Pulling slices from Strings

It is possible to reference the data inside a String.

pub fn caller() {
// Creates a String
let name: String = String::from("Peter");
// Creates a slice using explicit dereferencing
let name_dereference: &str = &(*name)[..];
// Creates a slice without dereferencing
let name_reference: &str = &name[..];
// Creates a slice of the first 4 bytes
let first_four: &str = &name[..4];
println!("First four bytes of {} is {}", name, first_four);
print_greeting(name_reference);
}
fn print_greeting(name: &str) {
println!("Hello, {}", name)
}

Converting Strings to Vectors

Character encoding is a deep, dark abyss. Rust likes UTF-8 and so do I. Lets look at how to convert Strings to [u8] and back easily. Generally speaking, if you’re working with ASCII characters and performance is critical, this method is gonna be direct and efficient. Rust also has a from_utf8() method, but this checks if the byte array is valid UTF-8.

let s: String = "Peter™".to_string();
// Convert the whole string to a [u8]
let bv1: Vec<u8> = s.as_bytes().to_vec();
// Takes the first 5 bytes
let bv2: Vec<u8> = s[..5].as_bytes().to_vec();
println!("{:?}\n{:?}", bv1, bv2);
// Convert [u8] to a String
// Lossy just inserts an unknown character for invalid UTF-8
let s2 = String::from_utf8_lossy(&bv1);
println!("{}", s2);
// Stock method does a UTF-8 validation check and requires error handling
if let Ok(s3) = String::from_utf8(bv1.clone()) {
println!("{s3}")
};
[80, 101, 116, 101, 114, 226, 132, 162]
[80, 101, 116, 101, 114]
Peter™
Peter™

Updating Strings

There are several ways to modify strings in Rust. This section covers the + operator, format! macro, push(), and push_str() methods.

The + operator uses the add() method under the hood. It is important to see what this function is doing, because it results in some funky behaviors. Lets take a look at the add() function’s signature for details.

fn add(self, s: &str) -> String {

The signature takes ownership of self (otherwise it would be &self), which in this case is a String base. The function also takes and appends a referenced slice (&str) to the String. Because it takes ownership without implementing the Copy trait this method moves the String base which invalidates it after the operation. This all makes the operation more efficient than copying, as we’ve already seen. The compiler employs a process called dereference coercion to cast String and literal values to &str as &s[..] (we’ll cover this process later.). Additionally, the underlying add() function does not take ownership of the reference, so it is still valid after any concatenation operations.

let s: String = String::from("Hello"); //Base String
let s_1 = s.clone();
let s2: &str = &String::from(" world"); //First example creates explicit &str
let concat = s + s2; //Moves s, s2 is already a reference
let concat2 = s_1 + s2; //Required cloned s due to move, s2 is a reference
println!("s + s2: {}\ns clone with reused s2: {}", concat, concat2);
let s3 = String::from("Hello"); //New base String
let s3_1 = s3.clone();
let s4 = String::from(" animals"); //String to add to base String
let s5 = " wildlings"; //Literal to add to base String
let concat3 = s3 + &s4; //Compiler can coerce String to &str
let concat4 = s3_1 + &s5; //Required s3 clone due to move
println!("s3 + s4: {}\ns3 clone + s5 literal: {}", concat3, concat4);
s + s2: Hello world
s clone with reused s2: Hello world
s3 + s4: Hello animals
s3 clone + s5 literal: Hello wildlings

Concatenating strings with the + operator can get unwieldy pretty quickly. There may be times when we want to format a string from a bunch of different elements, such as the time. For these applications we can use the format! macro.

let mut time_now: Time = Time::time_constructor(Time::get_system_time());
let period: String = Time::set_12h_period(&mut time_now, &offset);
let hour: i32 = Time::set_timezone(&mut time_now);
let minute: String = time_now.format_minute();
let second: String = time_now.format_second();
return format!("{hour}:{minute}:{second} {period}");
3:47:22 PM

Note that slicing into strings (accessing array indexes) is often a bad idea because its not clear what the return type should be. As discussed, this could be a simple byte value, but it could also be a more complex Unicode character, grapheme cluster, or a string slice. For example, the following code works because the characters are 1-byte base Unicode.

let s = String::from("Peter Schmitz");
let given = &s[..5];
let family = &s[6..];
println!("Given name: {}\nFamily name: {}", given, family);
Given name: Peter
Family name: Schmitz

However, what if we want to access 2-byte characters?

let hello = "Здравствуйте";
let first_three = &hello[..3];
println!("First three characters: {}", first_three);
thread 'main' panicked at src/cncpt/types/compound/strings.rs:138:29:
byte index 3 is not a char boundary; it is inside 'д' (bytes 2..4) of `Здравствуйте`
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

To handle this, Rust provides the bytes() and chars() methods to access elements of a string.