Strings
Rust technically doesn’t have a string type in the core language, not as you might know it from other languages anyway. This is due to a decision to treat strings closer to the way the computer actually handles character data rather than how you might think of strings as text. Instead, Rust provides an immutable primitive with all of Rust’s safety guarantees and a convenient wrapper API for more convenient string operations. Both the core language’s primitive and it’s API represent UTF-8 encoded character values as bytes. The byte values can be translated to scalar values (char
types), and ultimately to grapheme clusters (the closest thing to what you would call letters).
The str
Type
To understand the basics of working with strings and string data in Rust you must first understand three concepts: slices, dynamically sized type handling in Rust, and the string slice primitive str
. This section contains some parallel concepts so it may be helpful to read through this section and revisit it after reading the Ownership section as these concepts are closely aligned.
Strings and the string slice jump off where the discussion of Rust’s primitive types end. This subject bridges the gap between primitives and You can read more about slices on the Primitives page. The TL;DR is that in Rust slices represent a view into a contiguous collection of elements, and that the str
type is guaranteed to contain UTF-8 encoded data.
The string slice in Rust, identified with the str
keyword, is a special kind of slice that expands on the universal definition of a slice. String slices in Rust are immutable, dynamically-sized, and always guaranteed to contain UTF-8 encoded data. These characteristics are what make working with raw string slices difficult and is why Rust includes the String
wrapper for manipulating text. Before talking about the String
type lets explore the implications for the raw string slice type str
.
The str
type is a dynamically sized primitive type. Because you cannot know a dynamically sized type’s size at compile time Rust requires all dynamically sized types (DSTs) to Rust Rust cannot know the size of the type at compile time This means that you cannot create a str
type directly. The string slice is more commonly seen as a reference to UTF-8-encoded data in memory. The slice type is most commonly seen in its (immutable) borrowed form as &str
. Slices are actually just references to a contiguous sequence of elements within a collection, such as a Vector
. Slices do not have ownership because they are references. String slices can be declared and bound to variables. Typically speaking, we aren’t going to declare slices unless a function or method requires them as type.
let s: &str = "Hello, World"; // String literal bound to a referenced string slice
String literals like the one used in our Hello World program get converted to slices at compile time and are stored in the program’s binary. String literals exist in static memory, something we’ll cover later.
fn main() { println!("Hello World!"); // String slice as a literal}
Slices are immutable which is not always convenient or desirable. What if we want to work with a string of unknown size such as an input? What if we want to change the string? This is where the String
(with a capital S) type comes in.
The String wrapper
The String
type was added to the standard library to deal with string manipulations, though like most things in Rust it is immutable by default. The String
type has ownership. The String
type is actually a wrapper around the Vec<T>
type with some additional methods and guard rails. This String
type is more in line with what we typically think of as a string from other languages. The String
type has many of the same operations as the Vec<T>
due to their relationship.
Creating new Strings
Declaring and binding String
variable is easy with the new()
method.
let mut s: String = String::new();
We can also create a string using a literal or a variable with the from()
method.
let s: String = String::from("Hello"); //String from literallet mut sp = String::from(&s); //String from slice reference
We can then use any of the handful of methods on our new a String
type variable. To add characters we can “push” information to it as str
s.
sp.push_str(" world"); //We can only push to mutable variables!
All together with a print it may look something like this.
let s = String::from("Hello");let mut s = String::from(&s);sp.push_str(" world");println!("{}\n{}", s, sp);
HelloHello world
It is also possible to use the to_string()
method on any type that implements the Display
trait to create a new string. The to_string()
method works the same way as the from()
method. The choice between from()
and to_string()
is up to you and may depend on style or readability. Most of this text uses the from()
method because it just looks cleaner to the author’s eye. Whatever you choose, stay consistent!
let data = "Im literal";let s = data.to_string(); //Works with variableslet sp = "Im also literal".to_string(); //Also works with literals
Pulling slices from Strings
It is possible to reference the data inside a String.
pub fn caller() {
// Creates a String let name: String = String::from("Peter");
// Creates a slice using explicit dereferencing let name_dereference: &str = &(*name)[..];
// Creates a slice without dereferencing let name_reference: &str = &name[..];
// Creates a slice of the first 4 bytes let first_four: &str = &name[..4];
println!("First four bytes of {} is {}", name, first_four); print_greeting(name_reference);}fn print_greeting(name: &str) { println!("Hello, {}", name)}
Converting Strings to Vectors
Character encoding is a deep, dark abyss. Rust likes UTF-8 and so do I. Lets look at how to convert Strings to [u8]
and back easily. Generally speaking, if you’re working with ASCII characters and performance is critical, this method is gonna be direct and efficient. Rust also has a from_utf8()
method, but this checks if the byte array is valid UTF-8.
let s: String = "Peter™".to_string();
// Convert the whole string to a [u8] let bv1: Vec<u8> = s.as_bytes().to_vec(); // Takes the first 5 bytes let bv2: Vec<u8> = s[..5].as_bytes().to_vec();
println!("{:?}\n{:?}", bv1, bv2);
// Convert [u8] to a String // Lossy just inserts an unknown character for invalid UTF-8 let s2 = String::from_utf8_lossy(&bv1); println!("{}", s2);
// Stock method does a UTF-8 validation check and requires error handling if let Ok(s3) = String::from_utf8(bv1.clone()) { println!("{s3}") };
[80, 101, 116, 101, 114, 226, 132, 162][80, 101, 116, 101, 114]Peter™Peter™
Updating Strings
There are several ways to modify strings in Rust. This section covers the +
operator, format!
macro, push(), and push_str()
methods.
The +
operator uses the add()
method under the hood. It is important to see what this function is doing, because it results in some funky behaviors. Lets take a look at the add()
function’s signature for details.
fn add(self, s: &str) -> String {
The signature takes ownership of self
(otherwise it would be &self
), which in this case is a String
base. The function also takes and appends a referenced slice (&str
) to the String
. Because it takes ownership without implementing the Copy
trait this method moves the String
base which invalidates it after the operation. This all makes the operation more efficient than copying, as we’ve already seen. The compiler employs a process called dereference coercion to cast String
and literal values to &str
as &s[..]
(we’ll cover this process later.). Additionally, the underlying add()
function does not take ownership of the reference, so it is still valid after any concatenation operations.
let s: String = String::from("Hello"); //Base String let s_1 = s.clone(); let s2: &str = &String::from(" world"); //First example creates explicit &str let concat = s + s2; //Moves s, s2 is already a reference let concat2 = s_1 + s2; //Required cloned s due to move, s2 is a reference println!("s + s2: {}\ns clone with reused s2: {}", concat, concat2);
let s3 = String::from("Hello"); //New base String let s3_1 = s3.clone(); let s4 = String::from(" animals"); //String to add to base String let s5 = " wildlings"; //Literal to add to base String let concat3 = s3 + &s4; //Compiler can coerce String to &str let concat4 = s3_1 + &s5; //Required s3 clone due to move println!("s3 + s4: {}\ns3 clone + s5 literal: {}", concat3, concat4);
s + s2: Hello worlds clone with reused s2: Hello worlds3 + s4: Hello animalss3 clone + s5 literal: Hello wildlings
Concatenating strings with the +
operator can get unwieldy pretty quickly. There may be times when we want to format a string from a bunch of different elements, such as the time. For these applications we can use the format!
macro.
let mut time_now: Time = Time::time_constructor(Time::get_system_time()); let period: String = Time::set_12h_period(&mut time_now, &offset); let hour: i32 = Time::set_timezone(&mut time_now); let minute: String = time_now.format_minute(); let second: String = time_now.format_second(); return format!("{hour}:{minute}:{second} {period}");
3:47:22 PM
Note that slicing into strings (accessing array indexes) is often a bad idea because its not clear what the return type should be. As discussed, this could be a simple byte value, but it could also be a more complex Unicode character, grapheme cluster, or a string slice. For example, the following code works because the characters are 1-byte base Unicode.
let s = String::from("Peter Schmitz"); let given = &s[..5]; let family = &s[6..]; println!("Given name: {}\nFamily name: {}", given, family);
Given name: PeterFamily name: Schmitz
However, what if we want to access 2-byte characters?
let hello = "Здравствуйте"; let first_three = &hello[..3]; println!("First three characters: {}", first_three);
thread 'main' panicked at src/cncpt/types/compound/strings.rs:138:29:byte index 3 is not a char boundary; it is inside 'д' (bytes 2..4) of `Здравствуйте`note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
To handle this, Rust provides the bytes()
and chars()
methods to access elements of a string.