In many programming languages, manipulating strings is a crucial aspect of writing applications. The Rust programming language, known for its performance and safety, is no different. This article provides an in-depth exploration of strings in Rust, including the special notations and "tricks" that could simplify your coding experience.
Understanding Basic Strings in Rust
At its most basic level, a string in Rust is represented as a sequence of Unicode scalar values encoded as a stream of UTF-8 bytes. Strings are created using double quotes ""
.
let s = "Hello, World!";
In this code snippet, s
is a string that contains the text "Hello, World!".
String Literals and String Slices
In Rust, a string literal is a slice (&str) that points to a specific section of our program's binary output – which is read-only and thus immutable. This is also why string literals are sometimes referred to as 'static strings'.
let s: &'static str = "Hello, World!";
Here, s
is a string slice pointing to the string literal "Hello, World!".
Raw Strings
In Rust, the r
before a string literal denotes a raw string. Raw strings ignore all escape characters and print the string as it is. This is helpful when you want to avoid escaping backslashes in your strings, for example, in the case of regular expressions or file paths.
let s = r"C:\Users\YourUser\Documents";
Byte Strings
Rust also has the concept of byte strings. They're similar to text strings, but they're constructed of bytes instead of characters. You can create a byte string by prefixing a string literal with a b
.
let bs: &[u8; 4] = b"test"; // bs is a byte array: [116, 101, 115, 116]
Raw Byte Strings
A raw byte string is a combination of raw strings and byte strings. This type of string is useful for including byte sequences that might not be valid UTF-8. A raw byte string is created by prefixing a string literal with br
.
let raw_bs = br"\xFF"; // raw_bs is a byte array: [92, 120, 70, 70]
Escaping in Raw Strings
If you need to include quotation marks in a raw string, you can do so by adding additional #
symbols on both sides of the string.
let s = r#"This string contains "quotes"."#;
Multiline Raw Strings
Raw strings can be multiline. The content of the string starts at the first line that does not contain only a #
.
let s = r####"
This string contains "quotes".
It also spans multiple lines.
"####;
Keep in mind! That the number of hash symbols (#) preceding and succeeding the string delimiters be the same and at least one. Furthermore, within the raw string, various formatted elements such as tabs and others can be included without escape sequences. (Thanks to Nirmalya Sengupta for his kind suggestion in the comments below.)
Unicode Strings
String literals in Rust can also contain any valid Unicode characters.
let s = "Hello, 世界!";
Character Escapes
Regular (non-raw) string literals support several escape sequences:
\\
Backslash\"
Double quote\n
Newline\r
Carriage return\t
Tab\0
Null
There are also Unicode escapes:
\u{7FFF}
Unicode character (variable length, up to 6 digits)\u{1F600}
Unicode emoji
Conclusion
In summary, Rust provides powerful and flexible tools for working with strings. From raw and byte strings to Unicode and escape sequences,
Top comments (2)
Very useful!
It may be beneficial to your readers if you mention, that in multiline raw strings, the number of '#' is important because:
Playground
Just a suggestion.
Great point. Updated! Thanks a lot.