The Cow type is a mystery even for some intermediate-level Rust developers. Despite being defined as a simple two-variant enum
pub enum Cow<'a, B>
where
B: 'a + ToOwned + ?Sized,
{
Borrowed(&'a B),
Owned(<B as ToOwned>::Owned),
}
, it challenges the developers to understand the ownership and lifetimes, as well yet another mystery Borrow and ToOwned traits. As a result, programmers avoid using Cow
, which often leads to extra memory allocations (which are not cheap) and less efficient software.
What are the situations when you might consider using Cow
? Why does it have such a strange name? Let's try to find some answers today!
A function rarely modifying the data
Let's start with the most common and straightforward use case for Cow
type. It is a good illustration of the situation when most developers (including me!) encounter the Cow
for the first time.
Consider the following function accepting and modifying the borrowed data (in this case &str
):
fn remove_whitespaces(s: &str) -> String {
s.to_string().replace(' ', "")
}
fn main() {
let value = remove_whitespaces("Hello world!");
println!("{}", value);
}
As you can see, it does nothing but removes all white spaces from the string. What is wrong with it? What if in 99.9% of calls the string contains no white spaces? Or slight modification of the method when spaces should be removed based on some other condition.
In such cases, we could avoid to_string()
call and creation an unnecessary copy of the string. However, if we are to implement such logic, we can use neither String
no &str
type: the first one forces the memory allocation and the last is immutable.
This is the moment when Cow
plays its role. We can return Cow::Owned
when the string is modified and Cow::Borrowed(s)
otherwise:
use std::borrow::Cow;
fn remove_whitespaces(s: &str) -> Cow<str> {
if s.contains(' ') {
Cow::Owned(s.to_string().replace(' ', ""))
} else {
Cow::Borrowed(s)
}
}
fn main() {
let value = remove_whitespaces("Hello world!");
println!("{}", value);
}
The nice thing about Cow<str>
is that it could always be dereferenced into &str
later or converted into String
by calling into_owned. The into_owned
only allocates the memory if the string was originally borrowed.
A struct optionally owning the data
We often need to store references inside the structs. If we have no such need, you are likely ending up cloning data unnecessarily.
Consider
struct User<'a> {
first_name: &'a str,
last_name: &'a str,
}
Would not it be nice to be able to create a user with a static lifetime User<'static>
owning its own data? This way we could implement the method do_something_with_user(user)
accepting the same struct regardless of whether the data is cloned or borrowed. Unfortunately, the only way to create User<'static>
is by using &'static str
.
But what if we have a String
? We can solve the problem by storing not &'a str
, but Cow<'a, str>
inside the struct:
use std::borrow::Cow;
struct User<'a> {
first_name: Cow<'a, str>,
last_name: Cow<'a, str>,
}
This way, we can construct both owned and borrowed version of the User
struct:
impl<'a> User<'a> {
pub fn new_owned(first_name: String, last_name: String) -> User<'static> {
User {
first_name: Cow::Owned(first_name),
last_name: Cow::Owned(last_name),
}
}
pub fn new_borrowed(first_name: &'a str, last_name: &'a str) -> Self {
Self {
first_name: Cow::Borrowed(first_name),
last_name: Cow::Borrowed(last_name),
}
}
pub fn first_name(&self) -> &str {
&self.first_name
}
pub fn last_name(&self) -> &str {
&self.last_name
}
}
fn main() {
// Static lifetime as it owns the data
let user: User<'static> = User::new_owned("James".to_owned(), "Bond".to_owned());
println!("Name: {} {}", user.first_name, user.last_name);
// Static lifetime as it borrows 'static data
let user: User<'static> = User::new_borrowed("Felix", "Leiter");
println!("Name: {} {}", user.first_name, user.last_name);
let first_name = "Eve".to_owned();
let last_name = "Moneypenny".to_owned();
// Non-static lifetime as it borrows the data
let user= User::new_borrowed(&first_name, &last_name);
println!("Name: {} {}", user.first_name, user.last_name);
}
A clone on write struct
The examples above illustrate only one side of the Cow
: the ability to represent the data which borrowed or owned status is figured in not in compile time, but in runtime.
But why was it named Cow
then? Cow
stands for copy on write. The examples above illustrate only one side of the Cow
: the ability to represent the data which borrowed or owned status is figured in not in compile time, but in runtime.
The true power of Cow
comes with to_mut method. If the Cow
is owned, it simply returns the pointer to the underlying data, however if it is borrowed, the data is first cloned to the owned from.
It allows you to implement an interface based on the structures, lazily storing the references to the data and cloning it only if (and for the first time) the mutation is required.
Consider the code which receives the buffer of data in the form of &[u8]
. We would like to pass it over some logic, conditionally modifying the data (e.g. appending a few bytes) and consume the buffer as &[u8]
. Similar to the example above, we can't keep the buffer as &[u8]
as we won't be able to modify it, but converting it to Vec
would lead to the copy being made every time.
We can achieve the required behavior by representing the data as Cow<[u8]>
:
use std::borrow::Cow;
struct LazyBuffer<'a> {
data: Cow<'a, [u8]>,
}
impl<'a> LazyBuffer<'a> {
pub fn new(data: &'a[u8]) -> Self {
Self {
data: Cow::Borrowed(data),
}
}
pub fn data(&self) -> &[u8] {
&self.data
}
pub fn append(&mut self, data: &[u8]) {
self.data.to_mut().extend(data)
}
}
This way we can pass borrowed data around without cloning up until the moment when (and if) we need to modify it:
fn main() {
let data = vec![0u8; 10];
// No memory copied yet
let mut buffer = LazyBuffer::new(&data);
println!("{:?}", buffer.data());
// The data is cloned
buffer.append(&[1, 2, 3]);
println!("{:?}", buffer.data());
// The data is not cloned on further attempts
buffer.append(&[4, 5, 6]);
println!("{:?}", buffer.data());
}
Keep your own type inside it
Most likely you would end up using Cow<str>
or Cow<[u8]>
, but there are cases when you might want to store your own type inside it.
In order to use the Cow
with a user defined type, you would need to implemented owned and borrowed version of it. The owned and borrowed version must by tied together by the following trait boundaries:
- Owned version should implement the Borrow trait to produced a reference to the borrowed type
- The borrowed version should implement ToOwned trait to produce the owned type.
Implementation of the the Borrow trait is tricky and often unsafe. Indeed, in order for the fn borrow(&self) -> &Borrowed;
function to return a reference to Borrowed
typed, this reference should either be stored inside &self
or produced unsafely.
The above often means that the borrowed type is an unsized (also know as dynamically sized type. Their size is not known at compile time, so they can only exist as a pointer or a reference.
Have you ever wondered why we use &str
everywhere and nearly never use str
? You can't find the definition of the str
type in the standard library, it is a primitive type (part of the language). Since str is a dynamically sized type, it can only be instantiated through a pointer type, such as &str. Trait object dyn T
is another example of the dynamically sized type.
Imagine you would like to implement your own version of String
and str
type.
use std::borrow::{Borrow, Cow};
use std::ops::Deref;
#[derive(Debug)]
struct MyString {
data: String
}
#[derive(Debug)]
#[repr(transparent)]
struct MyStr {
data: str,
}
Since str
is unsized, so is MyStr
. You can then bound MyString
and MyStr
same way as String
and str
are bounded:
impl Borrow<MyStr> for MyString {
fn borrow(&self) -> &MyStr {
unsafe { &*(self.data.as_str() as *const str as *const MyStr) }
}
}
impl ToOwned for MyStr {
type Owned = MyString;
fn to_owned(&self) -> MyString {
MyString {
data: self.data.to_owned()
}
}
}
The unsafe pointer case inside the borrow
method has probably drawn your attention. While looking scary, it is the usual pattern in the standard library (have a look at e.g. Path
type implementation). Since MyStr
is a single field struct annotated with #[repr(transparent)]
, it is guarantied to have zero cost compile-time representation. It means we can safely cast the valid pointer to str
to the pointer to MyStr
and then convert it to a reference.
We could also optionally implement the Deref
trait for convenience and store MyString
and MyStr
into cow as well, taking all advantages provided.
impl Deref for MyString {
type Target = MyStr;
fn deref(&self) -> &Self::Target {
self.borrow()
}
}
fn main() {
let data = MyString { data: "Hello world".to_owned() };
let borrowed_cow: Cow<'_, MyStr> = Cow::Borrowed(&data);
println!("{:?}", borrowed_cow);
let owned_cow: Cow<'_, MyStr> = Cow::Owned(data);
println!("{:?}", owned_cow);
}
Borrow the type as dyn Trait
As mentioned above, the trait object is another example of dynamically sized type. Somewhat surprising, we can use Cow
in a similar manner to implement dynamic dispatch, similarly to Box<dyn Trait>
and Arc<dyn Trait>
.
Consider the following trait and struct implementations:
use std::borrow::{Borrow, Cow};
use std::fmt::Debug;
use std::ops::Deref;
trait MyTrait: Debug {
fn data(&self) -> &str;
}
#[derive(Debug)]
struct MyString {
data: String
}
impl MyTrait for MyString {
fn data(&self) -> &str {
&self.data
}
}
As MyString
implements MyTrait
, we can borrow &MyString
as &dyn MyTrait
:
impl<'a> Borrow<dyn MyTrait + 'a> for MyString {
fn borrow(&self) -> &(dyn MyTrait + 'a) {
self
}
}
We can also convert any MyTrait
implementation to MyString
:
impl ToOwned for dyn MyTrait {
type Owned = MyString;
fn to_owned(&self) -> MyString {
MyString {
data: self.data().to_owned()
}
}
}
Since we have defined Borrow
and ToOwned
, we can now put MyString
into Cow<dyn MyTrait>
:
fn main() {
let data = MyString { data: "Hello world".to_owned() };
let borrowed_cow: Cow<'_, dyn MyTrait> = Cow::Borrowed(&data);
println!("{:?}", borrowed_cow);
let owned_cow: Cow<'_, dyn MyTrait> = Cow::Owned(data);
println!("{:?}", owned_cow);
}
The above could be useful to implement, e.g. the mutable vector of the trait objects:
fn main() {
let data = MyString { data: "Hello world".to_owned() };
let cow1: Cow<'_, dyn MyTrait> = Cow::Borrowed(&data);
let data = MyString { data: "Hello world".to_owned() };
let cow2: Cow<'_, dyn MyTrait> = Cow::Owned(data);
let mut vector: Vec<Cow<'_, dyn MyTrait>> = vec![cow1, cow2];
}
Implement safe wrapper over FFI type
The above MyString
example is exciting but somewhat artificial. Let's consider the real-life pattern when you would like to store your own type inside the Cow
.
Imagine you are using the C library in your rust project. Let's say you receive a buffer of data from the C code in the form of the pointer *const u8
and length usize
. Say you would like to pass the data around the layer of the rust logic, possibly modifying it (does it trigger you to think about Cow
?). Finally, you might want to access the data (modified or not) in rust as &[u8]
or pass into another C function as the pointer *const u8
and length usize
.(Here we assume that this C function would not release the memory. If this assumption surprises you, consider reading 7 ways to pass a string between 🦀 Rust and C article)
As we would like to avoid cloning the data unnecessarily, we would represent the buffer as the following struct:
use std::borrow::{Borrow, Cow};
use std::fmt::{Debug, Formatter};
use std::ops::Deref;
struct NativeBuffer {
pub ptr: *const u8,
pub len: usize
}
This struct does not own its data, it borrows it from the C pointer with an unknown lifetime.
For convince only, we can implement the traits to access the buffer as &[u8]
slice and print it:
impl Borrow<[u8]> for NativeBuffer {
fn borrow(&self) -> &[u8] {
unsafe {
std::slice::from_raw_parts(self.ptr, self.len)
}
}
}
impl Deref for NativeBuffer {
type Target = [u8];
fn deref(&self) -> &Self::Target {
self.borrow()
}
}
impl Debug for NativeBuffer {
fn fmt(&self, f: &mut Formatter<'_>) -> std::fmt::Result {
let data: &[u8] = self.borrow();
write!(f, "NativeBuffer {{ data: {:?}, len: {} }}", data, self.len)
}
}
In order to store the NativeBuffer
in the Cow
we first need to define the owning version of it:
#[derive(Debug)]
struct OwnedBuffer {
owned_data: Vec<u8>,
native_proxy: NativeBuffer,
}
impl ToOwned for NativeBuffer {
type Owned = OwnedBuffer;
fn to_owned(&self) -> OwnedBuffer {
let slice: &[u8] = self.borrow();
let owned_data = slice.to_vec();
let native_proxy = NativeBuffer {
ptr: owned_data.as_ptr(),
len: owned_data.len()
};
OwnedBuffer {
owned_data,
native_proxy,
}
}
}
The trick is to borrow the data as a slice and convert it to Vec
. We also need to store the NativeBuffer
inside OwnedBuffer
. It contains a pointer to the data inside the vector and the length of it, so we could implement the Borrow
trait:
impl Borrow<NativeBuffer> for OwnedBuffer {
fn borrow(&self) -> &NativeBuffer {
&self.native_proxy
}
}
We can now define the method to mutate the data:
impl OwnedBuffer {
pub fn append(&mut self, data: &[u8]) {
self.owned_data.extend(data);
self.native_proxy = NativeBuffer {
ptr: self.owned_data.as_ptr(),
len: self.owned_data.len()
};
}
}
It is important to ensure to keep the native buffer pointers up to date.
We can finally put our borrowed buffer in the Cow
and implement the conditional mutation logic, for example:
fn main() {
// Simulates the data coming across FFI (from C)
let data = vec![1, 2, 3];
let ptr = data.as_ptr();
let len = data.len();
let native_buffer = NativeBuffer { ptr, len};
let mut buffer = Cow::Borrowed(&native_buffer);
// NativeBuffer { data: [1, 2, 3], len: 3 }
println!("{:?}", buffer);
// No data cloned
assert_eq!(buffer.ptr, ptr);
assert_eq!(buffer.len, len);
if buffer.len > 1 {
buffer.to_mut().append(&[4, 5, 6]);
// OwnedBuffer { owned_data: [1, 2, 3, 4, 5, 6], native_proxy: NativeBuffer { data: [1, 2, 3, 4, 5, 6], len: 6 } }
println!("{:?}", buffer);
// Data is cloned
assert_ne!(buffer.ptr, ptr);
assert_eq!(buffer.len, len + 3);
}
let slice: &[u8] = &buffer;
// [1, 2, 3, 4, 5, 6]
println!("{:?}", slice);
}
The buffer is only cloned if the length of it is bigger than 1.
Summary
I sincerely hope that this post helped to demystify the Cow
type and increase its adoption among the rust community! If you like the article, please put your reaction up and consider reading my other posts!
Top comments (7)
That assumption is only safe if you add
repr(transparent)
, otherwise the compiler is allowed to change the representation of the data. A reference to a str has a pointer and the length (aka, a fat pointer) but the compiler doesn't guarantee that those will be in the same order.The same applies if there was a struct in there, the compiler is allowed to add padding bytes on either side of the inner field.
doc.rust-lang.org/nomicon/other-re...
Good point.
I am looking into the definition of the Path struct in standard library:
and the comment of top of it is really confusing.
Is there a French influence in Rust or something?
We love cows because half of the french cuisine is based on milk
and we have 42th great expressions using cows as a metaphore.
For example "It's amazing" can be said as "C'est vachement 🐄 bien"!
Good article! Especially last two sections were really insightful.
Getting the feeling there is still long way to go in becoming somewhat proficient on the Rust field.
Some topics are really confusing not in the conceptual way, but in the way they are implemented and used in Rust.
Programming in Rust is in some way really more about Rust, not about the programming itself :))
Though there is really good, nice guts feeling once you rein the horses.
I think it's worth it! No hurry.
Still a long way to master Rust but thank you for the effort you put into this. It has helped me a lot.
Hi! Thank you for the article.
Regarding the second use: A struct optionally owning the data.
What is a rationale/reason behind using
User<'static>
instead of simplyUser<'a>
?Even though not important, the associated functions
seem to be redundant, don't they?
When you say 'a, what are the lifetime 'a bounds? Is it lifetime of &self?
If so, the return value would be valid as long as reference to self is valid, while we would like it to be valid for the rest the program lifetime (or until we drop it)