Want to see disappearing data in Go caused by an innocent append of data to an array? Can this happen in Rust? Check this out!
I was playing with Go and Rust. When I was reading slices intro in Go, the following question popped up in my head:
For Go:
When
append(...)
is safe?
After deeper thinking I realized, that the answer is the same as for the next question!
For Rust:
What's the difference between
Vec<SomeType>
and&[SomeType]
?
The text answers the questions giving an interesting insight into design differences between the two languages. The case study below shows data corruption caused by naive use of append(...)
.
Don't forget to lookup attached ⭐ GIT repo!
Album software in Go
Let's write simple album software to manage photos. This will be a web server in Go.
Meet Bob who is passionate about photography.
Bob likes to take photos. He gives them names, to be able to find them easily at drunken parties.
// Photo was taken in particular time using camera.
// Has non-unique name given by user.
type Photo struct {
ID int
Name string
DateTime time.Time
}
Bob groups photos into albums, such as "Nature", "Cars" or "Egypt".
// Album is collection of photos sorted by time.
// Has non-unique name given by user.
type Album struct {
ID int
Name string
Photos []Photo
}
He wants to easily find photos from his visit to Boston (which took place from 2019-06-15 to 2019-06-17).
// GetPhotos returns photos in given time interval.
func (album *Album) GetPhotos(start, end time.Time) []Photo {
var i, j int
// because album is already sorted by time,
// we just calculate i and j containing photos
// from the requested interval.
... // omitted
return p[i:j]
}
Bob wants to impress friends by showing vacation pictures with pyramids.
// GetHolidayPhotos gets photos from June.
func GetHolidayPhotos(album *Album, year int) []Photo {
photos := album.GetPhotos(
time.Date(year, 6, 1, 0, 0, 0, 0, time.UTC),
time.Date(year, 7, 1, 0, 0, 0, 0, time.UTC),
)
return photos
}
Seems OK.
Just one more thing...
Bob would like also to show photos from the winter holidays at his mother's home. Fortunately, we can adapt GetHolidayPhotos
with append(...)
:
decemberPhotos := album.GetPhotos(
time.Date(year, 12, 1, 0, 0, 0, 0, time.UTC),
time.Date(year+1, 1, 1, 0, 0, 0, 0, time.UTC),
)
// photos already have photos from June.
photos = append(photos, decemberPhotos...)
Of course, we wrote nice unit tests like this:
func TestGetHolidayPhotos(t *testing.T) {
a := createAlbum() // factory creating example album
got := GetHolidayPhotos(a, 2021)
want := []Photo{
{
ID: 2,
Name: "Camellia",
DateTime: time.Date(2021, 6, 12, 14, 0, 0, 0, time.UTC),
},
..., // omitted
{
ID: 7,
Name: "More snow",
DateTime: time.Date(2021, 12, 15, 14, 0, 0, 0, time.UTC),
},
}
if !reflect.DeepEqual(got, want) {
t.Errorf("Got = %v, want %v", got, want)
}
}
Tests passing, git commit, push, done!
The next day...
Stars told us, that something is wrong.
You had a nightmare where Bob's photos from Egypt were eaten by mummies.
GetHolidayPhotos
is a query, in other words, it shouldn't modify the album. Let's make additional unit test:
func TestGetHolidayPhotosNotMutatingInput(t *testing.T) {
album := createAlbum()
// (1) we serialize album to JSON.
serializedAlbum, err := json.Marshal(album)
if err != nil {
t.Fatalf("Got error %v", err)
}
// (2) perform the query.
GetHolidayPhotos(album, 2021)
// (3) serialize after the query.
postmortemSerializedAlbum, err := json.Marshal(album)
if err != nil {
t.Fatalf("Got error %v", err)
}
// (4) serialized album shouldn't change.
if !bytes.Equal(postmortemSerializedAlbum, serializedAlbum) {
t.Errorf(
"Got postmortemSerializedAlbum = %v, want serializedAlbum = %v",
string(postmortemSerializedAlbum[:]),
string(serializedAlbum[:]),
)
}
}
What do you expect - will it pass?
$ go test
--- FAIL: TestGetHolidayPhotosNotMutatingInput (0.00s)
album_test.go:155: Got postmortemSerializedAlbum = {"ID":0, ...
FAIL
exit status 1
It failed! 💥 GetHolidayPhotos
is corrupting input data! 😱
What happened?
Let's look at what happens inside GetHolidayPhotos
in the failing test.
First, we get photos from June:
photos := album.GetPhotos( /* June date range */ )
Now photos
points to contents of album.Photos
- photos
is equal to album.Photos[2..4]
.
It's how Go sees photos
slice:
The diagram has a few parts:
- All items (gray, orange, blue) are in
album.Photos
. For examplealbum.Photos[2]
has a photo of camellia. -
Orange items are referenced by the
photos
slice. For example,photos[1]
has a photo of daisy. The length of the slice islen(photos) = 2
.
Next December photos are appended:
photos = append(photos, decemberPhotos...)
Blue elements are treated by Go as free slots for append
.
Let's see how the photos
slice looks after the append:
Some blue elements were replaced with photos from December. Thus slice photos
indeed contains photos from June and December.
However, as you can see from the diagram, this operation destroyed the contents of album.Photos
! Photo of "Forget Me Not" was discarded, and there is a duplicate photo of snow from 2021-12-11 in album.Photos[4]
and album.Photos[6]
.
This effect was already described from a technical perspective. Below I show a simple rule of thumb to avoid such issues from a semantic perspective.
What is array duality?
I propose in this article two forms in which you can meet arrays:
- owned - you can append to the array,
- borrowed - you cannot append to the array.
Owned are used to store data, which might shrink or expand. It could be a repository of users or a list of currently opened files.
Borrowed are pointing to another array or its subarray. It's a kind of view. Could be used to pass one sentence from a string to a function. Might be returned as a query result.
In our album software album.Photos
is owned array, because the album contains/manages photos.
On the other hand, album.GetPhotos(start_time, end_time)
gives borrowed array - it returns the slice album.Photos[i:j]
. This way it does not have to allocate a new array - instead, it points to the original array from the album.
Let's look, at how the duality looks in Go and Rust.
In Go
In Go, we need a bit of self-discipline from a developer. There is no clear indicator, of whether an array is owned or borrowed.
However, sometimes it's easy to infer this from context. In the case of Album
, we know that the album contains photos. We could add a method AddPhoto(photo Photo)
to make this explicit.
When we subslice an array, usually the result is a borrowed array, as in album.GetPhotos
.
The exception is subslicing used to pop elements from the end of the array:
// myPhotos[5], myPhotos[6], myPhotos[7], ..., are discarded.
myPhotos = myPhotos[:5]
In Rust
This whole "array duality" was inspired by the ownership model in Rust. The distinction between owned and borrowed array is explicitly marked by type.
In Rust owned array is represented by Vec<Photo>
. You can push and pop elements:
let mut photos = Vec::new();
photos.push(Photo { ... });
A borrowed array is represented by a slice &[Photo]
. You get it by subslicing an array:
let june_photos = &photos[2..5];
However, Rust forbids to append to borrowed array:
june_photos.push(Photo { ... });
gives compile error:
error[E0599]: no method named
push
found for reference&[Photo]
in the current scope
That's it!
Let's summarize it with a table:
owned | borrowed | |
---|---|---|
Can append? | Yes | No |
Go type | []Photo |
[]Photo |
Rust type | Vec<Photo> |
&[Photo] |
Bugfix
Let's make a walk-through and:
- ask whether a given array is owned or borrowed,
- make sure, that we do not append to borrowed arrays.
This way the bug in the Go server should be fixed.
The original code:
// GetHolidayPhotos gets photos from June and December.
func GetHolidayPhotos(album *Album, year int) []Photo {
photos := album.GetPhotos(
time.Date(year, 6, 1, 0, 0, 0, 0, time.UTC),
time.Date(year, 7, 1, 0, 0, 0, 0, time.UTC),
)
// (1) album.GetPhotos gives *borrowed* array.
decemberPhotos := album.GetPhotos(
... // omitted
)
photos = append(photos, decemberPhotos...)
// (2) we are appending to *borrowed* array - BAD!
return photos
}
We must ensure, that we do not append(...)
to a borrowed array. Fixed code:
// GetHolidayPhotos gets photos from June and December.
func GetHolidayPhotos(album *Album, year int) []Photo {
var photos []Photo
// (1) we create a new empty array.
// It's *owned* array.
junePhotos := album.GetPhotos(
time.Date(year, 6, 1, 0, 0, 0, 0, time.UTC),
time.Date(year, 7, 1, 0, 0, 0, 0, time.UTC),
)
photos = append(photos, junePhotos...)
// (2) Appending to *owned* array - OK.
decemberPhotos := album.GetPhotos(
time.Date(year, 12, 1, 0, 0, 0, 0, time.UTC),
time.Date(year+1, 1, 1, 0, 0, 0, 0, time.UTC),
)
photos = append(photos, decemberPhotos...)
// (3) Appending to *owned* array - OK.
return photos
}
Notice that appending to empty slice in Go is essentially a copy of the array in the argument.
Now test TestGetHolidayPhotosNotMutatingInput
passes!
Rewrite It in Rust!
Out of curiosity, I've ported the software to Rust.
Album
and Photo
structs are defined similarly as in Go. It's noteworthy that Album
has explicitly owned an array of photos:
/// Album is a collection of photos sorted by time.
/// Has a non-unique name given by the user.
#[derive(Debug, PartialEq, Eq, Clone, Serialize, Deserialize)]
pub struct Album {
pub id: u64,
pub name: String,
pub photos: Vec<Photo>,
}
album.get_photos(..)
returns borrowed array, notice returned type:
impl Album {
/// Returns photos in given time interval.
pub fn get_photos(
&self,
start: DateTime<Utc>,
end: DateTime<Utc>,
) -> &[Photo] {
let mut i: usize;
let mut j: usize;
// because album is already sorted by time,
// we just calculate i and j containing photos
// from the requested interval.
... // omitted
&self.photos[i..j]
}
}
And get_holiday_photos
:
/// Gets photos from June and December.
pub fn get_holiday_photos(album: &Album, year: i32) -> Vec<Photo> {
let mut photos = Vec::new();
// (1) we create a new empty array.
// It's *owned* array, because it's type is `Vec<Photo>`.
let june_photos = album.get_photos(
DateTime::from_utc(
NaiveDate::from_ymd(year, 6, 1).and_hms(0, 0, 0),
Utc,
),
DateTime::from_utc(
NaiveDate::from_ymd(year, 7, 1).and_hms(0, 0, 0),
Utc,
),
);
photos.extend_from_slice(june_photos);
// (2) Appending to *owned* array - OK.
let december_photos = album.get_photos(
... // omitted
);
photos.extend_from_slice(december_photos);
// (3) Appending to *owned* array - OK.
photos
}
Trade-off
This case study highlights the differences between Go and Rust.
Rust requires a developer to declare explicitly how data will be used. Thus the compiler can statically verify this promise - which is useful for a team of developers. Go requires more self-discipline (the most notable example is passing data among threads).
Sometimes a library can take advantage of such declarations - for instance, JSON deserialization can take a borrowed substring of input, instead of copying it (a.k.a. "zero-copy deserialization") - more in the documentation of serde library.
On the other hand, Go has a lower learning curve - no discussion about vectors, ownership, or async stuff. A lot of problems can be solved by defensive programming, like array copying before append or sharing memory by communicating. You do not have to know about the existence of cows in the standard library...
Cows?
Up to this point, I discussed two disjoint scenarios:
- always returning from function owned array,
- always returning from function borrowed array.
However, there is a third case: function might at runtime dynamically decide, whether to return owned or borrowed array.
Bob would like to show his holiday photos, but only from summer.
// GetHolidayPhotos gets photos from June and December.
func GetHolidayPhotos(
album *Album,
year int,
june, december bool,
) []Photo
Instead of allocating always a new (owned) array, we can sometimes return borrowed array:
junePhotos := album.GetPhotos(
... // omitted
)
// (1) `junePhotos` is borrowed slice from album.Photos.
if june && !december {
return junePhotos
// (2) returning in this case *borrowed* array.
}
// (3) Original code returning *owned* array.
As was shown in the table, owned and borrowed arrays have the same type in Go.
And in Rust? Which type should be returned?
/// Gets photos from June and December.
pub fn get_holiday_photos(
album: &Album,
year: i32,
june: bool,
december: bool,
) -> Vec<Photo> // or &[Photo] ?
Let's try similar optimization as in Go:
if june && !december {
return june_photos; // returning &[Photo]
}
gives error
error[E0308]: mismatched types
because Vec<Photo>
is a different thing than &[Photo]
.
This shows the steeper learning curve in Rust caused by the requirement of an explicit declaration of an array type (owned vs borrowed). To know how to solve this, you must be aware that cows exist and read the mysteriously named article "The Secret Life of Cows"!
Rust has a distinct type for such dynamic ownership: Cow<[Photo]>
, which might contain owned or borrowed array. The first time I've met a cow in the wild was search and replace string function in regex library.
For the sake of completeness, I give there updated table:
statically owned | statically borrowed | dynamic | |
---|---|---|---|
Can append? | Yes | No | No |
Go type | []Photo |
[]Photo |
[]Photo |
Rust type | Vec<Photo> |
&[Photo] |
Cow<[Photo]> |
Conclusions
The text described "array duality" in Go and Rust, showing when it is correct to append a new value to some array.
The way the concept is handled in these post-2010 languages shows philosophical differences between them. Rust requires a developer to explicitly state the duality of each array in code. On the other hand, Go doesn't burden developers with such annotations, instead requires self-discipline from a developer.
Top comments (4)
Fascinating post
Thanks! :)
Hi Michael, thanks for the explanation of the differences between the two approaches!
Have you explored copy() in Go? I wonder if that'd work for your use case.
I guess that you suggest using
copy()
inGetHolidayPhotos
.Yes, it could be implemented like this:
junePhotos
,decemberPhotos
.result := make([]Photo, len(junePhotos) + len(decemberPhotos))
junePhotos
anddecemberPhotos
toresult
, calculating right destination indices.This is correct. Notice, that it is required to calculate total output size before copying.
bytes.Join
applies a similar approach.However, in many cases it might not be so easy to calculate final output size - consider I/O with files or database. Then
append()
is a better solution.