Introduction
You've probably heard many times by now that is C is not a memory safe language, and that is absolutely correct. Buffer overflows all over the place, memory leaks, and SEGFAULT had even become a taboo. In this small article we're gonna show you some unsafe C code, and we're going to rewrite it in Rust.
The problem
Consider this code:
// lifetimetest.c
#include <stdio.h>
#include <string.h>
char*
longest(char* a, char* b)
{
if(strlen(a) > strlen(b)){
return a;
}
return b;
}
main(void)
{
char a[] = "Hello";
char b[] = "KosherFoods";
char* res;
res = longest(a, b);
printf("%s", res);
}
Here we define the function longest
which takes two character pointers (Essentially strings) and it returns the longer string.
if you compile and run this program:
$ gcc lifetimetest.c -o lft.out && ./lft.out
you should get this output:
KosherFoods
Which is what we expected
now let's rewrite this program slightly
#include <stdio.h>
#include <string.h>
char*
longest(char* a, char* b)
{
if(strlen(a) > strlen(b)){
return a;
}
return b;
}
main(void)
{
char a[] = "Hello";
char* res;
{
char b[] = "KosherFoods";
res = longest(a, b);
}
printf("%s", res);
}
Here we moved b
into a different scope
if you compile and run this program:
$ gcc lifetimetest.c -o lft.out && ./lft.out
you should get this output:
KosherFoods
Nothing peculiar so far.
Let's rewrite the program once again!
#include <stdio.h>
#include <string.h>
char*
longest(char* a, char* b)
{
if(strlen(a) > strlen(b)){
return a;
}
return b;
}
main(void)
{
char a[] = "Hello";
char* res;
{
char b[] = "KosherFoods";
res = longest(a, b);
}
char ohnoo[] = "Plan 9 from User Space";
printf("%s", res);
}
Here we declared a new string variable after the second scope
if you compile and run this program:
$ gcc lifetimetest.c -o lft.out && ./lft.out
you should get this output:
Plan 9 from User Space
But.... How could this Happen?
Explanation
In C, strings are essentially character arrays, and the variables that "store" those strings are just pointers to the start of the array.
So longest
doesn't return the copy of the string, but rather, the pointer to the start of the string in memory. With this in mind let's continue. When we declare the variable b
in the inner scope it gets put in the memory of that scope, but once the scope closes it gets freed, this means that any other variable can be put in the memory where b
once was. so res
still points to let's say 0x000004
(Where b
once was), but now 0x000004
is the start of the ohnoo
string.
Rewrite in Rust
Let's discuss a couple of things before moving on to the implementation. In Rust there is a term "Lifetime", it's means, well.. the lifetime of a reference and it prevents dangling references. with that in mind let's continue.
First let's implement the longest function:
fn longest<'a>(a: &'a str, b: &'a str) -> &'a str {
if a.len() > b.len() {
a
} else {
b
}
}
Here 'a
is the lifetime specifier, which means that both of these parameters must live at least as long as 'a
, with this in mind let's continue.
Consider this code:
fn main() {
let a = String::from("APCHIHBALONGERSTRING");
let result: &str;
{
let b = String::from("Banana");
result = longest(a.as_str(), b.as_str());
println!("Longest: {}", result);
}
}
If we type:
$ cargo run
we should get the following output:
Finished dev [unoptimized + debuginfo] target(s) in 0.01s
Running `target/debug/lifetime_test`
Longest: APCHIHBALONGERSTRING
as you can the rust compiler didn't complain.
Now let's try using the lower lifetime borrow in the higher lifetime:
fn main() {
let a = String::from("APCHIHBALONGERSTRING");
let result: &str;
{
let b = String::from("Banana");
result = longest(a.as_str(), b.as_str());
}
println!("Longest: {}", result);
}
If we type:
cargo run
We should get the following output:
Compiling lifetime_test v0.1.0 (/home/ernest/projects/lifetime_test)
error[E0597]: `b` does not live long enough
--> src/main.rs:14:38
|
14 | result = longest(a.as_str(), b.as_str());
| ^^^^^^^^^^ borrowed value does not live long enough
15 | }
| - `b` dropped here while still borrowed
16 | println!("Longest: {}", result);
| ------ borrow later used here
For more information about this error, try `rustc --explain E0597`.
error: could not compile `lifetime_test` due to previous error
This happens because we borrow the reference to b
, but later we drop it (By exiting the scope) thus making the use of result
in the outer scope (Higher lifetime) invalid.
As a result, we can't get the same undefined behaviour as we did in C, unless you're explicitly using the unsafe
keyword.
Conclusion
In this article we provided an example of why C is dangerous and tricky to use. We've provided an example of the same Rust code and showed how Rust deals with such problems.
Note
Whilst i like Rust and i think it is great for low level things like: Audio libraries, video game engines, web servers, etc., i still don't think it's a good idea to rewrite even parts of the operating systems in Rust
Top comments (4)
Your C code should be using
const
for allchar*
, e.g.,char const*
.That aside, the faults of C have been known for quite a long time.
Because?
1st, Rust's assembly output is a bit more complicated than that of C (Or C++, Even with things like classes and generics), not a problem when dealing with a game engine, or a web server, but pretty important when writing an operating system (Or when writing on an embedded system). with that said, i'm not saying Rust is slow, it's surprisingly fast, what i'm saying is Rust's assembly is harder to debug than that of C or C++
2nd, this part is a bit biased but hear me out, i don't think we NEED TO rewrite even parts of the current operating systems in Rust (For example drivers). The reason for that is very simple,- C provides the bare minimum for writing that sort of stuff, and i don't think we need anything more.
In spite of this, i still like Rust. I'm looking forward to new operating systems being written in Rust, but i oppose the idea of rewriting the current ones
That could simply be due to an immature compiler. gcc way back in the 3.0 days was pretty poor too.
Rewriting anything just for the sake of rewriting it is generally a bad idea. There's simply no reason to replace debugged, stable code.
agreed