Rust generics are great, but they can bloat the final code, and have negative effects on compile times. This post discusses a "trick" to help fight those effects.
There exists a decently common pattern to combat these issues given a few preconditions are met. This pattern is used heavily in the standard library and popular crates.
However, when talking to a colleague recently they were unaware of this pattern (to be fair I wasn't fully aware of this pattern either until just a few years ago). This conversation made me wonder if there are others who also haven't yet seen this method which can have pretty big downstream affects!
Prelude
This post only addresses code bloat from generic functions, not generic structs.
Basics
Let's start by just showing a basic set of generics in Rust, and why some problems pop up.
When you use generic parameters in Rust, the compiler will actually generate a copy of your code for each concrete type it finds. For example:
fn genric<T>(param: T) {
// code...
}
fn main() {
generic(30); // T = i32
generic("foo"); // T = &'static str
}
Turns into something like:
fn genric_i32(param: i32) {
// code...
}
fn genric_str(param: &'static str) {
// code...
}
fn main() {
generic(30);
generic("foo");
}
While this produces extremely efficient code at runtime, there are a few potential downsides:
- If
generic<T>()
is of substantial size (in compiled binary form), that code is also duplicated leading to potential "bloat" - (Although I haven't gone spelunking into the compiler code to know for sure; I believe that) all the generated (roughly duplicate) code must also be compiled and optimized separately! This is a lot more code for
rustc
and LLVM to churn through -
generic<T>()
cannot be fully compiled until the compiler knows all the concrete types that will be used
Library Authors Beware
These concerns are most applicable to library authors, but can affect binary authors as well when their binary is split into a binary-consuming-an-internal-library.
Where this shows up is downstream consumers filing issues related to code bloat (it turned out many concrete types were used for T
), or slow compile times due to your library (your library heavily relies on generics that cannot be compiled "earlier" or the compiler is churning through all that extra code).
Preconditions
I mentioned earlier that the pattern we're about to discuss isn't a silver bullet and cannot be used in all circumstances. There is an important precondition that need to be met:
Important
The generic code must have a "preferred type" which usually means the generic
parameter is bounded and that bound represents some kind of type conversion
This should become more clear as we go along.
Immediate Dispatch to the Rescue
The solution is to immediately dispatch to a known type. In hindsight this may seem somewhat obvious, however if it doesn't that's OK! We're about to see this in action!
Note
This example will be extremely contrived in order to demonstrate the effects
In Action
First, lets define a trait that will be used as a generic bound.
trait Speak {
fn speak(&self) -> String;
}
Next, we define a generic function that can be used with any type that implements the Speak
trait.
fn generic_speak<T: Speak>(param: &T) {
println!("It says: {}", param.speak());
}
This could be thought of as our library boundary. Perhaps the library also has some concrete types it uses internally, but that is not a requirement.
It does not matter if the concrete types are internal, external, or a mix.
Lets define two concrete types that implement said trait as example:
struct Cat;
impl Speak for Cat {
fn speak(&self) -> String {
"meow".into()
}
}
struct Dog;
impl Speak for Dog {
fn speak(&self) -> String {
"woof".into()
}
}
Finally, we use that generic function with both a Cat
and a Dog
struct:
Note
We're creating a binary because it's easier to demonstrate :)
fn main() {
let whiskers = Cat;
let spot = Dog;
generic_speak(&whiskers);
generic_speak(&spot);
}
If we run it, we get what you probably expect:
$ cargo run --quiet
It says: meow
It says: woof
Counting Functions
Lets first back up and prove the claims I made at the beginning, that our generic function actually generates two concrete functions. To see that we can either cargo-llvm-lines
or cargo-bloat
.
I use both pretty extensively, so lets compare the output of both just for fun!
aside: cargo-llvm-lines
First, we'll use cargo-llvm-lines
to see the amount of LLVM IR generated:
Note
Both tools can generate a lot of output, so we could begrep
ing it down to
size but I'll make some readability edits instead.
$ cargo llvm-lines
Lines Copies Function name
----- ------ -------------
[ .. snip .. ]
80 (4.9%, 50.3%) 2 (4.9%, 17.1%) generic_speak
5 (0.3%, 98.3%) 1 (2.4%, 82.9%) <Cat as Speak>::speak
5 (0.3%, 98.6%) 1 (2.4%, 85.4%) <Dog as Speak>::speak
[ .. snip .. ]
Notice we do, in fact, have two copies of generic_speak
(as can be seen by the Copies
column) and one implementation function each for the concrete types (e.g. <Cat as Speak>::speak
which is the code from where we did impl
).
Speak for Cat
cargo-bloat
Contrasting the above with cargo-bloat
which shows the binary size of each function:
Note
cargo-bloat
by default only shows the largest 99 functions, but our example
is to tiny we tell it show us the largest 999 functions so we can be sure our
functions show up in the output.
$ cargo bloat -n 999
Analyzing target/debug/blog_demo
File .text Size Crate Name
[ .. snip .. ]
0.0% 0.1% 193B blog_demo generic_speak
0.0% 0.1% 193B blog_demo generic_speak
0.0% 0.0% 44B blog_demo <Dog as Speak>::speak
0.0% 0.0% 44B blog_demo <Cat as Speak>::speak
[ .. snip .. ]
Like carog-llvm-lines
we can see that we do, in fact, have two copies of generic_speak
and one implementation function each for the concrete types (e.g. <Cat as Speak>::speak
which is the code from where we did impl Speak for Cat
).
Note
For the rest of the post I'm going to omit the actual trait implementations
(<Dog as Speak>::speak
) for brevity, because those don't change, we still
implement those traits with actual code!
With cargo-bloat
we see that in the final binary both copies of generic_speak
are 193 bytes (for a total of 386 bytes).
Note
As the name implies I tend to prefercargo-bloat
when working on bloat
issues, because it looks at the final compiled binary size as opposed to just
the LLVM IR withcargo-llvm-lines
. Although I prefercargo-llvm-lines
when working on compile times.
Generic Bloat
Although contrived, one thing I like about this example is by pure line count, the generic_speak
function looks like almost no code! But this brings up another great source of bloat (especially when combined with the issue
described in this post!): macros!
aside: macros
Using an LSP expand function in my editor (although similar could be done with something like cargo-expand
) we see generic_speak
actually expands to something like this:
Note
You can't run this directly as it uses private rust internals, but it gives a
good view into the kind things macros expand to
fn generic_speak<T: Speak>(param: &T) {
{
{
std::io::_print(std::fmt::Arguments::new_v1(
&["It says: "],
&[std::fmt::ArgumentV1::new(
&(param.speak()),
std::fmt::Display::fmt,
)],
));
}
}
Trying Immediate Dispatch
If we take a step back we see that Speak::speak
just produces a String
and all the code inside generic_speak
really only needs that String
to operate.
The trick is that we We can add a private internal function that accepts the actual type we actually needed/wanted (i.e. String
in this case) instead of just sticking with the generic parameter.
Now generic_speak
looks like this:
fn generic_speak<T: Speak>(param: &T) {
fn generic_speak_string(param: String) {
println!("It says: {param}");
}
generic_speak_string(param.speak());
}
Note
The function could be external as well, it doesn't need to be defined within
the outer function scope. Although if it's not used anywhere else it makes
sense to define it within internal scope.
If we re-run cargo-bloat
we now see:
$ cargo bloat -n 999
Analyzing target/debug/blog_demo
File .text Size Crate Name
[ .. snip .. ]
0.0% 0.1% 160B blog_demo generic_speak::generic_speak_string
0.0% 0.0% 37B blog_demo generic_speak
0.0% 0.0% 37B blog_demo generic_speak
[ .. snip .. ]
We still have two concrete implementations of generic_speak
since we still have a generic function, however notice the actual code inside has gone down from 193 bytes to just 37 bytes (essentially enough to dispatch the other function). Now, all our "real code" lives in non-generic (and thus not
duplicated) generic_speak_string
internal function (160 bytes).
Doing the quick math of our duplicate generic functions + the new internal function we get a total 234 bytes versus the original total of 386 bytes!
This is just a contrived example, but imagine a real library with multiple generics and actually substantial sized functions!
Additionally, that inner function (generic_speak_string
) is able to be fully compiled right away since all it's types are fully known.
A Note on Inlining and Release Builds
You may have noticed if you compiled the examples in release mode, the functions don't appear at all!
$ cargo bloat -n 999 --release | grep speak
$
This is because the code is pretty trivial and Rust/LLVM are able to inline all the code and optimize this away. However, in a real world library that's not always possible.
Warning
When using this trick, you may need in some cases tell Rust/LLVM not to
inline your private wrapped function by using#[inline(never)]
if it turns
out the function is just getting inlined into all the generic functions
again. But that should only be done when you're sure that's the case, and the
additional code bloat is worse than the performance lost by not-inlining.
Why not just use a String
as the parameter instead of the generic?
Because this was a contrived example.
Also there are times you'll do something almost exactly like this purely for ergonomics. For example:
// Accepts anything that can be converted to &str cheaply
fn takes_stringish<S: AsRef<str>>(param: S) { /* code */ }
Sure, we could just take &str
as the parameter, but some types may be cheaply yet not ergonomically converted to a &str
. Using the generic parameter can give our library a nice ergonomic boost.
An Even More Contrived Example
At the risk of making this post too long, to show another example of one where it's less ergonomic to ask the user for exactly what we want.
Cases often comes up around type generics, e.g. we'd prefer to ask for a impl Iterator<Item = AsRef<str>>
when we're working with roughly a Vec<&str>
internally. Forcing the user to do the conversion isn't very ergonomic.
Let's do exactly this with our previous code, but instead of AsRef<str>
we'll use our Speak
as the bound.
If we change our generic_speak
and main
functions:
fn generic_speak<T: Speak>(param: impl Iterator<Item = T>) {
for item in param {
println!("It says: {}", item.speak());
}
}
fn main() {
use std::collections::HashSet;
let whiskers = vec![Cat, Cat];
let mut spots = HashSet::new();
spots.insert(Dog);
generic_speak(whiskers.into_iter());
generic_speak(spots.into_iter());
}
Notice we're taking two totally different collection types, a Vec<Cat>
and HashSet<Dog>
.
Re-running our example gives what we'd expect:
$ cargo run --quiet
It says: meow
It says: meow
It says: woof
And cargo-bloat
now shows:
$ cargo bloat -n 999
Analyzing target/debug/blog_demo
[ .. snip .. ]
0.0% 0.2% 440B blog_demo generic_speak
0.0% 0.1% 415B blog_demo generic_speak
[ .. snip .. ]
A total of 855 bytes.
Since you already know the drill, we can do the inner function dispatch thing (with a caveat listed below):
fn generic_speak<T: Speak>(param: impl Iterator<Item = T>) {
fn generic_speak_strings(params: Vec<String>) {
for item in params {
println!("It says: {item}");
}
}
generic_speak_strings(param.map(|s| s.speak()).collect());
}
To which cargo-bloat
reports:
$ cargo bloat -n 999
nalyzing target/debug/blog_demo
[ .. snip .. ]
0.0% 0.1% 405B blog_demo generic_speak::generic_speak_strings
0.0% 0.0% 80B blog_demo generic_speak
0.0% 0.0% 69B blog_demo generic_speak
0.0% 0.0% 65B blog_demo generic_speak::{{closure}}
0.0% 0.0% 65B blog_demo generic_speak::{{closure}}
[ .. snip .. ]
Even though we have these closures, it's still a total of 684 bytes instead of the original total of 855 bytes. Again, in this contrived example it's not that dramatic, but in the real world it is often quite dramatic as the original generic_speak
or later generic_speak_strings
could have quite a lot of code!
Warning
The Caveat! Conversion performance and allocations
I mentioned there is a caveat to the above. We created a whole new Vec<String>
to pass to the inner function which is another allocation and has it's own performance implications. However, perhaps this is a case were an extra allocation like that is acceptable compared to the code bloat and compile times. You'll have to be the judge in your specific case.
This is also partially only due to the strange contrived API I used for this example as well!
Conclusion
We learned that Rust will duplicate generic functions for all concrete types that use said function, which can cause a decent bit of code duplication and increase compile times.
By turning our generic function into a small wrapping shim, that immediately dispatches to an internal non-generic function only the shim gets duplicated while all our "real code" stays as a single logical function.
Top comments (0)