Rust procedural macros are one of the most exciting feature of the language. They enable you to inject code at compile time, but differently from the method used for generics by monomorphization. Using very specific crates, you can build new code totally from scratch.
I decided to write this article to share my experience, because event though the different resources are more and more widespread, it's not really straightforward at first sight.
Let's see how it works.
Building a procedural derive macro
The operating principle of the procedural macros is quite simple: take a piece of code, called an input TokenStream, convert it to an abstract syntax tree (ast) which represents the internal structure of that piece for the compiler, build a new TokenStream from what you've got at input (using the syn::parse() method), and inject it in the compiler as an output piece of code.
Using a procedural derive macro
A derive macro is used by declaring the
#[derive()]
attribute, like for example the well-known:
#[derive(Debug)]
Building a procedural derive macro
Suppose you want to create a WhoAmI derive macro, to just print out the name of the structure under the derive statement:
#[derive(WhoAmI)]
struct Point {
x: f64,
y: f64
}
What you need to do:
- create a brand new lib crate (procedural macros must be defined in their own crate, otherwise if you try to use the macro in the same one, you face the following error: can't use a procedural macro from the same crate that defines it)
$ cargo new --lib whoami
- add the required dependencies to Cargo.toml and flags:
[lib]
proc-macro = true
[dependencies]
syn = { version = "1.0.82", features = ["full", "extra-traits"] }
quote = "1.0.10"
- define a new regular fn Rust fonction like this one in lib.rs:
use proc_macro::TokenStream; // no need to import a specific crate for TokenStream
use syn::parse;
// Generate a compile error to output struct name
#[proc_macro_derive(WhoAmI)]
pub fn whatever_you_want(tokens: TokenStream) -> TokenStream {
// convert the input tokens into an ast, specially from a derive
let ast: syn::DeriveInput = syn::parse(tokens).unwrap();
panic!("My struct name is: <{}>", ast.ident.to_string());
TokenStream::new()
}
As you can't use the regular Rust macros to print out some information on stdout (like println!()), the only way is to panic with an output message, to stop the compiler and tell that guy to output the message for you. Not really convenient to debug, nor easy to fully understand the nuts and bolts of a procedural macro !
Now, in order to use that awesome macro (not really handy because it won't compile):
- you have to define a new crate:
$ cargo new thisisme
- add our macro crate as a dependency:
[dependencies]
# provided both crates are on the same directory level, otherwise replace by your crate's path
whoami = { path = "../whoami" }
- replace main.rs source code with:
// import our crate
use whoami::WhoAmI;
#[derive(WhoAmI)]
struct Point {
x: f64,
y: f64
}
fn main() {
println!("Hello, world!");
}
- and compile the whole project:
error: proc-macro derive panicked
--> src/main.rs:3:10
|
3 | #[derive(WhoAmI)]
| ^^^^^^
|
= help: message: My struct name is: <Point>
Your can watch the compiler spitting the error message with defined in the procedural macro.
Using the proc-macro2 crate for debugging and understanding procedural macros
The previous method is unwieldy to say the least, and not meant to make you understand how to really leverage from
procedural macros, because you can't really debug the macro (although it can change in the future).
That's why the proc-macro2 exists: you can use its methods, along with its syn::parse2() counterpart, in unit tests or regular binaries. You can then directly output the code generated to stdout or save it into a "*.rs" file to check its content.
Let's create a procedural macro artefact which auto-magically defines a function which calculates the summation of all fields, for the Point structure.
- create a new binary crate
$ cargo new fields_sum
- add the dependencies:
syn = { version = "1.0.82", features = ["full", "extra-traits"] }
quote = "1.0.10"
proc-macro2 = "1.0.32"
Add the following code in the main.rs file:
// necessary for the TokenStream::from_str() implementation
use std::str::FromStr;
use proc_macro2::TokenStream;
use quote::{format_ident, quote};
use syn::ItemStruct;
fn main() {
// struct sample
let s = "struct Point { x : u16 , y : u16 }";
// create a new token stream from our string
let tokens = TokenStream::from_str(s).unwrap();
// build the AST: note the syn::parse2() method rather than the syn::parse() one
// which is meant for "real" procedural macros
let ast: ItemStruct = syn::parse2(tokens).unwrap();
// save our struct type for future use
let struct_type = ast.ident.to_string();
assert_eq!(struct_type, "Point");
// we have 2 fields
assert_eq!(ast.fields.len(), 2);
// syn::Fields is implementing the Iterator trait, so we can iterate through the fields
let mut iter = ast.fields.iter();
// this is x
let x_field = iter.next().unwrap();
assert_eq!(x_field.ident.as_ref().unwrap(), "x");
// this is y
let y_field = iter.next().unwrap();
assert_eq!(y_field.ident.as_ref().unwrap(), "y");
// now the most tricky part: use the quote!() macro to generate code, aka a new
// TokenStream
// first, build our function name: point_summation
let function_name = format_ident!("{}_summation", struct_type.to_lowercase());
// and our argument type. If we don't use the format ident macro, the function prototype
// will be: pub fn point_summation (pt : "Point")
let argument_type = format_ident!("{}", struct_type);
// same for x and y
let x = format_ident!("{}", x_field.ident.as_ref().unwrap());
let y = format_ident!("{}", y_field.ident.as_ref().unwrap());
// the quote!() macro is returning a new TokenStream. This TokenStream is returned to
// the compiler in a "real" procedural macro
let summation_fn = quote! {
pub fn #function_name(pt: &#argument_type) -> u16 {
pt.#x + pt.#y
}
};
// output our function as Rust code
println!("{}", summation_fn);
}
Now running our crate gives:
pub fn point_summation (pt : & Point) -> u16 { pt . x + pt . y }
So far, so good.
Combining TokenStreams
The previous example is straightforward because we knew in advance the number of fields in the struct.
What if we don't know it beforehand ? Well we can use a special construct of quote!() to generate the summation on all fields:
// create the list of tokens
// tokens type is: impl Iterator<Item = TokenStream>
let tokens = fields.iter().map(|i| quote!(pt.#i));
// the trick is made by: 0 #(+ #tokens)*
// which repeats the + sign on all tokens
let summation_fn = quote! {
pub fn #function_name(pt: &#argument_type) -> u16 {
0 #(+ #tokens)*
}
};
Result is:
pub fn point_summation (pt : & Point) -> u16 { 0 + pt . x + pt . y + pt . z + pt . t }
Hope this help !
Photo by Stéphane Mingot on Unsplash
Top comments (3)
Dude, I've been looking for a similar explanation for a long time, thanks very interesting and understandable.
Thanks for your comment !
This tutorial is really helpful, thanks!