Intro
Having read through the macros section of “The Book” (Chapter 19.6), I thought I would try to hack together a simple idea using macros as a way to get a proper feel for them.
The chapter was a little light, and declarative macros (using macro_rules!
), which is what I’ll be using below, seemed like a potentially very nice feature of the language … the sort of thing that really makes the language malleable. Indeed, in poking around I’ve realised, perhaps naively, that macros are a pretty common tool for rust devs (or at least more common than I knew).
I’ll rant for a bit first, which those new to rust macros may find interesting or informative (it’s kinda a little tutorial) … to see the implementation, go to “Implementation (without using a macro)” heading and what follows below.
Using a macro
Well, “declarative macros” (with macro_rules!
) were pretty useful I found and easy to get going with (such that it makes perfect sense that they’re used more frequently than I thought).
- It’s basically pattern matching on arbitrary code and then emitting new code through a templating-like mechanism (pretty intuitive).
- The type system and
rust-analyzer
LSP
understand what you’re emitting perfectly well in my experience. It really felt properly native to rust.
The Elements of writing patterns with “Declarative macros”
Use macro_rules!
to declare a new macro
Yep, it’s also a macro!
Create a structure just like a match expression
- Except the pattern will match on the code provided to the new macro
- … And uses special syntax for matching on generic parts or fragments of the code
- … And it returns new code (not an expression or value).
Write a pattern as just rust code with “generic code fragment” elements
- You write the code you’re going to match on, but for the parts that you want to capture as they will vary from call to call, you specify variables (or more technically, “metavariables”).
- You can think of these as the “arguments” of the macro. As they’re the parts that are operated on while the rest is literally just static text/code.
- These variables will have a name and a type.
- The name as prefixed with a dollar sign
like so:
$GENERIC_CODE
. - And it’s type follows a colon as in ordinary rust:
$GENERIC_CODE:expr
- These types are actually syntax specifiers. They specify what part of rust syntax will appear in the fragment.
- Presumably, they link right back into the rust parser and are part of how these macros integrate pretty seamlessly with the type system and borrow checker or compiler.
- Here’s a decent list from rust-by-example (you can get a full list in the rust reference on macro “metavariables”):
block
expr
is used for expressionsident
is used for variable/function namesitem
literal
is used for literal constantspat
(pattern)path
stmt
(statement)tt
(token tree)ty
(type)vis
(visibility qualifier)
So a basic pattern that matches on any struct
while capturing the struct
’s name, its only field’s name, and its type would be:
macro_rules! my_new_macro {
(
struct $name:ident {
$field:ident: $field_type:ty
}
)
}
Now, $name
, $field
and $field_type
will be captured for any single-field struct
(and, presumably, the validity of the syntax enforced by the “fragment specifiers”).
Capture any repeated patterns with +
or *
- Yea, just like
regex
- Wrap the repeated pattern in
$( ... )
- Place whatever separating code that will occur between the repeats after the wrapping parentheses:
- EG, a separating comma:
$( ... ),
- EG, a separating comma:
- Place the repetition counter/operator after the separator:
$( ... ),+
Example
So, to capture multiple fields in a struct
(expanding from the example above):
macro_rules! my_new_macro {
(
struct $name:ident {
$field:ident: $field_type:ty,
$( $ff:ident : $ff_type: ty),*
}
)
}
- This will capture the first field and then any additional fields.
- The way you use these repeats mirrors the way they’re captured: they all get used in the same way and rust will simply repeat the new code for each repeated captured.
Writing the emitted or new code
Use =>
as with match expressions
- Actually, it’s
=> { ... }
, IE with braces (not sure why)
Write the new emitted code
- All the new code is simply written between the braces
- Captured “variables” or “metavariables” can be used just as they were captured:
$GENERIC_CODE
. - Except types aren’t needed here
- Captured repeats are expressed within wrapped parentheses just as they were captured:
$( ... ),*
, including the separator (which can be different from the one used in the capture).- The code inside the parentheses can differ from that captured (that’s the point after all), but at least one of the variables from the captured fragment has to appear in the emitted fragment so that rust knows which set of repeats to use.
- A useful feature here is that the repeats can be used multiple times, in different ways in different parts of the emitted code (the example at the end will demonstrate this).
Example
For example, we could convert the struct
to an enum
where each field became a variant with an enclosed value of the same type as the struct
:
macro_rules! my_new_macro {
(
struct $name:ident {
$field:ident: $field_type:ty,
$( $ff:ident : $ff_type: ty),*
}
) => {
enum $name {
$field($field_type),
$( $ff($ff_type) ),*
}
}
}
With the above macro defined … this code …
my_new_macro! {
struct Test {
a: i32,
b: String,
c: Vec<String>
}
}
… will emit this code …
enum Test {
a(i32),
b(String),
c(Vec<String>)
}
Application: “The code” before making it more efficient with a macro
Basically … a simple system for custom types to represent physical units.
The Concept (and a rant)
A basic pattern I’ve sometimes implemented on my own (without bothering with dependencies that is) is creating some basic representation of physical units in the type system. Things like meters or centimetres and degrees or radians etc.
If your code relies on such and performs conversions at any point, it is way too easy to fuck up, and therefore worth, IMO, creating some safety around. NASA provides an obvious warning. As does, IMO, common sense and experience: most scientists and physical engineers learn the importance of “dimensional analysis” of their calculations.
In fact, it’s the sort of thing that should arguably be built into any language that takes types seriously (like eg rust). I feel like there could be an argument that it’d be as reasonable as the numeric abstractions we’ve worked into programming??
At the bottom I’ll link whatever crates I found for doing a better job of this in rust (one of which seemed particularly interesting).
Implementation (without using a macro)
The essential design is (again, this is basic):
- A single type for a particular dimension (eg time or length)
- Method(s) for converting between units of that dimension
- Ideally, flags or constants of some sort for the units (thinking of enum variants here)
- These could be methods too
#[derive(Debug)]
pub enum TimeUnits {s, ms, us, }
#[derive(Debug)]
pub struct Time {
pub value: f64,
pub unit: TimeUnits,
}
impl Time {
pub fn new<T: Into<f64>>(value: T, unit: TimeUnits) -> Self {
Self {value: value.into(), unit}
}
fn unit_conv_val(unit: &TimeUnits) -> f64 {
match unit {
TimeUnits::s => 1.0,
TimeUnits::ms => 0.001,
TimeUnits::us => 0.000001,
}
}
fn conversion_factor(&self, unit_b: &TimeUnits) -> f64 {
Self::unit_conv_val(&self.unit) / Self::unit_conv_val(unit_b)
}
pub fn convert(&self, unit: TimeUnits) -> Self {
Self {
value: (self.value * self.conversion_factor(&unit)),
unit
}
}
}
So, we’ve got:
- An
enum
TimeUnits
representing the various units of time we’ll be using - A
struct
Time
that will be any givenvalue
of “time” expressed in any givenunit
- With methods for converting from any units to any other unit, the heart of which being a
match expression
on the new unit that hardcodes the conversions (relative to base unit of seconds … see theconversion_factor()
method which generalises the conversion values).
Note: I’m using T: Into<f64>
for the new()
method and f64
for Time.value
as that is the easiest way I know to accept either integers or floats as values. It works because i32
(and most other numerics) can be converted lossless-ly to f64
.
Obviously you can go further than this. But the essential point is that each unit needs to be a new type with all the desired functionality implemented manually or through some handy use of blanket trait implementations
Defining a macro instead
For something pretty basic, the above is an annoying amount of boilerplate!! May as well rely on a dependency!?
Well, we can write the boilerplate once in a macro and then only provide the informative parts!
In the case of the above, the only parts that matter are:
- The name of the type/
struct
- The name of the units
enum
type we’ll use (as they’ll flag units throughout the codebase) - The names of the units we’ll use and their value relative to the base unit.
IE, for the above, we only need to write something like:
struct Time {
value: f64,
unit: TimeUnits,
s: 1.0,
ms: 0.001,
us: 0.000001
}
Note: this isn’t valid rust! But that doesn’t matter, so long as we can write a pattern that matches it and emit valid rust from the macro, it’s all good! (Which means we can write our own little DSLs with native macros!!)
To capture this, all we need are what we’ve already done above: capture the first two fields and their types, then capture the remaining “field names” and their values in a repeating pattern.
Implementation of the macro
The pattern
macro_rules! unit_gen {
(
struct $name:ident {
$v:ident: f64,
$u:ident: $u_enum:ident,
$( $un:ident : $value:expr ),+
}
)
}
- Note the repeating fragment doesn’t provide a type for the field, but instead captures and expression
expr
after it, despite being invalid rust.
The Full Macro
macro_rules! unit_gen {
(
struct $name:ident {
$v:ident: f64,
$u:ident: $u_enum:ident,
$( $un:ident : $value:expr ),+
}
) => {
#[derive(Debug)]
pub struct $name {
pub $v: f64,
pub $u: $u_enum,
}
impl $name {
fn unit_conv_val(unit: &$u_enum) -> f64 {
match unit {
$(
$u_enum::$un => $value
),+
}
}
fn conversion_factor(&self, unit_b: &$u_enum) -> f64 {
Self::unit_conv_val(&self.$u) / Self::unit_conv_val(unit_b)
}
pub fn convert(&self, unit: $u_enum) -> Self {
Self {
value: (self.value * self.conversion_factor(&unit)),
unit
}
}
}
#[derive(Debug)]
pub enum $u_enum {
$( $un ),+
}
}
}
Note the repeating capture is used twice here in different ways.
- The capture is:
$( $un:ident : $value:expr ),+
And in the emitted code:
- It is used in the
unit_conv_val
method as:$( $u_enum::$un => $value ),+
- Here the
ident
$un
is being used as the variant of theenum
that is defined later in the emitted code - Where
$u_enum
is also used without issue, as the name/type of theenum
, despite not being part of the repeated capture but another variable captured outside of the repeated fragments.
- Here the
- It is then used in the definition of the variants of the enum:
$( $un ),+
- Here, only one of the captured variables is used, which is perfectly fine.
Usage
Now all of the boilerplate above is unnecessary, and we can just write:
unit_gen!{
struct Time {
value: f64,
unit: TimeUnits,
s: 1.0,
ms: 0.001,
us: 0.000001
}
}
Usage from main.rs
:
use units::Time;
use units::TimeUnits::{s, ms, us};
fn main() {
let x = Time{value: 1.0, unit: s};
let y = x.convert(us);
println!("{:?}", x);
println!("{:?}", x);
}
Output:
Time { value: 1.0, unit: s }
Time { value: 1000000.0, unit: us }
- Note how the
struct
andenum
created by the emitted code is properly available from the module as though it were written manually or directly. - In fact, my LSP (
rust-analyzer
) was able to autocomplete these immediately once the macro was written and called.
Crates for unit systems
I did a brief search for actual units systems and found the following
dimnesioned
- Easily the most interesting to me (from my quick glance), as it seems to have created the most native and complete representation of physical units in the type system
- It creates, through types, a 7-dimensional space, one for each SI base unit
- This allows all possible units to be represented as a reduction to a point in this space.
- EG, if the dimensions are
[
, then the ]Newton
,m.kg / s^2
would be[-2, 1, 1, 0, 0, 0, 0]
.
- EG, if the dimensions are
- This allows all units to be mapped directly to this consistent representation (interesting!!), and all operations to then be done easily and systematically.
Unfortunately, I’m not sure if the repository is still maintained.
uom
- This might actually be good too, I just haven’t looked into it much
- It also seems to be currently maintained
F#
Interestingly, F#
actually has a system built in!
- See learning documentation on
F#
here - Also this older (2008) series of blogs on the feature here
Ah, you’re right. I’ve mainly worked through the sorted-chapter and thought the seq!()-macro would be a macro_rules thing, but apparently that’s a proc_macro-thing with TokenStream parsing and such, too. I didn’t even know that’s an option, although it makes perfect sense. 🙃
Yea, and proc_macro TokenStream macros definitely seem worthwhile knowing about without necessarily ever wanting to reach for them, at least not often.
Declarative macros though (using
macro_rules!
as in the top post) surprised me in how straightforward and useful they are. Basically boilerplate machines built right into the language. I’d previously gotten the impression that all macros were likeproc_macro
.It’d be interesting to see some challenges with
macro_rules!
. I’m not sure there’s much scope to challenge people though … they’re pretty simple. But there are some tricks in the system AFAICT I didn’t touch on here.tt
syntax type, which stands for “Token Tree” and accepts, I think, any arbitrary series of tokens, can be powerfulTogether it seems you can put together a pseudo parser, with recursive calls passing in flags or markers to dictate which branch the call goes down. I found this suggestion on users.rust-lang to use a “switch” token along with the above tricks).
Yeah, I’m only looking into proc_macros, because I’m working on a library. In application code, I do think they’re essentially never going to be worth the complexity that they introduce. But in a library, I can deal with the complexity and hopefully my users don’t have to think about it.
Having said that, I actually don’t think proc_macros are insanely complex. There’s a bit of a learning curve to them, particularly the parsing with the syn-crate takes a moment to understand the concepts.
But once you’ve parsed things, you can use the quote-crate to do templating in quite a similar fashion as macro_rules. The thing is just that all the simple cases are covered by the simpler macro_rules, so you just wouldn’t reach for proc_macro most of the time in application code.
yea, and it would probably be worth just a quick hack to get a feel for it (procedural macros) at least once so you know what you can reach for when the time comes. As you say, it seems involved, but not really that insanely complex … and knowing the bits that make the language “your own” can be really valuable. Cheers for the workshop thing though, definitely worth knowing about!