Intro
Having read through the macros section of “The Book” (Chapter 19.6), I thought I would try to hack together a simple idea using macros as a way to get a proper feel for them.
The chapter was a little light, and declarative macros (using macro_rules!
), which is what I’ll be using below, seemed like a potentially very nice feature of the language … the sort of thing that really makes the language malleable. Indeed, in poking around I’ve realised, perhaps naively, that macros are a pretty common tool for rust devs (or at least more common than I knew).
I’ll rant for a bit first, which those new to rust macros may find interesting or informative (it’s kinda a little tutorial) … to see the implementation, go to “Implementation (without using a macro)” heading and what follows below.
Using a macro
Well, “declarative macros” (with macro_rules!
) were pretty useful I found and easy to get going with (such that it makes perfect sense that they’re used more frequently than I thought).
- It’s basically pattern matching on arbitrary code and then emitting new code through a templating-like mechanism (pretty intuitive).
- The type system and
rust-analyzer
LSP
understand what you’re emitting perfectly well in my experience. It really felt properly native to rust.
The Elements of writing patterns with “Declarative macros”
Use macro_rules!
to declare a new macro
Yep, it’s also a macro!
Create a structure just like a match expression
- Except the pattern will match on the code provided to the new macro
- … And uses special syntax for matching on generic parts or fragments of the code
- … And it returns new code (not an expression or value).
Write a pattern as just rust code with “generic code fragment” elements
- You write the code you’re going to match on, but for the parts that you want to capture as they will vary from call to call, you specify variables (or more technically, “metavariables”).
- You can think of these as the “arguments” of the macro. As they’re the parts that are operated on while the rest is literally just static text/code.
- These variables will have a name and a type.
- The name as prefixed with a dollar sign
like so:
$GENERIC_CODE
. - And it’s type follows a colon as in ordinary rust:
$GENERIC_CODE:expr
- These types are actually syntax specifiers. They specify what part of rust syntax will appear in the fragment.
- Presumably, they link right back into the rust parser and are part of how these macros integrate pretty seamlessly with the type system and borrow checker or compiler.
- Here’s a decent list from rust-by-example (you can get a full list in the rust reference on macro “metavariables”):
block
expr
is used for expressionsident
is used for variable/function namesitem
literal
is used for literal constantspat
(pattern)path
stmt
(statement)tt
(token tree)ty
(type)vis
(visibility qualifier)
So a basic pattern that matches on any struct
while capturing the struct
’s name, its only field’s name, and its type would be:
macro_rules! my_new_macro {
(
struct $name:ident {
$field:ident: $field_type:ty
}
)
}
Now, $name
, $field
and $field_type
will be captured for any single-field struct
(and, presumably, the validity of the syntax enforced by the “fragment specifiers”).
Capture any repeated patterns with +
or *
- Yea, just like
regex
- Wrap the repeated pattern in
$( ... )
- Place whatever separating code that will occur between the repeats after the wrapping parentheses:
- EG, a separating comma:
$( ... ),
- EG, a separating comma:
- Place the repetition counter/operator after the separator:
$( ... ),+
Example
So, to capture multiple fields in a struct
(expanding from the example above):
macro_rules! my_new_macro {
(
struct $name:ident {
$field:ident: $field_type:ty,
$( $ff:ident : $ff_type: ty),*
}
)
}
- This will capture the first field and then any additional fields.
- The way you use these repeats mirrors the way they’re captured: they all get used in the same way and rust will simply repeat the new code for each repeated captured.
Writing the emitted or new code
Use =>
as with match expressions
- Actually, it’s
=> { ... }
, IE with braces (not sure why)
Write the new emitted code
- All the new code is simply written between the braces
- Captured “variables” or “metavariables” can be used just as they were captured:
$GENERIC_CODE
. - Except types aren’t needed here
- Captured repeats are expressed within wrapped parentheses just as they were captured:
$( ... ),*
, including the separator (which can be different from the one used in the capture).- The code inside the parentheses can differ from that captured (that’s the point after all), but at least one of the variables from the captured fragment has to appear in the emitted fragment so that rust knows which set of repeats to use.
- A useful feature here is that the repeats can be used multiple times, in different ways in different parts of the emitted code (the example at the end will demonstrate this).
Example
For example, we could convert the struct
to an enum
where each field became a variant with an enclosed value of the same type as the struct
:
macro_rules! my_new_macro {
(
struct $name:ident {
$field:ident: $field_type:ty,
$( $ff:ident : $ff_type: ty),*
}
) => {
enum $name {
$field($field_type),
$( $ff($ff_type) ),*
}
}
}
With the above macro defined … this code …
my_new_macro! {
struct Test {
a: i32,
b: String,
c: Vec<String>
}
}
… will emit this code …
enum Test {
a(i32),
b(String),
c(Vec<String>)
}
Application: “The code” before making it more efficient with a macro
Basically … a simple system for custom types to represent physical units.
The Concept (and a rant)
A basic pattern I’ve sometimes implemented on my own (without bothering with dependencies that is) is creating some basic representation of physical units in the type system. Things like meters or centimetres and degrees or radians etc.
If your code relies on such and performs conversions at any point, it is way too easy to fuck up, and therefore worth, IMO, creating some safety around. NASA provides an obvious warning. As does, IMO, common sense and experience: most scientists and physical engineers learn the importance of “dimensional analysis” of their calculations.
In fact, it’s the sort of thing that should arguably be built into any language that takes types seriously (like eg rust). I feel like there could be an argument that it’d be as reasonable as the numeric abstractions we’ve worked into programming??
At the bottom I’ll link whatever crates I found for doing a better job of this in rust (one of which seemed particularly interesting).
Implementation (without using a macro)
The essential design is (again, this is basic):
- A single type for a particular dimension (eg time or length)
- Method(s) for converting between units of that dimension
- Ideally, flags or constants of some sort for the units (thinking of enum variants here)
- These could be methods too
#[derive(Debug)]
pub enum TimeUnits {s, ms, us, }
#[derive(Debug)]
pub struct Time {
pub value: f64,
pub unit: TimeUnits,
}
impl Time {
pub fn new<T: Into<f64>>(value: T, unit: TimeUnits) -> Self {
Self {value: value.into(), unit}
}
fn unit_conv_val(unit: &TimeUnits) -> f64 {
match unit {
TimeUnits::s => 1.0,
TimeUnits::ms => 0.001,
TimeUnits::us => 0.000001,
}
}
fn conversion_factor(&self, unit_b: &TimeUnits) -> f64 {
Self::unit_conv_val(&self.unit) / Self::unit_conv_val(unit_b)
}
pub fn convert(&self, unit: TimeUnits) -> Self {
Self {
value: (self.value * self.conversion_factor(&unit)),
unit
}
}
}
So, we’ve got:
- An
enum
TimeUnits
representing the various units of time we’ll be using - A
struct
Time
that will be any givenvalue
of “time” expressed in any givenunit
- With methods for converting from any units to any other unit, the heart of which being a
match expression
on the new unit that hardcodes the conversions (relative to base unit of seconds … see theconversion_factor()
method which generalises the conversion values).
Note: I’m using T: Into<f64>
for the new()
method and f64
for Time.value
as that is the easiest way I know to accept either integers or floats as values. It works because i32
(and most other numerics) can be converted lossless-ly to f64
.
Obviously you can go further than this. But the essential point is that each unit needs to be a new type with all the desired functionality implemented manually or through some handy use of blanket trait implementations
Defining a macro instead
For something pretty basic, the above is an annoying amount of boilerplate!! May as well rely on a dependency!?
Well, we can write the boilerplate once in a macro and then only provide the informative parts!
In the case of the above, the only parts that matter are:
- The name of the type/
struct
- The name of the units
enum
type we’ll use (as they’ll flag units throughout the codebase) - The names of the units we’ll use and their value relative to the base unit.
IE, for the above, we only need to write something like:
struct Time {
value: f64,
unit: TimeUnits,
s: 1.0,
ms: 0.001,
us: 0.000001
}
Note: this isn’t valid rust! But that doesn’t matter, so long as we can write a pattern that matches it and emit valid rust from the macro, it’s all good! (Which means we can write our own little DSLs with native macros!!)
To capture this, all we need are what we’ve already done above: capture the first two fields and their types, then capture the remaining “field names” and their values in a repeating pattern.
Implementation of the macro
The pattern
macro_rules! unit_gen {
(
struct $name:ident {
$v:ident: f64,
$u:ident: $u_enum:ident,
$( $un:ident : $value:expr ),+
}
)
}
- Note the repeating fragment doesn’t provide a type for the field, but instead captures and expression
expr
after it, despite being invalid rust.
The Full Macro
macro_rules! unit_gen {
(
struct $name:ident {
$v:ident: f64,
$u:ident: $u_enum:ident,
$( $un:ident : $value:expr ),+
}
) => {
#[derive(Debug)]
pub struct $name {
pub $v: f64,
pub $u: $u_enum,
}
impl $name {
fn unit_conv_val(unit: &$u_enum) -> f64 {
match unit {
$(
$u_enum::$un => $value
),+
}
}
fn conversion_factor(&self, unit_b: &$u_enum) -> f64 {
Self::unit_conv_val(&self.$u) / Self::unit_conv_val(unit_b)
}
pub fn convert(&self, unit: $u_enum) -> Self {
Self {
value: (self.value * self.conversion_factor(&unit)),
unit
}
}
}
#[derive(Debug)]
pub enum $u_enum {
$( $un ),+
}
}
}
Note the repeating capture is used twice here in different ways.
- The capture is:
$( $un:ident : $value:expr ),+
And in the emitted code:
- It is used in the
unit_conv_val
method as:$( $u_enum::$un => $value ),+
- Here the
ident
$un
is being used as the variant of theenum
that is defined later in the emitted code - Where
$u_enum
is also used without issue, as the name/type of theenum
, despite not being part of the repeated capture but another variable captured outside of the repeated fragments.
- Here the
- It is then used in the definition of the variants of the enum:
$( $un ),+
- Here, only one of the captured variables is used, which is perfectly fine.
Usage
Now all of the boilerplate above is unnecessary, and we can just write:
unit_gen!{
struct Time {
value: f64,
unit: TimeUnits,
s: 1.0,
ms: 0.001,
us: 0.000001
}
}
Usage from main.rs
:
use units::Time;
use units::TimeUnits::{s, ms, us};
fn main() {
let x = Time{value: 1.0, unit: s};
let y = x.convert(us);
println!("{:?}", x);
println!("{:?}", x);
}
Output:
Time { value: 1.0, unit: s }
Time { value: 1000000.0, unit: us }
- Note how the
struct
andenum
created by the emitted code is properly available from the module as though it were written manually or directly. - In fact, my LSP (
rust-analyzer
) was able to autocomplete these immediately once the macro was written and called.
Crates for unit systems
I did a brief search for actual units systems and found the following
dimnesioned
- Easily the most interesting to me (from my quick glance), as it seems to have created the most native and complete representation of physical units in the type system
- It creates, through types, a 7-dimensional space, one for each SI base unit
- This allows all possible units to be represented as a reduction to a point in this space.
- EG, if the dimensions are
[
, then the ]Newton
,m.kg / s^2
would be[-2, 1, 1, 0, 0, 0, 0]
.
- EG, if the dimensions are
- This allows all units to be mapped directly to this consistent representation (interesting!!), and all operations to then be done easily and systematically.
Unfortunately, I’m not sure if the repository is still maintained.
uom
- This might actually be good too, I just haven’t looked into it much
- It also seems to be currently maintained
F#
Interestingly, F#
actually has a system built in!
- See learning documentation on
F#
here - Also this older (2008) series of blogs on the feature here
For
figuring out how to write macrosanyone wanting to learn about more advanced macros beyond macro_rules, I can recommend this: https://github.com/dtolnay/proc-macro-workshopBasically, you clone that repo, pick one of the projects, uncomment the first test in the respective
tests/progress.rs
file and read the steps in the respective unit test file. Then you try to implement a macro to fulfill the test.It should be said that it isn’t spoon-feeding you, you will still need to read actual documentation for macros. But with its test harness, you get a quick feedback loop and it gives at least some pointers for where to start learning.
Nice!!
It seems that it covers mainly procedural macros, which for those who don’t know are different from what I cover here. They are more involved but more powerful.
Ah, you’re right. I’ve mainly worked through the sorted-chapter and thought the seq!()-macro would be a macro_rules thing, but apparently that’s a proc_macro-thing with TokenStream parsing and such, too. I didn’t even know that’s an option, although it makes perfect sense. 🙃
Yea, and proc_macro TokenStream macros definitely seem worthwhile knowing about without necessarily ever wanting to reach for them, at least not often.
Declarative macros though (using
macro_rules!
as in the top post) surprised me in how straightforward and useful they are. Basically boilerplate machines built right into the language. I’d previously gotten the impression that all macros were likeproc_macro
.It’d be interesting to see some challenges with
macro_rules!
. I’m not sure there’s much scope to challenge people though … they’re pretty simple. But there are some tricks in the system AFAICT I didn’t touch on here.- Multiple alternative patterns can be matched on in a single macro (just like match expressions)
- Patterns can match on invalid rust, where the
tt
syntax type, which stands for “Token Tree” and accepts, I think, any arbitrary series of tokens, can be powerful - A macro can call itself recursively
Together it seems you can put together a pseudo parser, with recursive calls passing in flags or markers to dictate which branch the call goes down. I found this suggestion on users.rust-lang to use a “switch” token along with the above tricks).
Yeah, I’m only looking into proc_macros, because I’m working on a library. In application code, I do think they’re essentially never going to be worth the complexity that they introduce. But in a library, I can deal with the complexity and hopefully my users don’t have to think about it.
Having said that, I actually don’t think proc_macros are insanely complex. There’s a bit of a learning curve to them, particularly the parsing with the syn-crate takes a moment to understand the concepts.
But once you’ve parsed things, you can use the quote-crate to do templating in quite a similar fashion as macro_rules. The thing is just that all the simple cases are covered by the simpler macro_rules, so you just wouldn’t reach for proc_macro most of the time in application code.yea, and it would probably be worth just a quick hack to get a feel for it (procedural macros) at least once so you know what you can reach for when the time comes. As you say, it seems involved, but not really that insanely complex … and knowing the bits that make the language “your own” can be really valuable. Cheers for the workshop thing though, definitely worth knowing about!
I’d definitely recommend looking into the
uom
crate. It uses a different and in my opinion much better approach to unit-tracking than you present here. Instead of storing the unit at runtime in a field, it uses generics to specify the unit.So rather than having this:
struct Length { value: f64, unit: <some enum of all length units>, }
It does (conceptually but simplified) this:
struct Length<Unit>(f64);
This means that you can for instance do trait impls like this:
impl<U1, U2> Add<Length<U1>> for Length<U2> { // Convert the units and add them together }
So for instance if you had a variable
a: Length<Millimeter>
andb: Length<Kilometer>
you could add them together and still get the correct result, because the units will do the conversion during the addition, and all of this is ensured at compile-time. So you don’t even need to track that you’re using the same units for different variables, the units will sort themselves out automatically. This is much safer than trying to keep track of the unit dynamically at runtime.That’s at least the idea as far as I understand but I haven’t used the
uom
crate much, just read about it.Oh, what I did here was a toy example, or a “shit, something would be better than nothing” approach for basic usage, where such is the value of having units represented in the type system, something can be better than nothing.
Yes
uom
does seem good and everything you say about it totally makes sense (I hadn’t actually thought that much about automatic conversions for arithmetic!). I haven’t dug into it at all, but it did have me a little concerned that one could run into some situation it doesn’t handle well (eg, does it work with arrays?).dimensioned
seemed nice too (especially as its approach didn’t seem to rely on conversions to base units asuom
did), though it is likely unfinished or unmaintained … it was encouraging to see a simulation example in its documentation.If I ever dig into
uom
more I’ll definitely report back here. Cheers for the recommendation!does it work with arrays
Not sure what you mean - work with arrays how?
Actually … yea, I didn’t think about this much and probably misunderstood something from the
dimensioned
docs … so it’s probably not a thing or not common at all …But assigning a unit type of some sort to an array of values, not just a single or scalar value. It has its uses, and I’ve seen an application of this before. It could also probably be achieved with some basic wrapping (eg
Newtype
wrappers in rust).Right you can always do
vec![Length<YourUnitHere>; 5]
and then you have a vec of 5 length values with a certain unit (or you could do a simple array but then you can’t change the length). But you won’t be able to have different units in different values in that array/vec, since that would be different types and you can’t have different types in an array/vec. You could have different types in a tuple, but then you can’t vary the length.These limitations are usually not a problem, especially not for something like units where you can easily convert between them.
Oh yea … that works too of course!!
But you won’t be able to have different units in different values in that array/vec,
Oh that wasn’t the aim. The idea behind thinking about arrays was more to head in the direction of having unit-types along with numpy style arrays (ndarray being the only crate I know of for such a tool) … so that calculations and arithmetic with scalars and arrays can get pretty seamless, but with the safety of a units system too.
Consider also checking https://pola.rs/ for data frames and multidimensional data and such :)
For sure!