Introduction
Often as folks who create useful software things we tend to think of ourselves as people who write software for the mythical "user". A "user" clicks a button and
something magical happens. This is commonly reffered to as an abstraction.
Abstractions are all around us in software and clever programmers create good abstractions for other programmers often to manage complexity.
A really common example of this is an Application Programming Interface which allows two "applications" to share useful data with each other over some transport while being platform-agnostic as to how this data is used. Like an API, there are other interesting kinds of abstractions -- let's peel back the abstraction between the language creator and language user by inventing syntax!
This involves a subtle shift in the paradigm used to understand computation, at the core is the idea of viewing computation as data. I would guess for most people,
the typical mental model when reading at first is mostly procedural, a top-down scan with familiarity of syntax and semantics, then another important shift occurs in
understanding runtime execution with the introduction of concurrency and parallelism, here we'll be peeling back at the layer between compile time and runtime.
Compile time occurs when program text is being "parsed and transformed" into many forms all the way to bits and runtime
is when the program is actually executing ie "running", in this paradigm of viewing programs as textual input to other programs and to the program itself while running, is known as metaprogramming.
This distinction between what is "compile" and "runtime" is
a useful mental model illustrated here for simplicity, odds
are what's happening in your favorite language is probably
more interesting![11]
Before we begin, a caveat. Although this technique applies broadly to most modern languages -- implementations vary in feature parity, I'll try to primarily include alternate examples with go's reflection and rust's macro system while providing nods to Cpython[1], Ruby MRI[2] and some javascript [3]) but not typescript[4]
Computation is data
Consider for example the humble eval()
function:
// javascript
console.log(eval('2 + 2'));
# python
print(eval('2 + 2'))
# ruby
puts eval('2 + 2')
The computation 2 + 2
is represented as data, in this case a string
. That's kind of neat isn't it? however we can take this much futher.
If you're interested in further details of what's happening here, checkout crafting interpreters.
Since programming languages contain all sorts of elements like expressions and statements, we need some
way to hold information about what the program or computation is trying to do, this internal representation is most commonly known as an Abstract Syntax Tree.
At the risk of oversimplification, think of an AST as a way to meaningfully represent the textual source of a program that sometimes allows you to do something resembling string interpolation operations on your program's text source.
Prelude, why meta?
To illustrate this concept, lets see how one might add syntax to create a constructor for a dynamic array in elixir
.
First, some background. Elixir is a (mostly) functional language with (mostly) immutable datastructures, it doesn't encourage the use of
or provide a dynamic array out of the box like most functional languages, as the implementation of one
requires random access read/write via mutable state. Nor does it have "constructors", a typical pattern is creating structured data returned from
a function and "piping" it through several other functions:
defmodule MyApp.Array do
alias MyApp.Array
defstruct field: nil
def new(options \\ []) do
%Array{field: options}
end
end
iex(1)> MyApp.Array.new() |> do_stuff() |> do_other_stuff()
For this example, we're going to piggyback off the rust standard library's Vector by
creating a foreign function interface in elixir and alternatively a data structure implemented in the erlang stdlib in order to re-create something like vec!
As we'll see the "backend" implementation of the data structure is not important, the fact that it's in rust or erlang doesn't matter, what we're focused on is providing an easy to use syntactic abstraction
of a common datastructure.
Here's a simplified version pulled straight from the rust book of vec!
:
#[macro_export]
macro_rules! vec {
( $( $x:expr ),* ) => {
{
let mut temp_vec = Vec::new();
$(
temp_vec.push($x);
)*
temp_vec
}
};
}
Here we see a pattern match ( $( $x:expr ),* )
like our humble eval('2 + 2')
instead of representing the computation as a string, it's a tree like data-structure
where we can assert at compile time, if some code looks like what we think it looks like, replace it with what's in the match arm,
a process known as macro expansion
.
In elixir, we can write something similar, the pattern match is a three-element style tuple[5]:
{node, execution_context or meta_data, arguments}
Go and ruby share some superficial similarities as their metaprogramming api doesn't give you direct access to the AST. In ruby libraries like RSpec
,rails
router and erb
html templates often use metaprogramming techniques via "monkey patching" -- modifying at runtime various
properties of an object[6] and since in ruby's extremely dynamically typed[7] interpreted world there is no notion of "compile time expansion" instead you have powerful introspection and malleability at runtime giving rise to patterns like hooks
[8] to alter nearly almost anything about the language via object properties, syntax or not.
Take this small excerpt[9] from rspec-core of the describe
public api:
# @private
def self.expose_example_group_alias(name)
return if example_group_aliases.include?(name)
example_group_aliases << name
(class << RSpec; self; end).__send__(:define_method, name) do |*args, &example_group_block|
group = RSpec::Core::ExampleGroup.__send__(name, *args, &example_group_block)
RSpec.world.record(group)
group
end
expose_example_group_alias_globally(name) if exposed_globally?
end
There's alot happening but the important thing to note is RSpec::Core::ExampleGroup
is an object that is being modified at the test-runner's runtime which specifies the linguistic structure of the domain's specific language of what describe
means.
In go like ruby we have reflection
that allows runtime introspection, unlike ruby it is statically typed and compiled. Reflection gives a temporary "escape hatch" out of the rigid type system and allows modification based on dynamic interfaces
the most idiomatic example of this I can find are the printing family[10] functions like fmt.Sprintf
.
func (p *pp) doPrint(a []any) {
prevString := false
for argNum, arg := range a {
isString := arg != nil && reflect.TypeOf(arg).Kind() == reflect.String
// Add a space between two non-string arguments.
if argNum > 0 && !isString && !prevString {
p.buf.writeByte(' ')
}
p.printArg(arg, 'v')
prevString = isString
}
}
Building a (Dynamic) Array "constructor" in Elixir
Now, let's get hands on. Everything here lives in a mix project called ExVec
and defining the macro's public api:
defmodule ExVec do
alias ExVec.{Array, Vector}
defmacro __using__(implementation: impl) do
quote do
import unquote(__MODULE__)
Module.put_attribute(__MODULE__, :implementation, unquote(impl))
end
end
defmacro vec!([_h | _t] = args) do
quote bind_quoted: [args: args] do
dispatch_constructor(@implementation, args)
end
end
defmacro vec!({:.., _, [first, last]}) do
args = Range.new(first, last) |> Enum.to_list()
quote bind_quoted: [args: args] do
dispatch_constructor(@implementation, args)
end
end
def dispatch_constructor(impl, args) do
case impl do
:rust -> Vector.new(args)
:erlang -> Array.new(args)
_ -> raise "invalid constructor type, did you mean :rust?"
end
end
end
The ex_vec
library has two backends ExVec.Array
which is a thin wrapper around :array
and ExVec.Vector
which is a NIF wrapper that leverages rustler's Encoder
and Decoder
to encode an elixir List
as a Vec
then implementing interfaces for what an array might look like in elixir by defining:
- The
Access
behaviour - A protocol implementation of
Enumerable
By implementing these specifications we can safely use things from stdlib like Enum
and even Stream
and just like that in any other elixir project
and letting the client choose the backend while keep the macro's syntax:
defmodule MyApp.DoStuff do
alias ExVec.Vector
use ExVec, implementation: :rust
def len do
# serialised as a rust Vec<i32>
vec!(1..4) |> Enum.count()
vec!([1, 2, 3, 4]) |> Enum.count()
# plain old linked-list
[1, 2, 3, 4] |> Enum.count()
end
def random_access do
# O(1) read
my_array = vec!(0..10)
my_array[5]
# serialised random write access
Vector.get_and_update(my_array, 0, fn n -> {n, 42} end)
end
end
defmodule MyApp.DoOtherStuff do
use ExVec, implementation: :erlang
def len do
# this is an erlang :array!
vec!([1, 2, 3, 4]) |> Enum.count()
end
end
unfortunately as of the time of this writing, rustler
does not support generic type intefaces so I
guess this is impossible?
#[derive(Debug, NifStruct)]
#[module = "ExVec.Vector"]
pub struct Vector<T> {
fields: Vec<T>,o
size: isize
}
Therefore a serious limitation of this toy example is it only works for i32
integers :) I also glossed over some behaviours and protocols with defaults.
You can find the full source for this example here, please let me know if you have a comment, found a bug or typo. Thanks for reading!
References
[1] Python3's excellent ast
library: https://docs.python.org/3/library/ast.html
[2] RubyVM::AST : https://ruby-doc.org/core-trunk/RubyVM/AST.html
[3] Javascript(since ECMAScript6): https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Reflect
[4] Typescript: https://basarat.gitbook.io/typescript/overview
[4] Go's AST : https://pkg.go.dev/go/ast
[5] Elixir's AST: https://github.com/elixir-lang/elixir/blob/d8f1a5d6b653c14ae44c6eacdbc8e9df7006d284/lib/elixir/pages/syntax-reference.md#the-elixir-ast
[6] The one true (useful) object to rule them all: https://ruby-doc.org/3.2.1/Object.html
[7] Ruby Extensions: https://docs.ruby-lang.org/en/master/extension_rdoc.html#label-Basic+Knowledge
[8] Awesome example of the hook pattern
into ruby's object lifecyle: https://github.com/rspec/rspec-core/blob/main/lib/rspec/core/hooks.rb
[9] RSpec public DSL module: https://github.com/rspec/rspec-core/blob/main/lib/rspec/core/dsl.rb
[10] doPrint: https://cs.opensource.google/go/go/+/refs/tags/go1.20:src/fmt/print.go;drc=261fe25c83a94fc3defe064baed3944cd3d16959;l=1204
[11] Just in Time compilation: https://en.wikipedia.org/wiki/Just-in-time_compilation
Top comments (0)