Already in my teenage years as a junior programmer I heard lots of talk about Metaprogramming. Even though wikipedia didn't exist and information on the internet in general wasn't available to the same extent it is today, it was easy to look up the definition of metaprogramming. My problem was that the definition didn't tell me much. Over the years I've learned a lot more about metaprogramming. In this blog post I'll explain what metaprogramming is. Furthermore, I'll show various examples of metaprogramming.
The Definition of Metaprogramming
Now, what is metaprogramming? When we're programming, i.e. write program code, we're writing code for the program. Conversely, when we're metaprogramming, we are writing code for the code itself.
Sounds confusing? Guess how confused I was when I was 14yo and trying to learn all of this by myself...
Examples often help alleviate confusion. Hence, I will give lots of examples throughout this post. Let's start with an example in pseudocode.
Let's say we want to print the numbers from 1 to 10. Even a novice developer would think of using a loop:
for i = 1 to 10:
print(i)
But would it be the same if we would write all the print statements out?
print(1)
print(2)
print(3)
print(4)
print(5)
print(6)
print(7)
print(8)
print(9)
print(10)
This question is more involved than first meets the eye. However, as compiler optimizations and instruction set architectures (ISA) are out-of-scope for this post, let's keep things simple and assume they will be executed as written. I.e. the former has more compact code, but the latter runs faster as it has no comparisons nor branches. A typical tradeoff between size and speed.
However, code readability is useful, too. So, what if, we would like to write the former, but end up with the code from the latter? This is something metaprogramming could solve. For example, our pseudocode with metaprogramming might look as follows:
META_FOR(i, 1, 10):
print(i)
In our pseudocode language this would generate the 10 sequential print-statements in code.
Now that we have given the definition for metaprogramming and looked at an unrealistically simple example, let's look at some real examples.
C Preprocessor Directives
Many aspiring C programmers may not realize this at first, but the C preprocessor directives, such as #include
and #define
are metaprogramming. The C preprocessor, often abbreviated CPP, runs before the C compiler executes.
The GNU C compiler (GCC) ships with the C preprocessor bundled. To run only the preprocessor, without running the compiler, we have to invoke gcc
with -E
.
As a simple example, let's define a variable VARIABLE_X
as 5
and use a print statement to print it. The code might look as follows:
$ cat define-x.c
#define VARIABLE_X 5
printf("X = %d\n", VARIABLE_X);
Now, let's run the preprocessor and see the output. (We will have to use the switch -P
to avoid clutter caused by line markers the preprocessor would add by default):
$ gcc -E -P define-x.c
printf("X = %d\n", 5);
The C preprocessor supports conditionals. We can write code that will end up different depending on definitions. For example, we could use #ifdef
to generate something when OS_WINDOWS
is defined and something else when OS_LINUX
is defined. Note that we need to end the conditional with #endif
.
$ cat ifdef-windows-linux.c
#ifdef OS_WINDOWS
printf("hello Bill\n");
#endif
#ifdef OS_LINUX
printf("hello Linus\n");
#endif
Now, if we don't set either, the generated code is empty:
$ gcc -E -P ifdef-windows-linux.c
If we define OS_WINDOWS
, we get as follows:
$ gcc -E -P -D OS_WINDOWS ifdef-windows-linux.c
printf("hello Bill\n");
and with OS_LINUX
:
$ gcc -E -P -D OS_LINUX ifdef-windows-linux.c
printf("hello Linus\n");
We can do mathematical statements in #if
's:
$ cat if-math.c
#if (6 * 7) == 42
printf("all good\n");
#else
printf("is math broken?\n");
#endif
The output is as expected:
$ gcc -E -P if-math.c
printf("all good\n");
In fact, the C preprocessor is very powerful. It is possible to do functions, loops and so forth. Now, it is important to understand that just because it is possible it doesn't mean it's a good idea. Some programs such as the linux kernel make heavy use of the C preprocessor, but most programs are best of with minimal use of it. Granted, you can't do much without #include
statements and header files need include guards, but otherwise I recommend using the C preprocessor sparingly.
C++ Templates
C++ is a superset of C and therefore all of the C preprocessor macros work for C++, too. (Technically, the C preprocessor doesn't care about the source code at all and you could use the C preprocessor for whatever language you like)
In addition to the C preprocessor, C++ has templates. Unlike the C preprocessor, the C++ templates are a native construct of the language. Despite this, the C++ templates are metaprogramming. C++ templates generate code that is to be compiled.
Let's say we want to create a function that returns the sum of two integers. However, for whatever reason, we don't want to create a parameterized function taking to integers, but rather hard-code it as follows:
int sum23()
{
return 2 + 3;
}
Now, if we want to create many of these functions, we will have to write a lot of code. Instead, we could use templates to have the compiler create the code for us:
template<int N, int M>
int template_sum()
{
return N + M;
}
To get an instance of the function sum23()
, we would do like:
int (*sum23)() = template_sum<2,3>();
Note that preferrably we would do
auto sum23 = template_sum<2,3>();
but I wanted to make the type explicit here for maximum readability.
The most common usage of C++ template is containers. Say, you want to create a linked list container. In C++, without any kind of metaprogramming, i.e. without templates and the C preprocessor, you would have to rewrite the same container for every class you want it to support. In practice, would end up copy-pasting a lot of code which is suboptimal, to say the least.
The declaration of our linked list implementation might look as follows:
template<class T>
class linked_list;
The implementation of the linked-list is out-of-scope for this post. However, assuming we would create the implementation, then, we could create linked lists with integers, string etc.
linked_list<int> my_integer_list;
linked_list<std::string> my_string_list;
Note that the templates create an equivalent of source code that has been copy-pasted, i.e. there are separate classes for linked_list<int>
, linked_list<std::string>
etc. for everything that is instantiated anywhere. Furthermore, as compilation units, i.e. source code object files, are compiled separately, this can grow compilation times significantly. There are some optimizations to this, but those are out-of-scope for this post.
While the C++ templates are extremely powerful, I would recommend using them sparingly. Myself, I make heavy use of the standard library containers as they're very good, but I rarely write any templates myself. Bugs related to templates are typically hard to solve. This is primarily because the compiler may not know if the bug is in the template code or in the usage of the template as either one could be the culprit.
Function Redefinition
In a language like C, declaring a function assigns the symbol used to address the function. Every further declaration of the function needs to have the same prototype or otherwise the compiler complains. For example, let's say we declare int my_sum(int, int)
and later int my_sum(char, char)
:
$ cat my_sum.h
#pragma once
int my_sum(int a, int b);
int my_sum(char a, char b);
The compiler throws an error as expected:
$ gcc my_sum.c
In file included from my_sum.c:1:
my_sum.h:3:5: error: conflicting types for ‘my_sum’
3 | int my_sum(char a, char b);
| ^~~~~~
my_sum.h:2:5: note: previous declaration of ‘my_sum’ was here
2 | int my_sum(int a, int b);
| ^~~~~~
This makes sense. We want all our calls to my_sum()
to be unambiguous. More important, in a language like C, functions cannot be redefined. Even if we write the exact same implementation twice, as follows:
$ cat my_sum.c
#include "my_sum.h"
int my_sum(int a, int b)
{
return a + b;
}
int my_sum(int a, int b)
{
return a + b;
}
the compiler throws an error:
$ gcc my_sum.c
my_sum.c:8:5: error: redefinition of ‘my_sum’
8 | int my_sum(int a, int b)
| ^~~~~~
my_sum.c:3:5: note: previous definition of ‘my_sum’ was here
3 | int my_sum(int a, int b)
| ^~~~~~
However, while this helps avoid confusion, it wouldn't have to be like this. Instead of functions, let's consider normal variables in C. Let's assume we declare the integer a
and initially set it to 5
. Later, we can set it to 7
and that is totally allowed. Essentially, we are redefining variables:
$ cat int_a.c
#include <assert.h>
int main(void)
{
int a = 5;
assert(a == 5);
a = 7;
assert(a == 5);
return 0;
}
Compiling and running this fails as expected:
$ gcc int_a.c && ./a.out
a.out: int_a.c:7: main: Assertion `a == 5' failed.
Aborted (core dumped)
If we change the latter assertion to a == 7
, the code runs fine.
Why is it that we can redefine variables, but not functions? Are not the names we used to address functions not symbols just like the variables are?
Indeed, and in many languages, such as python and javascript, function redefinition is perfectly allowed.
Javascript Function Redefinition
Let's talk about javascript. Let's assume we want to write a function my_greeting()
which returns a string containing a greeting message. For example, let's make the implementation as follows:
function my_greeting() {
return 'hello';
}
Now, if we call the function and use the result as input for console.log()
, the expected message is printed:
> console.log(my_greeting());
hello
Unlike in a language like C, we can redefine the function at will. Let's say we feel a little Spanish and want to change greeting accordingly:
function my_greeting() {
return 'hola que tal';
}
Now, if we use the same code as before, it will call the new implementation:
> console.log(my_greeting());
hola que tal
Not only can we redefine standalone functions. We can also redefine functions belonging to an object. Let's say we have a class Banana
which has a function getColor()
which returns 'green'
:
class Banana {
getColor() {
return 'green';
}
}
Instantiating this class and calling getColor()
returns the expected result:
> my_banana = new Banana();
Banana {}
> my_banana.getColor();
'green'
Now, we can change the implementation of the function. If we want to change the implementation of the single instance of the class, we can do as follows:
> my_banana.getColor = function() { return 'yellow'; }
[Function]
> my_banana.getColor();
'yellow'
Instead, if we want to change the implementation of all instances of the class, we need to change the class. To demonstate this, let's first create an instance of Banana
and call it original_banana
:
> original_banana = new Banana();
Banana {}
> original_banana.getColor();
'green'
Now, let's change the implementation of the class method getColor
. To do that, we have to use prototype
:
> Banana.prototype.getColor = function() { return 'black'; }
Let's create a banana new_banana
and see that it has the new color:
> new_banana = new Banana();
Banana {}
> new_banana.getColor();
'black'
Not only does the new banana have a new color, but the previously created banana uses the new implementation, too:
> original_banana.getColor();
'black'
However, recall that we changed the implementation of the instance of my_banana
. That implementation stays intact:
> my_banana.getColor();
'yellow'
Python Function Redefinition
In Python, we can redefine functions in a similar way as what we did in javascript. Let's do the greeting example in Python. First, we define the original version of my_greeting
:
def my_greeting():
return 'hello'
Now, we can call it and print the result:
>>> print(my_greeting())
hello
Using the Python lambda syntax, we can redefine the function as follows:
>>> my_greeting = lambda: 'hola que tal'
Now, if we call it, it uses the new implementation:
>>> print(my_greeting())
hola que tal
Likewise, in Python, we can redefine functions of class instances. Let's define the class Banana
:
class Banana:
def get_color(self):
return 'green'
Let's create an instance of Banana
, call it my_banana
and call get_color()
:
>>> my_banana = Banana()
>>> my_banana.get_color()
'green'
Next, let's redefine get_color
. To do that we need to import types
and then call types.MethodType
:
>>> import types
>>> my_banana.get_color = types.MethodType(lambda self: 'yellow', my_banana)
>>> my_banana.get_color()
'yellow'
Changing the implementation for all instances of the class is similar. Instead of referencing the instance, we reference the class. Let's create an instance original_banana
, then change the implementation of Banana.get_color
, create the instance new_banana
and call all get_color()
's:
>>> original_banana = Banana()
>>> Banana.get_color = types.MethodType(lambda self: 'black', Banana)
>>> new_banana = Banana()
>>> original_banana.get_color()
'black'
>>> new_banana.get_color()
'black'
>>> my_banana.get_color()
'yellow'
Even though original_banana
was instantiated before the redefinition of get_color
, it uses the new implementation. However, the custom implementation in my_banana.get_color
stays intact.
Python Dynamic Classes
In Python, we have a concept of metaclassses. Using metaclasses, we can dynamically create classes.
Metaclasses have rather niche utility. As such, they are needed mostly in library development or similar. I will show an example of how metaclasses could be used. However, the example is rather long and, more importantly, not a recommended way of doing things. With the disclaimer out of the way, we are ready to proceed.
Let's say we want to create classes for various car models. The conventional (and preferred) way would be to create a class "Car" which would then include various information related to the car such as engine
, price
and whatever is needed. Furthermore, for our example, let's say we also have classses for the manufacturer:
class Model:
def __init__(self, engine, price):
self.engine = engine
self.price = price
class Manufacturer:
def __init__(self, models):
self.models = models
Now, let's say we get some data according to which create instances, i.e. objects, of our classes:
cars_dict = {
'Audi': {
'A5': {'engine': 'ICE', 'price': 46000},
'S5': {'engine': 'ICE', 'price': 55000},
'e-tron': {'engine': 'electric', 'price': 102000},
},
'Porsche': {
'911': {'engine': 'ICE', 'price': 214000},
'Taycan': {'engine': 'electric', 'price': 194000},
},
}
This data could then be read in a function as follows:
def create_cars_conventional(cars_dict):
cars = {}
for manufacturer in cars_dict:
models = {}
for model in cars_dict[manufacturer]:
models[model] = Model(cars_dict[manufacturer][model]['engine'],
cars_dict[manufacturer][model]['price'])
cars[manufacturer] = Manufacturer(models)
return cars
We could call this function and thereafter we could query the various instances for data:
>>> cars = create_cars_conventional(cars_dict)
>>> audi = cars['Audi']
>>> a5 = audi.models['A5']
>>> a5.engine
'ICE'
>>> a5.price
46000
For what it is worth, we could also directly query information of specific models:
>>> cars['Audi'].models['A5'].engine
'ICE'
However, what if we would like to create classes rather than instances of the cars? We can do that with metaprogramming. As classes are first-class citizens in Python, a class is an object. To create a class object, we call type()
with the class name, the inherited base classes and the dictionary. For instance, our "Audi A5" class might look like:
>>> A5 = type('A5', (), dict(engine='ICE', price=46000))
and now we can use it like:
>>> A5.engine
'ICE'
However, we would probably like to have a function like create_cars_conventional
which builds all classes for us. It could be as follows:
def create_cars_classes(cars_dict):
classes_all = types.SimpleNamespace()
for manufacturer in cars_dict:
classes_models = {}
for model in cars_dict[manufacturer]:
class_name = 'Car{}{}'.format(manufacturer, model)
car_engine = cars_dict[manufacturer][model]['engine']
car_price = cars_dict[manufacturer][model]['price']
car_class = type(class_name, (), dict(engine=car_engine, price=car_price))
setattr(classes_all, class_name, car_class)
return classes_all
Now it can be used like:
>>> cars = create_cars_classes(cars_dict)
>>> mycar = cars.CarAudiA5
>>> mycar.engine
'ICE'
>>> mycar.price
46000
As said, this is mainly a niche feature which most programmers will never need. But it has it's use in library development and so forth.
Conclusion
In this blog post I gave a definition for metaprogramming. Furthermore, I showed examples of what qualifies as metaprogramming. As we moved forward towards more complicated examples, the distinction between what is and isn't metaprogramming got less obvious. Ultimately, I wanted to show that metaprogramming can get somewhat esoteric. Moreover, I wanted to show that metaprogramming is rarely needed and solutions without metaprogramming should be considered first. Metaprogramming is mostly useful in library and similar scenarios, where we are effectively extending the programming language itself.
Top comments (0)