Writing your first pass with LLVM
What is a pass? Simply put, it is a thing that analyzes or transforms LLVM IR inputs.
What are we going to use? We will use opt
. What is opt
?
The opt command is the modular LLVM optimizer and analyzer. It takes LLVM source files as input, runs the specified optimizations or analyses on it, and then outputs the optimized file.
This is so-called middle end
. The input to the opt is LLVM IR
and the output is alsoLLVM IR
Before we start writing our first pass, we need llvm. We can install it or build it ourselves. I used LLVM 16.
We can install llvm using:brew install llvm
on MacOS or apt install llvm
on Ubuntu. If you decide to build llvm, you can follow the official instructions here:https://llvm.org/docs/GettingStarted.html#getting-the-source-code-and-building-llvm
From this point I will refere to llvm installation directory as LLVM_PATH
. To easier follow the rest of the stuff, you can just export LLVM_PATH
In my case it was export LLVM_PATH=/opt/homebrew/opt/llvm@16/
To test this, we can invoke:$LLVM_PATH/bin/opt --version
and this should return something similar to this:
Homebrew LLVM version 16.0.6
Optimized build.
Default target: arm64-apple-darwin22.5.0
Host CPU: apple-m1
Let’s start from the point of how are we going to invoke opt
$LLVM_PATH/bin/opt -load-pass-plugin hello.dylib -passes=first -disable-output path/to/input.ll
We will have to provide our plugin, where our pass (or multiple passes) is defined, and a list of the passes. The plugin is a shared library. opt
will load our plugin using dlopen
and it will try to find a predefined function name in that plugin to set up the plugin. In this case, it will look for llvmGetPassPluginInfo
function to initialize the plugin.
From the official documentation
/// The public entry point for a pass plugin.
///
/// When a plugin is loaded by the driver, it will call this entry point to
/// obtain information about this plugin and about how to register its passes.
/// This function needs to be implemented by the plugin, see the example below:
///
///
/// extern "C" ::llvm::PassPluginLibraryInfo LLVM_ATTRIBUTE_WEAK
/// llvmGetPassPluginInfo() {
/// return {
/// LLVM_PLUGIN_API_VERSION, "MyPlugin", "v0.1", [](PassBuilder &PB) { ... }
/// };
/// }
///
This means that our plugin should implement this function. Let’s quickly check the return type:
struct PassPluginLibraryInfo {
uint32_t APIVersion;
const char *PluginName;
const char *PluginVersion;
/// The callback for registering plugin passes with a \c PassBuilder
/// instance
void (*RegisterPassBuilderCallbacks)(PassBuilder &);
};
So, the minimal version of our plugin looks like this: first.cpp:
#include "llvm/Passes/PassBuilder.h"
#include "llvm/Passes/PassPlugin.h"
#include "llvm/Support/raw_ostream.h"
extern "C" LLVM_ATTRIBUTE_WEAK ::llvm::PassPluginLibraryInfo
llvmGetPassPluginInfo() {
return {LLVM_PLUGIN_API_VERSION, "First", "0.1", [](llvm::PassBuilder &PB) {
llvm::errs() << "Register pass builder callback :)\n";
}};
}
We can build it from command line using:
$LLVM_PATH/bin/clang++ -std=c++17 first.cpp -shared -fno-rtti -fno-exceptions `$LLVM_PATH/bin/llvm-config --cppflags --ldflags --system-libs --libs core` -o first
Before invoking opt
, we need the input LLVM IR
file. In our case, We will create a simple c source file and compile it to LLVM IR
.foo.c
int foo(int a, int b) {
return a + b;
}
Compile it using:
$LLVM_PATH/bin/clang -c -S -emit-llvm foo.c
Run opt with this command
$LLVM_PATH/bin/opt -load-pass-plugin first -passes=hello -disable-output foo.ll
And, we should get something like this:
Register pass builder callback :)
/opt/homebrew/opt/llvm@16/bin/opt: unknown pass name 'hello'
Now we can update the implementation of llvmGetPassPluginInfo
to register our pass.
extern "C" LLVM_ATTRIBUTE_WEAK ::llvm::PassPluginLibraryInfo
llvmGetPassPluginInfo() {
return {LLVM_PLUGIN_API_VERSION, "First", "0.1", [](llvm::PassBuilder &PB) {
PB.registerPipelineParsingCallback(
[](llvm::StringRef name, llvm::FunctionPassManager &FPM,
llvm::ArrayRef<llvm::PassBuilder::PipelineElement>) -> bool {
if (name == "hello") {
FPM.addPass(HelloPass{});
return true;
}
return false;
});
}};
}
registerPipelineParsingCallback
has several overloads, but for now, we are interested in function pass, as you can see from the signature of provided lambda.
llvm::StringRef name, llvm::FunctionPassManager &FPM, llvm::ArrayRef<llvm::PassBuilder::PipelineElement>
If the provided pass name is hello
, we instantiate HelloPass
. So, let’s see the definition of the pass.
struct HelloPass : llvm::PassInfoMixin<HelloPass> {
llvm::PreservedAnalyses run(llvm::Function &F,
llvm::FunctionAnalysisManager &) {
llvm::errs() << "Function name: " << F.getName() << '\n';
return llvm::PreservedAnalyses::all();
}
static bool isRequired() { return true; }
};
This pass is simple, it will run on functions and print function names. We added our pass to the llvm::FunctionPassManager
so that the manager will invoke run
on our pass, providing us llvm::Function
and llvm::FunctionAnalysisManager
In our case we do not even use llvm::FunctionAnalysisManager
We do not change anything when our pass is invoked, so we return llvm::PreservedAnalyses::all()
If we now compile again our plugin
$LLVM_PATH/bin/clang++ -std=c++17 first.cpp -shared -fno-rtti -fno-exceptions `$LLVM_PATH/bin/llvm-config --cppflags --ldflags --system-libs --libs core` -o first
run opt
$LLVM_PATH/bin/opt -load-pass-plugin first -passes=hello -disable-output foo.ll
we get:
Function name: foo
Now let’s update our pass to print the number of arguments as well:
llvm::errs() << "Arg size: " << F.arg_size() << '\n';
also, let’s add more functions to our foo.c
file
int foo(int a, int b) { return a + b; }
double bar(int a, char b, short c) { return a + b + c; }
int tar() { return 42; }
Compile our plugin! Compile foo.c
to LLVM IR
. Run opt
, we will get:
Function name: foo
Function Arg size: 2
Function name: bar
Function Arg size: 3
Function name: tar
Function Arg size: 0
Top comments (2)
Interesting. I'm just working on my first LLVM project. If I understand correctly this is something you would use to review code or inject things like instrumentation etc. I guess you could add custom optimizations too but most of those are probably good enough already.
Did I get this right?
Well, to some extent you are right. But with custom optimisations/transformations you can do things like: Implement obfuscation passes (obfuscate function names), inject function calls(for example inject
printf
orlogging
at the beginning of each function), remove function calls, change the control flow the way you want etc. I plan to write more about it in future posts.Thanks for the comment.