Elvis Oric

Posted on Aug 3, 2023 • Originally published at elvisoric.com on Aug 3, 2023

LLVM - Writing your first pass

#cpp #llvm #compiler #programming

Writing your first pass with LLVM

What is a pass? Simply put, it is a thing that analyzes or transforms LLVM IR inputs.

What are we going to use? We will use opt. What is opt?

The opt command is the modular LLVM optimizer and analyzer. It takes LLVM source files as input, runs the specified optimizations or analyses on it, and then outputs the optimized file.

This is so-called middle end. The input to the opt is LLVM IR and the output is alsoLLVM IR

Before we start writing our first pass, we need llvm. We can install it or build it ourselves. I used LLVM 16.

We can install llvm using:brew install llvm on MacOS or apt install llvm on Ubuntu. If you decide to build llvm, you can follow the official instructions here:https://llvm.org/docs/GettingStarted.html#getting-the-source-code-and-building-llvm

From this point I will refere to llvm installation directory as LLVM_PATH. To easier follow the rest of the stuff, you can just export LLVM_PATHIn my case it was export LLVM_PATH=/opt/homebrew/opt/llvm@16/To test this, we can invoke:$LLVM_PATH/bin/opt --version and this should return something similar to this:

Homebrew LLVM version 16.0.6
  Optimized build.
  Default target: arm64-apple-darwin22.5.0
  Host CPU: apple-m1

Let’s start from the point of how are we going to invoke opt

$LLVM_PATH/bin/opt -load-pass-plugin hello.dylib -passes=first -disable-output path/to/input.ll

We will have to provide our plugin, where our pass (or multiple passes) is defined, and a list of the passes. The plugin is a shared library. opt will load our plugin using dlopen and it will try to find a predefined function name in that plugin to set up the plugin. In this case, it will look for llvmGetPassPluginInfo function to initialize the plugin.

From the official documentation

/// The public entry point for a pass plugin.
///
/// When a plugin is loaded by the driver, it will call this entry point to
/// obtain information about this plugin and about how to register its passes.
/// This function needs to be implemented by the plugin, see the example below:
///
/// 
/// extern "C" ::llvm::PassPluginLibraryInfo LLVM_ATTRIBUTE_WEAK
/// llvmGetPassPluginInfo() {
/// return {
/// LLVM_PLUGIN_API_VERSION, "MyPlugin", "v0.1", [](PassBuilder &PB) { ... }
/// };
/// }
///

This means that our plugin should implement this function. Let’s quickly check the return type:

struct PassPluginLibraryInfo {
  uint32_t APIVersion;
  const char *PluginName;
  const char *PluginVersion;
  /// The callback for registering plugin passes with a \c PassBuilder
  /// instance
  void (*RegisterPassBuilderCallbacks)(PassBuilder &);
};

So, the minimal version of our plugin looks like this: first.cpp:

#include "llvm/Passes/PassBuilder.h"
#include "llvm/Passes/PassPlugin.h"
#include "llvm/Support/raw_ostream.h"

extern "C" LLVM_ATTRIBUTE_WEAK ::llvm::PassPluginLibraryInfo
llvmGetPassPluginInfo() {
  return {LLVM_PLUGIN_API_VERSION, "First", "0.1", [](llvm::PassBuilder &PB) {
            llvm::errs() << "Register pass builder callback :)\n";
          }};
}

We can build it from command line using:

$LLVM_PATH/bin/clang++ -std=c++17 first.cpp -shared -fno-rtti -fno-exceptions `$LLVM_PATH/bin/llvm-config --cppflags --ldflags --system-libs --libs core` -o first

Before invoking opt, we need the input LLVM IR file. In our case, We will create a simple c source file and compile it to LLVM IR.foo.c

int foo(int a, int b) { 
  return a + b; 
}

Compile it using:

$LLVM_PATH/bin/clang -c -S -emit-llvm foo.c

Run opt with this command

$LLVM_PATH/bin/opt -load-pass-plugin first -passes=hello -disable-output foo.ll

And, we should get something like this:

Register pass builder callback :)
/opt/homebrew/opt/llvm@16/bin/opt: unknown pass name 'hello'

Now we can update the implementation of llvmGetPassPluginInfo to register our pass.

extern "C" LLVM_ATTRIBUTE_WEAK ::llvm::PassPluginLibraryInfo
llvmGetPassPluginInfo() {
  return {LLVM_PLUGIN_API_VERSION, "First", "0.1", [](llvm::PassBuilder &PB) {
            PB.registerPipelineParsingCallback(
                [](llvm::StringRef name, llvm::FunctionPassManager &FPM,
                   llvm::ArrayRef<llvm::PassBuilder::PipelineElement>) -> bool {
                  if (name == "hello") {
                    FPM.addPass(HelloPass{});
                    return true;
                  }
                  return false;
                });
          }};
}

registerPipelineParsingCallback has several overloads, but for now, we are interested in function pass, as you can see from the signature of provided lambda.

llvm::StringRef name, llvm::FunctionPassManager &FPM,                llvm::ArrayRef<llvm::PassBuilder::PipelineElement>

If the provided pass name is hello, we instantiate HelloPass. So, let’s see the definition of the pass.

struct HelloPass : llvm::PassInfoMixin<HelloPass> {
  llvm::PreservedAnalyses run(llvm::Function &F,
                              llvm::FunctionAnalysisManager &) {
    llvm::errs() << "Function name: " << F.getName() << '\n';
    return llvm::PreservedAnalyses::all();
  }

  static bool isRequired() { return true; }
};

This pass is simple, it will run on functions and print function names. We added our pass to the llvm::FunctionPassManager so that the manager will invoke run on our pass, providing us llvm::Function and llvm::FunctionAnalysisManager In our case we do not even use llvm::FunctionAnalysisManager

We do not change anything when our pass is invoked, so we return llvm::PreservedAnalyses::all()

If we now compile again our plugin

$LLVM_PATH/bin/clang++ -std=c++17 first.cpp -shared -fno-rtti -fno-exceptions `$LLVM_PATH/bin/llvm-config --cppflags --ldflags --system-libs --libs core` -o first

run opt

$LLVM_PATH/bin/opt -load-pass-plugin first -passes=hello -disable-output foo.ll

we get:

Function name: foo

Now let’s update our pass to print the number of arguments as well:

llvm::errs() << "Arg size: " << F.arg_size() << '\n';

also, let’s add more functions to our foo.c file

int foo(int a, int b) { return a + b; }
double bar(int a, char b, short c) { return a + b + c; }
int tar() { return 42; }

Compile our plugin! Compile foo.c to LLVM IR. Run opt, we will get:

Function name: foo
Function Arg size: 2
Function name: bar
Function Arg size: 3
Function name: tar
Function Arg size: 0

Top comments (2)

Shai Almog • Aug 4 '23

Interesting. I'm just working on my first LLVM project. If I understand correctly this is something you would use to review code or inject things like instrumentation etc. I guess you could add custom optimizations too but most of those are probably good enough already.

Did I get this right?

Elvis Oric • Aug 4 '23

Well, to some extent you are right. But with custom optimisations/transformations you can do things like: Implement obfuscation passes (obfuscate function names), inject function calls(for example inject printf or logging at the beginning of each function), remove function calls, change the control flow the way you want etc. I plan to write more about it in future posts.
Thanks for the comment.

DEV Community

LLVM - Writing your first pass

Writing your first pass with LLVM

Top comments (2)

Read next

Designing and Implementing Ant Design Global App Tour for React Apps.

👻 Scary Ghost Cursor with Smoke Trail! 💀 code using html5,css3 and javascript

JavaScript Interview Cheat Sheet - Part 2

Explaining donut like 5 years old Part-4 (Last)