Tomasz Wegrzanowski

Posted on Jan 21, 2022

100 Languages Speedrun: Episode 62: LLVM Intermediate Representation

#llvm #assembly

Creating a well performing programming language is just far too much work, and most of that work just repeats what other languages have done before.

So it would make so much sense for a new language to get transformed into some simple Intermediate Representation, and then something else picks it up and turns it into a high performance code specific to every target architecture like x86 or ARM or whatnot.

LLVM and its LLVM Intermediate Representation are the most popular way to do this.

History lesson

But before I get to the LLVM, a quick history lesson - why wasn't it done before?

In theory various VMs like most notably Java Virtual Machine can support multiple languages. However as it turns out, unless your language is extremely Java-shaped, compilation is really difficult and the code will run really poorly. So it works for something like Kotlin (which is basically Better Java), but attempts at making Ruby or Python run on a JVM were both difficult and performed poorly. (TruffleRuby runs fast on GraalVM, which is a new VM, not on the base JVM itself, it's all a complicated story).

And JVM was a relatively successful case, as far as these go. Many other attempts like Guile or ParrotVM just died, and nobody outside Microsoft really tried to target .NET VM.

The other popular target was compiling languages to JavaScript, so they can run in a browser, and that went far far worse than compiling for the JVM. Again, it worked for various "Better JavaScript" languages like CoffeeScript or ES6 JavaScript, but Ruby on JavaScript engine is crazy slow, even after making a lot of language changes to fit the JavaScript engine. Nobody even seriously thinks of making Ruby-compatible Ruby on JavaScript VM, as that would be impossibly slow. WASM also got zero traction outside cryptominers and other scams.

But before LLVM, we had the GNU Compiler Collection, an successful Open Source compiler collection with multiple programming language frontends and multiple architecture backends, so why people didn't use that?

And that's because the Free Software Foundation were fucking morons, and tried to obfuscate the intermediate representation as much as they could, to prevent people from using parts of GCC to do some spooky proprietary stuff. I'm not making this up, there's plenty of mails from Richard Stallman himself showing this idiocy was an explicit FCC policy, even way past the point where LLVM was taking over the compiler world. And it's not just new languages that need access to compiler internals. Editors need it for proper syntax highlighting, linters, refactoring tools, and all sorts of programmer utilities need this access. FSF got fuck all for their hardline approach, they just drove GCC into irrelevance. And that's how we got here.

Getting Started with LLVM

LLVM doesn't guarantee stability of its internal representation, and it's not really meant to be too human editable. It's also difficult to write complete programs in LLVM, as it doesn't have any I/O itself, it's all up to each language.

If you know C or C++ or some other low level language with LLVM backend, the easiest way to get started is to write some simple programs in that language, then tell the compiler to generate LLVM IR.

For example if we start with C Hello, World:

#include <stdio.h>

int main() {
  printf("Hello, World!\n");
  return 0;
}

And run clang -S -emit-llvm hello.c, we get this hello.ll:

; ModuleID = 'hello.c'
source_filename = "hello.c"
target datalayout = "e-m:o-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-apple-macosx11.0.0"

@.str = private unnamed_addr constant [15 x i8] c"Hello, World!\0A\00", align 1
; Function Attrs: noinline nounwind optnone ssp uwtable
define i32 @main() #0 {
  %1 = alloca i32, align 4
  store i32 0, i32* %1, align 4
  %2 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([15 x i8], [15 x i8]* @.str, i64 0, i64 0))
  ret i32 0
}
declare i32 @printf(i8*, ...) #1

attributes #0 = { noinline nounwind optnone ssp uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "darwin-stkchk-strong-link" "disable-tail-calls"="false" "frame-pointer"="all" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "probe-stack"="___chkstk_darwin" "stack-protector-buffer-size"="8" "target-cpu"="penryn" "target-features"="+cx16,+cx8,+fxsr,+mmx,+sahf,+sse,+sse2,+sse3,+sse4.1,+ssse3,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }
attributes #1 = { "correctly-rounded-divide-sqrt-fp-math"="false" "darwin-stkchk-strong-link" "disable-tail-calls"="false" "frame-pointer"="all" "less-precise-fpmad"="false" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "probe-stack"="___chkstk_darwin" "stack-protector-buffer-size"="8" "target-cpu"="penryn" "target-features"="+cx16,+cx8,+fxsr,+mmx,+sahf,+sse,+sse2,+sse3,+sse4.1,+ssse3,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }

!llvm.module.flags = !{!0, !1, !2}
!llvm.ident = !{!3}

!0 = !{i32 2, !"SDK Version", [2 x i32] [i32 11, i32 1]}
!1 = !{i32 1, !"wchar_size", i32 4}
!2 = !{i32 7, !"PIC Level", i32 2}
!3 = !{!"Apple clang version 12.0.0 (clang-1200.0.32.29)"}

That's some LLVM IR, but it contains so much extra stuff as well.

But no matter, we can just compile it like regular C files, and run. If we do it this way, they'll be linked to C library by default, so we get access to C functions like printf.

$ clang -S -emit-llvm hello.c
$ clang hello.ll -o hello
$ ./hello
Hello, World!

Simplest LLVM IR program

Just like with assembly, let's start with a very simple program that just returns some exit code. In this case we're not doing any system calls, we just return from main function, and C library does the rest.

define i32 @main() {
  ret i32 7
}

This works, except it generates a warning message:

$ clang exit7.ll -o exit7
warning: overriding the module target triple with x86_64-apple-macosx11.0.0 [-Woverride-module]
1 warning generated.

So let's specify the target:

target triple = "x86_64-apple-macosx11.0.0"

define i32 @main() {
  ret i32 7
}

And run it:

$ clang exit7.ll -o exit7
$ ./exit7
$ echo $?
7

What's going on:

target directive specifies the target triple (CPU architecture; vendor; OS version). We could do some fancy cross-compiling here.
symbols all have prefixes, @main means global main
LLVM types are expressed in bit length, like i32 is 32bit integer.
ret i32 7 means return 32bit integer with value 7
define i32 @main() means define function main, with no arguments, and returning 32bit integer

Math

Before we do anything complicated like Hello Worlds or FizzBuzzes, let's see how LLVM adds some numbers. And to keep things simple, let's return the answer as process exit code:

target triple = "x86_64-apple-macosx11.0.0"

define i32 @main() {
  ; There are no "assign constant" instructions in the LLVM IR
  %1 = add i32 13, 0
  %2 = add i32 3, 0
  %3 = add i32 30, 0
  ; Now do the math
  %4 = mul i32 %1, %2
  %5 = add i32 %4, %3
  ret i32 %5
}

$ clang math.ll -o math
$ ./math
$ echo $?
69

What's going on:

You can have any number of local variables, they all start with sigils, @ are globals, % are locals
all names are prefixed in LLVM, so they don't conflict with keywords - this is useful, as LLVM wants to be language independent, so navigating what's a keyword in a language vs what's the keyword in LLVM IR etc. would be a lot more work than the usual compiler has to deal with
if you don't want to bother with names, you can just call them %1, %2 etc.
every local variable can have only one definition, you cannot reassign %1 later
there's nothing like setting a local variable to a constant, I'm using add i32 N, 0 here to fake it
the reason for it is that since everything has only one definition, if %1 is always equal 5, then LLVM can just mass replace every occurrence of %1 with 5
I'll get to how we can have "variable" variables later
mul i32 multiplies
add i32 adds

Optimizations

Now let's look inside math binary LLVM produced:

objdump -x86-asm-syntax=intel -d math

math:   file format Mach-O 64-bit x86-64


Disassembly of section __TEXT,__text:

0000000100003f60 _main:
100003f60: 31 c0                        xor eax, eax
100003f62: 89 c1                        mov ecx, eax
100003f64: 83 c1 0d                     add ecx, 13
100003f67: 89 c2                        mov edx, eax
100003f69: 83 c2 03                     add edx, 3
100003f6c: 83 c0 1e                     add eax, 30
100003f6f: 0f af ca                     imul    ecx, edx
100003f72: 01 c1                        add ecx, eax
100003f74: 89 c8                        mov eax, ecx
100003f76: c3                           ret

That's a lot of calculations, step by step, translating this to normal assignments and ignoring "add zero":

xor eax, eax - eax = 0
mov ecx, eax - ecx = 0
add ecx, 13 - ecx = 13
mov edx, eax - edx = 0
add edx, 3 - edx = 3
add eax, 30 - eax = 30
imul ecx, edx - eax (39) = ecx (13) * edx (3)
add ecx, eax - ecx (69) = ecx (30) + eax (39)
mov eax, ecx - eax (69) = ecx (69)
ret - return eax (69)

Let's turn on the optimizer and try again:

$ clang -O2 math.ll -o math
$ objdump -x86-asm-syntax=intel -d math

math:   file format Mach-O 64-bit x86-64


Disassembly of section __TEXT,__text:

0000000100003fb0 _main:
100003fb0: b8 45 00 00 00               mov eax, 69
100003fb5: c3                           ret

As you can see, our language frontend (whatever we used to generate LLVM IR code) didn't need to worry about optimizing anything, it can just generate extremely naive code, and LLVM will handle optimizations for us!

Hello, World!

Let's print a Hello, World! message.

target triple = "x86_64-apple-macosx11.0.0"

@hello = constant [15 x i8] c"Hello, World!\0A\00"

define i32 @main() {
  call i32 (i8*, ...) @printf(
    i8* getelementptr ([15 x i8], [15 x i8]* @hello, i64 0, i64 0)
  )
  ret i32 0
}
declare i32 @printf(i8*, ...)

$ clang hello2.ll -o hello2
$ ./hello2
Hello, World!

There's a call going on here.

we define global @hello to keep our message
we need to specify its type, constant [15 x i8], which means an 15-byte constant; i8 is a byte (8-bit integer)
c"Hello, World!\0A\00" is a string with a newline at the end, then a 0 byte, the way C functions want the strings to be
as @printf function is external, we need to declare its interface, it's declare i32 @printf(i8*, ...) - that means it returns i32, takes i8* (pointer to a string) first, and ... means it takes some extra arguments after that
call i32 (i8*, ...) @printf calls the @printf function, I spread it over multiple lines to make it more clear
now you'd think we could just pass @hello as argument, like it works in C; unfortunately LLVM IR has really convoluted pointer model, so there's a lot of getelementptr instructions in LLVM code

As we won't really be doing anything with getelementptr except pass the constant strings, I'll skip the detailed explanations. It's one of more complicated parts of LLVM IR.

Loop - stack memory version

Let's write a loop that prints numbers from 1 to 10.

The main problem is that we want to increment the i, but LLVM only allows one definition per variable.

There are two ways around it, to have "variable variables" in LLVM IR:

we can save to memory and load from memory
we can use phi nodes - then value of variable in a given block depends on which block we came from

Let's try both ways, first memory version. We'll use stack for our memory:

target triple = "x86_64-apple-macosx11.0.0"

@fnumber = constant [4 x i8] c"%d\0A\00"

define i32 @main() {
start:
  %i = alloca i32
  store i32 1, i32* %i
  br label %loop

loop:
  %i0 = load i32, i32* %i

  call i32 (i8*, ...) @printf(
    i8* getelementptr ([4 x i8], [4 x i8]* @fnumber, i64 0, i64 0),
    i32 %i0
  )

  %i1 = add i32 %i0, 1
  store i32 %i1, i32* %i
  %t = icmp sgt i32 %i1, 10

  br i1 %t, label %exit, label %loop

exit:
  ret i32 0
}
declare i32 @printf(i8*, ...)

So step by step:

we have three blocks - start, loop, and exit
start block sets up i=1 and goes to loop
loop block prints i, increments i, checks depending on if it's greater than 10 goes back to loop, or to exit
exit block returns 0

In the start block:

%i = alloca i32 allocates memory for i variable, of size i32 - that allocation will happen on the stack
store i32 1, i32* %i stores value 1 to where i is pointing to
br label %loop jumps to loop unconditionally - there's no fall-through, every block needs to end with a jump (br), return (ret) or something like that

In the loop block:

%i0 = load i32, i32* %i loads value of i to i0 - this is the only definition of i0, so it's fine
then we call printf passing two arguments, basically doing printf("%d\n", i0)
we calculate %i1 = i0 + 1 and store it to %i
we calculate if i1 is Signed Greater Than 10, and store returns to %t
we then have calculated branch, depending on value of %t, we go to exit or loop

The exit block just returns 0 like we did previously.

Loop - phi version

The other way, and actually much more in line with how compilers work, is to not store anything on stack, but use phi instruction to define a value. phi instruction contains a list of blocks and their associated values - its result will depend on where we came from.

target triple = "x86_64-apple-macosx11.0.0"

@fnumber = constant [4 x i8] c"%d\0A\00"

define i32 @main() {
start:
  br label %loop

loop:
  %i0 = phi i32 [ 1, %start ], [ %i1, %loop ]

  call i32 (i8*, ...) @printf(
    i8* getelementptr ([4 x i8], [4 x i8]* @fnumber, i64 0, i64 0),
    i32 %i0
  )

  %i1 = add i32 %i0, 1
  %t = icmp sgt i32 %i1, 10

  br i1 %t, label %exit, label %loop

exit:
  ret i32 0
}
declare i32 @printf(i8*, ...)

start block doesn't actually do anything, but we still label it for sake of phi.
in loop block, %i0 = phi i32 [ 1, %start ], [ %i1, %loop ] means that if we came from start, then i0 is 1, and if we came from loop it's i1
then we the rest of the loop block is identical to what we had before, except we don't store anything
exit block is just like before

Generated loop code

It's important to note that the shape of the LLVM IR code coming in, and the shape of generated assembly, really don't have much in common. LLVM will very aggressively optimize things. For example both loop codes end up like this:

Disassembly of section __TEXT,__text:

0000000100003ef0 _main:
100003ef0: 53                           push    rbx
100003ef1: 48 8d 1d ba 00 00 00         lea rbx, [rip + 186]
100003ef8: 48 89 df                     mov rdi, rbx
100003efb: be 01 00 00 00               mov esi, 1
100003f00: 31 c0                        xor eax, eax
100003f02: e8 8b 00 00 00               call    139 <dyld_stub_binder+0x100003f92>
100003f07: 48 89 df                     mov rdi, rbx
100003f0a: be 02 00 00 00               mov esi, 2
100003f0f: 31 c0                        xor eax, eax
100003f11: e8 7c 00 00 00               call    124 <dyld_stub_binder+0x100003f92>
100003f16: 48 89 df                     mov rdi, rbx
100003f19: be 03 00 00 00               mov esi, 3
100003f1e: 31 c0                        xor eax, eax
100003f20: e8 6d 00 00 00               call    109 <dyld_stub_binder+0x100003f92>
100003f25: 48 89 df                     mov rdi, rbx
100003f28: be 04 00 00 00               mov esi, 4
100003f2d: 31 c0                        xor eax, eax
100003f2f: e8 5e 00 00 00               call    94 <dyld_stub_binder+0x100003f92>
100003f34: 48 89 df                     mov rdi, rbx
100003f37: be 05 00 00 00               mov esi, 5
100003f3c: 31 c0                        xor eax, eax
100003f3e: e8 4f 00 00 00               call    79 <dyld_stub_binder+0x100003f92>
100003f43: 48 89 df                     mov rdi, rbx
100003f46: be 06 00 00 00               mov esi, 6
100003f4b: 31 c0                        xor eax, eax
100003f4d: e8 40 00 00 00               call    64 <dyld_stub_binder+0x100003f92>
100003f52: 48 89 df                     mov rdi, rbx
100003f55: be 07 00 00 00               mov esi, 7
100003f5a: 31 c0                        xor eax, eax
100003f5c: e8 31 00 00 00               call    49 <dyld_stub_binder+0x100003f92>
100003f61: 48 89 df                     mov rdi, rbx
100003f64: be 08 00 00 00               mov esi, 8
100003f69: 31 c0                        xor eax, eax
100003f6b: e8 22 00 00 00               call    34 <dyld_stub_binder+0x100003f92>
100003f70: 48 89 df                     mov rdi, rbx
100003f73: be 09 00 00 00               mov esi, 9
100003f78: 31 c0                        xor eax, eax
100003f7a: e8 13 00 00 00               call    19 <dyld_stub_binder+0x100003f92>
100003f7f: 48 89 df                     mov rdi, rbx
100003f82: be 0a 00 00 00               mov esi, 10
100003f87: 31 c0                        xor eax, eax
100003f89: e8 04 00 00 00               call    4 <dyld_stub_binder+0x100003f92>
100003f8e: 31 c0                        xor eax, eax
100003f90: 5b                           pop rbx
100003f91: c3                           ret

LLVM decided that the best way to loop this is basically with printf("%d\n", 1); printf("%d\n", 2); ... printf("%d\n", 10);.

Fibonacci

Here's Fibonacci in LLVM IR:

target triple = "x86_64-apple-macosx11.0.0"

@format = constant [14 x i8] c"fib(%d) = %d\0A\00"

define i32 @fib(i32 %a) {
start_fib:
  ; if (a <= 2) { fib_small } else { fib_big }
  %t = icmp sle i32 %a, 2
  br i1 %t, label %fib_small, label %fib_big

fib_big:
  ; f1 = fib(a-1)
  %a1 = sub i32 %a, 1
  %f1 = call i32 @fib(i32 %a1)

  ; f2 = fib(a-1)
  %a2 = sub i32 %a, 2
  %f2 = call i32 @fib(i32 %a2)

  ; return f1 + f2
  %e = add i32 %f1, %f2
  ret i32 %e

fib_small:
  ; return 1
  ret i32 1
}

define i32 @main() {
start:
  br label %loop

loop:
  %i0 = phi i32 [ 1, %start ], [ %i1, %loop ]
  %j = call i32 @fib(i32 %i0)

  call i32 (i8*, ...) @printf(
    i8* getelementptr ([14 x i8], [14 x i8]* @format, i64 0, i64 0),
    i32 %i0,
    i32 %j
  )

  %i1 = add i32 %i0, 1
  %t = icmp sgt i32 %i1, 20

  br i1 %t, label %exit, label %loop

exit:
  ret i32 0
}
declare i32 @printf(i8*, ...)

This doesn't really use any new techniques.

ifs use same basic blocks as loops - and they need same phi nodes if you assign something in both then and else branches and want to use it after the if.
unlike assembly, there's no "calling convention", saving registers, or any such concerns - we just do call and LLVM will figure out how to translate that into register actions when it's generating code

$clang -O2 fib.ll -o fib
$ ./fib
fib(1) = 1
fib(2) = 1
fib(3) = 2
fib(4) = 3
fib(5) = 5
fib(6) = 8
fib(7) = 13
fib(8) = 21
fib(9) = 34
fib(10) = 55
fib(11) = 89
fib(12) = 144
fib(13) = 233
fib(14) = 377
fib(15) = 610
fib(16) = 987
fib(17) = 1597
fib(18) = 2584
fib(19) = 4181
fib(20) = 6765

FizzBuzz

And we obviously finish with the FizzBuzz:

target triple = "x86_64-apple-macosx11.0.0"

@fizz = constant [6 x i8] c"Fizz\0A\00"
@buzz = constant [6 x i8] c"Buzz\0A\00"
@fizzbuzz = constant [10 x i8] c"FizzBuzz\0A\00"
@fnumber = constant [4 x i8] c"%d\0A\00"

define i32 @main() {
start:
  br label %loop

loop:
  %i0 = phi i32 [ 1, %start ], [ %i1, %loop_end ]
  %m15 = srem i32 %i0, 15
  %e15 = icmp eq i32 %m15, 0
  br i1 %e15, label %fizzbuzz, label %not_fizzbuzz

fizzbuzz:
  call i32 (i8*, ...) @printf(
    i8* getelementptr ([10 x i8], [10 x i8]* @fizzbuzz, i64 0, i64 0)
  )
  br label %loop_end

not_fizzbuzz:
  %m5 = srem i32 %i0, 5
  %e5 = icmp eq i32 %m5, 0
  br i1 %e5, label %buzz, label %not_buzz

buzz:
  call i32 (i8*, ...) @printf(
    i8* getelementptr ([6 x i8], [6 x i8]* @buzz, i64 0, i64 0)
  )
  br label %loop_end

not_buzz:
  %m3 = srem i32 %i0, 3
  %e3 = icmp eq i32 %m3, 0
  br i1 %e3, label %fizz, label %not_fizz

fizz:
  call i32 (i8*, ...) @printf(
    i8* getelementptr ([6 x i8], [6 x i8]* @fizz, i64 0, i64 0)
  )
  br label %loop_end

not_fizz:
  call i32 (i8*, ...) @printf(
    i8* getelementptr ([4 x i8], [4 x i8]* @fnumber, i64 0, i64 0),
    i32 %i0
  )
  br label %loop_end

loop_end:
  %i1 = add i32 %i0, 1
  %t = icmp sgt i32 %i1, 100

  br i1 %t, label %exit, label %loop

exit:
  ret i32 0
}
declare i32 @printf(i8*, ...)

There's nothing new here other than srem for Signed REMinder (that is %) operator.

Should you use LLVM IR?

LLVM IR is far less human friendly than assembly, and it's really only meant to be generated by compiler frontends. If you make a typo in basic block structure, you get either completely meaningless error, or LLVM will crash during compilation.

It's only barely sufficiently human readable to let people creating compilers debug their compilers. There are some analysis tools, but all of that is also really meant mainly to debug your compiler.

So LLVM is a great tool for creating compilers, but LLVM IR is really not something you should be writing by hand.

And since this is recommendation section, don't be like the FSF. "Free Software" under centralized control that intentionally prevents it from reaching its full potential is doomed to fail.

Code

All code examples for the series will be in this repository.

Code for the LLVM Intermediate Representation episode is available here.

DEV Community