Creating a well performing programming language is just far too much work, and most of that work just repeats what other languages have done before.
So it would make so much sense for a new language to get transformed into some simple Intermediate Representation, and then something else picks it up and turns it into a high performance code specific to every target architecture like x86 or ARM or whatnot.
LLVM and its LLVM Intermediate Representation are the most popular way to do this.
History lesson
But before I get to the LLVM, a quick history lesson - why wasn't it done before?
In theory various VMs like most notably Java Virtual Machine can support multiple languages. However as it turns out, unless your language is extremely Java-shaped, compilation is really difficult and the code will run really poorly. So it works for something like Kotlin (which is basically Better Java), but attempts at making Ruby or Python run on a JVM were both difficult and performed poorly. (TruffleRuby runs fast on GraalVM, which is a new VM, not on the base JVM itself, it's all a complicated story).
And JVM was a relatively successful case, as far as these go. Many other attempts like Guile or ParrotVM just died, and nobody outside Microsoft really tried to target .NET VM.
The other popular target was compiling languages to JavaScript, so they can run in a browser, and that went far far worse than compiling for the JVM. Again, it worked for various "Better JavaScript" languages like CoffeeScript or ES6 JavaScript, but Ruby on JavaScript engine is crazy slow, even after making a lot of language changes to fit the JavaScript engine. Nobody even seriously thinks of making Ruby-compatible Ruby on JavaScript VM, as that would be impossibly slow. WASM also got zero traction outside cryptominers and other scams.
But before LLVM, we had the GNU Compiler Collection, an successful Open Source compiler collection with multiple programming language frontends and multiple architecture backends, so why people didn't use that?
And that's because the Free Software Foundation were fucking morons, and tried to obfuscate the intermediate representation as much as they could, to prevent people from using parts of GCC to do some spooky proprietary stuff. I'm not making this up, there's plenty of mails from Richard Stallman himself showing this idiocy was an explicit FCC policy, even way past the point where LLVM was taking over the compiler world. And it's not just new languages that need access to compiler internals. Editors need it for proper syntax highlighting, linters, refactoring tools, and all sorts of programmer utilities need this access. FSF got fuck all for their hardline approach, they just drove GCC into irrelevance. And that's how we got here.
Getting Started with LLVM
LLVM doesn't guarantee stability of its internal representation, and it's not really meant to be too human editable. It's also difficult to write complete programs in LLVM, as it doesn't have any I/O itself, it's all up to each language.
If you know C or C++ or some other low level language with LLVM backend, the easiest way to get started is to write some simple programs in that language, then tell the compiler to generate LLVM IR.
For example if we start with C Hello, World:
#include <stdio.h>
int main() {
printf("Hello, World!\n");
return 0;
}
And run clang -S -emit-llvm hello.c
, we get this hello.ll
:
; ModuleID = 'hello.c'
source_filename = "hello.c"
target datalayout = "e-m:o-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-apple-macosx11.0.0"
@.str = private unnamed_addr constant [15 x i8] c"Hello, World!\0A\00", align 1
; Function Attrs: noinline nounwind optnone ssp uwtable
define i32 @main() #0 {
%1 = alloca i32, align 4
store i32 0, i32* %1, align 4
%2 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([15 x i8], [15 x i8]* @.str, i64 0, i64 0))
ret i32 0
}
declare i32 @printf(i8*, ...) #1
attributes #0 = { noinline nounwind optnone ssp uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "darwin-stkchk-strong-link" "disable-tail-calls"="false" "frame-pointer"="all" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "probe-stack"="___chkstk_darwin" "stack-protector-buffer-size"="8" "target-cpu"="penryn" "target-features"="+cx16,+cx8,+fxsr,+mmx,+sahf,+sse,+sse2,+sse3,+sse4.1,+ssse3,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }
attributes #1 = { "correctly-rounded-divide-sqrt-fp-math"="false" "darwin-stkchk-strong-link" "disable-tail-calls"="false" "frame-pointer"="all" "less-precise-fpmad"="false" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "probe-stack"="___chkstk_darwin" "stack-protector-buffer-size"="8" "target-cpu"="penryn" "target-features"="+cx16,+cx8,+fxsr,+mmx,+sahf,+sse,+sse2,+sse3,+sse4.1,+ssse3,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }
!llvm.module.flags = !{!0, !1, !2}
!llvm.ident = !{!3}
!0 = !{i32 2, !"SDK Version", [2 x i32] [i32 11, i32 1]}
!1 = !{i32 1, !"wchar_size", i32 4}
!2 = !{i32 7, !"PIC Level", i32 2}
!3 = !{!"Apple clang version 12.0.0 (clang-1200.0.32.29)"}
That's some LLVM IR, but it contains so much extra stuff as well.
But no matter, we can just compile it like regular C files, and run. If we do it this way, they'll be linked to C library by default, so we get access to C functions like printf
.
$ clang -S -emit-llvm hello.c
$ clang hello.ll -o hello
$ ./hello
Hello, World!
Simplest LLVM IR program
Just like with assembly, let's start with a very simple program that just returns some exit code. In this case we're not doing any system calls, we just return from main
function, and C library does the rest.
define i32 @main() {
ret i32 7
}
This works, except it generates a warning message:
$ clang exit7.ll -o exit7
warning: overriding the module target triple with x86_64-apple-macosx11.0.0 [-Woverride-module]
1 warning generated.
So let's specify the target:
target triple = "x86_64-apple-macosx11.0.0"
define i32 @main() {
ret i32 7
}
And run it:
$ clang exit7.ll -o exit7
$ ./exit7
$ echo $?
7
What's going on:
-
target
directive specifies the target triple (CPU architecture; vendor; OS version). We could do some fancy cross-compiling here. - symbols all have prefixes,
@main
means globalmain
- LLVM types are expressed in bit length, like
i32
is 32bit integer. -
ret i32 7
means return 32bit integer with value7
-
define i32 @main()
means define functionmain
, with no arguments, and returning 32bit integer
Math
Before we do anything complicated like Hello Worlds or FizzBuzzes, let's see how LLVM adds some numbers. And to keep things simple, let's return the answer as process exit code:
target triple = "x86_64-apple-macosx11.0.0"
define i32 @main() {
; There are no "assign constant" instructions in the LLVM IR
%1 = add i32 13, 0
%2 = add i32 3, 0
%3 = add i32 30, 0
; Now do the math
%4 = mul i32 %1, %2
%5 = add i32 %4, %3
ret i32 %5
}
$ clang math.ll -o math
$ ./math
$ echo $?
69
What's going on:
- You can have any number of local variables, they all start with sigils,
@
are globals,%
are locals - all names are prefixed in LLVM, so they don't conflict with keywords - this is useful, as LLVM wants to be language independent, so navigating what's a keyword in a language vs what's the keyword in LLVM IR etc. would be a lot more work than the usual compiler has to deal with
- if you don't want to bother with names, you can just call them
%1
,%2
etc. - every local variable can have only one definition, you cannot reassign
%1
later - there's nothing like setting a local variable to a constant, I'm using
add i32 N, 0
here to fake it - the reason for it is that since everything has only one definition, if
%1
is always equal5
, then LLVM can just mass replace every occurrence of%1
with5
- I'll get to how we can have "variable" variables later
-
mul i32
multiplies -
add i32
adds
Optimizations
Now let's look inside math
binary LLVM produced:
objdump -x86-asm-syntax=intel -d math
math: file format Mach-O 64-bit x86-64
Disassembly of section __TEXT,__text:
0000000100003f60 _main:
100003f60: 31 c0 xor eax, eax
100003f62: 89 c1 mov ecx, eax
100003f64: 83 c1 0d add ecx, 13
100003f67: 89 c2 mov edx, eax
100003f69: 83 c2 03 add edx, 3
100003f6c: 83 c0 1e add eax, 30
100003f6f: 0f af ca imul ecx, edx
100003f72: 01 c1 add ecx, eax
100003f74: 89 c8 mov eax, ecx
100003f76: c3 ret
That's a lot of calculations, step by step, translating this to normal assignments and ignoring "add zero":
-
xor eax, eax
-eax = 0
-
mov ecx, eax
-ecx = 0
-
add ecx, 13
-ecx = 13
-
mov edx, eax
-edx = 0
-
add edx, 3
-edx = 3
-
add eax, 30
-eax = 30
-
imul ecx, edx
-eax (39) = ecx (13) * edx (3)
-
add ecx, eax
-ecx (69) = ecx (30) + eax (39)
-
mov eax, ecx
-eax (69) = ecx (69)
-
ret
-return eax (69)
Let's turn on the optimizer and try again:
$ clang -O2 math.ll -o math
$ objdump -x86-asm-syntax=intel -d math
math: file format Mach-O 64-bit x86-64
Disassembly of section __TEXT,__text:
0000000100003fb0 _main:
100003fb0: b8 45 00 00 00 mov eax, 69
100003fb5: c3 ret
As you can see, our language frontend (whatever we used to generate LLVM IR code) didn't need to worry about optimizing anything, it can just generate extremely naive code, and LLVM will handle optimizations for us!
Hello, World!
Let's print a Hello, World!
message.
target triple = "x86_64-apple-macosx11.0.0"
@hello = constant [15 x i8] c"Hello, World!\0A\00"
define i32 @main() {
call i32 (i8*, ...) @printf(
i8* getelementptr ([15 x i8], [15 x i8]* @hello, i64 0, i64 0)
)
ret i32 0
}
declare i32 @printf(i8*, ...)
$ clang hello2.ll -o hello2
$ ./hello2
Hello, World!
There's a call going on here.
- we define global
@hello
to keep our message - we need to specify its type,
constant [15 x i8]
, which means an 15-byte constant; i8 is a byte (8-bit integer) -
c"Hello, World!\0A\00"
is a string with a newline at the end, then a 0 byte, the way C functions want the strings to be - as
@printf
function is external, we need to declare its interface, it'sdeclare i32 @printf(i8*, ...)
- that means it returnsi32
, takesi8*
(pointer to a string) first, and...
means it takes some extra arguments after that -
call i32 (i8*, ...) @printf
calls the@printf
function, I spread it over multiple lines to make it more clear - now you'd think we could just pass
@hello
as argument, like it works in C; unfortunately LLVM IR has really convoluted pointer model, so there's a lot ofgetelementptr
instructions in LLVM code
As we won't really be doing anything with getelementptr
except pass the constant strings, I'll skip the detailed explanations. It's one of more complicated parts of LLVM IR.
Loop - stack memory version
Let's write a loop that prints numbers from 1 to 10.
The main problem is that we want to increment the i
, but LLVM only allows one definition per variable.
There are two ways around it, to have "variable variables" in LLVM IR:
- we can save to memory and load from memory
- we can use
phi
nodes - then value of variable in a given block depends on which block we came from
Let's try both ways, first memory version. We'll use stack for our memory:
target triple = "x86_64-apple-macosx11.0.0"
@fnumber = constant [4 x i8] c"%d\0A\00"
define i32 @main() {
start:
%i = alloca i32
store i32 1, i32* %i
br label %loop
loop:
%i0 = load i32, i32* %i
call i32 (i8*, ...) @printf(
i8* getelementptr ([4 x i8], [4 x i8]* @fnumber, i64 0, i64 0),
i32 %i0
)
%i1 = add i32 %i0, 1
store i32 %i1, i32* %i
%t = icmp sgt i32 %i1, 10
br i1 %t, label %exit, label %loop
exit:
ret i32 0
}
declare i32 @printf(i8*, ...)
So step by step:
- we have three blocks -
start
,loop
, andexit
-
start
block sets upi=1
and goes toloop
-
loop
block printsi
, incrementsi
, checks depending on if it's greater than 10 goes back toloop
, or toexit
-
exit
block returns 0
In the start
block:
-
%i = alloca i32
allocates memory fori
variable, of sizei32
- that allocation will happen on the stack -
store i32 1, i32* %i
stores value1
to wherei
is pointing to -
br label %loop
jumps toloop
unconditionally - there's no fall-through, every block needs to end with a jump (br
), return (ret
) or something like that
In the loop
block:
-
%i0 = load i32, i32* %i
loads value ofi
toi0
- this is the only definition ofi0
, so it's fine - then we call
printf
passing two arguments, basically doingprintf("%d\n", i0)
- we calculate
%i1 = i0 + 1
and store it to%i
- we calculate if
i1
isSigned Greater Than
10
, and store returns to%t
- we then have calculated branch, depending on value of
%t
, we go toexit
orloop
The exit
block just returns 0 like we did previously.
Loop - phi version
The other way, and actually much more in line with how compilers work, is to not store anything on stack, but use phi
instruction to define a value. phi
instruction contains a list of blocks and their associated values - its result will depend on where we came from.
target triple = "x86_64-apple-macosx11.0.0"
@fnumber = constant [4 x i8] c"%d\0A\00"
define i32 @main() {
start:
br label %loop
loop:
%i0 = phi i32 [ 1, %start ], [ %i1, %loop ]
call i32 (i8*, ...) @printf(
i8* getelementptr ([4 x i8], [4 x i8]* @fnumber, i64 0, i64 0),
i32 %i0
)
%i1 = add i32 %i0, 1
%t = icmp sgt i32 %i1, 10
br i1 %t, label %exit, label %loop
exit:
ret i32 0
}
declare i32 @printf(i8*, ...)
-
start
block doesn't actually do anything, but we still label it for sake ofphi
. - in
loop
block,%i0 = phi i32 [ 1, %start ], [ %i1, %loop ]
means that if we came fromstart
, theni0
is1
, and if we came fromloop
it'si1
- then we the rest of the
loop
block is identical to what we had before, except we don't store anything -
exit
block is just like before
Generated loop code
It's important to note that the shape of the LLVM IR code coming in, and the shape of generated assembly, really don't have much in common. LLVM will very aggressively optimize things. For example both loop codes end up like this:
Disassembly of section __TEXT,__text:
0000000100003ef0 _main:
100003ef0: 53 push rbx
100003ef1: 48 8d 1d ba 00 00 00 lea rbx, [rip + 186]
100003ef8: 48 89 df mov rdi, rbx
100003efb: be 01 00 00 00 mov esi, 1
100003f00: 31 c0 xor eax, eax
100003f02: e8 8b 00 00 00 call 139 <dyld_stub_binder+0x100003f92>
100003f07: 48 89 df mov rdi, rbx
100003f0a: be 02 00 00 00 mov esi, 2
100003f0f: 31 c0 xor eax, eax
100003f11: e8 7c 00 00 00 call 124 <dyld_stub_binder+0x100003f92>
100003f16: 48 89 df mov rdi, rbx
100003f19: be 03 00 00 00 mov esi, 3
100003f1e: 31 c0 xor eax, eax
100003f20: e8 6d 00 00 00 call 109 <dyld_stub_binder+0x100003f92>
100003f25: 48 89 df mov rdi, rbx
100003f28: be 04 00 00 00 mov esi, 4
100003f2d: 31 c0 xor eax, eax
100003f2f: e8 5e 00 00 00 call 94 <dyld_stub_binder+0x100003f92>
100003f34: 48 89 df mov rdi, rbx
100003f37: be 05 00 00 00 mov esi, 5
100003f3c: 31 c0 xor eax, eax
100003f3e: e8 4f 00 00 00 call 79 <dyld_stub_binder+0x100003f92>
100003f43: 48 89 df mov rdi, rbx
100003f46: be 06 00 00 00 mov esi, 6
100003f4b: 31 c0 xor eax, eax
100003f4d: e8 40 00 00 00 call 64 <dyld_stub_binder+0x100003f92>
100003f52: 48 89 df mov rdi, rbx
100003f55: be 07 00 00 00 mov esi, 7
100003f5a: 31 c0 xor eax, eax
100003f5c: e8 31 00 00 00 call 49 <dyld_stub_binder+0x100003f92>
100003f61: 48 89 df mov rdi, rbx
100003f64: be 08 00 00 00 mov esi, 8
100003f69: 31 c0 xor eax, eax
100003f6b: e8 22 00 00 00 call 34 <dyld_stub_binder+0x100003f92>
100003f70: 48 89 df mov rdi, rbx
100003f73: be 09 00 00 00 mov esi, 9
100003f78: 31 c0 xor eax, eax
100003f7a: e8 13 00 00 00 call 19 <dyld_stub_binder+0x100003f92>
100003f7f: 48 89 df mov rdi, rbx
100003f82: be 0a 00 00 00 mov esi, 10
100003f87: 31 c0 xor eax, eax
100003f89: e8 04 00 00 00 call 4 <dyld_stub_binder+0x100003f92>
100003f8e: 31 c0 xor eax, eax
100003f90: 5b pop rbx
100003f91: c3 ret
LLVM decided that the best way to loop this is basically with printf("%d\n", 1); printf("%d\n", 2); ... printf("%d\n", 10);
.
Fibonacci
Here's Fibonacci in LLVM IR:
target triple = "x86_64-apple-macosx11.0.0"
@format = constant [14 x i8] c"fib(%d) = %d\0A\00"
define i32 @fib(i32 %a) {
start_fib:
; if (a <= 2) { fib_small } else { fib_big }
%t = icmp sle i32 %a, 2
br i1 %t, label %fib_small, label %fib_big
fib_big:
; f1 = fib(a-1)
%a1 = sub i32 %a, 1
%f1 = call i32 @fib(i32 %a1)
; f2 = fib(a-1)
%a2 = sub i32 %a, 2
%f2 = call i32 @fib(i32 %a2)
; return f1 + f2
%e = add i32 %f1, %f2
ret i32 %e
fib_small:
; return 1
ret i32 1
}
define i32 @main() {
start:
br label %loop
loop:
%i0 = phi i32 [ 1, %start ], [ %i1, %loop ]
%j = call i32 @fib(i32 %i0)
call i32 (i8*, ...) @printf(
i8* getelementptr ([14 x i8], [14 x i8]* @format, i64 0, i64 0),
i32 %i0,
i32 %j
)
%i1 = add i32 %i0, 1
%t = icmp sgt i32 %i1, 20
br i1 %t, label %exit, label %loop
exit:
ret i32 0
}
declare i32 @printf(i8*, ...)
This doesn't really use any new techniques.
-
if
s use same basic blocks as loops - and they need samephi
nodes if you assign something in boththen
andelse
branches and want to use it after theif
. - unlike assembly, there's no "calling convention", saving registers, or any such concerns - we just do
call
and LLVM will figure out how to translate that into register actions when it's generating code
$clang -O2 fib.ll -o fib
$ ./fib
fib(1) = 1
fib(2) = 1
fib(3) = 2
fib(4) = 3
fib(5) = 5
fib(6) = 8
fib(7) = 13
fib(8) = 21
fib(9) = 34
fib(10) = 55
fib(11) = 89
fib(12) = 144
fib(13) = 233
fib(14) = 377
fib(15) = 610
fib(16) = 987
fib(17) = 1597
fib(18) = 2584
fib(19) = 4181
fib(20) = 6765
FizzBuzz
And we obviously finish with the FizzBuzz:
target triple = "x86_64-apple-macosx11.0.0"
@fizz = constant [6 x i8] c"Fizz\0A\00"
@buzz = constant [6 x i8] c"Buzz\0A\00"
@fizzbuzz = constant [10 x i8] c"FizzBuzz\0A\00"
@fnumber = constant [4 x i8] c"%d\0A\00"
define i32 @main() {
start:
br label %loop
loop:
%i0 = phi i32 [ 1, %start ], [ %i1, %loop_end ]
%m15 = srem i32 %i0, 15
%e15 = icmp eq i32 %m15, 0
br i1 %e15, label %fizzbuzz, label %not_fizzbuzz
fizzbuzz:
call i32 (i8*, ...) @printf(
i8* getelementptr ([10 x i8], [10 x i8]* @fizzbuzz, i64 0, i64 0)
)
br label %loop_end
not_fizzbuzz:
%m5 = srem i32 %i0, 5
%e5 = icmp eq i32 %m5, 0
br i1 %e5, label %buzz, label %not_buzz
buzz:
call i32 (i8*, ...) @printf(
i8* getelementptr ([6 x i8], [6 x i8]* @buzz, i64 0, i64 0)
)
br label %loop_end
not_buzz:
%m3 = srem i32 %i0, 3
%e3 = icmp eq i32 %m3, 0
br i1 %e3, label %fizz, label %not_fizz
fizz:
call i32 (i8*, ...) @printf(
i8* getelementptr ([6 x i8], [6 x i8]* @fizz, i64 0, i64 0)
)
br label %loop_end
not_fizz:
call i32 (i8*, ...) @printf(
i8* getelementptr ([4 x i8], [4 x i8]* @fnumber, i64 0, i64 0),
i32 %i0
)
br label %loop_end
loop_end:
%i1 = add i32 %i0, 1
%t = icmp sgt i32 %i1, 100
br i1 %t, label %exit, label %loop
exit:
ret i32 0
}
declare i32 @printf(i8*, ...)
There's nothing new here other than srem
for Signed REMinder
(that is %
) operator.
Should you use LLVM IR?
LLVM IR is far less human friendly than assembly, and it's really only meant to be generated by compiler frontends. If you make a typo in basic block structure, you get either completely meaningless error, or LLVM will crash during compilation.
It's only barely sufficiently human readable to let people creating compilers debug their compilers. There are some analysis tools, but all of that is also really meant mainly to debug your compiler.
So LLVM is a great tool for creating compilers, but LLVM IR is really not something you should be writing by hand.
And since this is recommendation section, don't be like the FSF. "Free Software" under centralized control that intentionally prevents it from reaching its full potential is doomed to fail.
Code
All code examples for the series will be in this repository.
Code for the LLVM Intermediate Representation episode is available here.
Top comments (0)