Navigation
3 Equivalent Forms
Infinite Virtual Register Set
load
and store
Instruction
Three Address Form Assignment
Static Single Assignment
Global and Local Identifiers
icmp
Instruction
3 Equivalent Forms
LLVM-IR have 3 equivalent forms:
- textual form for developers to read and edit manually.
- binary form(also called bitcode form) for storing on disk(in .o and .exe files).
- in-memory form for compilers and optimizers to use.
These 3 forms can be transformed between each other with no information lose.
Since these 3 forms are equivalent and only the textual form of them is readable, we only introduce the textual form in this article.
Infinite Virtual Register Set
LLVM-IR use a Virtual Register Set which have infinite virtual registers.
When LLVM-IR code is transformed to Native Code, the virtual registers are maped to the real registers of the Host Machine.
Since the Host Machine's registers are finite, the used registers that are nolonger referenced will be reused to map new virtual registers.
Code 1 LLVM-IR Code for Infinite Virtual Register Set:
%a = add i64 9, 8 ; i64 is 64-bit integer type
%b = add i64 %a, 7 ; after ';' are Comments
%c = add i64 %b, %a ; LLVM-IR's register name must have a prefix: "%"
%d = add i64 %c, %b
%e = add i64 %d, %c
...
Code 2 Pseudo Code After Register Mapping:
AX = add i64 9, 8
BX = add i64 AX, 7
CX = add i64 BX, AX
AX = add i64 CX, BX
BX = add i64 AX, CX
...
Virtual Registers can be used to hold scalar values: integers, floating-point numbers, pointers.
`load` and `store` Instruction
load
instruction is used to "read" from memory, and is the only way to "read" from memory;
store
instruction is used to "write" to memory, and is the only way to "write" to memory.
load
and store
access memory with a pointer.
Code 3 LLVM-IR Code for load
and store
:
%a = load i64, i64* %ptr ; load i64 type value from memory pointed by i64* type pointer register %ptr
store i64 99, i64* %ptr ; store i64 type value 99 into memory pointed by i64* type pointer register %ptr
Three Address Form Assignment
LLVM-IR's assignment-statements always have a =
, and there must be one and only one register at the left side of the =
.
You can see that, the assignment-statements in Code 1, Code 2 and Code 3 are all in Three Address Form.
Static Single Assignment
In LLVM-IR code, there can be only one assignment-statement for each register.
Although loop code can perform one assignment-statement many times, phi
instruction can put many possible values at right side of a assignment-statement's =
, SSA(Static Single Assignment) and TAF(Three Adress Form) still significantly simplifies the data-flow of LLVM-IR code, so that many analyses can get the data-flow information they need from LLVM-IR code without any sophisticated data-flow analysis.
But how can we transform non-SSA source code to SSA LLVM-IR code?
Code 4 non-SSA Go Code:
var a i64 = 99
a = 88
a = 77
a = 66
Code 5 Corresponding SSA LLVM-IR Code:
%a = i64 99
%0 = i64 88
%1 = i64 77
%2 = i64 66
When transform non-SSA source code to SSA LLVM-IR code, the compiler auto generate register names with %
prefix and numbers.
Global and Local Identifiers
LLVM-IR have two kinds of identifiers: Global Identifier and Local Identifier.
Global Identifiers begin with the @
character, Local Identifiers begin with the %
character.
Global Identifiers contain Global Variable names and Function names.
Local Identifiers contain Register names, Label names and User Defined Type names.
We'll illustrate Global Variables later, Labels are similar to the labels in high-level languages.
Code 6 GO Code for Label:
var i i64 = 0
// 'loop' is a label
loop:
i++
println(i)
if i != 10{goto loop}
// other code
Code 7 LLVM-IR Code for Label:
; 'enter', 'loop', 'end' are labels
enter:
%i = i64 0
loop:
%i1 = phi i64 [ %i, %enter ], [ %i2, %loop ] ; if dominated by label 'enter' %i1 = %i; if dominated by label 'loop' %i1 = %i2.
%i2 = add i64 %i1, 1
%cond = icmp ne i64 %i2, 10 ; if %i2 != 10 %cond is "true"; otherwise %cond is "false"
br i1 %cond, label %loop, label %end ; if %cond is "true" jump to label 'loop'; otherwise jump to label 'end'
end:
;code for label 'end'
icmp
, br
and phi
instructions will be illustrated later.
Struct is a typical example of User Defined Types.
Code 8 GO Code for Struct:
// 'usertype' is a user defined type name
type usertype struct{
a i8
b i16
c i32
d i64
}
Code 9 LLVM-IR Code for Struct:
%usertype = type { i8, i16, i32, i64 } ; '%usertype' is a user defined type name
`icmp` Instruction
icmp
is used to compare values, return a bool value.
Syntax:
<result> = icmp <cond> <ty> <op1>, <op2>
<cond>
is keyword that indicates witch kind of compare to perform.
<ty>
is the type of <op1>
and <op2>
.
<op1>
and <op2>
are the values to be compared, they can be integer, float-pointing or pointer type, and they must be in the same type.
<cond>
s:
- eq: equal
- ne: not equal
- ugt: unsigned greater than
- uge: unsigned greater or equal
- ult: unsigned less than
- ule: unsigned less or equal
- sgt: signed greater than
- sge: signed greater or equal
- slt: signed less than
- sle: signed less or equal
Examples:
Code 10 Simple examples for icmp
:
%result1 = icmp eq i32 4, 5 ; yields: %result1=false
%result2 = icmp ult i16 4, 5 ; yields: %result2=true
%result3 = icmp sgt i16 4, 5 ; yields: %result3=false
%result4 = icmp sge i16 4, 5 ; yields: %result4=false
Code 11 Complicated examples for icmp
:
; Use <cond> ne(not equal) to compare float* type pointer %X and itself.
%result5 = icmp ne float* %X, %X ; yields: %result5=false
; Use <cond> ule(unsigned less or equal) to compare -4 and 5
; first convert -4 to unsigned 252, then compare 252 and 5
; 252 > 5, so -4 ule 5 is false.
%result6 = icmp ule i16 -4, 5 ; yields: %result6=false
Code 12 Unsure examples for icmp
:
; These examples combine two compare operations in one instruction:
; 1. Use <cond> ugt(unsigned greater than) to compare i16 type values 0 and 1; 0 < 1, so 0 ugt 1 is false.
; 2. Use <cond> ugt to compare i16 type values 3 and 2; 3 > 2, so 3 ugt 2 is true.
<%result7, %result8> = icmp ugt <2 x i16> <i16 0, i16 3>, <i16 1, i16 2> ; yields: %result7=false; %result8=true
%resultptr = icmp ugt <2 x i16> <i16 0, i16 3>, <i16 1, i16 2> ; %resultptr is a <2 x i1>* type pointer, the vector it points to is <i1 0, i1 1> where 'i1 0' represents false, 'i1 1' represents true.
There is no example for multiple compare in LLVM-IR's documentation so I make two by myself, I'm not sure which one is correct or they are all uncorrect.
To Be Continued...
-
br
Instruction -
phi
Instruction - Global Variable
Top comments (2)
Love your articles on LLVM.
Thank you