If you're a C/C++ developer, you've probably encountered this annoying linker error before:
undefined reference to 'symbol'
In this post, I explain exactly what this error means and why it occurs. We go into the details of object files and the linking process.
Note: This post assumes a Linux environment.
TL;DR
This error means one of the following:
- You declared a function and called it without providing its definition.
- You included a header file of a library, and you called a function declared in this header file, but you didn't link to the library itself.
C/C++ Compilation Steps
The compilation of a C/C++ program is carried out in these steps:
- Preprocessing: Takes the original C/C++ source file and produces an intermediate C/C++ source file.
- Compilation: Takes the intermediate C/C++ source file and produces an assembly code file.
- Assembly: Takes the assembly code file and produces an object file.
- Linking: Takes the object files and links them to produce the final executable file.
We'll examine a simple C++ program, and we'll stop its compilation at step 2 "Compilation" to examine the output assembly file, and at step 3 "Assembly" to examine the output object file.
C/C++ to Assembly
I'll explain in the context of C. The same concepts apply to C++ as well.
Take a look at the following C program:
void defined_function() {}
void undefined_function();
void main()
{
defined_function();
undefined_function();
}
We define a function called defined_function
with an empty body, we declare a function called undefined_function
without defining it, and we call both functions from the main function.
Assume the program is in a file called main.c
. We compile the program with gcc using the -S option to stop at the compilation step and examine the output assembly file:
gcc -S main.c
The output assembly file main.s
will be something like this:
.file "main.c"
.text
.globl defined_function
.type defined_function, @function
defined_function:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
nop
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size defined_function, .-defined_function
.globl main
.type main, @function
main:
.LFB1:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
movl $0, %eax
call defined_function
movl $0, %eax
call undefined_function
nop
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE1:
.size main, .-main
.ident "GCC: (GNU) 12.2.0"
.section .note.GNU-stack,"",@progbits
Now let's go for a quick assembly refresher before we continue. An assembly program contains a list of instructions that the CPU executes one by one. These instructions could be movement instructions like mov
, calculation instructions like add
and sub
, control transfer instrutions like jmp
and call
, and many more.
An assembly program can also contain labels at the beginning of lines in the form label:
. Writing labels at the beginning of lines like this defines them. A label is simply a label for this place in memory that can be referenced in other instructions like jmp
and call
. So for example jmp label
means jump to the memory location labeled by label
.
Now let's examine the assembly program above. This program contains a lot of details that we don't need and we'll focus only on the important parts.
Notice the labels defined_function:
and main:
: these correspond to the defined functions in our C program. Also, notice the instructions call defined_function
and call undefined_function
: these correspond to the function calls in our C program. Notice also that there is no label defined for the undefined function undefined_function
.
Assembly to Object File
We now compile the C file with gcc using the -c option to stop at the assembly step:
gcc -c main.c
Alternatively, we can assemble the assembly file using the GNU assembler:
as main.s -o main.o
Both approaches will produce the same output object file main.o
.
Object File Format
You probably know that the assembly step transforms the assembly code to binary, but is this binary the only thing that is present in the object file? The answer is no.
The object file contains other metadata about the program such as section information, a symbol table and relocation information. In Linux, the object file is in a format called ELF (Executable and Linkable Format). There are many formats such as the PE (Portable Executable) format used in Windows and an older format called a.out. In this post, we'll focus on the ELF format.
How can we view the contents of an ELF file? There is a utility in Linux called readelf
that we can use. We're only interested in the symbol table now, so we use readelf on the object file and pass -s to it to view only the symbol table:
readelf -s main.o
The output is as follows:
Symbol table '.symtab' contains 6 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 FILE LOCAL DEFAULT ABS main.c
2: 0000000000000000 0 SECTION LOCAL DEFAULT 1 .text
3: 0000000000000000 7 FUNC GLOBAL DEFAULT 1 defined_function
4: 0000000000000007 27 FUNC GLOBAL DEFAULT 1 main
5: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND undefined_function
Let's analyze this output.
Notice that there are symbols for defined_function
and main
, and their Ndx is a number which means they are defined. The assembler creates a defined symbol in the symbol table for each label defined in the assembly code. Because there are lines beginning with defined_function:
and main:
in the assembly code, they are defined symbols in the symbol table.
Notice also that there is a symbol for undefined_function
, and its Ndx is UND which means undefined. The assembler creates an undefined symbol in the symbol table for each label referenced in instructions but not defined. Because undefined_function
is referenced in an instruction (call undefined_function
) and there is no line beginning with undefined_function:
in the assembly code, it is an undefined symbol in the symbol table.
Also, notice that our three symbols defined_function
, undefined_function
and main
have a Bind of GLOBAL, which means they are global symbols. This is important because when the linker links files, it sees only the global symbols.
Executables and Library Files
In the final stage of compilation, object and library files are linked together to produce an executable or library file. An executable file is a file with an entry point. A library file is a collection of object files where each object file has a collection of functions, and there is no entry point. There are two types of libraries: static and dynamic libraries. In this post, we're only interested in static libraries. In Linux, static libraries have a .a extension and are sometimes called static archives, and they are also in the ELF format.
Linking
The input to linking is object files and library files. When linking, the linker reads all global symbols in all input object and library files. For each undefined symbol, the linker checks if there is a defined symbol with the same name taken from another file. If there is a defined symbol with the same name of the undefined symbol for each undefined symbol, the linking can proceed successfully. On the other hand, if there are undefined symbols that don't have defined symbols with the same name, the linker issues the error undefined reference to 'symbol' for each of them.
In our example, if the input to the linker is only main.o
, the linker will issue the following error:
undefined reference to 'undefined_function'
This is because undefined_function
is an undefined symbol and there is no defined symbol with the same name.
The solution of this error would be either to define undefined_function
in main.c
, to compile with another C file that has the definition of undefined_function
, or to link with a library that has undefined_function
defined.
Conclusion
In summary, each defined function in an input C file will have a defined symbol in the output object file, and each declared and called but not defined function in an input C file will have an undefined symbol in the output object file. During linking, when an undefined symbol doesn't have another defined symbol with the same name, the linker issues the undefined reference error.
That's it. I hope you now really understand why this error occurs and I hope you have also gained insight into the content of object files and the linking process 🙂
Top comments (0)