C/C++ is the mother of many popular programming languages out there today, all the fancy programming languages we use today like Python, JavaScript are built using C/C++. For example, the standard python interpreter CPython
is built using C and the most popular JavaScript implementation V8
is built using C/C++, C/C++ also powers most of the underlying libraries used by Node.js, In other words, C/C++ powers most of the open source software ever written by humans. One of the main reasons we prefer high level languages like Python is because of the robust package management tools they provide, we don't have to worry about managing dependencies anymore pip
automatically manages it for us. Same case holds true for JavaScript as well. These languages also have robust build systems which allow us to build and ship the software more easily.
C/C++ also has few popular build systems like cmake
and bazel
which manages dependencies automatically, but in this post, we will be compiling a C/C++ project without making use of these tools in order to understand how things work internally.
We will be building a simple system logger that logs total free RAM memory of the system every 5 seconds.
First things first! Let's create a project structure:
Project structure has to be easily understandable and should isolate different functionalities as much as possible to avoid confusion. No one will ever stop us from using our own project structure, but most of the open source projects built with C/C++ use this structure :
project_root
- include
- src
- module-1
- module-2
- module-n
- main.c/main.cc (depends on the project)
- Makefile
- README
- LICENSE
- misc files
Let's have a look at what each and every file/directory means:
-
include
- This is the place where all our header files live. -
src
- The directory that contains all our source code. We can have multiple sub-directories/modules inside src. Also we can have a main function file insidesrc
. -
Makefile
: Makefiles are used bymake
command, we will be usingmake
to build our project.
For our project we will have the following structure:
memlogger
- bin - will explain the need for this
- include
- free_memory_api.h
- file_writer.h
- src
- free_memory_api
- free_memory_api.c
- file_writer
- file_writer.c
- main.c
- Makefile
Let's Code
We can start writing code once we are done with the project structure setup. I will not be explaining the code in depth to avoid writing very long post, but we will stress more on the concepts.
What are header files?
Header files are blueprints of our actual C code. For every C module we write, it is a good practice to export the header-file. These header files are used by the compiler to understand what all functions are exported by a module. Once compilation is done, header files are not used anywhere. The actual use of header files comes into picture when our project/module is used as a module in some other project, other programmers can simply include our header file to use the function declarations we exported.
Let us create free_memory_api.h
as per the structure :
#ifndef __FREE_MEMORY_API
#define __FREE_MEMORY_API
//this is our API function which returns free memory in bytes
unsigned long long get_free_system_memory();
#endif
Let us create file_writer.h
which declares file writer API
#ifndef __FILE_WRITER_API
#define __FILE_WRITER_API
#include <stdio.h>
//opens the log file for writing
FILE * open_log_file(char * path);
// we will use this function to write contents to the log file
void write_log_to_file(FILE * file, unsigned long long free_memory);
//closes the log file
void close_log_file(FILE * file);
#endif
Let's define these APIs:
We declared what all APIs we need, but we did not write the underlying code for those APIs. We will be writing the code for file logging and getting free memory from the system. Before writing the code, we have to import the blueprint we declared before.
file_writer.c
#include "file_writer.h"
// Open the log-file in append mode and return it
FILE * open_log_file(char * file_path) {
FILE * fp = fopen(file_path, "a");
return fp;
}
// Close the file
void close_log_file(FILE * fp) {
if(fp) {
fclose(fp);
}
}
//write log entry into the file
void write_log_to_file(FILE * fp, unsigned long long free_memory) {
if(fp) {
fprintf(fp, "free_memory=%llu\n", free_memory);
}
}
Now let us define the free memory api i.e free_memory_api.c
:
#include <sys/sysinfo.h>
#include "free_memory_api.h"
unsigned long long get_free_system_memory() {
struct sysinfo info;
if (sysinfo(&info) < 0) {
return 0;
}
return info.freeram;;
}
And finally, main.c
#include "file_writer.h"
#include "free_memory_api.h"
#include <unistd.h>
int main(int argc, char **argv) {
if (argc < 2) {
printf("Provide log file name\n");
return 0;
}
unsigned long long free_memory = 0;
while(1) {
free_memory = get_free_system_memory();
FILE * log = open_log_file(argv[1]);
write_log_to_file(log, free_memory);
close_log_file(log);
sleep(5);
}
}
Let's start building the project
Now that we have written the code, it is time to compile the project. Now we have multiple modules in our project. These modules can be linked together to build a standalone executable, or we can build individual modules alone as shared libraries and link them together at runtime.
Building a static-monolithic executable
In this section, we build a single binary that can be shipped, there are many ways we can build a C/C++ project. In this post we will only build a standalone executable which is the most easiest way of building a C project.
We make use of make
a Linux command-line utility that can automate any task, it is a series of shell commands which can be grouped and tagged under a name to perform a specific task. We can write multiple such tasks conveniently using a Makefile
. Let's see our Makefile now.
COMPILER=gcc
file_writer:
@$(COMPILER) -c src/file_writer/*.c -Iinclude/ -o bin/file_writer.o
@echo "Built file_writer.o"
free_memory_api:
@$(COMPILER) -c src/free_memory_api/*.c -Iinclude/ -o bin/free_memory_api.o
@echo "Built free_memory_api.o"
project:
$(COMPILER) -c src/main.c -Iinclude/ -o bin/main.o
@$(COMPILER) bin/free_memory_api.o bin/file_writer.o bin/main.o -o memlogger
@echo "Finished building memlogger"
Even though we can build the entire project in a single command, I have divided this into three phases.
-
file_writer
: Thismake
rule will generatefile_writer.o
object file under./bin
. -
free_memory_api
: This rule generatesfree_memory_api.o
under./bin
. -
project
: This builds the entire project,it generatesmain.o
and linksmain.o
with other two object files to create a standalone executable calledmemlogger
.
Let's execute these commands with make:
Step-1. file_writer.o
make file_writer
Step-2. free_memory_api.o
make free_memory_api
Step-3. Final binary:
make project
Executing the binary:
We can execute the binary by running it like a normal linux executable:
memlogger logs.txt
After 15 seconds, we see 3 entries in the log file:
free_memory=8322523136
free_memory=8330776576
free_memory=8335728640
Which is approximately 8GB
free out of 16GB
RAM and it is correct. hurray! we created a small system logger.
What happened under the hood??
It is important to understand the build process we used here. To understand that, we need to know the concept of object files.
What we did in the Makefile??
We defined three rules, each rule builds a module, the third rule goes ahead by one step and links all the three modules.
We used gcc
compiler, (g++
for C++ projects). Options used:
-
-c
: This tells the compiler to only compile and don't perform linking, since we are explicitly linking the object files in Step-3. -
-I
: Since we defined our own headers, we have to provide it to the compiler during compile time, by default compiler searches for these headers in standard locations, we can also tell the compiler to include our custom location for resolving the headers using-I
. -
-o
: The output file name.
What are object files?
An object file is a linux ELF - Executable and linkable format
binary produced by the compiler. ELF is designed by POSIX standards and all Linux distributions can understand what an object file is. What this object file contains in layman terms is a mapping table and symbol definitions.
-
mapping-table or Symbol table
: The mapping table contains a set of symbols defined by the object file and an offset in text segment where actual code for the symbols are defined. -
Symbol definitions
: This section contains machine code of all our functions. So, to get the machine code of a function/symbol we do these two steps: First, we lookup the symbol table of the object file to get its offset in the text segment. Second, we go to the text segment and obtain its code.
These are just layman terms, ELF has standard definitions for mapping-table
and Segment definitions
which are confusing for a beginner.
Let's see the object file contents of file_writer.o
, we use a tool called readelf
which is default in all the linux systems.
readelf -h bin/file_writer.o
Output:
ELF Header:
Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
Class: ELF64
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: REL (Relocatable file)
Machine: Advanced Micro Devices X86-64
Version: 0x1
Entry point address: 0x0
Start of program headers: 0 (bytes into file)
Start of section headers: 1168 (bytes into file)
Flags: 0x0
Size of this header: 64 (bytes)
Size of program headers: 0 (bytes)
Number of program headers: 0
Size of section headers: 64 (bytes)
Number of section headers: 13
Section header string table index: 12
These are ELF headers. Now let's see the Symbol table (or mapping table in our terms)
readelf --syms bin/file_writer.o
Output:
Symbol table '.symtab' contains 16 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 FILE LOCAL DEFAULT ABS file_writer.c
2: 0000000000000000 0 SECTION LOCAL DEFAULT 1
3: 0000000000000000 0 SECTION LOCAL DEFAULT 3
4: 0000000000000000 0 SECTION LOCAL DEFAULT 4
5: 0000000000000000 0 SECTION LOCAL DEFAULT 5
6: 0000000000000000 0 SECTION LOCAL DEFAULT 7
7: 0000000000000000 0 SECTION LOCAL DEFAULT 8
8: 0000000000000000 0 SECTION LOCAL DEFAULT 6
9: 0000000000000000 41 FUNC GLOBAL DEFAULT 1 open_log_file
10: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND _GLOBAL_OFFSET_TABLE_
11: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND fopen
12: 0000000000000029 34 FUNC GLOBAL DEFAULT 1 close_log_file
13: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND fclose
14: 000000000000004b 54 FUNC GLOBAL DEFAULT 1 write_log_to_file
15: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND fprintf
As we can see in the table, we have entries for open_log_file
, close_log_file
, write_log_to_file
which are the API functions we defined. Hurray! Our object file is correct. Also, if you observe carefully, we see the presence of fclose
, fopen
and fprintf
and they are prefixed with UND
which means these symbol addresses are not known yet, but C/C++ runtime resolves them during linking which is in step-3, it either links to these functions statically or dynamically during the runtime, we will see the concept of shared libraries in the next part.
Similarly, we can run the same command for free_memory_api.o
to see it's symbol table. We get the output as :
Symbol table '.symtab' contains 14 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 FILE LOCAL DEFAULT ABS free_memory_api.c
2: 0000000000000000 0 SECTION LOCAL DEFAULT 1
3: 0000000000000000 0 SECTION LOCAL DEFAULT 3
4: 0000000000000000 0 SECTION LOCAL DEFAULT 4
5: 0000000000000000 0 SECTION LOCAL DEFAULT 5
6: 0000000000000000 0 SECTION LOCAL DEFAULT 7
7: 0000000000000000 0 SECTION LOCAL DEFAULT 8
8: 0000000000000000 0 SECTION LOCAL DEFAULT 6
9: 0000000000000000 89 FUNC GLOBAL DEFAULT 1 get_free_system_memory
10: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND _GLOBAL_OFFSET_TABLE_
11: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND sysinfo
12: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND printf
13: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND __stack_chk_fail
We can see get_free_system_memory
as a FUNC
and sysinfo
which is undefined.
What we did in the final step?
In the first two steps, we compiled the modules and generated the object files, but they can't be executed because they don't have the main
function definition which is the entry-point of any C/C++ program. We have two commands in Makefile
under project
rule (step-3), the first command only compiles the main.c file into main.o
, let's try to run it, it should run because it has main
function.
./main.o
Output:
bash: ./bin/main.o: cannot execute binary file: Exec format error
We cannot run it because it is not an executable file, it is an object file, the final step is still remaining which links all the three object files and generate the final executable binary.
Before that we will try to link only main.o
and discard the remaining two modules, let's see what happens:
gcc bin/main.o -o memlogger
Output:
bin/main.o: In function `main':
main.c:(.text+0x36): undefined reference to `get_free_system_memory'
main.c:(.text+0x4d): undefined reference to `open_log_file'
main.c:(.text+0x64): undefined reference to `write_log_to_file'
main.c:(.text+0x70): undefined reference to `close_log_file'
collect2: error: ld returned 1 exit status
This is exactly what was supposed to happen, the executable file needs the following functions but don't know where they are. So we need to link it with remaining two object files.
gcc bin/free_memory_api.o bin/file_writer.o bin/main.o -o memlogger
Now the compiler looks into the Symbol tables of file_writer.o
and free_memory_api.o
to resolve the functions which were undefined in our previous command. Since the Symbol tables of these two object files defines those symbols/functions the linking is successful and the final executable is generated.
Let's see the mapping table or Symbol table of our memlogger
binary:
readelf --syms memlogger
Output:
26: 0000000000000000 0 FILE LOCAL DEFAULT ABS crtstuff.c
27: 0000000000000780 0 FUNC LOCAL DEFAULT 14 deregister_tm_clones
28: 00000000000007c0 0 FUNC LOCAL DEFAULT 14 register_tm_clones
29: 0000000000000810 0 FUNC LOCAL DEFAULT 14 __do_global_dtors_aux
30: 0000000000201010 1 OBJECT LOCAL DEFAULT 24 completed.7698
31: 0000000000200d88 0 OBJECT LOCAL DEFAULT 20 __do_global_dtors_aux_fin
32: 0000000000000850 0 FUNC LOCAL DEFAULT 14 frame_dummy
33: 0000000000200d80 0 OBJECT LOCAL DEFAULT 19 __frame_dummy_init_array_
34: 0000000000000000 0 FILE LOCAL DEFAULT ABS free_memory_api.c
35: 0000000000000000 0 FILE LOCAL DEFAULT ABS file_writer.c
36: 0000000000000000 0 FILE LOCAL DEFAULT ABS main.c
37: 0000000000000000 0 FILE LOCAL DEFAULT ABS crtstuff.c
38: 0000000000000c5c 0 OBJECT LOCAL DEFAULT 18 __FRAME_END__
39: 0000000000000000 0 FILE LOCAL DEFAULT ABS
40: 0000000000200d88 0 NOTYPE LOCAL DEFAULT 19 __init_array_end
41: 0000000000200d90 0 OBJECT LOCAL DEFAULT 21 _DYNAMIC
42: 0000000000200d80 0 NOTYPE LOCAL DEFAULT 19 __init_array_start
43: 0000000000000a78 0 NOTYPE LOCAL DEFAULT 17 __GNU_EH_FRAME_HDR
44: 0000000000200f80 0 OBJECT LOCAL DEFAULT 22 _GLOBAL_OFFSET_TABLE_
45: 0000000000000a30 2 FUNC GLOBAL DEFAULT 14 __libc_csu_fini
46: 0000000000000000 0 NOTYPE WEAK DEFAULT UND _ITM_deregisterTMCloneTab
47: 0000000000201000 0 NOTYPE WEAK DEFAULT 23 data_start
48: 0000000000000000 0 FUNC GLOBAL DEFAULT UND puts@@GLIBC_2.2.5
49: 0000000000201010 0 NOTYPE GLOBAL DEFAULT 23 _edata
50: 0000000000000000 0 FUNC GLOBAL DEFAULT UND fclose@@GLIBC_2.2.5
51: 0000000000000a34 0 FUNC GLOBAL DEFAULT 15 _fini
52: 0000000000000000 0 FUNC GLOBAL DEFAULT UND __stack_chk_fail@@GLIBC_2
53: 00000000000008dc 34 FUNC GLOBAL DEFAULT 14 close_log_file
54: 0000000000000000 0 FUNC GLOBAL DEFAULT UND printf@@GLIBC_2.2.5
55: 0000000000000000 0 FUNC GLOBAL DEFAULT UND __libc_start_main@@GLIBC_
56: 0000000000201000 0 NOTYPE GLOBAL DEFAULT 23 __data_start
57: 0000000000000000 0 FUNC GLOBAL DEFAULT UND fprintf@@GLIBC_2.2.5
58: 0000000000000000 0 NOTYPE WEAK DEFAULT UND __gmon_start__
59: 0000000000201008 0 OBJECT GLOBAL HIDDEN 23 __dso_handle
60: 0000000000000a40 4 OBJECT GLOBAL DEFAULT 16 _IO_stdin_used
61: 000000000000085a 89 FUNC GLOBAL DEFAULT 14 get_free_system_memory
62: 00000000000009c0 101 FUNC GLOBAL DEFAULT 14 __libc_csu_init
63: 0000000000201018 0 NOTYPE GLOBAL DEFAULT 24 _end
64: 0000000000000750 43 FUNC GLOBAL DEFAULT 14 _start
65: 0000000000201010 0 NOTYPE GLOBAL DEFAULT 24 __bss_start
66: 0000000000000934 130 FUNC GLOBAL DEFAULT 14 main
67: 0000000000000000 0 FUNC GLOBAL DEFAULT UND fopen@@GLIBC_2.2.5
68: 00000000000008fe 54 FUNC GLOBAL DEFAULT 14 write_log_to_file
69: 0000000000000000 0 FUNC GLOBAL DEFAULT UND sysinfo@@GLIBC_2.2.5
70: 0000000000201010 0 OBJECT GLOBAL HIDDEN 23 __TMC_END__
71: 0000000000000000 0 NOTYPE WEAK DEFAULT UND _ITM_registerTMCloneTable
72: 0000000000000000 0 FUNC GLOBAL DEFAULT UND sleep@@GLIBC_2.2.5
73: 00000000000008b3 41 FUNC GLOBAL DEFAULT 14 open_log_file
74: 0000000000000000 0 FUNC WEAK DEFAULT UND __cxa_finalize@@GLIBC_2.2
75: 0000000000000698 0 FUNC GLOBAL DEFAULT 11 _init
Since the Symbol table is very big, I have pasted entries starting from 26. As you can see our final executable has all our definitions from all the three modules and also there are no UND
symbols, these symbols are now replaced with something like fopen@@GLIBC_2.2.5
. This means, the code for these functions are not copied into our binary, instead they have to be resolved during the runtime, it is the responsibility of Linux loader ld.so
to link these symbols dynamically during runtime.
So this is it! We are done with the first part of the Post. If you are reading this line. I would like to really thank you for reading the entire Post, keep the compliments even if you skipped everything and came here directly.
Thank you :) Have a good time.
Top comments (16)
Very well logically and well structured explanation. There's a few spelling errors that could be fixed, but that's to be expected on such a long article. Other than that, one technical improvement I could suggest is to spell out that the definition of the object files are dependent on the target. It would also be good to spell out how the linker determines what external libraries to dynamically link to. Other than these points, it's a near perfect article. Well done!
Great post, in the main.c and free_memory_api.c files the
include "memory_api.h" should be #include "free_memory_api.h" for the tutorial to work as expected, I cant no wait for the second part ;)
keep the great work!
Yes! Haha. Thanks for reminding.
Thanks to you for sharing know-how in such a simple way.
You're welcome
Very easy to understand! Thanks
Thanks
The easiest article I have ever read! Thumb up!
Thanks man
it take me few hours
Thanks
You're welcome
Thanks, @Naransmha, appreciate this article.
Thanks
I had Makefile problems, thanks!!
You're welcome