Introduction π
We see now and then the rise of Object-Oriented Programming because they provide a plethora of built-in functionalities like Object Creation, Garbage Collection, Function and Operator Overloading, Native Language support, Memory Protection and so on. Most of these are so crucial in programming that writing programs without them will become a cumbersome task.
One of the important features is the ability to create Objects and write methods to interact with them. However, some procedural oriented languages like C do not support this feature.
But then again, is there anything that is not possible in C.
Let's begin π‘
Coming back to the topic, how are the objects created and how are they destroyed?
In this tutorial, We will be learning how it is possible to implement Objects in a Procedural Oriented Language, C.
So, Sit back as we unfold the secrets or behind the scenes of an Object.
The infamous "new" operator and the "Garbage Collector" are indeed responsible for the creation and deletion of objects respectively.
Let's understand what the new operator does to create an object out of nothing. But before we do that, It will be much wiser to build the required background for it.
Prerequisites:
- Pointers ( Cannot emphasize more on it. It is a very crucial topic)
- Structures
- Static Keyword in C
- Virtual Memory(Memory Map of a running Process)
- Static Memory Allocation / Stack Memory
- Dynamic Memory Allocation / Heap Memory
1. Pointers
The most widely used concept in C. It will either bring your program to glory or break it into dust.
The only purpose of a pointer is to store the address of a variable that's all. Rest is handled by the compiler like the proper assignment of address with addressof operator, pointer arithmetic and dereferencing.
Illustrations to show usage of Pointer in C=>
//Header file for the printf()
#include <stdio.h>
int main(void)
{
int a = 10;
//int pointer created and pointed to NULL
int *ptr = NULL;
ptr = &a;
//value in a is changed by the pointer
*ptr = 20;
printf("Value in a = %d\n",a);
printf("Address of a = %p\n",&a);
printf("Value pointed by ptr = %d\n",*ptr);
printf("Value in ptr = %p\n", ptr);
printf("Address of ptr = %p\n", &ptr);
return 0;
}
It is important to note that the pointer to a variable must be of the same data type as that of the variable to which it is pointing. As, in the example above, int a
had a pointer of type int
(data type of p). This helps the compiler to perform pointer arithmetics. In C, there can be a pointer to anything which you can think of. Even the functions in C can be pointed to with the help of function pointers.
2. Structures
They group different data types into a single unit. As mentioned above they are similar to classes in OOP but don't have the capability of holding functions. Well, there are other ways of adding functions in it via using function pointers but we won't be talking about it in here.
Through the use of structures, we get much closer to achieving our goal of Object-Oriented Programming in C.
Illustrations to show usage of Structures in C=>
#include <stdio.h>
//structure for person
//typedef to make definition simpler
typedef struct person {
char *name;
int age;
char *address;
}person;
//printStruct() function prototype
void printStruct(person*);
int main(void)
{
//p1 variable of struct person data type
person p1;
p1.name = "Foo Bar";
p1.age = 21;
p1.address = "Street XYZ";
//p1 passed by reference
printStruct(&p1);
return 0;
}
void printStruct(person *ptr)
{
printf("Name of Person = %s\n",ptr->name);
printf("Age of Person = %d\n",ptr->age);
printf("Address of Person = %s\n",ptr->address);
}
We can see from the example above we have created a struct
named "person", and we have typedef
it too to keep definition much simpler. Creating a struct
enables us to group data in a single unit. Now even we wanted to pass the struct
variable to another function we don't have pass it's individual constituents, we just pass the struct
variable name and everything goes along. This approach later became one of the pillars of OOP known as encapsulation.
Notice the use of Pass by Reference rather than Pass by Value to the printStruct()
function. And to accomplish the task we have to use the addressof operator (&
) before the variable name to pass the address of the memory location where the struct person p1
exists.
3. Static Keyword in C
The static
keyword in C is a storage class specifier. The default storage class specifier in C is auto
but can be explicitly set to static
using this keyword. This keyword changes the scope of the variable to that particular block in which it is defined and changes its lifespan until the program is running.
Illustrations to show usage of static in C=>
#include <stdio.h>
void func(void);
int main(void)
{
for(int i=0;i<5;i++){
func();
}
return 0;
}
void func(void)
{
/*int count is set to static,
due to which it is stored in Data Segment
and not in Stack Memory*/
static int count = 0;
count ++;
printf("%d\n",count);
}
Try the above program, it will print the numbers from 1 to 5. Also, try to change the static int i = 0;
to int i = 0;
. By using static the variable i
is stored in the Data Segment rather than on the Stack. Once this variable is declared, it exists until the program executes. So, the lifetime of a static variable is the lifetime of the program.
The use of the static
keyword is not needed as such for the Object creation but it is required for the Garbage Collector. For now, it is enough to know that the static
keyword will be used only for Garbage Collection and is the building block of any Garbage Collector in high-level languages.
4. Virtual Memory
Understanding the memory map of the running process which help in a better understanding of the memory usage in various segments by the program.
Any running process in the RAM consists of at least these segments:
Text Segment: Also known as the Code Segment starts at the lowermost memory address. It consists of all the instructions of your program. Additionally, it contains code from the static libraries.
Data Segment: They are further divided into two segments, Initialised and (Uninitialised) BSS Data Segment.
The initialized data segment consists of two areas read-only and read-write. Variables initialized by the programmer are stored inside the read-write area. eg:int i = 0
,char a ='A'
. On the other hand, the variables having the const keyword are stored inside the read-only area. eg:const char* s = "hello, world"
, the"hello, world"
string literal being defined by the constant keyword is stored in the read-only area, however, the pointerchar* s
is stored in the read-write area.
The uninitialized data segment (BSS) is consists of the variables which are not initialized by the programmer in the source code. eg:int i
orchar a
.
All the global and static variables are stored inside the Data Segment. All the local variables are stored in the Stack MemoryHeap Memory: This is a very crucial segment for any program. Every good programmer must know how to use this segment. If you understand this segment thoroughly then writing programs for any application will become a piece of cake. This segment opens up the doors to many concepts of programming they being, Object-Oriented Programming, Inter Process Communication such as Pipes, Sockets, Shared Memory, etc, Virtual Machines, and so on.
There is no limit as such about how much heap memory a process can use. It all boils down to the amount of physical memory a system has and CPU architectureProgram Break: This is the borderline or the highest memory address which can be used by any program. Using addresses above the program break causes the program to get killed by Segmentation Fault Signal. Well, then doesn't this statement sounds contradictory to the above-mentioned statement that there is no limit on how much memory a program can use. No, because in the source code we can increase the program break by using system calls like
sbrk()
orbrk()
, which shifts the program break to the desired level.Shared Library: At some offset above the program break lies the shared library or .so executable files/functions. These files are loaded and unloaded on demand.
Stack Memory: Another important segment to understand the working of the function calls. Whenever in your program, one function calls other functions some of the data is pushed to the stack. The parameters which were passed to the functions are pushed to the stack among the return address of the caller function. Understanding Stack Memory will help to better understand recursion and how local/auto variables work.
Environment Variables: These are values that were passed to the program as the arguments, like the arguments passed to the main from the command line are stored in this area. Other information about the program is also stored in this area.
5. Static Memory Allocation
It is the memory that is going to be allocated at the compile time. As the name suggests it is the memory allocation done by the compiler by looking at the variables defined in the source code. When the compiler reaches a variable inside the source code, it statically allocates memory for it inside the memory map, it can be inside the Data segment or in the Stack depending upon the scope of the variable.
Illustrations to show usage of static memory allocation in C=>
#include <stdio.h>
//Global Variables stored in Data Segment
int i = 20;
char c = 'a';
void func(void);
int main(void)
{
//Local Variables stored in Stack Memory
int j = 100;
char name[] = "Hello, World";
int arr[] = {1,2,4,8,16,32,64,128};
func();
return 0;
}
void func(void)
{
//Static Variables stored in Data Segment
static int k = 0;
//String literal stored in Read-Only Data Segment
//char pointer in Stack Memory
const char * s = "Inside func";
}
6. Dynamic Memory Allocation
The memory allocated during the run time is known as dynamic memory allocation. It is useful when we don't know beforehand how much amount of memory is required to store the data. Dynamic Memory allocation would become the stepping stone towards learning about the new operator used in OOP. The variables are allocated in heap by using dynamic memory allocation.
Illustrations to show usage of dynamic memory allocation in C=>
#include <stdio.h>
//Header file for malloc,calloc,realloc and free
#include <stdlib.h>
//Header file for mmap
#include <sys/mman.h>
int main(void)
{
//Dynamically allocating 10 bytes of int, uninitialised
int *ptr1 = (int *)malloc(10*sizeof(int));
//Dynamically allocating 10 bytes of int, initialised to 0
int *ptr2 = (int *)calloc(10,sizeof(int));
//Reallocating ptr1 to 20 bytes of int
ptr1 = (int *)realloc(ptr1,20*sizeof(int));
//allocating one whole page of 4096 bytes to store data.
int *ptr3 = mmap(NULL,4096, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
return 0;
}
Well, in C memory can be dynamically allocated using functions like malloc()
, calloc()
, realloc()
but the high-level programming languages use new
operator.
Creating an Object in C π
String as an Object
With all the above background we can now move ahead to create our first object. Since C does not inherently support string as a data type, it will be a good idea to start with it.
As mentioned earlier, to get the object-like feel in C, the closest we can ever get is by making use of Structures.
Illustration of string struct=>
typedef struct string {
char* str;
size_t size;
}string;
The above struct
consists of two members. The first member is the pointer to the later dynamically allocated memory area. The second member is used to store the total size of the allocated area, the data type of it is size_t
which can store the maximum possible integer value allowable by the system(for 64 bit Linux it is unsigned long int
).
Illustration to store string dynamically=>
#include <stdio.h>
#include <stdlib.h>
typedef struct string {
char* str;
size_t size;
}string;
int main(void)
{
//Creating two string objects and initialising them
string a = {NULL,0};
string b = {NULL,0};
a.str = "AB";
a.size = 2;
b.str = (char *)malloc(2*sizeof(char));
b.str[0] = 'C';
b.str[1] = '\0';
b.size++;
b.str = (char *)realloc(b.str,3*sizeof(char));
b.str[1] = 'D';
b.str[2] = '\0';
b.size++;
printf("[string a] %s is %ld characters in size\n",a.str,a.size);
printf("[string b] %s is %ld characters in size\n",b.str,b.size);
return 0;
}
For the string a
, the string literal is stored in read-only initialized Data Segment. For the string b
, it is stored in heap and for the rest of this article follow the practice of storing data in heap for the reason we will know later. The code snippet store three characters(C,D,\0)
in the variable b.str
. It updates the size and stores it in b.size
. But the above code is the worst code ever, it doesn't conform to DRY(Don't Repeat Yourself) practice. So, let's add some helper functions to automate the task of storing characters to the heap.
Helper functions for automating the task
Appending characters to the existing string
As the C Language does not support operator overloading, the best possible thing which we can do is create a function to append characters at the end of the existing string.
Illustration to append characters to the existing string=>
#include <stdio.h>
#include <stdlib.h>
//Made a initialisation keyword BLANK
#define BLANK {NULL,0}
typedef struct string {
char* str;
size_t size;
}string;
void append(string*,char);
int main(void)
{
string a = BLANK;
append(&a,'A');
append(&a,'B');
append(&a,'C');
printf("[string a] %s is %ld characters in size\n",a.str,a.size);
return 0;
}
//Function to append one character
//at a time to the existing string
void append(string* p,char c)
{
if(p->str == NULL){
//If the object is newly created then first malloc
//2 Bytes of int, one of data, other for '\0'
p->str = (char *)malloc(2*sizeof(char));
}
else{
//else, increasing the size of object to hold one more character
p->str = (char *)realloc(p->str,(p->size+2)*sizeof(char));
}
//Store the character
*(p->str+p->size) = c;
//Store the NULL terminator
*(p->str+p->size+1) = '\0';
//Increase the size of the length of the string
p->size++;
}
Appending string to the existing string
Appending individual characters can become a cumbersome task and will lead to a lengthy code.
Illustration to append string to the existing string=>
#include <stdio.h>
#include <stdlib.h>
#define BLANK {NULL,0}
typedef struct string {
char* str;
size_t size;
}string;
void append(string*,char);
void appendStr(string*,char*);
int main(void)
{
string a = BLANK;
appendStr(&a,"Helo");
printf("[string a] %s is %ld characters in size\n",a.str,a.size);
return 0;
}
//Function to append a new string to the existing string
void appendStr(string* p,char* s)
{
//Check if the string has reached the end or not
while(*(s)!='\0'){
//Make use of the helper function by breaking
//the string into single characters
append(p,*(s));
//Move forward
s++;
}
}
void append(string* p,char c)
{
if(p->str == NULL){
p->str = (char *)malloc(2*sizeof(char));
}
else{
p->str = (char *)realloc(p->str,(p->size+2)*sizeof(char));
}
*(p->str+p->size) = c;
*(p->str+p->size+1) = '\0';
p->size++;
}
Appending characters taken as input from the User
Sometimes, we don't know beforehand what data needs to be processed, such as the input taken from the user.
Illustration to append characters taken from the user to the existing string=>
#include <stdio.h>
#include <stdlib.h>
#define BLANK {NULL,0}
typedef struct string {
char* str;
size_t size;
}string;
void append(string*,char);
void appendStr(string*,char*);
void scanStr(string*);
int main(void)
{
string a = BLANK;
printf("Enter your name: ");
scanStr(&a);
printf("[string a] %s is %ld characters in size\n",a.str,a.size);
return 0;
}
//Function to store the string given by the user as Input
void scanStr(string* p)
{
//Create a char variable to store single characters from the input
char c;
//Making use of getchar() function to take the input
while((c=getchar())!='\n' && c!=EOF){
//Make use of the helper function by breaking
//the string into single characters
append(p,c);
}
}
void appendStr(string* p,char* s)
{
while(*(s)!='\0'){
append(p,*(s));
s++;
}
}
void append(string* p,char c)
{
if(p->str == NULL){
p->str = (char *)malloc(2*sizeof(char));
}
else{
p->str = (char *)realloc(p->str,(p->size+2)*sizeof(char));
}
*(p->str+p->size) = c;
*(p->str+p->size+1) = '\0';
p->size++;
}
And that's how easy it was to create objects in C, just make a structure and write helper function for it. This method not only helps in building a reliable program but also helps in adding features in it. As we saw adding appendStr() and scanStr() functions became easy because we already had created append() helper function which handled the very low-level task of memory allocation and character storage. This technique makes it possible to make scalable programs. Though this functionality is builtin in higher-level programming languages through the use of new
operator, and the work of appending is handled by +
overloading.
Memory Leak
When we compile the code with gcc [filename.c] -g - fsanitize=address
flags and execute the program, we get the following message.
Destroying an Object in C βοΈ
We have successfully created the string object, but the amount of heap memory that we acquired is not released upon the program exit. This leads to things like memory leaks which can be inspected using Valgrind Memory Leak Detector or via Google's Address Sanitizer (ASan).
There are different methods of Garbage collection, the most popular one being the mark and sweep method. Mark and Sweep method generally uses two phases, In the mark phase, it scans through the whole memory of the process from the Data Segment up to Stack memory for searching for the used memory block. In the Sweep phase, it frees up those used memory blocks.
Read more about Mark-Sweep
However, we will be using the simplest method, of storing the newly created object's address in a table.
Destroying string object
To destroy the string object we need to write a monitoring program, which will keep track of the newly created objects. Once the program exit, we can free the allocated memory through the information collected by the monitor program.
We will be writing a precise garbage collector as it frees all the memory allocated using malloc or realloc.
Writing the monitor program
The job of the monitor program is very straight forward. Just store the address of the newly created objects as they are malloced. Once the program exits, free the malloced memory.
Illustration of a precise Garbage Collector=>
//pointer to the table
string **shadowTable = NULL;
//variable to store the number of objects
static size_t unqID=0;
//Function to create a table or increase the size of the table
void mon_allocShadow(void)
{
if(shadowTable==NULL)
shadowTable = (string **)malloc(sizeof(string*));
else
shadowTable = (string **)realloc(shadowTable,(unqID+1)*sizeof(string*));
}
//Function to delete the entries in the table and then deleting the table itself
void mon_freeShadow(void)
{
for(size_t i=0;i<unqID;i++){
free(shadowTable[i]->str);
shadowTable[i]->str = NULL;
shadowTable[i]->size = 0;
}
free(shadowTable);
shadowTable = NULL;
}
//Function to add entries in the table
void add_shadowEntry(string* p)
{
shadowTable[unqID] = p;
unqID++;
mon_allocShadow();
}
-
shadowTable
is basically like a phone book directory, it stores the address of all newly created objects. This directory itself takes some amount of memory and increases as more and more objects are created. The allocation of memory is handled by themon_allocShadow()
helper function. -
add_shadowEntry()
helper function is used to add the address of the newly created object into theshadowTable
. -
unqID
is used to keep count of total string objects in the program. - Once the program finishes its execution
mon_freeShadow()
is called to free all the objects from the directory and delete the directory (shadowTable
) itself. ####Setting up Compiler Attributes Since themon_allocShadow()
andmon_freeShadow()
are responsible for creating and destroying the objects, so, they must be called at the beginning and end respectively. This can be accomplished with the help of compiler attributes as:
//This function will be called before the execution of the main
[Function Prototype] __attribute__((constructor));
//This function will be called before returning from main
[Function Prototype] __attribute__((destructor));
The Function assigned as the constructor will be called at the beginning of the program, while the function assigned as the destructor will be called at the end of the program. So, now we don't need to worry about the Garbage Collector, it will automatically start monitoring and free up space at the end.
Merging it all together β¨
Single File Program
Illustration of the complete program with built-in Garbage Collector=>
#include <stdio.h>
#include <stdlib.h>
#define BLANK {NULL,0}
void mon_allocShadow(void) __attribute__((constructor));
void mon_freeShadow(void) __attribute__((destructor));
typedef struct string {
char* str;
size_t size;
}string;
//To make the string object static
#define String static string
string **shadowTable = NULL;
static size_t unqID=0;
void append(string*,char);
void appendStr(string*,char*);
void scanStr(string*);
void mon_allocShadow(void);
void mon_freeShadow(void);
void add_shadowEntry(string*);
int main(void)
{
//Notice the use of Capital 'S' in String.
//which means we are storing our object in the Data Segment,
//and not stack, for the garbage collector to locate.
String a = BLANK;
String b = BLANK;
appendStr(&a,"Hello, World!");
appendStr(&b,"Hi");
return 0;
}
void mon_allocShadow(void)
{
if(shadowTable==NULL){
printf("Invoking Garbage Collector for monitoring.\n");
shadowTable = (string **)malloc(sizeof(string*));
}
else{
shadowTable = (string **)realloc(shadowTable,(unqID+1)*sizeof(string*));
}
}
void mon_freeShadow(void)
{
for(size_t i=0;i<unqID;i++){
free(shadowTable[i]->str);
shadowTable[i]->str = NULL;
shadowTable[i]->size = 0;
printf("Freeing Object %ld\n",unqID-i);
}
free(shadowTable);
shadowTable = NULL;
printf("ShadowTable Deleted.\n");
}
void add_shadowEntry(string* p)
{
printf("Registering Object.\n");
shadowTable[unqID] = p;
unqID++;
printf("\tObject Registered in ShadowTable! Total Objects: %ld\n",unqID);
mon_allocShadow();
}
void scanStr(string* p)
{
char c;
while((c=getchar())!='\n' && c!=EOF){
append(p,c);
}
}
void appendStr(string* p,char* s)
{
while(*(s)!='\0'){
append(p,*(s));
s++;
}
}
void append(string* p,char c)
{
if(p->str == NULL){
p->str = (char *)malloc(2*sizeof(char));
//ADDED: This function will add new entry to the Table
add_shadowEntry(p);
}
else{
p->str = (char *)realloc(p->str,(p->size+2)*sizeof(char));
}
*(p->str+p->size) = c;
*(p->str+p->size+1) = '\0';
p->size++;
}
- You must be wondering how does the Garbage Collector know that the user has created an object? Well, the GC does not know that when the object is created, rather it only becomes aware of it when the user uses that object. You can see that we have modified the
append()
helper function a bit, by addingadd_shadowEntry()
in it causes theshadowTable
to get updated. - There is an advantage to store the objects in the
shadowTable
only when they are used, as the unused declared objects get optimized out/ignored by the compiler and it no longer takes space in the memory. Smart, right?
Refactoring String Object and Garbage Collector Library π¨
The Program for Object Creation and the program for Garbage Collector can be separated into completely different source files, leaving the user focus on the main program.
Illustration for using header file=>
#include <stdio.h>
//This header file will contain all the helper functions
#include "strobj.h"
int main(void)
{
String result = BLANK;
appendStr(&result,"Hello, World");
printf("%s %ld\n",result.str,result.size);
return 0;
}
Fork this repo on GitHub,
ooc
Object-Oriented C
Resources
To Compile
First, convert source file to object files
gcc -c -I./lib/ ./lib/strobj.c ./lib/garcol.c ./string_gc.c
Second, convert object files to executable/binary file
gcc -Wall -Werror -g -fsanitize=address -o string strobj.o garcol.o string_gc.o
Conclusion
And we can call it a day!
With this behind the scenes about the Objects, you can easily implement objects and their garbage collector in the programming language of your choice.
Hope this article gave you some insights about the Objects and the Garbage Collector.
Top comments (3)
The purpose of a pointer is to index into an array.
Variables are effectively stored in arrays of length one.
Pointer arithmetic is very fundamental to C. :)
static doesn't affect scope -- it affects storage or linkage.
This may be how your system works, but it's not necessarily the case.
Why are you casting malloc?
b.str = malloc(2); is sufficient, as sizeof (char) must equal 1.
size_t can store the largest object size, which might be as small as 16k.
It is not related to the maximum integer value allowable by the system.
Note that sbrk and brk are not part of C.
This is incorrect.
"hello, world" is a string literal, and attempts to modify it produce undefined behavior, which means that it may be stored in read-only memory if the system feels like it.
The string literal is not affected by the const modifier on the char *s declaration.
What const char *s means is that you may not modify *s, not that *s points at something that cannot be modified.
In some cases the compiler may be able to deduce that no legitimate non-const pointers can be produced to an object and may then decide to store those differently, but it isn't required, and is independent of the behavior of string literals, which are defined that way mostly to allow consolidation.
Why are you writing *(s) rather than *s? this just makes it harder to read. It is sufficient to write while (*s) { ... } here.
Well, no -- you aren't -- there is no portable mechanism by which you can accomplish this.
These are not part of C -- these are extensions provided by GCC.
A garbage collector that only collects garbage when there is nothing except for garbage is not really a garbage collector.
It's also redundant since C does the same thing when main returns. :)
First of all, I would like to express my immense gratitude and thanks as much as the
size_t
data type can hold for reading my first ever article, (which I posted publicly). It means a lot to me that someone took their precious time out to read my article which we might say is of no use.As by reading only you might have deduced, I am still a beginner in the field of programming and have to learn a lot many things (that too correctly/unlearn bad practices in programming). The only excuse which I can give is, I never tried my level best and blamed my Non-CS background (Electronics Engineering) for it, the whole time (which of course is absolutely wrong, I realized this a few months ago when I started to write this article).
The points which you have given are like stepping stones for me and I will inspect each and every issue meticulously and do in-depth research. I won't let your time spent in finding bugs in this article go in vain, I will update this article with the right information as soon as I get them all resolved.
Again, thanks a lot for spending your time, I hope to see more such valuable conversations in future as well.
You are welcome. :)