Memory Management in Python
In languages such as C or C++, programmers find it hectic to allocate and deallocate (free) memory while executing the programs. Python takes care of memory allocation and deallocation during runtime automatically.
Everything is considered as an object in Python, be it strings, lists, and even Functions / modules are objects. For every object, memory has to be allocated. This task is done by the Memory Manager located inside the PVM.
All of the objects created are stored in a separate memory called heap. It is the memory which is allocated during runtime. The size of the heap memory is variable for different systems and depends on the Random Access Memory (RAM) of the system. It can even increase or decrease as per the requirement of the programs.
For our systems, the Operating System (OS) allocates memory for any program that runs on it. Python's Raw Memory Allocator runs on top of the OS which oversees the memory allocated for all of the objects. We also have Object-specific memory allocators which operate on the same heap. These memory allocators utilize various memory management policies (depending on the type of the objects) to allocate memory for the different objects.
Sequentially, we can view memory management in Python as follows:
Garbage Collection in Python
Garbage collector (gc) is a Python module which deletes objects from the memory, which are not used in the program. The simplest way in which it handles deallocation (freeing memory), is by keeping a reference count of the objects in the program. Whenever, an object is utilized in the program, it will be referenced atleast once i.e., it will have a reference count of atleast 1. When an object is found with reference count of 0, it figures out that the object is not being used in the program and deletes it from the memory.
However, this approach does have a catch. Suppose we have a reference cycle of 3 objects, which themselves reference to one another as follows:
Even if the objects A, B and C are not used in the program, they have a reference count of 1. To overcome this situation, gc uses an algorithm to remove objects from this cycle.
The algorithm classifies the objects into three generations. The newly created objects are generation 0 objects. Now, first time when garbage collector sweeps through the memory and does not remove the objects from the memory, the objects are promoted to generation 1. The next sweep ensures that the unused objects are deleted from memory. If the object survives again, it is promoted to generation 2. Hence, Garbage collector tends to delete younger objects (generations 0 and 1) rather than older objects (generation 2).
Garbage collector runs automatically, and Python schedules garbage collector depending on a metric (number) called threshold. This number represents the frequency of how many times garbage collector removed the objects from the memory. We can know the threshold number by using the get_threshold() method of the gc module. When the equation (numberOfAllocations - numberOfDeallocations) > threshold
is satisfied, garbage collector runs automatically. However, if more and more objects are created and if the system runs out of memory, garbage collector does not run automatically to deallocate the memory, instead, an exception (runtime error) is thrown.
When the programmer is sure that the program does not contain reference cycles, then automatic garbage collector is the most suitable choice. Whereas, the presence of reference cycles in the program requires the gc module to be called manually. The collect() method is useful in this case.
Manual garbage collection is done using 2 ways:
- Time-based - Garbage collector is called in certain intervals of time.
- Event-based - Garbage collector is called on the basis of an event (eg: When the network connectivity of the application is lost, etc.)
A thing to keep in mind is that running the garbage collector too frequently will slow down the program execution.
Top comments (0)