Author:
discord: https://discord.gg/xQ8Xxfyfcz
zinx: https://github.com/aceld/zinx
github: https://github.com/aceld
aceld's home: https://yuque.com/aceld
Twitter: https://twitter.com/Aceld_
(Golang Triad)-I-Understanding the Golang Goroutine Scheduler GPM Model
(Golang Triad)-II-Comprehensive Analysis of Go's Hybrid Write Barrier Garbage Collection
Garbage Collection (GC) is an automatic memory management mechanism provided in programming languages, capable of automatically releasing unnecessary memory objects and relinquishing storage resources without the need for manual intervention by programmers. The support for GC mechanisms is widespread in modern programming languages, and the performance of GC is often considered one of the comparative metrics between different languages.
Throughout its evolution in GC, Golang has undergone several significant changes. Up to Go V1.8, the modifications to Golang's GC have been substantial. The specific major changes are as follows:
- Mark and Sweep method before Go V1.3.
- Three-color concurrent marking method in Go V1.5.
- "Strong-Weak" three-color invariant, insertion barrier, and deletion barrier in Go V1.5.
- Hybrid write barrier mechanism in Go V1.8.
This chapter will introduce the algorithmic models of each GC method, their respective advantages and disadvantages, and how Golang's GC has evolved step by step from logical requirements to the hybrid write barrier mode.
1 Mark-Sweep Algorithm in Go V1.3
Next, let's take a look at the conventional mark-sweep algorithm primarily used in Golang before version 1.3. This algorithm consists of two main steps:
(1) Mark Phase.
(2) Sweep Phase.
The process is quite straightforward: identify memory data to be cleared and then sweep them all at once. Below is a detailed explanation of the process.
1.1 Detailed Process of Mark and Sweep Algorithm
(1) In the first step, pause the program's business logic, categorize objects into reachable and unreachable, and then mark them accordingly, as shown in Figure 1.1.
The diagram represents the reachability relationship between the program and objects. Currently, the objects reachable by the program are objects 1, 2, 3, and objects 4 and 7, totaling five objects.
(2) In the second step, marking begins. The program identifies all the objects it can reach and marks them, as shown in Figure 1.2.
Figure 1.2
Therefore, objects 1, 2, 3, and objects 4 and 7, a total of five objects, are marked.
(3) In the third step, after marking is completed, the unmarked objects are then cleared, as shown in Figure 1.3.
Figure 1.3
The operation is quite simple, but there is one important point to note. The Mark and Sweep algorithm requires the program to pause during execution, known as STW (Stop The World). During STW, the CPU does not execute user code and is entirely dedicated to garbage collection. This process has a significant impact, making STW one of the biggest challenges and focal points for optimization in some garbage collection mechanisms. Therefore, during the third step, the program will temporarily halt any work and wait for the garbage collection to complete.
(4) In the fourth step, the pause stops, allowing the program to resume running. This process is then repeated in a loop until the program's lifecycle ends. This describes the Mark and Sweep algorithm.
1.2 Disadvantages of the Mark and Sweep Algorithm
The Mark and Sweep algorithm is straightforward and clear in its process, but it has some significant issues. The first is STW (Stop The World), which causes the program to pause and may lead to noticeable stuttering, an important concern. The second issue is that marking requires scanning the entire heap. The third issue is that clearing data can create heap fragmentation.
Prior to version 1.3, Go implemented garbage collection using this approach. The basic GC process involved first initiating an STW pause, then performing marking, followed by data collection, and finally stopping the STW, as shown in Figure 1.4.
From the above figure, it can be seen that the entire GC time is encapsulated within the STW period, which seems to cause the program pause to be too long, affecting the program's runtime performance. Therefore, in Go version 1.3, a simple optimization was made to advance the STW step, reducing the duration of the STW pause. As shown in Figure 1.5.
The figure mainly illustrates that the STW step was advanced because during the Sweep phase, an STW pause is not necessary. These objects are already unreachable, so there won't be any conflicts or issues during garbage collection. However, regardless of the optimization, Go version 1.3 still faces a significant issue: the Mark and Sweep algorithm pauses the entire program.
How does Go address this problem? In the next version, Go 1.5, the tricolor concurrent marking algorithm was introduced to optimize this issue.
2 Tricolor Marking in Go V1.5
In Golang, garbage collection mainly employs the tricolor marking algorithm. The GC process can run concurrently with other user goroutines, but a certain amount of STW time is still required. The tricolor marking algorithm determines which objects need to be cleared through three marking phases. This section will introduce the specific process.
2.1 The Process of Tricolor Marking
First, every newly created object is initially marked as "white," as shown in Figure 1.6.
Figure 1.6
As shown in the figure, the memory object relationships reachable by the program are illustrated on the left. The marking table on the right is used to record the current marking color classification of each object. It is important to note that the term "program" refers to a collection of objects that are root nodes. Therefore, if you expand the "program," you will get a representation similar to what is shown in Figure 1.7.
Figure 1.7
In the second step, at the start of each GC collection, the process begins by traversing all objects from the root nodes. Objects encountered during this traversal are moved from the white set to the "gray" set, as shown in Figure 1.8.
Figure 1.8
It is important to note that this traversal is a single-level, non-recursive process. It traverses one level of objects reachable from the program. As shown in the figure, if the currently reachable objects are object 1 and object 4, then by the end of this round of traversal, objects 1 and 4 will be marked as gray, and the gray marking table will include these two objects.
In the third step, the gray set is traversed. Objects referenced by gray objects are moved from the white set to the gray set. Afterward, the original gray objects are moved to the black set, as shown in Figure 1.9.
This traversal only scans gray objects. The first layer of objects reachable from the gray objects is moved from the white set to the gray set, such as objects 2 and 7. Meanwhile, the previously gray objects 1 and 4 are marked as black and moved from the gray marking table to the black marking table.
In the fourth step, repeat the third step until there are no objects left in the gray set, as shown in Figures 1.10.1 and 1.10.2.
Figure 1.10.1
Figure 1.10.2
Once all reachable objects have been traversed, the gray marking table will no longer contain any gray objects. At this point, all memory data will only have two colors: black and white. Black objects are the ones that are logically reachable (needed) by the program. These are the valid and useful data that support the normal operation of the program and cannot be deleted. White objects, on the other hand, are unreachable by the current program logic and are considered garbage data in memory, which needs to be cleared.
In the fifth step, all objects in the white marking table are reclaimed, which means collecting the garbage, as shown in Figure 1.11.
Figure 1.11
All white objects are deleted and reclaimed, leaving only the black objects, which are all the necessary dependencies.
This is the tricolor concurrent marking algorithm. It clearly demonstrates the characteristics of the tricolor marking. However, many concurrent processes might be scanned, and the memory of these concurrent processes might be interdependent. To ensure data safety during the GC process, an STW is added before starting the tricolor marking. After scanning and determining the black and white objects, the STW is lifted. However, it is evident that the performance of such a GC scan is quite low.
So, how does Go address the stuttering issue in the Mark and Sweep algorithm?
2.2 Tricolor Marking Without STW
To start with an example, if there is no STW, performance issues would be eliminated. But what would happen if the tricolor marking algorithm did not include STW? Let's explore this process in detail.
Based on the aforementioned tricolor concurrent marking algorithm, it is indeed dependent on STW. Without pausing the program, any changes in object reference relationships during the marking phase could affect the correctness of the marking results. Consider a scenario where the tricolor marking process does not use STW and what might occur.
Assume the initial state has already undergone the first round of scanning. Currently, objects 1 and 4 are black, objects 2 and 7 are gray, and the rest are white. Object 2 is pointing to object 3 through a pointer p, as shown in Figure 1.12.
Figure 1.12
Now, if the tricolor marking process does not initiate STW, any object may undergo read or write operations during the GC scan. As shown in Figure 1.13, before object 2 has been scanned, the already black-marked object 4 creates a pointer q and points to the white object 3.
Figure 1.13
Meanwhile, the gray object 2 removes the pointer p. As a result, the white object 3 is now effectively attached to the already scanned black object 4, as shown in Figure 1.14.
Figure 1.14
Then, following the normal logic of the tricolor marking algorithm, all gray objects are marked as black. Consequently, objects 2 and 7 are marked as black, as shown in Figure 1.15.
Figure 1.15
Next, the final step of the tricolor marking process is executed, where all white objects are treated as garbage and collected, as shown in Figure 1.16.
Figure 1.16
However, the final result is that object 3, which was legitimately referenced by object 4, is mistakenly collected by the GC. This is not the desired outcome for Golang's garbage collection or for developers.
2.3 Necessary Conditions for Triggering Unsafe Tricolor Marking
It can be seen that there are two situations in the tricolor marking algorithm that are undesirable:
Condition 1
: A white object is referenced by a black object (the white object is attached under the black object).
Condition 2
: The reachable relationship between a gray object and a white object it references is broken (the gray object loses the reference to the white object).
If both of the above conditions are met simultaneously, it results in object loss. Furthermore, in the scenario shown above, if the white object 3 has many downstream objects, they will all be cleared as well.
To prevent this phenomenon, the simplest method is STW, which directly prohibits other user programs from interfering with object reference relationships. However, the STW process leads to significant resource waste and greatly impacts all user programs. Is it possible to maximize GC efficiency and minimize STW time while ensuring no objects are lost? The answer is yes. By using a mechanism that attempts to break the above two necessary conditions, this can be achieved.
3 Barrier Mechanism in Go V1.5
The GC collector ensures that objects are not lost when one of the following two conditions is met: the "strong tricolor invariant" and the "weak tricolor invariant."
3.1 "Strong-Weak" Tricolor Invariants
(1)Strong Tricolor Invariant
No black object should reference a white object, as shown in Figure 1.17.
Figure 1.17
The Strong Tricolor Invariant essentially enforces that black objects are not allowed to reference white objects, preventing the accidental deletion of white objects.
(2)Weak Tricolor Invariant
All white objects referenced by black objects are in a gray protected state, as shown in Figure 1.18.
Figure 1.18
The Weak Tricolor Invariant
emphasizes that while black objects can reference white objects, those white objects must have references from other gray objects or have a chain of gray objects upstream in their reachable path. This means that although a white object referenced by a black object is in a potentially dangerous state of being deleted, the references from upstream gray objects protect it and ensure its safety.
To adhere to these two invariants, the GC algorithm has evolved to include two types of barriers: insertion barriers and deletion barriers.
3.2 Insertion Barrier
The specific operation of an insertion barrier is as follows: when object A references object B, object B is marked as gray (object B must be marked as gray when it is attached downstream of object A).
The insertion barrier effectively satisfies the strong tricolor invariant (preventing the situation where a black object references a white object, because the white object will be forcibly marked as gray). The pseudocode for the insertion barrier is as follows:
Add downstream object (current downstream slot, new downstream object ptr) {
// Step 1
Mark gray (new downstream object ptr)
// Step 2
current downstream slot = new downstream object ptr
}
The pseudocode for inserting a write barrier in different scenarios is as follows:
// A previously had no downstream, now add a downstream object B, and B is marked gray
A.Add downstream object(nil, B)
// A replaces the downstream object C with B, and B is marked gray
A.Add downstream object(C, B)
This pseudocode logic represents the write barrier. Memory slots for black objects can be located in two places: the stack and the heap. The stack is characterized by its small capacity but requires fast access due to frequent function calls. Therefore, the "insertion barrier" mechanism is not used for objects in stack space but is applied to objects in heap space. Next, a series of process diagrams will be used to simulate a detailed process to provide a clearer view of the overall flow for readers.
Currently, let's assume a program is newly created. In the stack space, there are objects 1, 2, 3, and 5. Object 1 references object 2, object 2 references object 3, object 3 has no downstream objects, and object 5 references object 2. In the heap space, object 4 references object 7, object 7 has no downstream objects, and object 6 references no objects and is not referenced by any objects. All these memory objects are marked as white, so the white marking table includes all these objects, as shown in Figure 1.19.
Figure 1.19
Following the tricolor marking process, traverse the Root Set (root node set) in a non-recursive manner, scanning only once. This traversal identifies the first layer of gray nodes, which are object 1 and object 4. These gray nodes are also added to the gray marking table, as shown in Figure 1.20.
Figure 1.20
Following the tricolor marking method, the next step is to traverse the objects in the gray marking table, which are objects 1 and 4. The reachable objects are marked from white to gray. Meanwhile, the traversed gray objects are marked as black, as shown in Figure 2.21.
Figure 2.21
Due to the nature of concurrency, at this moment, the external environment adds the white object 8 to the already black-marked object 4, and the white object 9 to the already black-marked object 1, as shown in Figure 1.22. Object 1 is in stack space, and according to the characteristics of the insertion barrier, to ensure performance, the creation of objects in stack space does not trigger the insertion barrier. However, object 4 is in heap space, so it will trigger the insertion barrier mechanism.
Figure 1.22
Due to the insertion write barrier mechanism (when a black object adds a white object, the white object is changed to gray), when object 4 in the heap adds object 8, object 8 is marked as gray, while object 9 remains white, as shown in Figure 1.23.
Figure 1.23
Next, the normal tricolor marking process continues, cycling through the steps until there are no gray nodes left. The resulting object states are shown in Figure 1.24: objects 1, 2, and 3 in the stack space are marked black, and objects 4, 7, and 8 in the heap space are marked black. The other objects, 9, 5, and 6, remain white.
Figure 1.24
At this point, the insertion barrier will not immediately trigger garbage collection but will instead perform an additional process. However, if the stack is not scanned, there may still be white objects referenced in the stack after the tricolor marking is complete (such as object 9 in the previous figure). Therefore, the stack needs to be re-scanned with the tricolor marking to ensure no objects are lost. This re-scan requires initiating an STW (Stop-The-World) pause until the tricolor marking of the stack space is finished.
All memory objects in the stack space are re-marked as white (objects 1, 2, 3, 9, and 5). During this process, STW is activated to protect these white objects, preventing any read or write operations on these protected objects by intercepting and blocking them, thus avoiding external interference (e.g., new white objects being added by black objects).
Meanwhile, other heap space objects will not trigger STW, ensuring the performance of GC collection in the heap space, as shown in Figure 1.25.
Figure 1.25
Next, within the STW-protected area, the tricolor marking process continues until all reachable white objects are scanned and no gray nodes remain. The final state is achieved where objects 1, 2, 3, and 9 are marked black. Since object 5 is still not scanned, it remains white, as shown in Figure 1.26.
Figure 1.26
This time, when all memory objects are either white or black, the STW will be stopped, and the protection layer will be released, as shown in Figure 1.27.
Figure 1.27
Finally, all remaining white nodes in the stack and heap space (objects 5 and 6) are scanned and collected. Typically, the STW duration for this stack space operation is around 10 to 100 milliseconds. After the final scan, all objects in memory will be black, as shown in Figure 1.28. This concludes the explanation of the tricolor marking and collection mechanism based on the insertion barrier.
Figure 1.28
The purpose of the insertion barrier is to ensure that when a black object is inserted, there is a gray object to protect it, or the inserted object is turned gray. The insertion barrier essentially satisfies the strong tricolor invariant, preventing mistakenly deleted white objects.
3.3 Deletion Barrier
The specific operation of a deletion barrier is as follows: if a deleted object is gray or white, it is marked as gray.
The deletion barrier essentially satisfies the weak tricolor invariant, aiming to ensure that the path from gray objects to white objects remains intact. The pseudocode for the deletion barrier is as follows:
Add downstream object (current downstream slot, new downstream object ptr) {
// Step 1
if (current downstream slot is gray || current downstream slot is white) {
// The slot being removed, mark it as gray
Mark gray (current downstream slot)
}
// Step 2
current downstream slot = new downstream object ptr
}
The pseudocode for the deletion barrier in different scenarios is as follows:
// Object A removes the reference to object B. B is removed by A and marked gray (if B was previously white)
Add downstream object(B, nil)
// Object A replaces downstream B with C. B is removed by A and marked gray (if B was previously white)
Add downstream object(B, C)
The deletion barrier ensures that when an object's reference is removed or replaced by an upstream object, the removed object is marked as gray. The purpose of marking it as gray is to prevent a white object from being mistakenly collected if it is later referenced by another black object. This mechanism ensures that the deleted object, which was not intended to be collected, is properly protected. The following diagrams will simulate a detailed process to provide a clearer view of the overall workflow.
As shown in Figure 1.29, before executing the deletion barrier tricolor marking, the current memory situation is as follows: In the stack space, object 1 references object 5, object 5 references object 2, object 2 references object 3, and object 3 has no downstream objects. In the heap space, object 4 references object 7, and object 7 has no downstream objects. Object 6 is not referenced by anyone and has no downstream objects. All these objects are marked as white and added to the white marking table.
Figure 1.29
Following the tricolor marking process, the Root Set is traversed in a non-recursive manner, marking the first layer of gray nodes: objects 1 and 4. These gray nodes are also added to the gray marking table, as shown in Figure 1.30.
Figure 1.30
If gray object 1 deletes the white object 5 without triggering the deletion barrier mechanism, white object 5, along with its downstream objects 2 and 3, will be disconnected from the main chain and eventually be collected, as shown in Figure 1.31.
Figure 1.31
However, with the current tricolor marking algorithm using the deletion barrier mechanism, deleted objects are marked as gray. This is to protect object 5 and its downstream objects (consider why protection is necessary—what unexpected issues could arise if object 5 were not marked as gray? For example, as shown in Figure 2.31, if object 1 deletes object 5 but object 5 remains white, and if the process does not include STW protection, there is a high chance that, during the deletion process, an already black-marked object might reference object 5. Object 5 is still a valid memory object needed by the program, but it could be mistakenly collected by the GC as a white object. This happens because the downstream objects of black objects are not protected). Therefore, object 5 is marked as gray, as shown in Figure 1.32.
Following the tricolor marking process, the next step is to traverse the gray marking table, which includes objects 1, 4, and 5. During this traversal, their reachable objects are marked gray from white. Additionally, the gray objects being traversed are marked as black. As shown in Figure 1.33, after this round of processing, objects 1, 4, and 5 are marked as black, objects 2 and 7 are marked as gray, and objects 3 and 6 remain white.
Figure 1.33
Continue the tricolor marking process until there are no more gray nodes. The final state, as shown in Figure 1.34, indicates that all nodes except object 6 are marked as black.
Figure 1.34
Finally, execute the collection and cleanup process, removing all white objects through GC, as shown in Figure 1.35.
The above describes the tricolor marking process using the deletion barrier. The deletion barrier still allows for garbage collection in a parallel state, but this method has lower precision. Even if an object is deleted, the last pointer pointing to it can still "survive" this round of garbage collection and will only be cleared in the next GC cycle.
4 Hybrid Write Barrier in Go V1.8
While both insertion and deletion write barriers can address the issue of non-parallel processing caused by STW to some extent, each has its own limitations:
Insertion Write Barrier: Requires STW at the end to re-scan the stack and mark the live white objects referenced on the stack.
Deletion Write Barrier: Has lower accuracy in garbage collection. At the start of GC, STW scans the heap to record an initial snapshot, which protects all live objects at that moment.
Go V1.8 introduces the Hybrid Write Barrier mechanism, which avoids the need for stack re-scanning, significantly reducing STW time. This mechanism combines the advantages of both insertion and deletion write barriers. This section will discuss the rules of the hybrid write barrier and analyze some scenarios that trigger this barrier.
4.1 Hybrid Write Barrier Rules
The specific operations of the hybrid write barrier generally need to follow the following conditions:
- At the start of GC, all objects on the stack are scanned and marked as black (there is no need for a second scan or STW).
- Any new objects created on the stack during GC are marked as black.
- Deleted objects are marked as gray.
- Added objects are marked as gray.
The hybrid write barrier effectively satisfies a variant of the weak tricolor invariant. The pseudocode is as follows:
Add downstream object(current downstream slot, new downstream ptr) {
// 1
Mark gray(current downstream slot) // Mark the current downstream object as gray when it is removed
// 2
Mark gray(new downstream ptr)
// 3
current downstream slot = new downstream ptr
}
Note: The barrier technique is not applied to stack objects to ensure stack operation efficiency. The hybrid write barrier is a GC barrier mechanism and is triggered only during GC execution.
Next, let's simulate the detailed process of the hybrid write barrier to provide a clearer understanding of the overall flow.
Assume the following scenario at the start of GC. The initial memory object structure is as follows: In the stack space, the Root Set references object 1, object 2 references object 3, and object 3 has no downstream references. Object 5 has no upstream objects but references object 8, which has no downstream references. In the heap space, the Root Set references object 4, which references object 7, and object 7 has no downstream references. Object 6 has no upstream objects and no downstream references. All these memory objects are marked as white and placed in the white marking table, as shown in Figure 1.36.
Figure 1.36
As GC begins, following the steps of the hybrid write barrier, the first step is to scan the stack area and mark all reachable objects as black. Therefore, when the stack scan is complete, objects 1, 2, and 3, which are reachable, are marked as black and added to the black marking table, as shown in Figure 1.37.
Figure 1.37
Next, we will analyze several scenarios involving the hybrid write barrier. This section will outline four scenarios, all starting from the point shown in Figure 2.37, where the stack space has been scanned and reachable objects are marked as black. The four scenarios are:
- Heap Deletion of Reference, Becoming a Stack Downstream
- Stack Deletion of Reference, Becoming a Stack Downstream
- Heap Deletion of Reference, Becoming a Heap Downstream
- Stack Deletion of Reference, Becoming a Heap Downstream
4.2 Scenario 1: Heap Deletion of Reference, Becoming a Stack Downstream
Scenario 1 mainly describes the case where an object is deleted from a heap object's references and becomes a downstream object of a stack object. The pseudocode is as follows:
// Pseudocode explanation:
// For example: HeapObject4 -> Object7 = Object7, meaning: Object7 is referenced by HeapObject4
StackObject1 -> Object7 = HeapObject7; // Attach heap object 7 to stack object 1 as a downstream
HeapObject4 -> Object7 = null; // HeapObject4 deletes reference to Object7
Now execute the first line of code from the above scenario: StackObject1 -> Object7 = HeapObject7
, which adds the white object 7 to the downstream of the black stack object 1. It is important to note that because the stack does not trigger the write barrier mechanism, the white object 7 is directly attached under the black object 1, and object 7 remains white. Now, scan object 4, marking object 4 as gray, as shown in Figure 1.38.
Figure 1.38
Then execute the second line of code from the above scenario: HeapObject4 -> Object7 = null
. The gray object 4 deletes the white object 7 (deletion means reassigning to null). Since object 4 is in the heap space, the write barrier is triggered, and the deleted object 7 is marked as gray, as shown in Figure 1.39.
Thus, from the scenario of Scene One, object 7 ends up being attached to the downstream of object 1. Since object 7 is gray, it won't be collected as garbage, thus it is protected. In the hybrid write barrier of Scene One, STW (Stop The World) will not be triggered again to re-scan the stack space objects. The subsequent process continues to follow the hybrid write barrier's tricolor marking logic. Eventually, objects 4 and 7 will be marked as black, and the GC will ultimately reclaim objects 5, 8, and 6.
4.3 Scenario Two: Stack Deletes Reference, Becomes Stack Downstream
Scenario Two primarily describes a situation where an object is deleted by one stack object and becomes the downstream of another stack object. The pseudocode is as follows:
// Pseudocode explanation:
// For example: stack object 2->object 3 = null, meaning: object 3 is unreferenced by object 2
new stack object 9; // Create a new object 9 in the stack space
object 9->object 3 = object 3; // Attach stack object 3 to the downstream of stack object 9
object 2->object 3 = null; // Object 2 deletes the reference to object 3
Now, execute the first line of code in the above scenario: "new stack object 9". According to the hybrid write barrier's rules, any new memory object created in the stack space will be marked as black. Here, we inherit the memory layout scenario described in Figure 1.37, so object 9 is currently a black object referenced by the program's root set, as shown in Figure 1.40.
Figure 1.40
Next, execute the second line of code in the above scenario: object 9->object 3 = object 3
. Object 9 adds downstream object 3. Since object 9 is in the stack space, this addition does not trigger the write barrier. Object 3 is directly attached to the downstream of object 9, as shown in Figure 1.41.
Figure 1.41
Finally, execute the third line of code in the above scenario: "object 2->object 3 = null". Object 2 will remove downstream object 3. Since object 2 is within the stack space, the write barrier mechanism is not triggered. Object 2 directly detaches object 3 from its downstream, as shown in Figure 1.42.
From the above process, we can see that in the hybrid write barrier mechanism, an object transferring from the downstream of one stack object to another stack object's downstream ensures the object's safety without the need to trigger the write barrier or the STW mechanism, since stack objects are all black. This highlights the ingenious design of the hybrid write barrier.
4.3 Scenario Three: Heap Reference Deletion, Becoming a Heap Downstream
Scenario three mainly describes a situation where an object is removed from the reference of one heap object and becomes the downstream of another heap object. The pseudocode is as follows:
// Pseudocode explanation:
// For example: heapObject4->object7 = object7, meaning object7 is referenced by heapObject4
heapObject10->object7 = heapObject7; // Attach heap object 7 to the downstream of heap object 10
heapObject4->object7 = null; // heap object 4 removes the reference to object 7
Now, let's restore the memory layout for scenario three. In the heap space, there is a black object 10 (whether it is black or not doesn't matter, because object 10 is a reachable heap object, and it will eventually be marked as black). Here, we do not consider the possibility of object 10 being any other color. Since black is a special case, if object 10 is white, its downstream will eventually be scanned and thus safe. Similarly, if object 10 is gray, its downstream is also safe. Only when object 10 is black, there is a risk of its downstream memory being unsafe. So, the current memory layout is as shown in Figure 1.43.
Now, executing the first line of code in the above scenario: heapObject10->object7 = heapObject7
, heapObject10 adds a downstream reference to the white heap object 7. Since object 10 is within the heap space, this write operation will trigger the barrier mechanism. According to the conditions of the hybrid write barrier, the added object will be marked as gray. Thus, the white object 7 will be marked as gray, indirectly protecting the white object 6, as shown in Figure 1.44.
Next, executing the second line of code in the above scenario: heapObject4->object7 = null
, the gray heap object 4 removes the downstream reference to heap object 7. Since object 4 is within the heap space, this operation triggers the barrier mechanism. According to the conditions of the hybrid write barrier, the removed object will be marked as gray. Therefore, object 7 will be marked as gray (even though it is already gray), as shown in Figure 1.45.
Through the processes described, the originally white object 7 has successfully been transferred from beneath heap object 4 to beneath black heap object 10. Both object 7 and its downstream object (object 6) are protected, and throughout this process, no STW was used, avoiding any delay in the program's performance.
4.4 Scenario Four: Stack Deletes Reference, Becomes Heap Downstream
Scenario Four primarily describes the case where an object is deleted from a stack object and becomes a downstream object of another heap object. The pseudocode is as follows:
// Pseudocode explanation:
// For example: HeapObject4 -> Object7 = Object7 means Object7 is referenced by HeapObject4
StackObject1 -> Object2 = null; // Remove StackObject1's downstream reference to Object2
HeapObject4 -> Object7 = StackObject2; // HeapObject4 removes reference to Object7 and references new downstream Object2
Continuing from the memory layout scenario in Figure 1.37, executing the first line of code, StackObject1 -> Object2 = null
, removes StackObject1's reference to StackObject2. Since StackObject1 is in the stack space, this operation does not trigger the write barrier mechanism. Consequently, StackObject1 directly deletes Object2 and all downstream objects associated with Object2, as shown in Figure 1.46.
Next, executing the second line of code in the scenario: HeapObject4 -> Object7 = StackObject2
, actually performs two actions:
It removes the previous downstream white object, Object7, from HeapObject4.
It adds StackObject2 as the new downstream object for HeapObject4, as shown in Figure 1.47.
Figure 1.47
When HeapObject4 performs the two actions described, it triggers the write barrier mechanism since HeapObject4 is in the heap space. According to the rules of the hybrid write barrier, deleted objects are marked as gray, and newly added objects are also marked as gray. Consequently, Object7 is marked as gray, thereby protecting its downstream object, Object6. Object2, being newly added, will also go through the gray marking process. Since Object2 is already black (and thus safe), it will remain black, as shown in Figure 1.48.
Figure 1.48
Ultimately, Object2, which was originally referenced by a stack object, has been successfully moved to be a downstream object of HeapObject4. The memory dependencies and safety states are maintained. After several iterations, Object1, Object4, Object2, and Object3 will all be marked as black, while Object7 and Object6 will also be marked as black in this round of GC. The white memory objects reclaimed in this round of GC will be Object5 and Object8.
However, a question arises: Why were Object7 and Object6, which are already disconnected from the program's Root Set, not reclaimed? This is due to the delay issue with the hybrid write barrier. To avoid triggering STW, some memory may be delayed in being reclaimed for one cycle. In the next round of GC, if Object7 and Object6 are not referenced by external objects, they will eventually be reclaimed as white garbage memory.
5 Summary
This chapter has discussed the evolution of the garbage collection (GC) mechanism in Golang, highlighting the continuous optimization efforts aimed at improving GC performance. The garbage collection in Golang has seen significant advancements, and while this chapter covers up to Go 1.8, subsequent versions of Go have also introduced further performance improvements. However, the major changes discussed in this chapter are pivotal.
Currently, garbage collection employs a combination of the tri-color marking and barrier mechanisms, with the primary performance impact stemming from the Stop The World (STW) mechanism. STW is necessary for memory safety, but the hybrid write barrier mechanism significantly reduces the need for STW, allowing for concurrent garbage collection without pausing the program.
Readers should note the key milestones in the evolution of Golang's GC:
Go 1.3: The traditional mark-and-sweep approach required STW, resulting in low efficiency.
Go 1.5: Introduced tri-color marking combined with insertion and deletion write barriers, where write barriers were applied to heap space but not stack space. This method required an additional stack scan after the initial scan, leading to moderate efficiency.
Go 1.8: Implemented tri-color marking with a hybrid write barrier mechanism, where the stack space did not require barriers while the heap space did. This approach minimized the need for STW and achieved higher efficiency.
Each of these approaches had its limitations, but through comparison, it is evident that the hybrid write barrier mechanism in Go 1.8 offers a substantial improvement in efficiency by minimizing the use of STW.
(Golang Triad)-I-Understanding the Golang Goroutine Scheduler GPM Model
(Golang Triad)-II-Comprehensive Analysis of Go's Hybrid Write Barrier Garbage Collection
Author:
discord: https://discord.gg/xQ8Xxfyfcz
zinx: https://github.com/aceld/zinx
github: https://github.com/aceld
aceld's home: https://yuque.com/aceld
Top comments (1)
The STW step would not be needed if it was decided to store pointers on the mirrored stack. Look at the SGCL for C++.