This blog aims to assist you in understanding the final concepts of Chapter:5 [Concurrency Control] from the book The Internals of PostgreSQL.
Note: Ensure that you have a thorough understanding of
Chapter 5 Part-1 and basics of PostreSQL before we proceed to Chapter 5 Part-2, as it forms the foundation for our exploration.
So, Let's Start:
Transaction Snapshot
A transaction snapshot is a dataset that stored information about whether all transactions are active, at a certain point in time for an individual transaction.
PostgreSQL internally defines the textual representation format of transaction snapshots as '100: 100:'.
Examples of transaction snapshot representation in PostgreSQL is depicted in figure below:
Analysis of above transaction snapshot is given below:
The first example is '100: 100:'. This snapshot means the following:
- txids that are equal or less than 99 are not active because xmin is 100.
- txids that are equal or greater than 100 are active because xmax is 100.
The second example is '100:104:100,102'. This snapshot means the following:
- txids that are equal or less than 99 are not active.
- txids that are equal or greater than 104 are active.
txids 100 and 102 are active since they exist in the xip list, whereas txids 101 and 103 are not active.
Transaction snapshots are provided by the transaction manager.
In READ COMMITTED isolation level, a snapshot is obtained for each SQL command execution.
In REPEATABLE READ or SERIALIZABLE, a snapshot is obtained only when the first SQL command is executed.
The obtained transaction snapshot is used for visibility checks of tuples.
Active transactions in the snapshot are treated as in progress, even if they have been committed or aborted.
Transaction manager and transactions in PostgreSQL is depicted in figure below:
- The transaction manager always holds information about currently running transactions. Suppose that three transactions start one after another, and the isolation level of Transaction_A and Transaction_B are READ COMMITTED, and that of Transaction_C is REPEATABLE READ.
Visibility Check Rules
Visibility check rules are used to determine the visibility of tuples.
The rules consider the t_xmin and t_xmax values of the tuple, along with the Clog and the obtained transaction snapshot.
Discussion regarding t_ctid, which involves tuples updated more than twice within a transaction, is ignored.
The number of selected rules is ten, and they can be classified into three cases.
1.
Status of t_xmin is ABORTED
- Rule 1: If Status(t_xmin) = ABORTED ⇒ Invisible
2.
Status of t_xmin is IN_PROGRESS
Rule 2: If Status(t_xmin) = IN_PROGRESS ∧ t_xmin = current_txid ∧ t_xmax = INVAILD ⇒ Visible
Rule 3: If Status(t_xmin) = IN_PROGRESS ∧ t_xmin = current_txid ∧ t_xmax ≠ INVAILD ⇒ Invisible
Rule 4: If Status(t_xmin) = IN_PROGRESS ∧ t_xmin ≠ current_txid ⇒ Invisible
3.
Status of t_xmin is COMMITTED
Rule 5: If Status(t_xmin) = COMMITTED ∧ Snapshot(t_xmin) = active ⇒ Invisible
Rule 6: If Status(t_xmin) = COMMITTED ∧ (t_xmax = INVALID ∨ Status(t_xmax) = ABORTED) ⇒ Visible
Rule 7: If Status(t_xmin) = COMMITTED ∧ Status(t_xmax) = IN_PROGRESS ∧ t_xmax = current_txid ⇒ Invisible
Rule 8: If Status(t_xmin) = COMMITTED ∧ Status(t_xmax) = IN_PROGRESS ∧ t_xmax ≠ current_txid ⇒ Visible
Rule 9: If Status(t_xmin) = COMMITTED ∧ Status(t_xmax) = COMMITTED ∧ Snapshot(t_xmax) = active ⇒ Visible
Rule 10: If Status(t_xmin) = COMMITTED ∧ Status(t_xmax) = COMMITTED ∧ Snapshot(t_xmax) ≠ active ⇒ Invisible
Visibility Check
PostgreSQL performs a visibility check, i.e. how heap tuples of the appropriate versions in a given transaction are selected.
PostgreSQL prevents the anomalies defined in the ANSI SQL-92 Standard: Dirty Reads, Repeatable Reads and Phantom Reads.
Scenario to describe visibility check in PostgreSQL is depicted in figure below:
Analysis of Visibility Check of above figure
SQL commands are executed in the following time sequence:
T1: Start transaction (txid 200)
T2: Start transaction (txid 201)
T3: Execute SELECT commands of txid 200 and 201
T4: Execute UPDATE command of txid 200
T5: Execute SELECT commands of txid 200 and 201
T6: Commit txid 200
T7: Execute SELECT command of txid 201
Preventing Lost Updates
A Lost Update, also known as a ww-conflict.
A Lost Update is an anomaly that occurs when concurrent transactions update the same rows, and it must be prevented in both the REPEATABLE READ and SERIALIZABLE levels.
Note that the READ COMMITTED level does not need to prevent Lost Updates.
Behavior of Concurrent UPDATE Commands
When UPDATE command is executed, the function ExecUpdate is internally invoked.
This function performs update operations for each of the target rows.
It has a while loop to update each row, and the inside of the while loop branches to three blocks according to the conditions.
Three internal blocks in ExecUpdate in PostgreSQL is depicted in figure below:
Serializable Snapshot Isolation
- Serializable Snapshot Isolation (SSI) has been embedded in SI since version 9.1 to realize a true SERIALIZABLE isolation level.
Basic Strategy for SSI Implementation
If a cycle with conflicts exists in the precedence graph, a serialization anomaly occurs.
The simplest anomaly used as an example is Write-Skew.
Part (1) shows a schedule where Transaction_A reads Tuple_B and Transaction_B reads Tuple_A.
Transaction_A then writes Tuple_A, and Transaction_B writes Tuple_B.
Two read-write conflicts create a cycle in the precedence graph, as shown in Fig. Part (2).
As a result, this schedule exhibits the serialization anomaly of Write-Skew.
Write-Skew schedule and its precedence graph in PostgreSQL is depicted in figure below:
- SSI implementation in PostgreSQL only needs to consider rw-conflicts.
PostgreSQL takes the following strategy for the SSI implementation:
1.
Record all objects (tuples, pages, relations) accessed by transactions as SIREAD locks.
2.
Detect rw-conflicts using SIREAD locks whenever any heap or index tuple is written.
3.
Abort the transaction if a serialization anomaly is detected by checking detected rw-conflicts.
Implementing SSI in PostgreSQL
SIREAD locks and rw-conflicts can be data structures to describe the SSI mechanism in PostgreSQL.
They are stored in shared memory.
1.
SIREAD locks
SIREAD lock, also known as a predicate lock, is a lock that stores information about which transactions have accessed an object.
It consists of an object and (virtual) txids.
SIREAD locks are created by the CheckTargetForConflictsOut function in SERIALIZABLE mode when a DML command is executed.
For example, if txid 100 reads Tuple_1, an SIREAD lock {Tuple_1, {100}} is created.
SIREAD locks have three levels: tuple, page, and relation.
SIREAD locks are also created when reading index pages in the Index-Only Scans feature.
2.
rw-conflicts
A rw-conflict is a triplet of an SIREAD lock and two txids that reads and writes the SIREAD lock.
The CheckTargetForConflictsIn function is invoked whenever either an INSERT, UPDATE or DELETE command is executed in SERIALIZABLE mode, and it creates rw-conflicts when detecting conflicts by checking SIREAD locks.
Both CheckTargetForConflictOut and CheckTargetForConflictIn functions, as well as the PreCommit_CheckForSerializationFailure function, which is invoked when the COMMIT command is executed in SERIALIZABLE mode, check serialization anomalies using the created rw-conflicts.
If they detect anomalies, only the first-committed transaction is committed and the other transactions are aborted (by the first-committer-win scheme).
Required Maintenance Processes
- PostgreSQL's concurrency control mechanism requires the following maintenance processes.
- Remove dead tuples and index tuples that point to corresponding dead tuples
- Remove unnecessary parts of the clog
- Freeze old txids
- Update FSM, VM, and the statistics
- In PostgreSQL, VACUUM processing is responsible for these processes.
I hope, this blog has helped you in understanding the final concepts of Concurrency Control in PostreSQL.
Check out summary of Chapter : 6
If you want to understand PostgreSQL In-Depth.
Top comments (0)