In the field of database systems, concurrency control is a crucial aspect to ensure that multiple transactions can execute simultaneously without causing inconsistencies or data corruption. To manage this, databases use various isolation levels and techniques like locking mechanisms to protect data integrity. However, these mechanisms may introduce certain issues known as Dirty Read, Non-repeatable Read, and Phantom Read.
𝟭. 𝗗𝗶𝗿𝘁𝘆 𝗥𝗲𝗮𝗱:
A dirty read occurs when a transaction reads uncommitted data from the database. In other words, it reads data that has been modified by another
transaction but not yet committed. This can lead to incorrect results being displayed to the user. To prevent this issue, databases generally use the Two-Phase Locking (2PL) protocol, which ensures that a transaction holds locks only for the duration of the transaction.
# Example using SQL and Python's psycopg2 library:
import psycopg2
conn = psycopg2.connect(database="your_db", user="your_user", password="your_password")
cursor = conn.cursor()
def transaction1():
cursor.execute("UPDATE table SET column=column+1 WHERE condition")
conn.commit() # commits the changes
def transaction2():
cursor.execute("SELECT * FROM table") # reads uncommitted data
print(cursor.fetchall())
𝟮. 𝗡𝗼𝗻-𝗿𝗲𝗽𝗲𝗮𝘁𝗮𝗯𝗹𝗲 𝗥𝗲𝗮𝗱:
A non-repeatable read occurs when a transaction re-reads data that has been modified by another concurrent transaction. This issue arises because databases do not guarantee that the data will remain unchanged between two reads of the same transaction, even if no other transactions are modifying it. To address this, 2PL is also used to prevent non-repeatable reads.
# Example using SQL and Python's psycopg2 library:
import psycopg2
conn = psycopg2.connect(database="your_db", user="your_user", password="your_password")
cursor = conn.cursor()
def transaction1():
cursor.execute("SELECT * FROM table WHERE condition") # reads data
rows = cursor.fetchall()
cursor.execute("UPDATE table SET column=column+1 WHERE condition") # modifies data
conn.commit() # commits the changes
cursor.execute(rows) # reads data again, but results can be different due to non-repeatable read
𝟯. 𝗣𝗵𝗮𝗻𝘁𝗼𝗺 𝗥𝗲𝗮𝗱:
A phantom read occurs when a transaction retrieves multiple rows from a query based on certain conditions, but later, when it re-executes the
same query with the same conditions, additional rows appear that were inserted by other concurrent transactions in the meantime. To address this
issue, databases use multiversion concurrency control (MVCC) and version store, which allow multiple versions of data to coexist until a commit
or rollback is performed.
# Example using SQL and Python's psycopg2 library:
import psycopg2
conn = psycopg2.connect(database="your_db", user="your_user", password="your_password")
cursor = conn.cursor()
def transaction1():
cursor.execute("SELECT * FROM table WHERE condition") # reads data
rows = cursor.fetchall()
cursor.execute("INSERT INTO table (column) VALUES ('some value')") # inserts new row
conn.commit() # commits the changes
cursor.execute("SELECT * FROM table WHERE condition") # re-reads data, but results can be different due to phantom read
To avoid these issues, databases use various techniques like 2PL, MVCC, and version store, depending on the isolation level chosen for the transaction. By ensuring data integrity, these mechanisms prevent dirty reads, non-repeatable reads, and phantom reads, allowing transactions to run concurrently without causing inconsistencies or data corruption.
𝗥𝗲𝗮𝗹 𝗟𝗶𝗳𝗲 𝗘𝘅𝗮𝗺𝗽𝗹𝗲
Imagine you are withdrawing money from your bank account. You have two options for checking the balance: Option A and Option B.
Option A: Dirty Read, Non-repeatable Read
- You check the balance of your account, showing $10,000.
- Another transaction transfers $5,000 from your account to another account without committing yet (dirty read).
- You again check the balance of your account and see $5,000.
- The transfer transaction gets rolled back due to an error.
- If you try to re-check the balance using Option A, it will show $10,000, but it should ideally be $5,000 (non-repeatable read).
Option B: Phantom Read
- You check all transactions in your account from last month, showing a total of $10,000.
- Another transaction makes a deposit of $2,000 into your account without committing yet.
- You try to re-check the transactions using Option B, but it will now show an additional $2,000 deposit (phantom read).
- If you try to re-execute the query, it will show both the original transactions and the new $2,000 deposit (phantom read persists).
In this example, Option A demonstrates dirty reads and non-repeatable reads, while Option B illustrates a phantom read. Databases employ
isolation levels, locking mechanisms, or multiversion concurrency control to prevent these issues and maintain data integrity during concurrent
transactions.
Chapter on Concurrency Controls : https://www.cs.uct.ac.za/mit_notes/database/pdfs/chp13.pdf
Top comments (0)