CodingBlocks
Designing Data-Intensive Applications – Weak Isolation and Snapshotting
Ever wonder how database backups work if new data is coming in while the backup is running? Hang with us while we talk about that, while Allen doesn’t stand a chance, Outlaw is in love, and Joe forgets his radio voice.
The full show notes for this episode are available at https://www.codingblocks.net/episode204.
News
- Thanks for the great reviews!
- Audible: Allison Williams
- Orlando Code Camp 2023 is coming up on March 25th 2023 (orlandocodecamp.com)
Weak Isolation levels
- If two transactions don’t touch the same data, they can be run in parallel.
- Race conditions occur when two different processes are trying to modify and access or modify the same data at the same time.
- Concurrency bugs are hard to find and test for – it usually comes down to unlucky timing.
- Concurrency bugs can also be very difficult to understand because multiple parts of an application can be interacting with the database simultaneously and in unexpected ways.
- Single-user interactions with a database are hard enough, and when you have multiple interactions happening simultaneously, it makes it all much more difficult.
- Databases try to make it look like interactions happen one at a time for that very reason – to simplify the work for a developer.
- Serializable isolation is a database guarantee that makes transactions look as if they happened serially – one after another.
- Isolation is not that simple in reality
- Serializable isolation comes at a performance cost
- For this reason, most databases choose not to use it
- Most databases use weaker isolation levels to protect against some concurrency issues but not all of them
- These aren’t just theoretical bugs
- Have resulted in large financial losses
- Investigations by financial auditors
- Customer data corruption
- These aren’t just theoretical bugs
- Serializable isolation comes at a performance cost
- It’s been a common theme that “use a relational db if you’re doing financial transactions” – however, being that most db’s use weak isolation, that doesn’t guarantee things would have been perfect
- For this reason – you should understand the various weak isolation (non-serial) levels
Read Committed
- Two guarantees
- When reading from the database, you will only see data that has been committed (no dirty reads)
- When writing to the database, you will only overwrite data that has been committed (no dirty writes)
- A second write is delayed until the first write’s transaction has been committed or aborted
- This does not protect against the incrementing race condition – ie. two processes read a value at the same time, id = 1, then process one increments that and saves it, so the value is 2. Now, process two (which had 1 in memory from the read) does its increment, and stores the value as 2 as well – the value should have been three but because it had an old id, it didn’t increment properly
- Avoiding this is discussed later in “Preventing Lost Updates”
Snapshot Isolation and Repeatable Read
- Addresses read skew, an example of a non-repeatable read
- The example given was a customer has two bank accounts, gets her balance of account A, and then some moments after a transfer of $100 from account 2 to account 1 the customer gets the balance for account B…customer has an old value from account 1 and a new value from account two, so it looks like the customer is missing $100.
- This is acceptable in the read committed isolation as both account values were committed at the time of the reads.
- How could this happen? Here’s a quick example…
- Multiple queries were issued to get the different account values – get balance for account 1, get balance for account 2…behind the scenes, someone did a transfer from one account to the other
- This is a very temporary state
- The example given was a customer has two bank accounts, gets her balance of account A, and then some moments after a transfer of $100 from account 2 to account 1 the customer gets the balance for account B…customer has an old value from account 1 and a new value from account two, so it looks like the customer is missing $100.
- There are situations where this can’t happen
- Backups, analytics queries and integrity checks
- Snapshot isolation is a typical solution to the problem
- Transactions read from a consistent snapshot – meaning that a transaction is opened to read multiple values from a snapshot state of the database when the read first started
- Very popular feature – supported by PostgreSQL, Oracle, SQL Server, MySQL with InnoDB
- Transactions read from a consistent snapshot – meaning that a transaction is opened to read multiple values from a snapshot state of the database when the read first started
How is snapshot isolation accomplished?
- Usually use write locks to stop dirty writes
- Reads never block writes, and writes never block reads
- Because there may be multiple transactions taking place at once, there may need to be multiple copies of database objects in play at once – this is referred to as multi-version concurrency control
- The difference between read committed and snapshot isolation is read committed will use a different snapshot for each read whereas snapshot isolation will use the same snapshot for multiple reads within the transaction
- They show an example of PostgreSQL’s implementation
- Found this README in Postgres – https://github.com/postgres/postgres/blob/master/src/backend/storage/lmgr/README-SSI
- The implementation basically uses some metadata fields on a row – created_by and deleted_by fields which contain transaction id’s
- If you were to delete a row, that deleted_by field is updated, the row isn’t actually deleted at that point in time, but garbage collection will pick it up later and remove it physically from the table – at a time when it’s deemed that it will no longer be accessed
- Updates are converted to creates and deletes (similar to what you’d see if you’re familiar with triggers in something like SQL Server)
- They show an example of PostgreSQL’s implementation
Visibility for seeing a consistent snapshot
- Consistent snapshots work by following these rules:
- At the start of a transaction, a list of all transactions in progress are identified and ignored for any reads
- Any writes made by transactions that were aborted are ignored
- Any writes made by a newer transaction id are ignored
- All other writes are available to read
- Another way of thinking about it – an object is visible if
- A transaction that had created an object had already been completed BEFORE the reader transaction starts
- An object is either not marked for deletion OR if it is marked for deletion, the transaction had not been completed at the time the read transaction started
- Because the database is never truly updating/deleting values in place, a number of running transactions can continue to function from snapshots of those objects with very small overhead
Snapshot isolation and indexes
- Considering what we mentioned about the database storing multiple snapshots of state, how does this work with indexes?
- One way would be to have the index point to all the transaction ids and have them filtered out, and when garbage collection happens, remove those entries from the index as well
- Turns out, there are a lot of implementation details and performance thoughts to take into consideration depending on the database implementation
- They gave an example of how things are done in PostgreSQL – if multiple versions of the same object can fit on the same page file, nothing is done to the index
- Another approach used by CouchDB and others are to use an append-only / copy-on-write method that does not overwrite the existing page in the b-tree but rather creates a copy of the modified page. Then, a copy of each parent is made all the way up to the root page to point to the new pages. Any pages not impacted by the write operation don’t need to be touched
- If that sounded like it was creating a new tree for every write in the append-only-B-tree, you’d be correct. By taking this approach, every individual root tree is a consistent shot of the database at that point in time
- The benefit of this approach is you don’t have to filter anything because every root node has only the transactions that belong in that snapshot
- You do need a background process to garbage collect and compact
- If that sounded like it was creating a new tree for every write in the append-only-B-tree, you’d be correct. By taking this approach, every individual root tree is a consistent shot of the database at that point in time
Repeatable read and naming confusion
- Unfortunately, snapshot isolation is known by many names
- Oracle calls it serializable
- PostgreSQL and MySQL call it repeatable read
- Why is it not consistent naming? Because it’s not a part of SQL, but rather a part of System R’s 1975 isolation levels, it hadn’t been defined yet!
- They HAD defined repeatable read which is very similar to snapshot isolation
- Unfortunately in relational databases, “repeatable read” doesn’t define what guarantees they really provide
- It was called out that there is a formal definition of a repeatable read but most implementations don’t meet the definition
- “Nobody really knows what repeatable read means”
Resources We Like
- Orlando Code Camp 2023 (orlandocodecamp.com)
- The 12 Factor App and Google Cloud (cloud.google.com)
- Martin Kleppmann’s website (martin.kleppmann.com)
- Dave Foster Wallace – This is Water (youtube)
Tip of the Week
- “Infinite Jest” is an interesting book, but it’s not a good audio experience. Get the physical book this time. (amazon.com)
- Tamara Makes Games is a game dev on YouTube that has a lot of videos oriented around isometric, city-builder, and strategy games similar to Factorio. It’s a cool niche, and it’s a nice balance of code and visuals that are a delight to watch. (youtube)
- There’s a lot you can do with iTerm2’s status bar, it’s highly configurable making it easy to show system resource monitors, shell information, and other miscellaneous items. (iterm2.com)
- Google has a tool named “container-diff” for analyzing and comparing container images. It can examine images along several different criteria, great for tracking down issues..like knowing why Docker isn’t caching a layer. (github.com)
- Xeol is a great utility for checking for end-of-life packages that you should get rid of. Thanks, gaprogman! (github)
- Using minikube? You can manage the space for the vm by using minikube ssh to shell into the machine and then prune your images with docker builder prune. Alternatively, you can use
eval $(minikube docker-env)
it to proxy docker to your local machine so you can justdocker builder prune
(and any other docker commands) without the shell. (minikube.sigs.k8s.io)