CodingBlocks
Understanding Serial Transactions for Databases like Redis
We’ve got a new / old opening…Allen goes off / on script? Michael denies Joe the “swing” vote, and Joe is all in on AI assistance
Testing for concurrency issues is hard because it’s non-deterministic – basically you get unlucky due to the timing of things
Serializability
- The problems we’ve been discussing the past few episodes have been around since the 1970’s
- The answer is always – just use serializable isolation!
- Serializable isolation is the strongest isolation
- The database prevents ALL race conditions
- Even if transactions run in parallel, they’re guaranteed to act/result the same as if they had run one at a time, one after another
- The database prevents ALL race conditions
- If they’re so much better, why have/use weaker isolation levels?
Common Implementations
- Executing the transactions serially, actually
- Two phase locking – was one of the only real available solutions for several decades
- Optimistic concurrency control – things like serializable snapshot isolation
- We’ll be talking about these in terms of a single node database
Actual Serial Execution
- The easiest way to get rid of race conditions is to really just run things one after another – no concurrency
- This was only implemented for the first time around 2007 – prior the performance was too poor
- This is truly a loop over transactions submitted to the db engine
- What changed to make it possible?
- RAM became cheap enough to store entire active datasets in memory – when this is done transactions can execute much faster as you don’t have to wait to load the data from disk
- DB designers concluded that most OLTP transactions are usually short-lived and make a small number of reads and writes – so they can be run on a consistent snapshot using snapshot isolation outside of the serial execution loop
- Used by VoltDB/HStore, Redis, and Datomic
- Sometimes single-threaded systems can perform better than concurrent ones simply because there’s no locking
- However, you’re bound by a single CPU core
- Transactions will need to be set up differently than in typical concurrent systems
- Sometimes single-threaded systems can perform better than concurrent ones simply because there’s no locking
Encapsulating transactions in stored procedures
- They talked about how the early implementations in db’s had the intention of making the entire flow part of the transaction – to book a flight, a person would be shown a list of flights, they’d choose the one they want, and it’d be stored
- The problem with that approach is it can take a long time for that flow to be completed
- For that reason, web-applications limit transactions to a single web / http request
- There can still be situations where the transaction can occur with multiple interactions between the application and the database
- From the application, query to see if the seat is still available on the flight…Ok, it is..now send another query to the db to update the seat to the customer…now query the db again to get any additional information
- Doing it this way in a serial transaction db would be too slow because there’s too much network latency / waiting
- From the application, query to see if the seat is still available on the flight…Ok, it is..now send another query to the db to update the seat to the customer…now query the db again to get any additional information
- In a single-threaded serial transaction, everything must be done all at once in a stored procedure
- Keeping everything in memory and providing the stored proc everything it needs ensures the transaction is fast without waiting for any network or disk IO
- Great picture in the book in Figure 7-9 that describes this
- Keeping everything in memory and providing the stored proc everything it needs ensures the transaction is fast without waiting for any network or disk IO
Pros and Cons of Stored Procedures
- They’ve been part of the SQL standard since….1999
- Sometimes get a bad rep
- Each vendor’s implementation has their own language
- The book mentions that the SQL language hasn’t kept up with other programming languages and look/are archaic in comparison
- It’s hard to manage code stored on the database server
- Harder to debug
- More difficult to keep in source control
- More difficult to test
- More difficult to gather metrics for monitoring
- Because db’s are typically shared by many applications or a LOT of application code, non-performant stored procedure code can cause massive problems – usually worse problems than poorly written application code
- Sometimes get a bad rep
- These issues have and can be remedied
- Modern serializable databases use regular programming languages
- VoltDB – Java/Groovy
- Datomic – Java/Clojure
- Redis – Lua
- Modern serializable databases use regular programming languages
- When the database is in memory and the transactions are single threaded, stored procedures can actually be quite good
- Because there’s no IO / networking overhead, transactions can occur quickly on a single thread
- VoltDB also executes stored procedures for replication!
- This means the stored procedures have to be deterministic – datetimes have to use deterministic apis
Partitioning
- As mentioned before, doing serial transactions means you are limited to a single core of a single CPU
- Read only transactions could occur on a separate thread using snapshot isolation
- If you need high write throughput, the single thread on a single core could be a problematic bottleneck
- This is where partitioning comes into play – if you can divvy your data up in a way that would allow transactions to stay within a single partition, then you’ll have the ability to linearly scale your CPU cores/threads to the number of partitions you have
- If your transaction has to go across multiple partitions, then the stored procedure must ensure that each partition is handled appropriately to keep everything serialized properly
- VoltDB can handle multiple partitions
- Doing cross-partition writes is much slower than single partition writes – VoltDB reports 1k cross-partition writes per second
- Determining if transactions can occur on a single partition takes a bit of planning
- key-value data is likely a single partition transaction
- data with multiple secondary indexes will likely require cross-partition transactions
Resources We Like
- Designing Data Intensive Applicationshttps://www.codingblocks.net/get/designing-data-intensive-applications
Tips of the Episode
- Copilot Labs is an optional extension for Github Copilot that adds some nifty new features to VSCode with Copilot. It installs as a new sidebar icon and has 4 major features:
- Code explanation – What does this block of code do? Does the code I wrote do what I think it does?
- Code translation – Not familiar with a language you’re reading? Convert it to one that you do!
- IDE brushes – Modify existing code using a variety of brushes like you would in an art program – Add Types, Fix bugs, improve readability, resilience, add documentation and it looks like there’s a way to add custom branches!
- Test generation – JS and TS only right now
- from AndrewEver wanted a Windows-like
ALT+TAB
experience on your Mac? Introducing AltTab for Mac
https://alt-tab-macos.netlify.app/ - A reason to use the terminal in Visual Studio Code
- Any operations like a git status that show a list of files are easy to ctrl / cmd click to open directly in the editor
- Bitwarden as a LastPass replacement – Less than 1/3 the pricehttps://bitwarden.com/pricing/https://www.lastpass.com/pricing