The previous post used the primary key to record read intents in YugabyteDB. Serializable isolation can be scalable, but it requires optimal schema and indexes with a good knowledge of how the database works.
Indexes define the predicate locks for PostgreSQL Serializable Snapshot Isolation, and the primary key defines the range locks used by YugabyteDB's Two-Phase Commit.
Here an example taken from nathanl/postgresql_serializable_isolation.sql which simply inserts two rows from two transactions:
create table users(
id serial not null primary key,
username varchar not null
);
-- create index on users(username asc);
-- session 1
begin transaction isolation level serializable;
select * from users where username = 'alice';
insert into users ("username") values ('alice');
-- session 2
\! psql -c '\timing on' -c "begin transaction isolation level serializable; select * from users where username = 'bob'; insert into users ("username") values ('bob'); commit;" &
-- session 1
\! sleep 1
commit;
\! sleep 1
select * from users;
I've run this at a serializable isolation level. Let's examine the performance in PostgreSQL and YugabyteDB, with and without an index on "username".
PostgreSQL Seq Scan
Without an index on "username", the SELECT does a Full Table Scan.
Session 2 can commit, but session 1 detects a conflict on commit:
COMMIT
Time: 2.844 ms
postgres=*# commit;
ERROR: could not serialize access due to read/write dependencies among transactions
DETAIL: Reason code: Canceled on identification as a pivot, during commit attempt.
HINT: The transaction might succeed if retried.
This serializable error is a false positive. The two sessions read different values, and they are serializable. PostgreSQL employs predicate locks but only records the predicates used for the scan operation. The lock covers the entire table since it was read with a Seq Scan.
PostgreSQL Index Scan
I run the same with the index (uncomment the CREATE INDEX in the script), and there's no error:
COMMIT
Time: 2.581 ms
postgres=*# commit;
COMMIT
postgres=# \! sleep 1
postgres=# select * from users;
id | username
----+----------
1 | alice
2 | bob
In this case, PostgreSQL uses the index condition as a predicate lock, and they do not conflict between the two transactions.
Note that in this scenario, the "username" column should be declared unique instead of creating an additional index. This creates an implicit index and has the same effect:
create table users(
id serial not null primary ley,
username varchar not null unique
);
YugabyteDB
YugabyteDB uses Wait-on-Conflict to prevent the serializable error, allowing both transactions to commit even without the index.
yugabyte=*# commit;
COMMIT
COMMIT
Time: 926.484 ms
I waited one second before committing to the first session, and this delay is reflected in the response time of the second session, which waited for the first one. The reason for this is similar to what we have observed with PostgreSQL: the read lock intent is at the table level.
This is visible as STRONG_READ at the level of the "relation," which is blocked by the insert operation that acquires a WEAK_WRITE lock on the relation:
yugabyte=*# select
substr(ybdetails->>'transactionid',1,8) transactionid , substr(ybdetails->>'blocked_by',1,8) blocked_by,
relkind, locktype, mode, relname , ybdetails->'keyrangedetails'->'cols'
from pg_locks natural join (select oid relation, relname, relkind from pg_class) pg_class
order by transactionid, relkind desc, relname, case locktype when 'relation' then 1 when 'keyrange' then 2 when 'row' then 3 end, mode desc, locktype desc;
transactionid | blocked_by | relkind | locktype | mode | relname | ?column?
---------------+------------+---------+----------+----------------------+---------+----------
74e1329c | | r | relation | WEAK_READ,WEAK_WRITE | users | null
74e1329c | | r | relation | STRONG_READ | users | null
74e1329c | | r | row | STRONG_WRITE | users | ["1"]
74e1329c | | r | row | STRONG_READ | users | ["1"]
c5611a22 | ["74e132 | r | relation | STRONG_READ | users | null
The index will add the range lock intents in the index, but there's still a conflict with the table lock intents at "relation" level:
yugabyte=*# select
substr(ybdetails->>'transactionid',1,8) transactionid , substr(ybdetails->>'blocked_by',1,8) blocked_by,
relkind, locktype, mode, relname , ybdetails->'keyrangedetails'->'cols'
from pg_locks natural join (select oid relation, relname, relkind from pg_class) pg_class
order by transactionid, relkind desc, relname, case locktype when 'relation' then 1 when 'keyrange' then 2 when 'row' then 3 end, mode desc, locktype desc;
transactionid | blocked_by | relkind | locktype | mode | relname | ?column?
---------------+------------+---------+----------+----------------------+--------------------+---------------------------------------------
02a108e3 | | r | relation | WEAK_READ,WEAK_WRITE | users | null
02a108e3 | | r | relation | STRONG_READ | users | null
02a108e3 | | r | row | STRONG_WRITE | users | ["1"]
02a108e3 | | r | row | STRONG_READ | users | ["1"]
02a108e3 | | i | relation | WEAK_WRITE | users_username_idx | null
02a108e3 | | i | keyrange | WEAK_WRITE | users_username_idx | ["\"alice\""]
02a108e3 | | i | row | STRONG_WRITE | users_username_idx | ["\"alice\"", "\"H\\x80\\x00\\x00\\x01!\""]
5b82a9e3 | ["02a108 | r | relation | STRONG_READ | users | null
As a consequence of it, simply adding the index in YugabyteDB doesn't reduce the wait. If we want to minimize the latency for this use case, it is preferable to define the username as a primary key rather than the generated identifier:
create table users(
id serial not null unique,
username varchar not null primary key
);
In PostgreSQL, table rows are stored in heap tables, and all indexes, including the primary key, are considered secondary indexes. However, in YugabyteDB, the rows are stored in the primary key LSM Tree, so defining the primary key for the main access pattern is essential. There are no limitations because all secondary indexes are global and shared on their key.
Another reason the primary key matters in YugabyteDB is to improve serializable locks. In PostgreSQL, even if the scan predicates define the read intents, they are stored in memory. On the other hand, YugabyteDB is distributed and doesn't use a single-node shared memory, so it stores the intents with the table row in the primary key index.
With such a definition, both transactions can be committed without waiting for the other:
COMMIT
Time: 147.917 ms
yugabyte=*# commit;
COMMIT
Here are the locks just before the commits:
transactionid | blocked_by | relkind | locktype | mode | relname | ?column?
---------------+------------+---------+----------+----------------------+--------------+-----------------
350a076c | | r | relation | WEAK_READ,WEAK_WRITE | users | null
350a076c | | r | relation | WEAK_READ | users | null
350a076c | | r | row | STRONG_WRITE | users | ["\"alice\""]
350a076c | | r | row | STRONG_READ | users | ["\"alice\""]
350a076c | | r | row | STRONG_READ | users | ["\"alice\""]
350a076c | | i | relation | WEAK_READ,WEAK_WRITE | users_id_key | null
350a076c | | i | keyrange | WEAK_READ,WEAK_WRITE | users_id_key | ["1"]
350a076c | | i | row | STRONG_WRITE | users_id_key | ["1", "null"]
350a076c | | i | row | STRONG_READ | users_id_key | ["1", "null"]
97f5bd76 | | r | relation | WEAK_READ,WEAK_WRITE | users | null
97f5bd76 | | r | relation | WEAK_READ | users | null
97f5bd76 | | r | row | STRONG_WRITE | users | ["\"bob\""]
97f5bd76 | | r | row | STRONG_READ | users | ["\"bob\""]
97f5bd76 | | r | row | STRONG_READ | users | ["\"bob\""]
97f5bd76 | | i | relation | WEAK_READ,WEAK_WRITE | users_id_key | null
97f5bd76 | | i | keyrange | WEAK_READ,WEAK_WRITE | users_id_key | ["101"]
97f5bd76 | | i | row | STRONG_WRITE | users_id_key | ["101", "null"]
97f5bd76 | | i | row | STRONG_READ | users_id_key | ["101", "null"]
This situation does not involve any blocking. The STRONG_READ intents only affect individual rows because each transaction accesses different rows.
The serializable isolation level can be scalable. However, like many other aspects of SQL database performance, its performance depends on the definition of primary keys and secondary indexes. In SQL, "serializable" doesn't mean "serialized". The example above is serializable (with the same effect as if serialized), but the two transactions can run concurrently.
Top comments (2)
Thanks for this article as well, Franck 👏🏻
At this moment, I am digesting this sentence below because it sounds essential:
Now, besides the retry logic the application has to deal with, we developers must pay attention to how to design the schema to take the best out of the SERIALIZABLE isolation level in each database.
If you have other examples about this specific subject, please write new blog posts about it 🙏🏻
By the way, the Postgres documentation is AMAZING! It has some sentences about those issues with SERIALIZABLE isolation level well documented:
postgresql.org/docs/current/transa...