Dwi Wahyudi
Senior Software Engineer (Ruby, Golang, Java)
Isolation level is a feature to help us design better concurrency control in our application especially at the database level.
Overview
Previously we’ve covered another feature called row-level locking. Row-level locking is an obligatory feature to use when we want to design a safe concurrent application, preventing race-conditions.
Isolation level is another feature offered by common RDBMS, it is provided to make sure reading the same rows/records are consistent between each statement (not changed) by other concurrently running transactions.
In a concurrent application, especially at the database level, multiple transactions may access the same data at the same time (before commit/rollback), it is up to developers to control how much isolation we need between the transactions.
There are 4 effects that can be caused by concurrently running transactions not yet committed/rolled-back.
Dirty Read
Dirty read happens when a transaction is reading a data not yet committed from other transaction.
Connection A | Connection B |
---|---|
Begin Transaction | Begin Transaction |
Update wallets id 3 balance from 2000 to 3000 | |
Application still calculating, transaction not committed/rolled-back | Select wallets id = 3, get value 3000 |
Something wrong happened, rolled back | Processing data with value 3000 |
Connection A is rolled-back but yet Connection B is still using 3000 as the value (from the rolled-back transaction).
The isolation level for this to happen is read uncommitted.
Read committed isolation level fixes this, as this isolation level never sees either uncommitted data or changes committed by concurrent transactions during the query’s execution.
Dirty read is allowed in some other RDBMS, but not in PostgreSQL, so there’s no read uncommitted isolation in PostgreSQL.
Dirty read is only between different transactions, a transaction can see its own updated values (inside its own transaction scope).
Nonrepeatable Read
Nonrepeatable read happens when a transaction is re-reading data and finds it has been changed by another (committed) transaction.
Connection A | Connection B |
---|---|
Begin Transaction | Begin Transaction |
Select wallets id = 3, get value 2000 | Select wallets id = 3, get value 2000 |
Update wallets id 3 balance from 2000 to 3000 | |
Commit | |
Select wallets id = 3, get value 3000 |
Here we see that Connection B is finding out the second time it reads wallets id 3, the value is different.
The strictest isolation level for this to happen is read committed. This is the default isolation level in PostgreSQL.
Repeatable read isolation level fixes this, as PostgreSQL creates an internal snapshot to store the data before the transaction begins. Executing the same query will return the same data.
Phantom Read
Phantom read involves queries that return multiple rows/records, it happens when a transaction re-running a query (with the same WHERE
and or HAVING
conditions) returning a different set of rows, this is caused by another committed transaction.
Connection A | Connection B |
---|---|
Begin Transaction | Begin Transaction |
Select wallets, where value > 2500, returns id 3, 4, 5, 6 | |
Bulk Update wallets value for 10 rows | |
Commit | |
Select wallets, where value > 2500, returns id 3, 7, 8, 11, 19 |
As we can see here, the phantom reads is happening when Connection B runs the same query twice, turns out rows satisfying the query result has changed.
The strictest isolation level for this to happen is repeatable read.
Serialization anomaly isolation level fixes this, as PostgreSQL will force the transaction to run one by one.
This effect however is not possible in PostgreSQL.
Serialization Anomaly
Serialization anomaly happens when 2 transactions are affecting each other data that can cause wrong data to be processed in both transactions or even worse, some lost update may happen too. There might be duplication issue as well caused by this serialization.
Let’s say we have a table consisting of transaction journal. It has these fields/columns:
- User ID
- Amount (decimal data type, positive or negative)
- Timestamp
- other fields.
It records the money in and money of an account. Each night we want to disburse the money to the account holder, making the whole sum 0 again by creating the negative amount of the whole sum. In the application code we can just skip (and rollback) the disbursing process if amount is 0.
Connection A | Connection B |
---|---|
Begin Transaction | Begin Transaction |
Select sum(amount) from journals where user_id = ?, gets 300k | Select sum(amount) from journals where user_id = ?, gets 300k |
if sum < 10000 rollback | if sum < 10000 rollback |
Create a new disburse journal -300k | Create a new disburse journal -300k |
Commit | |
Commit |
With this particular effect, we can see that these 2 connections create duplicate journals. This 2 transactions operate under repeatable read isolation level, so querying with the same WHERE
query resulting in the same rows. Connection B doesn’t know about the new field committed by connection A, phantom read didn’t happen but serialization anomaly is happening and causing a bug, the user’s journal having twice negative 300k value.
If the final state depends on the order of running and committing these transactions, it is serialization anomaly.
Another example of this effect (this is a bad query anyway, don’t try this, this is made only for the sake of example):
Connection A | Connection B |
---|---|
Begin Transaction | Begin Transaction |
update journal set amount = 300 where user_id = ? | update journal set amount = 500 where user_id = ? (blocked by FOR NO KEY UPDATE) |
Commit | Lock from connection A released |
Commit |
The end result will be 500. The update to 300 is lost.
Using serializable isolation level, only 1 transaction will commit, the other will throw an error.
This isolation level is the strictest level, it is the best isolation level to guarantee the data consistency, but it is also the slowest, because this isolation level will process only 1 transaction at a time.
Implementation in PostgreSQL
We can adjust the isolation per transaction like this:
BEGIN ISOLATION LEVEL <isolation_level>;
statements
COMMIT;
Globally we can do something like this:
ALTER DATABASE <DATABASE NAME> SET DEFAULT_TRANSACTION_ISOLATION TO SERIALIZABLE ;
Or changes it in postgresql.conf
and restart the server.
We can then confirm the change by executing SHOW default_transaction_isolation;