Dwi Wahyudi

Senior Software Engineer (Ruby, Golang, Java)


Isolation level is a feature to help us design better concurrency control in our application especially at the database level.

Overview

Previously we’ve covered another feature called row-level locking. Row-level locking is an obligatory feature to use when we want to design a safe concurrent application, preventing race-conditions.

Isolation level is another feature offered by common RDBMS, it is provided to make sure reading the same rows/records are consistent between each statement (not changed) by other concurrently running transactions.

In a concurrent application, especially at the database level, multiple transactions may access the same data at the same time (before commit/rollback), it is up to developers to control how much isolation we need between the transactions.

There are 4 effects that can be caused by concurrently running transactions not yet committed/rolled-back.

Dirty Read

Dirty read happens when a transaction is reading a data not yet committed from other transaction.

Connection A Connection B
Begin Transaction Begin Transaction
Update wallets id 3 balance from 2000 to 3000
Application still calculating, transaction not committed/rolled-back Select wallets id = 3, get value 3000
Something wrong happened, rolled back Processing data with value 3000

Connection A is rolled-back but yet Connection B is still using 3000 as the value (from the rolled-back transaction).

The isolation level for this to happen is read uncommitted.

Read committed isolation level fixes this, as this isolation level never sees either uncommitted data or changes committed by concurrent transactions during the query’s execution.

Dirty read is allowed in some other RDBMS, but not in PostgreSQL, so there’s no read uncommitted isolation in PostgreSQL.

Dirty read is only between different transactions, a transaction can see its own updated values (inside its own transaction scope).

Nonrepeatable Read

Nonrepeatable read happens when a transaction is re-reading data and finds it has been changed by another (committed) transaction.

Connection A Connection B
Begin Transaction Begin Transaction
Select wallets id = 3, get value 2000 Select wallets id = 3, get value 2000
Update wallets id 3 balance from 2000 to 3000
Commit
Select wallets id = 3, get value 3000

Here we see that Connection B is finding out the second time it reads wallets id 3, the value is different.

The strictest isolation level for this to happen is read committed. This is the default isolation level in PostgreSQL.

Repeatable read isolation level fixes this, as PostgreSQL creates an internal snapshot to store the data before the transaction begins. Executing the same query will return the same data.

Phantom Read

Phantom read involves queries that return multiple rows/records, it happens when a transaction re-running a query (with the same WHERE and or HAVING conditions) returning a different set of rows, this is caused by another committed transaction.

Connection A Connection B
Begin Transaction Begin Transaction
Select wallets, where value > 2500, returns id 3, 4, 5, 6
Bulk Update wallets value for 10 rows
Commit
Select wallets, where value > 2500, returns id 3, 7, 8, 11, 19

As we can see here, the phantom reads is happening when Connection B runs the same query twice, turns out rows satisfying the query result has changed.

The strictest isolation level for this to happen is repeatable read.

Serialization anomaly isolation level fixes this, as PostgreSQL will force the transaction to run one by one.

This effect however is not possible in PostgreSQL.

Serialization Anomaly

Serialization anomaly happens when 2 transactions are affecting each other data that can cause wrong data to be processed in both transactions or even worse, some lost update may happen too. There might be duplication issue as well caused by this serialization.

Let’s say we have a table consisting of transaction journal. It has these fields/columns:

  • User ID
  • Amount (decimal data type, positive or negative)
  • Timestamp
  • other fields.

It records the money in and money of an account. Each night we want to disburse the money to the account holder, making the whole sum 0 again by creating the negative amount of the whole sum. In the application code we can just skip (and rollback) the disbursing process if amount is 0.

Connection A Connection B
Begin Transaction Begin Transaction
Select sum(amount) from journals where user_id = ?, gets 300k Select sum(amount) from journals where user_id = ?, gets 300k
if sum < 10000 rollback if sum < 10000 rollback
Create a new disburse journal -300k Create a new disburse journal -300k
Commit
Commit

With this particular effect, we can see that these 2 connections create duplicate journals. This 2 transactions operate under repeatable read isolation level, so querying with the same WHERE query resulting in the same rows. Connection B doesn’t know about the new field committed by connection A, phantom read didn’t happen but serialization anomaly is happening and causing a bug, the user’s journal having twice negative 300k value.

If the final state depends on the order of running and committing these transactions, it is serialization anomaly.

Another example of this effect (this is a bad query anyway, don’t try this, this is made only for the sake of example):

Connection A Connection B
Begin Transaction Begin Transaction
update journal set amount = 300 where user_id = ? update journal set amount = 500 where user_id = ? (blocked by FOR NO KEY UPDATE)
Commit Lock from connection A released
Commit

The end result will be 500. The update to 300 is lost.

Using serializable isolation level, only 1 transaction will commit, the other will throw an error.

This isolation level is the strictest level, it is the best isolation level to guarantee the data consistency, but it is also the slowest, because this isolation level will process only 1 transaction at a time.

Implementation in PostgreSQL

We can adjust the isolation per transaction like this:

BEGIN ISOLATION LEVEL <isolation_level>;

statements

COMMIT;

Globally we can do something like this:

ALTER DATABASE <DATABASE NAME> SET DEFAULT_TRANSACTION_ISOLATION TO SERIALIZABLE ;

Or changes it in postgresql.conf and restart the server.

We can then confirm the change by executing SHOW default_transaction_isolation;