Dwi Wahyudi

Senior Software Engineer (Ruby, Golang, Java)


Row-level locking is an obligatory feature to use in high-concurrency application to prevent prevalent race-condition issues. This article will cover it briefly.

Overview

We’ve covered similar topic in Go mutex article: Golang Mutex. Except this time, we’ll do locking in database level.

Concurrency itself is a great tool for processing plenty of similar tasks at the same time. An application typically has multiple connection instances to database (configured with certain pooling systems). And of course there might be multiple applications using the same database as well (although this is not recommended).

Typical scenario will be like this: an application serves hundreds or even thousands of requests (from many users) at the same time, and in order to serve the requests, for each request, server will communicate with the database, opening and using plenty of database connections at the same time. Multiple update or delete requests to the same data in database will certainly cause race condition.

Race condition itself is an issue that emerges when 2 or more concurrent instances/threads/coroutine/processes, etc access and update the same data causing loss-update (which surely makes such data broken). This of course won’t happen in non-concurrent system (which is non-existent by today standard).

Here’s an example on how race condition may occur, lost update is happening:

Connection 1 Connection 2 Value
10000
read 10000
read 10000
+4000 10000
+4000 10000
write 14000
write 14000

The ending value should have been 18.000 not 14.000, our customer will file a complaint to us because he/she’s losing 4.000 in the process.

Implementing Row-Level Locking

Some developers will try to fix this problem by having queueing system and having topic (let say in Kafka) as the ordering mechanism so only 1 transaction is calculated and processed at one time, however this won’t forever work because we can’t guarantee if there’s any other process/database connection currently updating the same data at the same time. For this reason, row-level locking is needed so that we can perfectly guarantee that only 1 database connection is accessing and updating the data at one time, other database connection will have to wait. Only one database connection can hold the lock.

Update here also includes deletion. Race condition may involve updating row/record that’s being deleted. Locking prevents such thing to happen.

Connection 1 Connection 2 Value
10000
Try to get row-level lock 10000
Lock acquired 10000
Read Try to get row-level lock 10000
+4000 Waiting for Connection 1 10000
write (lock released) 14000
Lock acquired 14000
Read 14000
+4000 14000
write (lock released) 18000

As we can see, it is now 18.000 as expected. Under an explicit transaction block (BEGIN COMMIT/ROLLBACK) locking will be released by connection 1 once transaction is committed/rolled-back, if the operations is not under a transaction than locking will be released when the UPDATE statement completed.

Let’s see the example of this:

BEGIN;
  SELECT * FROM wallets WHERE id = ? FOR UPDATE;
  UPDATE wallets SET balance = balance + ? WHERE id = ?;

FOR UPDATE here will attempt to acquire row-level locking for the current transaction in the current connection. If there’s another transaction in another connection (no matter where) wants to lock the very same wallets row/record, it will need to wait until the row-level lock is released by previous transaction. Concurrent transaction/connection can access other wallets id as long as it is not locked, hence the name row-level locking.

We must design our transaction block to be as fast as possible so it won’t hold the lock for too long.

There are 4 modes provided by PostgreSQL: FOR UPDATE, FOR NO KEY UPDATE, FOR SHARE, FOR KEY SHARE.

In summary:

  • FOR UPDATE is an obligatory feature if we want to update any row/record value.
  • FOR NO KEY UPDATE is acquired by UPDATE that doesn’t use FOR UPDATE and transaction explicitly (using autocommit transaction).
    • If there’s a row locked by FOR UPDATE, any UPDATE statement will be blocked (and wait).
    • FOR NO KEY UPDATE is used if we want to update columns that are non-unique, this also means to update columns that are not foreign keys.
  • FOR SHARE is useful when when we want to lock a row (while reading it), and no other transaction may update it.
  • FOR KEY SHARE is similar to FOR SHARE but it allows update only to non-key columns.

And the following rules apply:

  • FOR UPDATE blocks all of them and blocked by all of them.
    • Will be automatically acquired by DELETE and UPDATE that changes key-fields.
  • FOR UPDATE and FOR NO KEY UPDATE are exclusive locks. They block each other.
  • FOR SHARE and FOR KEY SHARE are shared locks. They won’t block each other.
  • FOR KEY SHARE and FOR NO KEY UPDATE won’t block each other. We may update non-key columns (in connection 1) and in the same time not allowing the key columns to be updated (in connection 2).
  • All of these modes will not block common SELECT statement. A row locked by SELECT FOR UPDATE can be SELECTed by other connections.
Requested Mode FOR KEY SHARE FOR SHARE FOR NO KEY UPDATE FOR UPDATE
FOR KEY SHARE Х
FOR SHARE Х Х
FOR NO KEY UPDATE Х Х Х
FOR UPDATE Х Х Х Х

Updating this value globally is not recommended, we can set it up locally.

BEGIN;
  SET LOCAL lock_timeout = '3s';
  SELECT * FROM wallets WHERE id = ? FOR UPDATE;
  UPDATE wallets SET balance = balance + ? WHERE id = ?;

PostgreSQL offers atomic update like this:

UPDATE wallets SET balance = balance - ? WHERE id = ?;

This will use FOR NO KEY UPDATE under the hood. We may be tempted to use this code because it’s simple, but

  • If we want to apply some business logic like checking if the balance is enough or more than some required limit, then selecting it first using FOR UPDATE is the way.
  • This is quite dangerous and can cause deadlock if the updates between transactions are interleaved.
Connection 1 Connection 2
UPDATE wallets SET balance = balance - ? WHERE id = 1;
UPDATE wallets SET balance = balance - ? WHERE id = 2;
UPDATE wallets SET balance = balance - ? WHERE id = 2; UPDATE wallets SET balance = balance - ? WHERE id = 1;

Remember that locks are released after commit/rollback. This will cause deadlock ! Not recommended.

We’ll need to lock both rows in the beginning of transaction using FOR UPDATE.

Now here comes another question, how long can connection 2 waits for connection 1 to complete? It is configured by lock_timeout setting, by default, it is 0, there is no timeout.

NOWAIT & SKIP LOCKED

If we don’t want the query to wait for lock to be released and just immediately throws error, we can use NO WAIT.

BEGIN;
  SELECT * FROM wallets WHERE id = ? FOR UPDATE NOWAIT;
  UPDATE wallets SET balance = balance + ? WHERE id = ?;

SKIP LOCKED can be used to just return unlocked rows. SKIP LOCKED is rarely used as it returns inconsistent result (if some rows are locked).