Dwi Wahyudi
Senior Software Engineer (Ruby, Golang, Java)
Row-level locking is an obligatory feature to use in high-concurrency application to prevent prevalent race-condition issues. This article will cover it briefly.
Overview
We’ve covered similar topic in Go mutex article: Golang Mutex. Except this time, we’ll do locking in database level.
Concurrency itself is a great tool for processing plenty of similar tasks at the same time. An application typically has multiple connection instances to database (configured with certain pooling systems). And of course there might be multiple applications using the same database as well (although this is not recommended).
Typical scenario will be like this: an application serves hundreds or even thousands of requests (from many users) at the same time, and in order to serve the requests, for each request, server will communicate with the database, opening and using plenty of database connections at the same time. Multiple update or delete requests to the same data in database will certainly cause race condition.
Race condition itself is an issue that emerges when 2 or more concurrent instances/threads/coroutine/processes, etc access and update the same data causing loss-update (which surely makes such data broken). This of course won’t happen in non-concurrent system (which is non-existent by today standard).
Here’s an example on how race condition may occur, lost update is happening:
Connection 1 | Connection 2 | Value |
---|---|---|
10000 | ||
read | 10000 | |
read | 10000 | |
+4000 | 10000 | |
+4000 | 10000 | |
write | 14000 | |
write | 14000 |
The ending value should have been 18.000 not 14.000, our customer will file a complaint to us because he/she’s losing 4.000 in the process.
Implementing Row-Level Locking
Some developers will try to fix this problem by having queueing system and having topic (let say in Kafka) as the ordering mechanism so only 1 transaction is calculated and processed at one time, however this won’t forever work because we can’t guarantee if there’s any other process/database connection currently updating the same data at the same time. For this reason, row-level locking is needed so that we can perfectly guarantee that only 1 database connection is accessing and updating the data at one time, other database connection will have to wait. Only one database connection can hold the lock.
Update here also includes deletion. Race condition may involve updating row/record that’s being deleted. Locking prevents such thing to happen.
Connection 1 | Connection 2 | Value |
---|---|---|
10000 | ||
Try to get row-level lock | 10000 | |
Lock acquired | 10000 | |
Read | Try to get row-level lock | 10000 |
+4000 | Waiting for Connection 1 | 10000 |
write (lock released) | 14000 | |
Lock acquired | 14000 | |
Read | 14000 | |
+4000 | 14000 | |
write (lock released) | 18000 |
As we can see, it is now 18.000 as expected. Under an explicit transaction block (BEGIN COMMIT/ROLLBACK
) locking will be released by connection 1 once transaction is committed/rolled-back, if the operations is not under a transaction than locking will be released when the UPDATE
statement completed.
Let’s see the example of this:
BEGIN;
SELECT * FROM wallets WHERE id = ? FOR UPDATE;
UPDATE wallets SET balance = balance + ? WHERE id = ?;
FOR UPDATE
here will attempt to acquire row-level locking for the current transaction in the current connection. If there’s another transaction in another connection (no matter where) wants to lock the very same wallets
row/record, it will need to wait until the row-level lock is released by previous transaction. Concurrent transaction/connection can access other wallets id as long as it is not locked, hence the name row-level locking.
We must design our transaction block to be as fast as possible so it won’t hold the lock for too long.
There are 4 modes provided by PostgreSQL: FOR UPDATE
, FOR NO KEY UPDATE
, FOR SHARE
, FOR KEY SHARE
.
In summary:
FOR UPDATE
is an obligatory feature if we want to update any row/record value.FOR NO KEY UPDATE
is acquired byUPDATE
that doesn’t useFOR UPDATE
and transaction explicitly (using autocommit transaction).- If there’s a row locked by
FOR UPDATE
, anyUPDATE
statement will be blocked (and wait). FOR NO KEY UPDATE
is used if we want to update columns that are non-unique, this also means to update columns that are not foreign keys.
- If there’s a row locked by
FOR SHARE
is useful when when we want to lock a row (while reading it), and no other transaction may update it.FOR KEY SHARE
is similar toFOR SHARE
but it allows update only to non-key columns.
And the following rules apply:
FOR UPDATE
blocks all of them and blocked by all of them.- Will be automatically acquired by
DELETE
andUPDATE
that changes key-fields.
- Will be automatically acquired by
FOR UPDATE
andFOR NO KEY UPDATE
are exclusive locks. They block each other.FOR SHARE
andFOR KEY SHARE
are shared locks. They won’t block each other.FOR KEY SHARE
andFOR NO KEY UPDATE
won’t block each other. We may update non-key columns (in connection 1) and in the same time not allowing the key columns to be updated (in connection 2).- All of these modes will not block common
SELECT
statement. A row locked bySELECT FOR UPDATE
can beSELECT
ed by other connections.
Requested Mode | FOR KEY SHARE | FOR SHARE | FOR NO KEY UPDATE | FOR UPDATE |
---|---|---|---|---|
FOR KEY SHARE | Х | |||
FOR SHARE | Х | Х | ||
FOR NO KEY UPDATE | Х | Х | Х | |
FOR UPDATE | Х | Х | Х | Х |
Updating this value globally is not recommended, we can set it up locally.
BEGIN;
SET LOCAL lock_timeout = '3s';
SELECT * FROM wallets WHERE id = ? FOR UPDATE;
UPDATE wallets SET balance = balance + ? WHERE id = ?;
PostgreSQL offers atomic update like this:
UPDATE wallets SET balance = balance - ? WHERE id = ?;
This will use FOR NO KEY UPDATE
under the hood. We may be tempted to use this code because it’s simple, but
- If we want to apply some business logic like checking if the balance is enough or more than some required limit, then selecting it first using
FOR UPDATE
is the way. - This is quite dangerous and can cause deadlock if the updates between transactions are interleaved.
Connection 1 | Connection 2 |
---|---|
UPDATE wallets SET balance = balance - ? WHERE id = 1; | |
UPDATE wallets SET balance = balance - ? WHERE id = 2; | |
UPDATE wallets SET balance = balance - ? WHERE id = 2; | UPDATE wallets SET balance = balance - ? WHERE id = 1; |
Remember that locks are released after commit/rollback. This will cause deadlock ! Not recommended.
We’ll need to lock both rows in the beginning of transaction using FOR UPDATE
.
Now here comes another question, how long can connection 2 waits for connection 1 to complete? It is configured by lock_timeout
setting, by default, it is 0, there is no timeout.
NOWAIT
& SKIP LOCKED
If we don’t want the query to wait for lock to be released and just immediately throws error, we can use NO WAIT
.
BEGIN;
SELECT * FROM wallets WHERE id = ? FOR UPDATE NOWAIT;
UPDATE wallets SET balance = balance + ? WHERE id = ?;
SKIP LOCKED
can be used to just return unlocked rows. SKIP LOCKED
is rarely used as it returns inconsistent result (if some rows are locked).