====== SQL Transaction Anomalies — Concurrency Problems in Databases ======
> A comprehensive guide to the 5 anomalies that occur when multiple transactions run concurrently, with real-world examples and solutions.
----
===== Table of Contents =====
- [[#1_dirty_read|1. Dirty Read]]
- [[#2_non-repeatable_read|2. Non-repeatable Read]]
- [[#3_phantom_read|3. Phantom Read]]
- [[#4_lost_update|4. Lost Update]]
- [[#5_write_skew|5. Write Skew]]
- [[#6_isolation_levels_summary|6. Isolation Levels Summary]]
----
===== 1. Dirty Read =====
==== What is it? ====
Transaction A reads data that Transaction B has **modified but not yet committed**. If B rolls back, A has been working with a value that never officially existed — known as "phantom data".
> **Important:** At the default isolation level (`READ COMMITTED`), A still sees the **original** value — B's uncommitted changes are invisible. Dirty reads only occur when using `READ UNCOMMITTED`.
==== Real-world Example ====
Alice's account has `balance = 100`. B is transferring money and changes it to 200 but hasn't committed yet.
^ Step ^ Txn A ^ Txn B ^ DB (balance) ^
| 1 | | Modify 100 → 200 (not committed) | 100 *(B holds: 200)* |
| 2 | Read → sees **200** (dirty!) | | 100 |
| 3 | | **ROLLBACK** → reverts to 100 | 100 |
| 4 | A is using 200 — **wrong!** 200 never officially existed | | 100 |
==== Why is it wrong? ====
B rolled back, so 200 was erased from history. But A already read and used 200 — A is working with invalid data.
**If B committed instead of rolling back?** → No problem at all. 200 becomes the official value.
==== How to Fix ====
**Solution: Use `READ COMMITTED` (default in PostgreSQL, SQL Server)**
-- PostgreSQL / SQL Server: already READ COMMITTED by default
-- MySQL: set manually if needed
SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
BEGIN;
SELECT balance FROM accounts WHERE id = 1;
-- Always returns committed values, never dirty
COMMIT;
At `READ COMMITTED`, A always sees **100** even though B is holding 200 in an uncommitted transaction.
----
===== 2. Non-repeatable Read =====
==== What is it? ====
Within the same transaction, A reads **the same row** twice — but gets **different results** because B **updated or deleted** that row and committed between A's two reads.
==== Comparison with Phantom Read ====
^ ^ Non-repeatable Read ^ Phantom Read ^
| What is read? | Same **single row** | Same **set of rows** by condition |
| Cause | B **UPDATE/DELETE** the row | B **INSERT/DELETE** a new row |
| Result | Row's value changes | Number of rows changes |
==== Real-world Example ====
A is generating an audit report and needs to read the balance twice to verify.
^ Step ^ Txn A (auditing) ^ Txn B (transfer) ^ DB (balance) ^
| 1 | SELECT balance WHERE id=1 → **1000** | | 1000 |
| 2 | | UPDATE balance=500 WHERE id=1, **COMMIT** | 500 |
| 3 | SELECT balance WHERE id=1 → **500** 😱 | | 500 |
| 4 | Same transaction, same row — **two different results!** | | — |
==== Why is it wrong? ====
A is compiling a financial report — reads 1000 to calculate totals, then reads again to verify and sees 500. The report is completely wrong within the same transaction.
==== How to Fix ====
**Solution: Use `REPEATABLE READ` (default in MySQL)**
BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;
SELECT balance FROM accounts WHERE id = 1;
-- → 1000
-- B updates and commits here
SELECT balance FROM accounts WHERE id = 1;
-- → still 1000, snapshot unchanged ✓
COMMIT;
`REPEATABLE READ` guarantees: a row read within a transaction will always return the same value as the first read — even if B updates or deletes it in between.
----
===== 3. Phantom Read =====
==== What is it? ====
Within the same transaction, A executes **the same query** twice — but gets **a different number of rows** because B **inserted or deleted** a row matching the query's condition and committed in between.
==== Real-world Example ====
A is generating a report of orders with amount greater than 100.
^ Step ^ Txn A (reporting) ^ Txn B ^ DB (orders) ^
| 1 | SELECT WHERE amount > 100 → **3 rows** | | 3 rows |
| 2 | | INSERT order amount=500, **COMMIT** | 4 rows |
| 3 | SELECT WHERE amount > 100 → **4 rows** 😱 | | 4 rows |
| 4 | Same transaction, same query — **two different results!** | | — |
==== Why is it wrong? ====
The 4th row is a "phantom" — it appeared mid-transaction. A is compiling a financial report and reads twice in the same transaction — the figures are inconsistent.
==== How to Fix ====
===== Option 1: SERIALIZABLE (strongest) =====
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
BEGIN;
SELECT COUNT(*), SUM(amount)
FROM orders
WHERE amount > 100;
-- 1st read: 3 rows, total 1500
-- B inserts and commits here
SELECT COUNT(*), SUM(amount)
FROM orders
WHERE amount > 100;
-- 2nd read: still 3 rows, total 1500 — consistent ✓
COMMIT;
===== Option 2: SELECT FOR SHARE (range lock) =====
BEGIN;
SELECT COUNT(*), SUM(amount)
FROM orders
WHERE amount > 100
FOR SHARE;
-- Locks the entire range of matching rows
-- B cannot INSERT into this range until A commits
COMMIT;
-- Only after this can B insert
===== Option 3: Snapshot Isolation / MVCC (PostgreSQL) =====
-- PostgreSQL: REPEATABLE READ uses MVCC
-- Each transaction sees a snapshot from the moment BEGIN was called
BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;
SELECT COUNT(*) FROM orders WHERE amount > 100;
-- → 3 rows (snapshot created here)
-- B inserts and commits — A cannot see it, not in snapshot
SELECT COUNT(*) FROM orders WHERE amount > 100;
-- → still 3 rows ✓
COMMIT;
----
===== 4. Lost Update =====
==== What is it? ====
Two transactions both read **the same value**, both modify it, and one **overwrites the other's result** — one update is silently lost.
Unlike dirty read: B has **committed** properly, but the data is still wrong because A read too early and overwrote B's result.
==== Real-world Example ====
A warehouse has `qty = 10`. A sells 2 items, B sells 3 items — at the same time.
^ Step ^ Txn A ^ Txn B ^ DB (qty) ^
| 1 | Read → **10** | Read → **10** | 10 |
| 2 | | 10 − 3 = 7 → write **7**, commit | 7 |
| 3 | 10 − 2 = 8 → write **8**, commit | | **8** ❌ |
| 4 | Correct result should be **5** (10−3−2) — B's deduction of 3 is **lost** | | — |
==== Why is it wrong? ====
A read 10 before B committed. When A writes, it calculates from 10 (the old value) instead of 7 (the latest value) — completely overwriting B's result.
==== How to Fix ====
===== Option 1: Atomic Update — simplest ✅ =====
-- WRONG: read value then calculate in application code → race condition
SELECT qty FROM products WHERE id = 1; -- → 10
UPDATE products SET qty = 10 - 2 WHERE id = 1;
-- CORRECT: combine into one atomic SQL statement
UPDATE products
SET qty = qty - 2 -- database handles read+calculate+write atomically
WHERE id = 1;
**When to use:** When you only need simple arithmetic (add/subtract). No need to read the value in application code first.
**Limitation:** Cannot be used when you need to inspect the value first (e.g., checking if stock is sufficient before deducting).
----
===== Option 2: SELECT FOR UPDATE — most common =====
-- Txn A: lock the row on read
BEGIN;
SELECT qty FROM products
WHERE id = 1
FOR UPDATE; -- lock this row, B must wait
-- check if qty is sufficient
-- if yes, proceed with UPDATE
UPDATE products
SET qty = qty - 2
WHERE id = 1;
COMMIT; -- B is now allowed to proceed
-- Txn B also does SELECT FOR UPDATE on the same row
-- → B is blocked until A commits
-- → B reads qty=8 (already deducted by A) → calculates correctly
**When to use:** When you need to read first and check business logic (e.g., verify balance or stock before writing).
**Watch out for deadlocks:** If A locks row 1 then needs row 2, and B locks row 2 then needs row 1 → deadlock. Always lock rows in a consistent order.
----
===== Option 3: Optimistic Lock (version column) =====
-- Schema: add a version column
ALTER TABLE products ADD COLUMN version INT DEFAULT 0;
-- Step 1: read with version
SELECT qty, version FROM products WHERE id = 1;
-- → qty=10, version=5
-- Step 2: write with version condition
UPDATE products
SET
qty = qty - 2,
version = version + 1 -- increment version
WHERE
id = 1
AND version = 5; -- only update if version is still 5
-- rowsAffected = 1 → success
-- rowsAffected = 0 → someone else updated first, version changed → retry
# Retry logic in Python
def update_with_optimistic_lock(conn, product_id, qty_change):
MAX_RETRY = 3
for attempt in range(MAX_RETRY):
with conn.cursor() as cur:
# Step 1: read with version
cur.execute(
"SELECT qty, version FROM products WHERE id = %s",
(product_id,)
)
row = cur.fetchone()
qty, version = row[0], row[1]
# Step 2: business logic check
if qty < qty_change:
raise Exception("Insufficient stock")
# Step 3: write with version condition
cur.execute(
"""UPDATE products
SET qty = qty - %s, version = version + 1
WHERE id = %s AND version = %s""",
(qty_change, product_id, version)
)
if cur.rowcount == 1:
conn.commit()
print(f"Success after {attempt + 1} attempt(s)")
return # success
else:
conn.rollback()
print(f"Conflict on attempt {attempt + 1}, retrying...")
raise Exception("Failed after 3 retries")
// Retry logic in Node.js
async function updateWithOptimisticLock(pool, productId, qtyChange) {
const MAX_RETRY = 3;
for (let attempt = 0; attempt < MAX_RETRY; attempt++) {
const client = await pool.connect();
try {
// Step 1: read with version
const { rows } = await client.query(
'SELECT qty, version FROM products WHERE id = $1',
[productId]
);
const { qty, version } = rows[0];
// Step 2: business logic check
if (qty < qtyChange) throw new Error('Insufficient stock');
// Step 3: write with version condition
const result = await client.query(
`UPDATE products
SET qty = qty - $1, version = version + 1
WHERE id = $2 AND version = $3`,
[qtyChange, productId, version]
);
if (result.rowCount === 1) {
await client.query('COMMIT');
console.log(`Success after ${attempt + 1} attempt(s)`);
return; // success
}
// rowCount = 0 → conflict → loop continues and retries
} finally {
client.release();
}
}
throw new Error('Failed after 3 retries');
}
**When to use:** When conflicts are rare (few people modifying the same row at once). Great for REST APIs and high-concurrency systems.
**Limitation:** You must write retry logic yourself. If conflicts happen frequently, repeated retries become less efficient than `SELECT FOR UPDATE`.
----
===== Comparison of the 3 options =====
^ ^ Atomic Update ^ SELECT FOR UPDATE ^ Optimistic Lock ^
| Real lock? | No | Yes | No |
| Retry needed? | No | No | Yes (manual) |
| Deadlock risk? | No | Possible | No |
| Best for | Simple add/subtract | Need to check before write | Low conflict, many users |
----
===== 5. Write Skew =====
==== What is it? ====
The most subtle anomaly. A and B both read data, both check **a shared condition** — both see the condition is satisfied — both write to **different rows**. The result violates the condition, but neither transaction knows.
**Difference from Lost Update:** Lost update is two transactions writing to the **same row**. Write skew is two transactions writing to **different rows** but corrupting a shared invariant.
==== Example 1 — Hospital: at least 1 doctor must be on call ====
Rule: **total number of on-call doctors must be ≥ 1**. Currently 2 doctors are on call: Alice and Bob. Both request to go off call at the same time.
^ Step ^ Txn A (Alice goes off call) ^ Txn B (Bob goes off call) ^ on_call ^
| 1 | SELECT COUNT(*) WHERE on_call=true → **2** | | Alice✓ Bob✓ |
| 2 | | SELECT COUNT(*) WHERE on_call=true → **2** | Alice✓ Bob✓ |
| 3 | count=2 ≥ 1 → allowed. UPDATE Alice on_call=false | | Alice✗ Bob✓ |
| 4 | | count=2 ≥ 1 → allowed. UPDATE Bob on_call=false | Alice✗ Bob✗ |
| 5 | **No doctors on call! Rule violated!** | | **0** ❌ |
==== Example 2 — Meeting room double booking ====
Rule: a room cannot be booked twice at the same time slot.
^ Step ^ Txn A (person 1) ^ Txn B (person 2) ^ bookings ^
| 1 | SELECT COUNT(*) room 101 at 2pm → **0** | | empty |
| 2 | | SELECT COUNT(*) room 101 at 2pm → **0** | empty |
| 3 | 0 = available → INSERT booking A, commit | | 1 booking |
| 4 | | 0 = available → INSERT booking B, commit | **2 bookings** ❌ |
| 5 | **Room 101 at 2pm is double-booked! Nobody knows!** | | — |
==== Why is it wrong? ====
Each transaction looks correct in isolation — it only breaks when viewed as a whole. A and B write to different rows (Alice vs Bob, booking A vs booking B), so neither blocks the other.
==== How to Fix ====
===== Option 1: SERIALIZABLE (handles all write skew cases) =====
BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
SELECT COUNT(*) FROM doctors WHERE on_call = true;
-- database tracks this dependency
-- if conflict detected → throws serialization failure error → needs retry
UPDATE doctors SET on_call = false WHERE id = 1;
COMMIT;
===== Option 2: SELECT FOR UPDATE (lock all rows involved in the condition) =====
BEGIN;
-- Lock ALL on-call rows — B is blocked here
SELECT COUNT(*) FROM doctors
WHERE on_call = true
FOR UPDATE;
-- Re-check after locking
-- Only UPDATE if count is still > 1
UPDATE doctors SET on_call = false WHERE id = 1;
COMMIT;
===== Option 3: UNIQUE constraint (for booking-style problems) =====
-- Add a constraint to prevent duplicates
ALTER TABLE bookings
ADD CONSTRAINT unique_room_time
UNIQUE (room_id, start_time);
-- → Database automatically rejects duplicates and throws an error
-- No need to worry about transaction isolation level
**When to use each option:**
* `SERIALIZABLE` — when the invariant is complex and cannot be expressed as a UNIQUE constraint
* `SELECT FOR UPDATE` — when you know exactly which rows to lock
* `UNIQUE constraint` — when the problem can be modeled as a unique key (simplest and most efficient)
----
===== 6. Isolation Levels Summary =====
==== Which anomalies occur at each level? ====
^ Anomaly ^ READ UNCOMMITTED ^ READ COMMITTED ^ REPEATABLE READ ^ SERIALIZABLE ^
| Dirty read | ❌ Yes | ✅ No | ✅ No | ✅ No |
| Non-repeatable read | ❌ Yes | ❌ Yes | ✅ No | ✅ No |
| Phantom read | ❌ Yes | ❌ Yes | ❌ Yes* | ✅ No |
| Lost update | ❌ Yes | ❌ Yes | ❌ Yes | ✅ No** |
| Write skew | ❌ Yes | ❌ Yes | ❌ Yes | ✅ No |
> * MySQL `REPEATABLE READ` prevents phantom reads thanks to MVCC. PostgreSQL does not.
>
> ** `SERIALIZABLE` prevents lost updates but has significant overhead — in practice, use `FOR UPDATE` or atomic updates instead of raising the isolation level.
==== MySQL vs PostgreSQL defaults ====
^ Anomaly ^ MySQL (REPEATABLE READ) ^ PostgreSQL (READ COMMITTED) ^
| Dirty read | ✅ No | ✅ No |
| Non-repeatable read | ✅ No | ❌ Possible |
| Phantom read | ✅ No* | ❌ Possible |
| Lost update | ❌ Possible | ❌ Possible |
| Write skew | ❌ Possible | ❌ Possible |
> Both MySQL and PostgreSQL defaults **do not protect** against Lost Update and Write Skew — these must be handled in your application code.
==== How to set Isolation Level ====
-- For a single transaction
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
BEGIN;
-- SQL statements
COMMIT;
-- For the entire session (PostgreSQL)
SET default_transaction_isolation = 'repeatable read';
-- For the entire session (MySQL)
SET SESSION TRANSACTION ISOLATION LEVEL REPEATABLE READ;
==== Practical recommendations ====
^ Situation ^ Recommended approach ^
| Typical web application | `READ COMMITTED` (default) is sufficient |
| Reports and analytics | `REPEATABLE READ` |
| Transfers, stock deduction | `READ COMMITTED` + `SELECT FOR UPDATE` |
| Finance and banking | `SERIALIZABLE` or `SELECT FOR UPDATE` |
| Simple increments/decrements | Atomic update (`SET qty = qty - 1`) |
| Many users, low conflict rate | Optimistic lock (version column) |
> **Rule of thumb:** Don't blindly raise to `SERIALIZABLE` — it significantly impacts performance. Most web applications are fine with `READ COMMITTED` + `SELECT FOR UPDATE` where needed.
----
''Document compiled from a conversation about SQL Transaction Anomalies.''