SQL Transaction Anomalies — Concurrency Problems in Databases

SQL Transaction Anomalies — Concurrency Problems in Databases

A comprehensive guide to the 5 anomalies that occur when multiple transactions run concurrently, with real-world examples and solutions.

1. Dirty Read
2. Non-repeatable Read
3. Phantom Read
4. Lost Update
5. Write Skew
6. Isolation Levels Summary

1. Dirty Read

What is it?

Transaction A reads data that Transaction B has modified but not yet committed. If B rolls back, A has been working with a value that never officially existed — known as “phantom data”.

Important: At the default isolation level (`READ COMMITTED`), A still sees the original value — B's uncommitted changes are invisible. Dirty reads only occur when using `READ UNCOMMITTED`.

Real-world Example

Alice's account has `balance = 100`. B is transferring money and changes it to 200 but hasn't committed yet.

Step	Txn A	Txn B	DB (balance)
1		Modify 100 → 200 (not committed)	100 (B holds: 200)
2	Read → sees 200 (dirty!)		100
3		ROLLBACK → reverts to 100	100
4	A is using 200 — wrong! 200 never officially existed		100

Why is it wrong?

B rolled back, so 200 was erased from history. But A already read and used 200 — A is working with invalid data.

If B committed instead of rolling back? → No problem at all. 200 becomes the official value.

How to Fix

Solution: Use `READ COMMITTED` (default in PostgreSQL, SQL Server)

-- PostgreSQL / SQL Server: already READ COMMITTED by default
-- MySQL: set manually if needed
 
SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
 
BEGIN;
  SELECT balance FROM accounts WHERE id = 1;
  -- Always returns committed values, never dirty
COMMIT;

At `READ COMMITTED`, A always sees 100 even though B is holding 200 in an uncommitted transaction.

2. Non-repeatable Read

What is it?

Within the same transaction, A reads the same row twice — but gets different results because B updated or deleted that row and committed between A's two reads.

Comparison with Phantom Read

	Non-repeatable Read	Phantom Read
What is read?	Same single row	Same set of rows by condition
Cause	B UPDATE/DELETE the row	B INSERT/DELETE a new row
Result	Row's value changes	Number of rows changes

Real-world Example

A is generating an audit report and needs to read the balance twice to verify.

Step	Txn A (auditing)	Txn B (transfer)	DB (balance)
1	SELECT balance WHERE id=1 → 1000		1000
2		UPDATE balance=500 WHERE id=1, COMMIT	500
3	SELECT balance WHERE id=1 → 500 😱		500
4	Same transaction, same row — two different results!		—

Why is it wrong?

A is compiling a financial report — reads 1000 to calculate totals, then reads again to verify and sees 500. The report is completely wrong within the same transaction.

How to Fix

Solution: Use `REPEATABLE READ` (default in MySQL)

BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;
 
  SELECT balance FROM accounts WHERE id = 1;
  -- → 1000
 
  -- B updates and commits here
 
  SELECT balance FROM accounts WHERE id = 1;
  -- → still 1000, snapshot unchanged ✓
 
COMMIT;

`REPEATABLE READ` guarantees: a row read within a transaction will always return the same value as the first read — even if B updates or deletes it in between.

3. Phantom Read

What is it?

Within the same transaction, A executes the same query twice — but gets a different number of rows because B inserted or deleted a row matching the query's condition and committed in between.

Real-world Example

A is generating a report of orders with amount greater than 100.

Step	Txn A (reporting)	Txn B	DB (orders)
1	SELECT WHERE amount > 100 → 3 rows		3 rows
2		INSERT order amount=500, COMMIT	4 rows
3	SELECT WHERE amount > 100 → 4 rows 😱		4 rows
4	Same transaction, same query — two different results!		—

Why is it wrong?

The 4th row is a “phantom” — it appeared mid-transaction. A is compiling a financial report and reads twice in the same transaction — the figures are inconsistent.

How to Fix

Option 1: SERIALIZABLE (strongest)

SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
BEGIN;
 
  SELECT COUNT(*), SUM(amount)
  FROM orders
  WHERE amount > 100;
  -- 1st read: 3 rows, total 1500
 
  -- B inserts and commits here
 
  SELECT COUNT(*), SUM(amount)
  FROM orders
  WHERE amount > 100;
  -- 2nd read: still 3 rows, total 1500 — consistent ✓
 
COMMIT;

Option 2: SELECT FOR SHARE (range lock)

BEGIN;
  SELECT COUNT(*), SUM(amount)
  FROM orders
  WHERE amount > 100
  FOR SHARE;
  -- Locks the entire range of matching rows
  -- B cannot INSERT into this range until A commits
 
COMMIT;
-- Only after this can B insert

Option 3: Snapshot Isolation / MVCC (PostgreSQL)

-- PostgreSQL: REPEATABLE READ uses MVCC
-- Each transaction sees a snapshot from the moment BEGIN was called
BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;
 
  SELECT COUNT(*) FROM orders WHERE amount > 100;
  -- → 3 rows (snapshot created here)
 
  -- B inserts and commits — A cannot see it, not in snapshot
 
  SELECT COUNT(*) FROM orders WHERE amount > 100;
  -- → still 3 rows ✓
 
COMMIT;

4. Lost Update

What is it?

Two transactions both read the same value, both modify it, and one overwrites the other's result — one update is silently lost.

Unlike dirty read: B has committed properly, but the data is still wrong because A read too early and overwrote B's result.

Real-world Example

A warehouse has `qty = 10`. A sells 2 items, B sells 3 items — at the same time.

Step	Txn A	Txn B	DB (qty)
1	Read → 10	Read → 10	10
2		10 − 3 = 7 → write 7, commit	7
3	10 − 2 = 8 → write 8, commit		8 ❌
4	Correct result should be 5 (10−3−2) — B's deduction of 3 is lost		—

Why is it wrong?

A read 10 before B committed. When A writes, it calculates from 10 (the old value) instead of 7 (the latest value) — completely overwriting B's result.

How to Fix

Option 1: Atomic Update — simplest ✅

-- WRONG: read value then calculate in application code → race condition
SELECT qty FROM products WHERE id = 1;   -- → 10
UPDATE products SET qty = 10 - 2 WHERE id = 1;
 
-- CORRECT: combine into one atomic SQL statement
UPDATE products
SET qty = qty - 2          -- database handles read+calculate+write atomically
WHERE id = 1;

When to use: When you only need simple arithmetic (add/subtract). No need to read the value in application code first.

Limitation: Cannot be used when you need to inspect the value first (e.g., checking if stock is sufficient before deducting).

Option 2: SELECT FOR UPDATE — most common

-- Txn A: lock the row on read
BEGIN;
  SELECT qty FROM products
  WHERE id = 1
  FOR UPDATE;           -- lock this row, B must wait
 
  -- check if qty is sufficient
  -- if yes, proceed with UPDATE
  UPDATE products
  SET qty = qty - 2
  WHERE id = 1;
COMMIT;                 -- B is now allowed to proceed
 
-- Txn B also does SELECT FOR UPDATE on the same row
-- → B is blocked until A commits
-- → B reads qty=8 (already deducted by A) → calculates correctly

When to use: When you need to read first and check business logic (e.g., verify balance or stock before writing).

Watch out for deadlocks: If A locks row 1 then needs row 2, and B locks row 2 then needs row 1 → deadlock. Always lock rows in a consistent order.

Option 3: Optimistic Lock (version column)

-- Schema: add a version column
ALTER TABLE products ADD COLUMN version INT DEFAULT 0;
 
-- Step 1: read with version
SELECT qty, version FROM products WHERE id = 1;
-- → qty=10, version=5
 
-- Step 2: write with version condition
UPDATE products
SET
  qty     = qty - 2,
  version = version + 1       -- increment version
WHERE
  id      = 1
  AND version = 5;            -- only update if version is still 5
 
-- rowsAffected = 1 → success
-- rowsAffected = 0 → someone else updated first, version changed → retry

# Retry logic in Python
def update_with_optimistic_lock(conn, product_id, qty_change):
    MAX_RETRY = 3
 
    for attempt in range(MAX_RETRY):
        with conn.cursor() as cur:
            # Step 1: read with version
            cur.execute(
                "SELECT qty, version FROM products WHERE id = %s",
                (product_id,)
            )
            row = cur.fetchone()
            qty, version = row[0], row[1]
 
            # Step 2: business logic check
            if qty < qty_change:
                raise Exception("Insufficient stock")
 
            # Step 3: write with version condition
            cur.execute(
                """UPDATE products
                   SET qty = qty - %s, version = version + 1
                   WHERE id = %s AND version = %s""",
                (qty_change, product_id, version)
            )
 
            if cur.rowcount == 1:
                conn.commit()
                print(f"Success after {attempt + 1} attempt(s)")
                return  # success
            else:
                conn.rollback()
                print(f"Conflict on attempt {attempt + 1}, retrying...")
 
    raise Exception("Failed after 3 retries")

// Retry logic in Node.js
async function updateWithOptimisticLock(pool, productId, qtyChange) {
  const MAX_RETRY = 3;
 
  for (let attempt = 0; attempt < MAX_RETRY; attempt++) {
    const client = await pool.connect();
    try {
      // Step 1: read with version
      const { rows } = await client.query(
        'SELECT qty, version FROM products WHERE id = $1',
        [productId]
      );
      const { qty, version } = rows[0];
 
      // Step 2: business logic check
      if (qty < qtyChange) throw new Error('Insufficient stock');
 
      // Step 3: write with version condition
      const result = await client.query(
        `UPDATE products
         SET qty = qty - $1, version = version + 1
         WHERE id = $2 AND version = $3`,
        [qtyChange, productId, version]
      );
 
      if (result.rowCount === 1) {
        await client.query('COMMIT');
        console.log(`Success after ${attempt + 1} attempt(s)`);
        return; // success
      }
      // rowCount = 0 → conflict → loop continues and retries
    } finally {
      client.release();
    }
  }
  throw new Error('Failed after 3 retries');
}

When to use: When conflicts are rare (few people modifying the same row at once). Great for REST APIs and high-concurrency systems.

Limitation: You must write retry logic yourself. If conflicts happen frequently, repeated retries become less efficient than `SELECT FOR UPDATE`.

Comparison of the 3 options

	Atomic Update	SELECT FOR UPDATE	Optimistic Lock
Real lock?	No	Yes	No
Retry needed?	No	No	Yes (manual)
Deadlock risk?	No	Possible	No
Best for	Simple add/subtract	Need to check before write	Low conflict, many users

5. Write Skew

What is it?

The most subtle anomaly. A and B both read data, both check a shared condition — both see the condition is satisfied — both write to different rows. The result violates the condition, but neither transaction knows.

Difference from Lost Update: Lost update is two transactions writing to the same row. Write skew is two transactions writing to different rows but corrupting a shared invariant.

Example 1 — Hospital: at least 1 doctor must be on call

Rule: total number of on-call doctors must be ≥ 1. Currently 2 doctors are on call: Alice and Bob. Both request to go off call at the same time.

Step	Txn A (Alice goes off call)	Txn B (Bob goes off call)	on_call
1	SELECT COUNT(*) WHERE on_call=true → 2		Alice✓ Bob✓
2		SELECT COUNT(*) WHERE on_call=true → 2	Alice✓ Bob✓
3	count=2 ≥ 1 → allowed. UPDATE Alice on_call=false		Alice✗ Bob✓
4		count=2 ≥ 1 → allowed. UPDATE Bob on_call=false	Alice✗ Bob✗
5	No doctors on call! Rule violated!		0 ❌

Example 2 — Meeting room double booking

Rule: a room cannot be booked twice at the same time slot.

Step	Txn A (person 1)	Txn B (person 2)	bookings
1	SELECT COUNT(*) room 101 at 2pm → 0		empty
2		SELECT COUNT(*) room 101 at 2pm → 0	empty
3	0 = available → INSERT booking A, commit		1 booking
4		0 = available → INSERT booking B, commit	2 bookings ❌
5	Room 101 at 2pm is double-booked! Nobody knows!		—

Why is it wrong?

Each transaction looks correct in isolation — it only breaks when viewed as a whole. A and B write to different rows (Alice vs Bob, booking A vs booking B), so neither blocks the other.

How to Fix

Option 1: SERIALIZABLE (handles all write skew cases)

BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
 
  SELECT COUNT(*) FROM doctors WHERE on_call = TRUE;
  -- database tracks this dependency
  -- if conflict detected → throws serialization failure error → needs retry
 
  UPDATE doctors SET on_call = FALSE WHERE id = 1;
 
COMMIT;

Option 2: SELECT FOR UPDATE (lock all rows involved in the condition)

BEGIN;
  -- Lock ALL on-call rows — B is blocked here
  SELECT COUNT(*) FROM doctors
  WHERE on_call = TRUE
  FOR UPDATE;
 
  -- Re-check after locking
  -- Only UPDATE if count is still > 1
  UPDATE doctors SET on_call = FALSE WHERE id = 1;
COMMIT;

Option 3: UNIQUE constraint (for booking-style problems)

-- Add a constraint to prevent duplicates
ALTER TABLE bookings
ADD CONSTRAINT unique_room_time
UNIQUE (room_id, start_time);
 
-- → Database automatically rejects duplicates and throws an error
-- No need to worry about transaction isolation level

When to use each option:

`SERIALIZABLE` — when the invariant is complex and cannot be expressed as a UNIQUE constraint
`SELECT FOR UPDATE` — when you know exactly which rows to lock
`UNIQUE constraint` — when the problem can be modeled as a unique key (simplest and most efficient)

6. Isolation Levels Summary

Which anomalies occur at each level?

Anomaly	READ UNCOMMITTED	READ COMMITTED	REPEATABLE READ	SERIALIZABLE
Dirty read	❌ Yes	✅ No	✅ No	✅ No
Non-repeatable read	❌ Yes	❌ Yes	✅ No	✅ No
Phantom read	❌ Yes	❌ Yes	❌ Yes*	✅ No
Lost update	❌ Yes	❌ Yes	❌ Yes

MySQL vs PostgreSQL defaults

Anomaly	MySQL (REPEATABLE READ)	PostgreSQL (READ COMMITTED)
Dirty read	✅ No	✅ No
Non-repeatable read	✅ No	❌ Possible
Phantom read	✅ No*	❌ Possible
Lost update	❌ Possible	❌ Possible
Write skew	❌ Possible	❌ Possible

Both MySQL and PostgreSQL defaults do not protect against Lost Update and Write Skew — these must be handled in your application code.

How to set Isolation Level

-- For a single transaction
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
BEGIN;
  -- SQL statements
COMMIT;
 
-- For the entire session (PostgreSQL)
SET default_transaction_isolation = 'repeatable read';
 
-- For the entire session (MySQL)
SET SESSION TRANSACTION ISOLATION LEVEL REPEATABLE READ;

Practical recommendations

Situation	Recommended approach
Typical web application	`READ COMMITTED` (default) is sufficient
Reports and analytics	`REPEATABLE READ`
Transfers, stock deduction	`READ COMMITTED` + `SELECT FOR UPDATE`
Finance and banking	`SERIALIZABLE` or `SELECT FOR UPDATE`
Simple increments/decrements	Atomic update (`SET qty = qty - 1`)
Many users, low conflict rate	Optimistic lock (version column)

Rule of thumb: Don't blindly raise to `SERIALIZABLE` — it significantly impacts performance. Most web applications are fine with `READ COMMITTED` + `SELECT FOR UPDATE` where needed.

Document compiled from a conversation about SQL Transaction Anomalies.

Table of Contents

SQL Transaction Anomalies — Concurrency Problems in Databases

Table of Contents

1. Dirty Read

What is it?

Real-world Example

Why is it wrong?

How to Fix

2. Non-repeatable Read

What is it?

Comparison with Phantom Read

Real-world Example

Why is it wrong?

How to Fix

3. Phantom Read

What is it?

Real-world Example

Why is it wrong?

How to Fix

Option 1: SERIALIZABLE (strongest)

Option 2: SELECT FOR SHARE (range lock)

Option 3: Snapshot Isolation / MVCC (PostgreSQL)

4. Lost Update

What is it?

Real-world Example

Why is it wrong?

How to Fix

Option 1: Atomic Update — simplest ✅

Option 2: SELECT FOR UPDATE — most common

Option 3: Optimistic Lock (version column)

Comparison of the 3 options

5. Write Skew

What is it?

Example 1 — Hospital: at least 1 doctor must be on call

Example 2 — Meeting room double booking

Why is it wrong?

How to Fix

Option 1: SERIALIZABLE (handles all write skew cases)

Option 2: SELECT FOR UPDATE (lock all rows involved in the condition)

Option 3: UNIQUE constraint (for booking-style problems)

6. Isolation Levels Summary

Which anomalies occur at each level?

MySQL vs PostgreSQL defaults

How to set Isolation Level

Practical recommendations