User Tools

Site Tools


mysql:tx:transaction-anomalies

SQL Transaction Anomalies — Concurrency Problems in Databases

A comprehensive guide to the 5 anomalies that occur when multiple transactions run concurrently, with real-world examples and solutions.

Table of Contents

1. Dirty Read

What is it?

Transaction A reads data that Transaction B has modified but not yet committed. If B rolls back, A has been working with a value that never officially existed — known as “phantom data”.

Important: At the default isolation level (`READ COMMITTED`), A still sees the original value — B's uncommitted changes are invisible. Dirty reads only occur when using `READ UNCOMMITTED`.

Real-world Example

Alice's account has `balance = 100`. B is transferring money and changes it to 200 but hasn't committed yet.

Step Txn A Txn B DB (balance)
1 Modify 100 → 200 (not committed) 100 *(B holds: 200)*
2 Read → sees 200 (dirty!) 100
3 ROLLBACK → reverts to 100 100
4 A is using 200 — wrong! 200 never officially existed 100

Why is it wrong?

B rolled back, so 200 was erased from history. But A already read and used 200 — A is working with invalid data.

If B committed instead of rolling back? → No problem at all. 200 becomes the official value.

How to Fix

Solution: Use `READ COMMITTED` (default in PostgreSQL, SQL Server)

-- PostgreSQL / SQL Server: already READ COMMITTED by default
-- MySQL: set manually if needed
 
SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
 
BEGIN;
  SELECT balance FROM accounts WHERE id = 1;
  -- Always returns committed values, never dirty
COMMIT;

At `READ COMMITTED`, A always sees 100 even though B is holding 200 in an uncommitted transaction.


2. Non-repeatable Read

What is it?

Within the same transaction, A reads the same row twice — but gets different results because B updated or deleted that row and committed between A's two reads.

Comparison with Phantom Read

Non-repeatable Read Phantom Read
What is read? Same single row Same set of rows by condition
Cause B UPDATE/DELETE the row B INSERT/DELETE a new row
Result Row's value changes Number of rows changes

Real-world Example

A is generating an audit report and needs to read the balance twice to verify.

Step Txn A (auditing) Txn B (transfer) DB (balance)
1 SELECT balance WHERE id=1 → 1000 1000
2 UPDATE balance=500 WHERE id=1, COMMIT 500
3 SELECT balance WHERE id=1 → 500 😱 500
4 Same transaction, same row — two different results!

Why is it wrong?

A is compiling a financial report — reads 1000 to calculate totals, then reads again to verify and sees 500. The report is completely wrong within the same transaction.

How to Fix

Solution: Use `REPEATABLE READ` (default in MySQL)

BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;
 
  SELECT balance FROM accounts WHERE id = 1;
  -- → 1000
 
  -- B updates and commits here
 
  SELECT balance FROM accounts WHERE id = 1;
  -- → still 1000, snapshot unchanged ✓
 
COMMIT;

`REPEATABLE READ` guarantees: a row read within a transaction will always return the same value as the first read — even if B updates or deletes it in between.


3. Phantom Read

What is it?

Within the same transaction, A executes the same query twice — but gets a different number of rows because B inserted or deleted a row matching the query's condition and committed in between.

Real-world Example

A is generating a report of orders with amount greater than 100.

Step Txn A (reporting) Txn B DB (orders)
1 SELECT WHERE amount > 100 → 3 rows 3 rows
2 INSERT order amount=500, COMMIT 4 rows
3 SELECT WHERE amount > 100 → 4 rows 😱 4 rows
4 Same transaction, same query — two different results!

Why is it wrong?

The 4th row is a “phantom” — it appeared mid-transaction. A is compiling a financial report and reads twice in the same transaction — the figures are inconsistent.

How to Fix

Option 1: SERIALIZABLE (strongest)

SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
BEGIN;
 
  SELECT COUNT(*), SUM(amount)
  FROM orders
  WHERE amount > 100;
  -- 1st read: 3 rows, total 1500
 
  -- B inserts and commits here
 
  SELECT COUNT(*), SUM(amount)
  FROM orders
  WHERE amount > 100;
  -- 2nd read: still 3 rows, total 1500 — consistent ✓
 
COMMIT;

Option 2: SELECT FOR SHARE (range lock)

BEGIN;
  SELECT COUNT(*), SUM(amount)
  FROM orders
  WHERE amount > 100
  FOR SHARE;
  -- Locks the entire range of matching rows
  -- B cannot INSERT into this range until A commits
 
COMMIT;
-- Only after this can B insert

Option 3: Snapshot Isolation / MVCC (PostgreSQL)

-- PostgreSQL: REPEATABLE READ uses MVCC
-- Each transaction sees a snapshot from the moment BEGIN was called
BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;
 
  SELECT COUNT(*) FROM orders WHERE amount > 100;
  -- → 3 rows (snapshot created here)
 
  -- B inserts and commits — A cannot see it, not in snapshot
 
  SELECT COUNT(*) FROM orders WHERE amount > 100;
  -- → still 3 rows ✓
 
COMMIT;

4. Lost Update

What is it?

Two transactions both read the same value, both modify it, and one overwrites the other's result — one update is silently lost.

Unlike dirty read: B has committed properly, but the data is still wrong because A read too early and overwrote B's result.

Real-world Example

A warehouse has `qty = 10`. A sells 2 items, B sells 3 items — at the same time.

Step Txn A Txn B DB (qty)
1 Read → 10 Read → 10 10
2 10 − 3 = 7 → write 7, commit 7
3 10 − 2 = 8 → write 8, commit 8
4 Correct result should be 5 (10−3−2) — B's deduction of 3 is lost

Why is it wrong?

A read 10 before B committed. When A writes, it calculates from 10 (the old value) instead of 7 (the latest value) — completely overwriting B's result.

How to Fix

Option 1: Atomic Update — simplest ✅

-- WRONG: read value then calculate in application code → race condition
SELECT qty FROM products WHERE id = 1;   -- → 10
UPDATE products SET qty = 10 - 2 WHERE id = 1;
 
-- CORRECT: combine into one atomic SQL statement
UPDATE products
SET qty = qty - 2          -- database handles read+calculate+write atomically
WHERE id = 1;

When to use: When you only need simple arithmetic (add/subtract). No need to read the value in application code first.

Limitation: Cannot be used when you need to inspect the value first (e.g., checking if stock is sufficient before deducting).


Option 2: SELECT FOR UPDATE — most common

-- Txn A: lock the row on read
BEGIN;
  SELECT qty FROM products
  WHERE id = 1
  FOR UPDATE;           -- lock this row, B must wait
 
  -- check if qty is sufficient
  -- if yes, proceed with UPDATE
  UPDATE products
  SET qty = qty - 2
  WHERE id = 1;
COMMIT;                 -- B is now allowed to proceed
 
-- Txn B also does SELECT FOR UPDATE on the same row
-- → B is blocked until A commits
-- → B reads qty=8 (already deducted by A) → calculates correctly

When to use: When you need to read first and check business logic (e.g., verify balance or stock before writing).

Watch out for deadlocks: If A locks row 1 then needs row 2, and B locks row 2 then needs row 1 → deadlock. Always lock rows in a consistent order.


Option 3: Optimistic Lock (version column)

-- Schema: add a version column
ALTER TABLE products ADD COLUMN version INT DEFAULT 0;
 
-- Step 1: read with version
SELECT qty, version FROM products WHERE id = 1;
-- → qty=10, version=5
 
-- Step 2: write with version condition
UPDATE products
SET
  qty     = qty - 2,
  version = version + 1       -- increment version
WHERE
  id      = 1
  AND version = 5;            -- only update if version is still 5
 
-- rowsAffected = 1 → success
-- rowsAffected = 0 → someone else updated first, version changed → retry
# Retry logic in Python
def update_with_optimistic_lock(conn, product_id, qty_change):
    MAX_RETRY = 3
 
    for attempt in range(MAX_RETRY):
        with conn.cursor() as cur:
            # Step 1: read with version
            cur.execute(
                "SELECT qty, version FROM products WHERE id = %s",
                (product_id,)
            )
            row = cur.fetchone()
            qty, version = row[0], row[1]
 
            # Step 2: business logic check
            if qty < qty_change:
                raise Exception("Insufficient stock")
 
            # Step 3: write with version condition
            cur.execute(
                """UPDATE products
                   SET qty = qty - %s, version = version + 1
                   WHERE id = %s AND version = %s""",
                (qty_change, product_id, version)
            )
 
            if cur.rowcount == 1:
                conn.commit()
                print(f"Success after {attempt + 1} attempt(s)")
                return  # success
            else:
                conn.rollback()
                print(f"Conflict on attempt {attempt + 1}, retrying...")
 
    raise Exception("Failed after 3 retries")
// Retry logic in Node.js
async function updateWithOptimisticLock(pool, productId, qtyChange) {
  const MAX_RETRY = 3;
 
  for (let attempt = 0; attempt < MAX_RETRY; attempt++) {
    const client = await pool.connect();
    try {
      // Step 1: read with version
      const { rows } = await client.query(
        'SELECT qty, version FROM products WHERE id = $1',
        [productId]
      );
      const { qty, version } = rows[0];
 
      // Step 2: business logic check
      if (qty < qtyChange) throw new Error('Insufficient stock');
 
      // Step 3: write with version condition
      const result = await client.query(
        `UPDATE products
         SET qty = qty - $1, version = version + 1
         WHERE id = $2 AND version = $3`,
        [qtyChange, productId, version]
      );
 
      if (result.rowCount === 1) {
        await client.query('COMMIT');
        console.log(`Success after ${attempt + 1} attempt(s)`);
        return; // success
      }
      // rowCount = 0 → conflict → loop continues and retries
    } finally {
      client.release();
    }
  }
  throw new Error('Failed after 3 retries');
}

When to use: When conflicts are rare (few people modifying the same row at once). Great for REST APIs and high-concurrency systems.

Limitation: You must write retry logic yourself. If conflicts happen frequently, repeated retries become less efficient than `SELECT FOR UPDATE`.


Comparison of the 3 options

Atomic Update SELECT FOR UPDATE Optimistic Lock
Real lock? No Yes No
Retry needed? No No Yes (manual)
Deadlock risk? No Possible No
Best for Simple add/subtract Need to check before write Low conflict, many users

5. Write Skew

What is it?

The most subtle anomaly. A and B both read data, both check a shared condition — both see the condition is satisfied — both write to different rows. The result violates the condition, but neither transaction knows.

Difference from Lost Update: Lost update is two transactions writing to the same row. Write skew is two transactions writing to different rows but corrupting a shared invariant.

Example 1 — Hospital: at least 1 doctor must be on call

Rule: total number of on-call doctors must be ≥ 1. Currently 2 doctors are on call: Alice and Bob. Both request to go off call at the same time.

Step Txn A (Alice goes off call) Txn B (Bob goes off call) on_call
1 SELECT COUNT(*) WHERE on_call=true → 2 Alice✓ Bob✓
2 SELECT COUNT(*) WHERE on_call=true → 2 Alice✓ Bob✓
3 count=2 ≥ 1 → allowed. UPDATE Alice on_call=false Alice✗ Bob✓
4 count=2 ≥ 1 → allowed. UPDATE Bob on_call=false Alice✗ Bob✗
5 No doctors on call! Rule violated! 0

Example 2 — Meeting room double booking

Rule: a room cannot be booked twice at the same time slot.

Step Txn A (person 1) Txn B (person 2) bookings
1 SELECT COUNT(*) room 101 at 2pm → 0 empty
2 SELECT COUNT(*) room 101 at 2pm → 0 empty
3 0 = available → INSERT booking A, commit 1 booking
4 0 = available → INSERT booking B, commit 2 bookings
5 Room 101 at 2pm is double-booked! Nobody knows!

Why is it wrong?

Each transaction looks correct in isolation — it only breaks when viewed as a whole. A and B write to different rows (Alice vs Bob, booking A vs booking B), so neither blocks the other.

How to Fix

Option 1: SERIALIZABLE (handles all write skew cases)

BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
 
  SELECT COUNT(*) FROM doctors WHERE on_call = TRUE;
  -- database tracks this dependency
  -- if conflict detected → throws serialization failure error → needs retry
 
  UPDATE doctors SET on_call = FALSE WHERE id = 1;
 
COMMIT;

Option 2: SELECT FOR UPDATE (lock all rows involved in the condition)

BEGIN;
  -- Lock ALL on-call rows — B is blocked here
  SELECT COUNT(*) FROM doctors
  WHERE on_call = TRUE
  FOR UPDATE;
 
  -- Re-check after locking
  -- Only UPDATE if count is still > 1
  UPDATE doctors SET on_call = FALSE WHERE id = 1;
COMMIT;

Option 3: UNIQUE constraint (for booking-style problems)

-- Add a constraint to prevent duplicates
ALTER TABLE bookings
ADD CONSTRAINT unique_room_time
UNIQUE (room_id, start_time);
 
-- → Database automatically rejects duplicates and throws an error
-- No need to worry about transaction isolation level

When to use each option:

  • `SERIALIZABLE` — when the invariant is complex and cannot be expressed as a UNIQUE constraint
  • `SELECT FOR UPDATE` — when you know exactly which rows to lock
  • `UNIQUE constraint` — when the problem can be modeled as a unique key (simplest and most efficient)

6. Isolation Levels Summary

Which anomalies occur at each level?

Anomaly READ UNCOMMITTED READ COMMITTED REPEATABLE READ SERIALIZABLE
Dirty read ❌ Yes ✅ No ✅ No ✅ No
Non-repeatable read ❌ Yes ❌ Yes ✅ No ✅ No
Phantom read ❌ Yes ❌ Yes ❌ Yes* ✅ No
Lost update ❌ Yes ❌ Yes ❌ Yes

MySQL vs PostgreSQL defaults

Anomaly MySQL (REPEATABLE READ) PostgreSQL (READ COMMITTED)
Dirty read ✅ No ✅ No
Non-repeatable read ✅ No ❌ Possible
Phantom read ✅ No* ❌ Possible
Lost update ❌ Possible ❌ Possible
Write skew ❌ Possible ❌ Possible
Both MySQL and PostgreSQL defaults do not protect against Lost Update and Write Skew — these must be handled in your application code.

How to set Isolation Level

-- For a single transaction
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
BEGIN;
  -- SQL statements
COMMIT;
 
-- For the entire session (PostgreSQL)
SET default_transaction_isolation = 'repeatable read';
 
-- For the entire session (MySQL)
SET SESSION TRANSACTION ISOLATION LEVEL REPEATABLE READ;

Practical recommendations

Situation Recommended approach
Typical web application `READ COMMITTED` (default) is sufficient
Reports and analytics `REPEATABLE READ`
Transfers, stock deduction `READ COMMITTED` + `SELECT FOR UPDATE`
Finance and banking `SERIALIZABLE` or `SELECT FOR UPDATE`
Simple increments/decrements Atomic update (`SET qty = qty - 1`)
Many users, low conflict rate Optimistic lock (version column)
Rule of thumb: Don't blindly raise to `SERIALIZABLE` — it significantly impacts performance. Most web applications are fine with `READ COMMITTED` + `SELECT FOR UPDATE` where needed.

Document compiled from a conversation about SQL Transaction Anomalies.

mysql/tx/transaction-anomalies.txt · Last modified: by phong2018