====== SQL Transaction Anomalies — Concurrency Problems in Databases ====== > A comprehensive guide to the 5 anomalies that occur when multiple transactions run concurrently, with real-world examples and solutions. ---- ===== Table of Contents ===== - [[#1_dirty_read|1. Dirty Read]] - [[#2_non-repeatable_read|2. Non-repeatable Read]] - [[#3_phantom_read|3. Phantom Read]] - [[#4_lost_update|4. Lost Update]] - [[#5_write_skew|5. Write Skew]] - [[#6_isolation_levels_summary|6. Isolation Levels Summary]] ---- ===== 1. Dirty Read ===== ==== What is it? ==== Transaction A reads data that Transaction B has **modified but not yet committed**. If B rolls back, A has been working with a value that never officially existed — known as "phantom data". > **Important:** At the default isolation level (`READ COMMITTED`), A still sees the **original** value — B's uncommitted changes are invisible. Dirty reads only occur when using `READ UNCOMMITTED`. ==== Real-world Example ==== Alice's account has `balance = 100`. B is transferring money and changes it to 200 but hasn't committed yet. ^ Step ^ Txn A ^ Txn B ^ DB (balance) ^ | 1 | | Modify 100 → 200 (not committed) | 100 *(B holds: 200)* | | 2 | Read → sees **200** (dirty!) | | 100 | | 3 | | **ROLLBACK** → reverts to 100 | 100 | | 4 | A is using 200 — **wrong!** 200 never officially existed | | 100 | ==== Why is it wrong? ==== B rolled back, so 200 was erased from history. But A already read and used 200 — A is working with invalid data. **If B committed instead of rolling back?** → No problem at all. 200 becomes the official value. ==== How to Fix ==== **Solution: Use `READ COMMITTED` (default in PostgreSQL, SQL Server)**


-- PostgreSQL / SQL Server: already READ COMMITTED by default
-- MySQL: set manually if needed

SET TRANSACTION ISOLATION LEVEL READ COMMITTED;

BEGIN;
  SELECT balance FROM accounts WHERE id = 1;
  -- Always returns committed values, never dirty
COMMIT;

At `READ COMMITTED`, A always sees **100** even though B is holding 200 in an uncommitted transaction. ---- ===== 2. Non-repeatable Read ===== ==== What is it? ==== Within the same transaction, A reads **the same row** twice — but gets **different results** because B **updated or deleted** that row and committed between A's two reads. ==== Comparison with Phantom Read ==== ^ ^ Non-repeatable Read ^ Phantom Read ^ | What is read? | Same **single row** | Same **set of rows** by condition | | Cause | B **UPDATE/DELETE** the row | B **INSERT/DELETE** a new row | | Result | Row's value changes | Number of rows changes | ==== Real-world Example ==== A is generating an audit report and needs to read the balance twice to verify. ^ Step ^ Txn A (auditing) ^ Txn B (transfer) ^ DB (balance) ^ | 1 | SELECT balance WHERE id=1 → **1000** | | 1000 | | 2 | | UPDATE balance=500 WHERE id=1, **COMMIT** | 500 | | 3 | SELECT balance WHERE id=1 → **500** 😱 | | 500 | | 4 | Same transaction, same row — **two different results!** | | — | ==== Why is it wrong? ==== A is compiling a financial report — reads 1000 to calculate totals, then reads again to verify and sees 500. The report is completely wrong within the same transaction. ==== How to Fix ==== **Solution: Use `REPEATABLE READ` (default in MySQL)**


BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;

  SELECT balance FROM accounts WHERE id = 1;
  -- → 1000

  -- B updates and commits here

  SELECT balance FROM accounts WHERE id = 1;
  -- → still 1000, snapshot unchanged ✓

COMMIT;

`REPEATABLE READ` guarantees: a row read within a transaction will always return the same value as the first read — even if B updates or deletes it in between. ---- ===== 3. Phantom Read ===== ==== What is it? ==== Within the same transaction, A executes **the same query** twice — but gets **a different number of rows** because B **inserted or deleted** a row matching the query's condition and committed in between. ==== Real-world Example ==== A is generating a report of orders with amount greater than 100. ^ Step ^ Txn A (reporting) ^ Txn B ^ DB (orders) ^ | 1 | SELECT WHERE amount > 100 → **3 rows** | | 3 rows | | 2 | | INSERT order amount=500, **COMMIT** | 4 rows | | 3 | SELECT WHERE amount > 100 → **4 rows** 😱 | | 4 rows | | 4 | Same transaction, same query — **two different results!** | | — | ==== Why is it wrong? ==== The 4th row is a "phantom" — it appeared mid-transaction. A is compiling a financial report and reads twice in the same transaction — the figures are inconsistent. ==== How to Fix ==== ===== Option 1: SERIALIZABLE (strongest) =====


SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
BEGIN;

  SELECT COUNT(*), SUM(amount)
  FROM orders
  WHERE amount > 100;
  -- 1st read: 3 rows, total 1500

  -- B inserts and commits here

  SELECT COUNT(*), SUM(amount)
  FROM orders
  WHERE amount > 100;
  -- 2nd read: still 3 rows, total 1500 — consistent ✓

COMMIT;

===== Option 2: SELECT FOR SHARE (range lock) =====


BEGIN;
  SELECT COUNT(*), SUM(amount)
  FROM orders
  WHERE amount > 100
  FOR SHARE;
  -- Locks the entire range of matching rows
  -- B cannot INSERT into this range until A commits

COMMIT;
-- Only after this can B insert

===== Option 3: Snapshot Isolation / MVCC (PostgreSQL) =====


-- PostgreSQL: REPEATABLE READ uses MVCC
-- Each transaction sees a snapshot from the moment BEGIN was called
BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;

  SELECT COUNT(*) FROM orders WHERE amount > 100;
  -- → 3 rows (snapshot created here)

  -- B inserts and commits — A cannot see it, not in snapshot

  SELECT COUNT(*) FROM orders WHERE amount > 100;
  -- → still 3 rows ✓

COMMIT;

---- ===== 4. Lost Update ===== ==== What is it? ==== Two transactions both read **the same value**, both modify it, and one **overwrites the other's result** — one update is silently lost. Unlike dirty read: B has **committed** properly, but the data is still wrong because A read too early and overwrote B's result. ==== Real-world Example ==== A warehouse has `qty = 10`. A sells 2 items, B sells 3 items — at the same time. ^ Step ^ Txn A ^ Txn B ^ DB (qty) ^ | 1 | Read → **10** | Read → **10** | 10 | | 2 | | 10 − 3 = 7 → write **7**, commit | 7 | | 3 | 10 − 2 = 8 → write **8**, commit | | **8** ❌ | | 4 | Correct result should be **5** (10−3−2) — B's deduction of 3 is **lost** | | — | ==== Why is it wrong? ==== A read 10 before B committed. When A writes, it calculates from 10 (the old value) instead of 7 (the latest value) — completely overwriting B's result. ==== How to Fix ==== ===== Option 1: Atomic Update — simplest ✅ =====


-- WRONG: read value then calculate in application code → race condition
SELECT qty FROM products WHERE id = 1;   -- → 10
UPDATE products SET qty = 10 - 2 WHERE id = 1;

-- CORRECT: combine into one atomic SQL statement
UPDATE products
SET qty = qty - 2          -- database handles read+calculate+write atomically
WHERE id = 1;

**When to use:** When you only need simple arithmetic (add/subtract). No need to read the value in application code first. **Limitation:** Cannot be used when you need to inspect the value first (e.g., checking if stock is sufficient before deducting). ---- ===== Option 2: SELECT FOR UPDATE — most common =====


-- Txn A: lock the row on read
BEGIN;
  SELECT qty FROM products
  WHERE id = 1
  FOR UPDATE;           -- lock this row, B must wait

  -- check if qty is sufficient
  -- if yes, proceed with UPDATE
  UPDATE products
  SET qty = qty - 2
  WHERE id = 1;
COMMIT;                 -- B is now allowed to proceed

-- Txn B also does SELECT FOR UPDATE on the same row
-- → B is blocked until A commits
-- → B reads qty=8 (already deducted by A) → calculates correctly

**When to use:** When you need to read first and check business logic (e.g., verify balance or stock before writing). **Watch out for deadlocks:** If A locks row 1 then needs row 2, and B locks row 2 then needs row 1 → deadlock. Always lock rows in a consistent order. ---- ===== Option 3: Optimistic Lock (version column) =====


-- Schema: add a version column
ALTER TABLE products ADD COLUMN version INT DEFAULT 0;

-- Step 1: read with version
SELECT qty, version FROM products WHERE id = 1;
-- → qty=10, version=5

-- Step 2: write with version condition
UPDATE products
SET
  qty     = qty - 2,
  version = version + 1       -- increment version
WHERE
  id      = 1
  AND version = 5;            -- only update if version is still 5

-- rowsAffected = 1 → success
-- rowsAffected = 0 → someone else updated first, version changed → retry


# Retry logic in Python
def update_with_optimistic_lock(conn, product_id, qty_change):
    MAX_RETRY = 3

    for attempt in range(MAX_RETRY):
        with conn.cursor() as cur:
            # Step 1: read with version
            cur.execute(
                "SELECT qty, version FROM products WHERE id = %s",
                (product_id,)
            )
            row = cur.fetchone()
            qty, version = row[0], row[1]

            # Step 2: business logic check
            if qty < qty_change:
                raise Exception("Insufficient stock")

            # Step 3: write with version condition
            cur.execute(
                """UPDATE products
                   SET qty = qty - %s, version = version + 1
                   WHERE id = %s AND version = %s""",
                (qty_change, product_id, version)
            )

            if cur.rowcount == 1:
                conn.commit()
                print(f"Success after {attempt + 1} attempt(s)")
                return  # success
            else:
                conn.rollback()
                print(f"Conflict on attempt {attempt + 1}, retrying...")

    raise Exception("Failed after 3 retries")


// Retry logic in Node.js
async function updateWithOptimisticLock(pool, productId, qtyChange) {
  const MAX_RETRY = 3;

  for (let attempt = 0; attempt < MAX_RETRY; attempt++) {
    const client = await pool.connect();
    try {
      // Step 1: read with version
      const { rows } = await client.query(
        'SELECT qty, version FROM products WHERE id = $1',
        [productId]
      );
      const { qty, version } = rows[0];

      // Step 2: business logic check
      if (qty < qtyChange) throw new Error('Insufficient stock');

      // Step 3: write with version condition
      const result = await client.query(
        `UPDATE products
         SET qty = qty - $1, version = version + 1
         WHERE id = $2 AND version = $3`,
        [qtyChange, productId, version]
      );

      if (result.rowCount === 1) {
        await client.query('COMMIT');
        console.log(`Success after ${attempt + 1} attempt(s)`);
        return; // success
      }
      // rowCount = 0 → conflict → loop continues and retries
    } finally {
      client.release();
    }
  }
  throw new Error('Failed after 3 retries');
}

**When to use:** When conflicts are rare (few people modifying the same row at once). Great for REST APIs and high-concurrency systems. **Limitation:** You must write retry logic yourself. If conflicts happen frequently, repeated retries become less efficient than `SELECT FOR UPDATE`. ---- ===== Comparison of the 3 options ===== ^ ^ Atomic Update ^ SELECT FOR UPDATE ^ Optimistic Lock ^ | Real lock? | No | Yes | No | | Retry needed? | No | No | Yes (manual) | | Deadlock risk? | No | Possible | No | | Best for | Simple add/subtract | Need to check before write | Low conflict, many users | ---- ===== 5. Write Skew ===== ==== What is it? ==== The most subtle anomaly. A and B both read data, both check **a shared condition** — both see the condition is satisfied — both write to **different rows**. The result violates the condition, but neither transaction knows. **Difference from Lost Update:** Lost update is two transactions writing to the **same row**. Write skew is two transactions writing to **different rows** but corrupting a shared invariant. ==== Example 1 — Hospital: at least 1 doctor must be on call ==== Rule: **total number of on-call doctors must be ≥ 1**. Currently 2 doctors are on call: Alice and Bob. Both request to go off call at the same time. ^ Step ^ Txn A (Alice goes off call) ^ Txn B (Bob goes off call) ^ on_call ^ | 1 | SELECT COUNT(*) WHERE on_call=true → **2** | | Alice✓ Bob✓ | | 2 | | SELECT COUNT(*) WHERE on_call=true → **2** | Alice✓ Bob✓ | | 3 | count=2 ≥ 1 → allowed. UPDATE Alice on_call=false | | Alice✗ Bob✓ | | 4 | | count=2 ≥ 1 → allowed. UPDATE Bob on_call=false | Alice✗ Bob✗ | | 5 | **No doctors on call! Rule violated!** | | **0** ❌ | ==== Example 2 — Meeting room double booking ==== Rule: a room cannot be booked twice at the same time slot. ^ Step ^ Txn A (person 1) ^ Txn B (person 2) ^ bookings ^ | 1 | SELECT COUNT(*) room 101 at 2pm → **0** | | empty | | 2 | | SELECT COUNT(*) room 101 at 2pm → **0** | empty | | 3 | 0 = available → INSERT booking A, commit | | 1 booking | | 4 | | 0 = available → INSERT booking B, commit | **2 bookings** ❌ | | 5 | **Room 101 at 2pm is double-booked! Nobody knows!** | | — | ==== Why is it wrong? ==== Each transaction looks correct in isolation — it only breaks when viewed as a whole. A and B write to different rows (Alice vs Bob, booking A vs booking B), so neither blocks the other. ==== How to Fix ==== ===== Option 1: SERIALIZABLE (handles all write skew cases) =====


BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;

  SELECT COUNT(*) FROM doctors WHERE on_call = true;
  -- database tracks this dependency
  -- if conflict detected → throws serialization failure error → needs retry

  UPDATE doctors SET on_call = false WHERE id = 1;

COMMIT;

===== Option 2: SELECT FOR UPDATE (lock all rows involved in the condition) =====


BEGIN;
  -- Lock ALL on-call rows — B is blocked here
  SELECT COUNT(*) FROM doctors
  WHERE on_call = true
  FOR UPDATE;

  -- Re-check after locking
  -- Only UPDATE if count is still > 1
  UPDATE doctors SET on_call = false WHERE id = 1;
COMMIT;

===== Option 3: UNIQUE constraint (for booking-style problems) =====


-- Add a constraint to prevent duplicates
ALTER TABLE bookings
ADD CONSTRAINT unique_room_time
UNIQUE (room_id, start_time);

-- → Database automatically rejects duplicates and throws an error
-- No need to worry about transaction isolation level

**When to use each option:** * `SERIALIZABLE` — when the invariant is complex and cannot be expressed as a UNIQUE constraint * `SELECT FOR UPDATE` — when you know exactly which rows to lock * `UNIQUE constraint` — when the problem can be modeled as a unique key (simplest and most efficient) ---- ===== 6. Isolation Levels Summary ===== ==== Which anomalies occur at each level? ==== ^ Anomaly ^ READ UNCOMMITTED ^ READ COMMITTED ^ REPEATABLE READ ^ SERIALIZABLE ^ | Dirty read | ❌ Yes | ✅ No | ✅ No | ✅ No | | Non-repeatable read | ❌ Yes | ❌ Yes | ✅ No | ✅ No | | Phantom read | ❌ Yes | ❌ Yes | ❌ Yes* | ✅ No | | Lost update | ❌ Yes | ❌ Yes | ❌ Yes | ✅ No** | | Write skew | ❌ Yes | ❌ Yes | ❌ Yes | ✅ No | > * MySQL `REPEATABLE READ` prevents phantom reads thanks to MVCC. PostgreSQL does not. > > ** `SERIALIZABLE` prevents lost updates but has significant overhead — in practice, use `FOR UPDATE` or atomic updates instead of raising the isolation level. ==== MySQL vs PostgreSQL defaults ==== ^ Anomaly ^ MySQL (REPEATABLE READ) ^ PostgreSQL (READ COMMITTED) ^ | Dirty read | ✅ No | ✅ No | | Non-repeatable read | ✅ No | ❌ Possible | | Phantom read | ✅ No* | ❌ Possible | | Lost update | ❌ Possible | ❌ Possible | | Write skew | ❌ Possible | ❌ Possible | > Both MySQL and PostgreSQL defaults **do not protect** against Lost Update and Write Skew — these must be handled in your application code. ==== How to set Isolation Level ====


-- For a single transaction
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
BEGIN;
  -- SQL statements
COMMIT;

-- For the entire session (PostgreSQL)
SET default_transaction_isolation = 'repeatable read';

-- For the entire session (MySQL)
SET SESSION TRANSACTION ISOLATION LEVEL REPEATABLE READ;

==== Practical recommendations ==== ^ Situation ^ Recommended approach ^ | Typical web application | `READ COMMITTED` (default) is sufficient | | Reports and analytics | `REPEATABLE READ` | | Transfers, stock deduction | `READ COMMITTED` + `SELECT FOR UPDATE` | | Finance and banking | `SERIALIZABLE` or `SELECT FOR UPDATE` | | Simple increments/decrements | Atomic update (`SET qty = qty - 1`) | | Many users, low conflict rate | Optimistic lock (version column) | > **Rule of thumb:** Don't blindly raise to `SERIALIZABLE` — it significantly impacts performance. Most web applications are fine with `READ COMMITTED` + `SELECT FOR UPDATE` where needed. ---- ''Document compiled from a conversation about SQL Transaction Anomalies.''