====== SQL Isolation Levels ======

> A complete guide to understanding the 4 isolation levels in SQL — what they protect against, how they work internally, and when to use each one.

----

===== Table of Contents =====

  - [[#1_what_is_an_isolation_level|1. What is an Isolation Level?]]
  - [[#2_read_uncommitted|2. READ UNCOMMITTED]]
  - [[#3_read_committed|3. READ COMMITTED]]
  - [[#4_repeatable_read|4. REPEATABLE READ]]
  - [[#5_serializable|5. SERIALIZABLE]]
  - [[#6_how_isolation_works_internally|6. How Isolation Works Internally]]
  - [[#7_full_comparison_table|7. Full Comparison Table]]
  - [[#8_how_to_set_isolation_level|8. How to Set Isolation Level]]
  - [[#9_practical_guide_which_one_to_use|9. Practical Guide — Which One to Use]]

----

===== 1. What is an Isolation Level? =====

When multiple transactions run at the same time, they can interfere with each other and produce incorrect results. **Isolation level** is a setting that controls **how much one transaction is shielded from the effects of other concurrent transactions**.

The SQL standard defines 4 isolation levels, ordered from weakest to strongest:

<code>
READ UNCOMMITTED  →  READ COMMITTED  →  REPEATABLE READ  →  SERIALIZABLE
   (weakest)                                                   (strongest)
      ↑                                                             ↑
 highest performance                                      highest data safety
 lowest data safety                                      lowest performance
</code>

There is always a **trade-off**: stronger isolation = safer data, but more locking = slower performance. You pick the level that matches what your use case actually needs.

==== The 3 read anomalies defined by the SQL standard ====

Before diving in, these are the problems isolation levels are designed to prevent:

^ Anomaly ^ Description ^
| **Dirty read** | Reading uncommitted changes from another transaction |
| **Non-repeatable read** | Reading the same row twice and getting different values (someone updated it) |
| **Phantom read** | Running the same query twice and getting different rows (someone inserted/deleted) |

----

===== 2. READ UNCOMMITTED =====

==== What it does ====

The **lowest** isolation level. A transaction can read data that other transactions have modified but **not yet committed**. There are virtually no restrictions.

==== Anomalies ====

^ Dirty read ^ Non-repeatable read ^ Phantom read ^
| ❌ Can happen | ❌ Can happen | ❌ Can happen |

==== How it works ====

No read locks are acquired. A transaction reads directly from the latest in-memory state of the data, regardless of whether other transactions have committed or not.

==== Example ====

<code>
balance = 100

Txn B:  UPDATE balance = 200  (not committed yet)
Txn A:  SELECT balance  →  reads 200  ← dirty read
Txn B:  ROLLBACK         →  balance reverts to 100
Txn A:  is now working with 200, which never existed
</code>

==== SQL ====

<code sql>
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
BEGIN;
  SELECT balance FROM accounts WHERE id = 1;
  -- May return uncommitted values from other transactions
COMMIT;
</code>

==== When to use ====

Almost **never** in production. The only theoretical use case is large read-only analytics queries where approximate results are acceptable and you want to avoid any locking overhead at all. In practice, even then there are better options.

----

===== 3. READ COMMITTED =====

==== What it does ====

The most **commonly used** isolation level. A transaction can only read data that has been **committed**. Any uncommitted changes from other transactions are invisible.

==== Anomalies ====

^ Dirty read ^ Non-repeatable read ^ Phantom read ^
| ✅ Prevented | ❌ Can happen | ❌ Can happen |

==== How it works ====

Each `SELECT` statement gets a **fresh snapshot** of committed data at the moment the statement runs — not at the moment the transaction began. This means if you run the same `SELECT` twice in one transaction, you may get different results if another transaction committed a change in between.

==== Example ====

<code>
balance = 1000

Txn A:  SELECT balance  →  1000          ← snapshot at statement time
Txn B:  UPDATE balance = 500, COMMIT
Txn A:  SELECT balance  →  500           ← new snapshot, sees B's commit
         same transaction, different result = non-repeatable read
</code>

==== SQL ====

<code sql>
-- Default in PostgreSQL and SQL Server
SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
BEGIN;

  SELECT balance FROM accounts WHERE id = 1;
  -- → 1000

  -- another transaction commits balance = 500 here

  SELECT balance FROM accounts WHERE id = 1;
  -- → 500  (different result — non-repeatable read)

COMMIT;
</code>

==== When to use ====

The right default for **most web applications**. It prevents the most dangerous anomaly (dirty read) while keeping performance high. Used by default in:
  * **PostgreSQL** ✓
  * **SQL Server** ✓
  * **Oracle** ✓

----

===== 4. REPEATABLE READ =====

==== What it does ====

A transaction always sees the **same snapshot** of data as it did at the start of the transaction. If you read a row, that row will return the same value for the entire duration of the transaction — even if another transaction updates it and commits.

==== Anomalies ====

^ Dirty read ^ Non-repeatable read ^ Phantom read ^
| ✅ Prevented | ✅ Prevented | ❌ Can happen* |

> * MySQL's `REPEATABLE READ` also prevents phantom reads via MVCC. PostgreSQL's does not fully prevent them in all cases.

==== How it works ====

The database takes a **snapshot at the start of the transaction** (`BEGIN`). All `SELECT` statements in that transaction read from this fixed snapshot — they never see changes committed by other transactions after the transaction began.

<code>
Transaction starts → snapshot created → all reads use this snapshot
                                              ↓
                              other transactions commit changes
                                              ↓
                              those changes are invisible to this transaction
</code>

==== Example ====

<code>
balance = 1000

Txn A:  BEGIN  →  snapshot created (balance=1000)
Txn A:  SELECT balance  →  1000
Txn B:  UPDATE balance = 500, COMMIT
Txn A:  SELECT balance  →  still 1000  ← reads from snapshot, not current data
Txn A:  COMMIT
</code>

==== SQL ====

<code sql>
-- Default in MySQL (InnoDB)
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
BEGIN;

  SELECT balance FROM accounts WHERE id = 1;
  -- → 1000  (snapshot created here)

  -- Txn B updates balance to 500 and commits

  SELECT balance FROM accounts WHERE id = 1;
  -- → still 1000  ✓  (reads from snapshot)

COMMIT;
</code>

==== Phantom read still possible (PostgreSQL) ====

Even though each row is stable, new rows can still appear:

<code sql>
BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ; -- PostgreSQL

  SELECT COUNT(*) FROM orders WHERE amount > 100;
  -- → 3 rows

  -- Txn B inserts a new order with amount=500 and commits

  SELECT COUNT(*) FROM orders WHERE amount > 100;
  -- → 4 rows  ← phantom read! (PostgreSQL)
  -- → 3 rows  ← MySQL MVCC prevents this
</code>

==== When to use ====

Use when your transaction needs **consistent reads across multiple queries** — reports, analytics, financial summaries. Used by default in:
  * **MySQL (InnoDB)** ✓

----

===== 5. SERIALIZABLE =====

==== What it does ====

The **strongest** isolation level. Transactions behave as if they were executed **one at a time, serially** — even though they actually run concurrently. All anomalies are prevented, including write skew.

==== Anomalies ====

^ Dirty read ^ Non-repeatable read ^ Phantom read ^ Lost update ^ Write skew ^
| ✅ Prevented | ✅ Prevented | ✅ Prevented | ✅ Prevented | ✅ Prevented |

==== How it works ====

There are two implementation approaches depending on the database:

**Approach 1 — Lock-based (traditional)**

The database places range locks on all rows and gaps scanned by a query. Other transactions cannot insert, update, or delete data in those ranges until the lock is released.

**Approach 2 — SSI (Serializable Snapshot Isolation — PostgreSQL)**

Transactions run freely using snapshots (no extra locking). The database tracks read/write dependencies between transactions. At commit time, if a conflict is detected that would violate serializability → one transaction is aborted with a `serialization failure` error and must retry.

==== Example — Write skew prevented ====

<code>
Rule: at least 1 doctor must be on call
Currently: Alice on_call=true, Bob on_call=true

Txn A (SERIALIZABLE): SELECT COUNT(*) WHERE on_call=true → 2, UPDATE Alice off call
Txn B (SERIALIZABLE): SELECT COUNT(*) WHERE on_call=true → 2, UPDATE Bob off call

Without SERIALIZABLE → both succeed → 0 doctors on call (rule violated)
With SERIALIZABLE    → one transaction is aborted → must retry → rule preserved
</code>

==== SQL ====

<code sql>
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
BEGIN;

  -- Any reads here are tracked for dependency
  SELECT COUNT(*) FROM doctors WHERE on_call = true;

  UPDATE doctors SET on_call = false WHERE id = 1;

COMMIT;
-- If a conflict is detected: ERROR: could not serialize access
-- → catch this error in your code and retry the transaction
</code>

==== Retry pattern for serialization failures ====

<code python>
import psycopg2

def run_serializable(conn, fn):
    MAX_RETRY = 5
    for attempt in range(MAX_RETRY):
        try:
            with conn.cursor() as cur:
                cur.execute("BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE")
                fn(cur)           # your business logic here
                cur.execute("COMMIT")
                return            # success
        except psycopg2.errors.SerializationFailure:
            conn.rollback()
            print(f"Serialization conflict, retrying... ({attempt + 1})")
    raise Exception("Failed after max retries")
</code>

<code javascript>
async function runSerializable(pool, fn) {
  const MAX_RETRY = 5;
  for (let attempt = 0; attempt < MAX_RETRY; attempt++) {
    const client = await pool.connect();
    try {
      await client.query('BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE');
      await fn(client);    // your business logic here
      await client.query('COMMIT');
      return;              // success
    } catch (err) {
      await client.query('ROLLBACK');
      // PostgreSQL serialization failure error code: 40001
      if (err.code === '40001') {
        console.log(`Serialization conflict, retrying... (${attempt + 1})`);
        continue;
      }
      throw err;           // different error, rethrow
    } finally {
      client.release();
    }
  }
  throw new Error('Failed after max retries');
}
</code>

==== When to use ====

Use for **critical financial operations** where correctness is non-negotiable:
  * Bank transfers
  * Booking systems (rooms, flights, tickets)
  * Inventory reservation under high concurrency
  * Any operation with complex shared invariants

**Performance note:** `SERIALIZABLE` is significantly slower under high concurrency because of the extra tracking or locking overhead. Only apply it to transactions that truly need it — not the entire application.

----

===== 6. How Isolation Works Internally =====

Databases implement isolation using two main techniques:

==== Technique 1 — Locking ====

The traditional approach. Transactions acquire locks on data they read or write. Other transactions that need the same data must wait.

<code>
Txn A reads row 1  →  acquires Shared Lock (S)
Txn B wants to write row 1  →  must wait for A to release S lock
Txn A commits  →  releases lock
Txn B proceeds
</code>

**Lock types:**

^ Lock ^ Who can hold it ^ Blocks ^
| Shared (S) | Multiple readers simultaneously | Writers only |
| Exclusive (X) | One writer at a time | All readers and writers |

**Downside:** Can cause **deadlocks** — A holds lock 1 waiting for lock 2, B holds lock 2 waiting for lock 1. Both wait forever. Databases detect and resolve this by aborting one transaction.

----

==== Technique 2 — MVCC (Multi-Version Concurrency Control) ====

Used by **PostgreSQL**, **MySQL InnoDB**, and Oracle. Instead of blocking readers, the database keeps **multiple versions** of each row. Each transaction reads from its own consistent snapshot.

<code>
Row versions stored in database:
  balance=1000  (committed at T=1)
  balance=500   (committed at T=5)
  balance=200   (committed at T=9)

Txn A started at T=3  →  always sees balance=1000
Txn B started at T=7  →  always sees balance=500
New query at T=10     →  sees balance=200
</code>

**Key benefit:** Readers never block writers, writers never block readers. Much higher concurrency than pure locking.

**Downside:** Old row versions accumulate and must be cleaned up (PostgreSQL calls this **VACUUM**).

----

==== Which technique does each database use? ====

^ Database ^ Technique ^ Notes ^
| PostgreSQL | MVCC + SSI for SERIALIZABLE | Readers never block writers |
| MySQL InnoDB | MVCC + gap locks | MVCC for reads, locks for writes |
| SQL Server | Locking (default) or MVCC (with RCSI) | MVCC must be enabled manually |
| Oracle | MVCC | Similar to PostgreSQL |

----

===== 7. Full Comparison Table =====

==== Anomaly protection by level ====

^ Isolation Level ^ Dirty Read ^ Non-repeatable Read ^ Phantom Read ^ Lost Update ^ Write Skew ^
| READ UNCOMMITTED | ❌ | ❌ | ❌ | ❌ | ❌ |
| READ COMMITTED | ✅ | ❌ | ❌ | ❌ | ❌ |
| REPEATABLE READ | ✅ | ✅ | ❌ (✅ MySQL) | ❌ | ❌ |
| SERIALIZABLE | ✅ | ✅ | ✅ | ✅ | ✅ |

==== Default level by database ====

^ Database ^ Default Level ^ Dirty Read ^ Non-repeatable Read ^ Phantom Read ^
| MySQL InnoDB | REPEATABLE READ | ✅ No | ✅ No | ✅ No* |
| PostgreSQL | READ COMMITTED | ✅ No | ❌ Possible | ❌ Possible |
| SQL Server | READ COMMITTED | ✅ No | ❌ Possible | ❌ Possible |
| Oracle | READ COMMITTED | ✅ No | ❌ Possible | ❌ Possible |

> * MySQL prevents phantom reads via MVCC at REPEATABLE READ level.

==== Performance vs safety trade-off ====

<code>
READ UNCOMMITTED  ──────────────────────────────►  SERIALIZABLE
        ↑                                                 ↑
   Fastest                                           Slowest
   Most concurrent                             Least concurrent
   Least safe                                     Most safe
</code>

----

===== 8. How to Set Isolation Level =====

==== For a single transaction ====

<code sql>
-- Set before BEGIN
SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
BEGIN;
  -- your SQL here
COMMIT;

-- Or inline with BEGIN (PostgreSQL)
BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;
  -- your SQL here
COMMIT;
</code>

==== For the current session ====

<code sql>
-- PostgreSQL
SET default_transaction_isolation = 'repeatable read';

-- MySQL
SET SESSION TRANSACTION ISOLATION LEVEL REPEATABLE READ;

-- SQL Server
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
-- (applies to all subsequent transactions in the session)
</code>

==== For all new connections (global default) ====

<code sql>
-- MySQL: set globally (requires SUPER privilege)
SET GLOBAL TRANSACTION ISOLATION LEVEL READ COMMITTED;

-- PostgreSQL: set in postgresql.conf
-- default_transaction_isolation = 'read committed'
</code>

==== In application code ====

<code python>
# Python (psycopg2 — PostgreSQL)
import psycopg2
from psycopg2 import extensions

conn = psycopg2.connect(...)
conn.set_isolation_level(extensions.ISOLATION_LEVEL_REPEATABLE_READ)
</code>

<code javascript>
// Node.js (pg — PostgreSQL)
const client = await pool.connect();
await client.query('BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE');
// ... your queries ...
await client.query('COMMIT');
</code>

<code java>
// Java (JDBC)
connection.setTransactionIsolation(Connection.TRANSACTION_REPEATABLE_READ);
connection.setAutoCommit(false);
// ... your queries ...
connection.commit();
</code>

<code go>
// Go (database/sql)
tx, err := db.BeginTx(ctx, &sql.TxOptions{
    Isolation: sql.LevelRepeatableRead,
})
</code>

----

===== 9. Practical Guide — Which One to Use =====

==== Decision flowchart ====

<code>
Does your transaction read data more than once
and needs consistent results throughout?
    │
    ├── No  → READ COMMITTED (default) is fine
    │
    └── Yes → Does it involve complex shared invariants
              (e.g. "total must be >= 1", "no double booking")?
                  │
                  ├── No  → REPEATABLE READ
                  │
                  └── Yes → SERIALIZABLE
                            (or SELECT FOR UPDATE for specific rows)
</code>

==== Recommendations by use case ====

^ Use Case ^ Recommended Level ^ Why ^
| Read a single row | `READ COMMITTED` | No repeated reads needed |
| Display a news feed or product listing | `READ COMMITTED` | Slight inconsistency is acceptable |
| Generate a financial report | `REPEATABLE READ` | Multiple reads must be consistent |
| Deduct stock on purchase | `READ COMMITTED` + `SELECT FOR UPDATE` | Need to check stock before deducting |
| Bank transfer (debit + credit) | `READ COMMITTED` + `SELECT FOR UPDATE` | Lock both accounts before writing |
| Doctor on-call scheduling | `SERIALIZABLE` | Complex invariant across multiple rows |
| Flight/hotel booking | `SERIALIZABLE` or `UNIQUE constraint` | No double booking allowed |
| Update a user profile | `READ COMMITTED` + Optimistic lock | Conflicts are rare |

==== The most practical approach for most apps ====

Most production applications **do not need to change the default isolation level**. Instead:

<code sql>
-- 1. Use READ COMMITTED (default) for regular queries

-- 2. Use SELECT FOR UPDATE only where race conditions could occur
BEGIN;
  SELECT balance FROM accounts WHERE id = 1 FOR UPDATE;
  -- check balance, then update
  UPDATE accounts SET balance = balance - 100 WHERE id = 1;
COMMIT;

-- 3. Use atomic updates for simple increments/decrements
UPDATE products SET qty = qty - 1 WHERE id = 1;

-- 4. Use UNIQUE constraints to enforce uniqueness at the DB level
ALTER TABLE bookings ADD CONSTRAINT uq_room_time UNIQUE (room_id, slot);

-- 5. Only raise to SERIALIZABLE for truly complex invariants
</code>

==== Common mistakes to avoid ====

^ Mistake ^ Problem ^ Fix ^
| Using `READ UNCOMMITTED` | Dirty reads, unreliable data | Use `READ COMMITTED` or higher |
| Setting everything to `SERIALIZABLE` | Severe performance degradation | Only use where necessary |
| Read-then-write without locking | Lost update | Use `FOR UPDATE` or atomic update |
| Locking in inconsistent order | Deadlocks | Always lock rows in the same order |
| Ignoring serialization failure errors | Silent data corruption | Always catch and retry |

----

''For related topics, see: SQL Transaction Anomalies, Locking Strategies, MVCC Internals.''