====== SQL Isolation Levels ====== > A complete guide to understanding the 4 isolation levels in SQL — what they protect against, how they work internally, and when to use each one. ---- ===== Table of Contents ===== - [[#1_what_is_an_isolation_level|1. What is an Isolation Level?]] - [[#2_read_uncommitted|2. READ UNCOMMITTED]] - [[#3_read_committed|3. READ COMMITTED]] - [[#4_repeatable_read|4. REPEATABLE READ]] - [[#5_serializable|5. SERIALIZABLE]] - [[#6_how_isolation_works_internally|6. How Isolation Works Internally]] - [[#7_full_comparison_table|7. Full Comparison Table]] - [[#8_how_to_set_isolation_level|8. How to Set Isolation Level]] - [[#9_practical_guide_which_one_to_use|9. Practical Guide — Which One to Use]] ---- ===== 1. What is an Isolation Level? ===== When multiple transactions run at the same time, they can interfere with each other and produce incorrect results. **Isolation level** is a setting that controls **how much one transaction is shielded from the effects of other concurrent transactions**. The SQL standard defines 4 isolation levels, ordered from weakest to strongest: READ UNCOMMITTED → READ COMMITTED → REPEATABLE READ → SERIALIZABLE (weakest) (strongest) ↑ ↑ highest performance highest data safety lowest data safety lowest performance There is always a **trade-off**: stronger isolation = safer data, but more locking = slower performance. You pick the level that matches what your use case actually needs. ==== The 3 read anomalies defined by the SQL standard ==== Before diving in, these are the problems isolation levels are designed to prevent: ^ Anomaly ^ Description ^ | **Dirty read** | Reading uncommitted changes from another transaction | | **Non-repeatable read** | Reading the same row twice and getting different values (someone updated it) | | **Phantom read** | Running the same query twice and getting different rows (someone inserted/deleted) | ---- ===== 2. READ UNCOMMITTED ===== ==== What it does ==== The **lowest** isolation level. A transaction can read data that other transactions have modified but **not yet committed**. There are virtually no restrictions. ==== Anomalies ==== ^ Dirty read ^ Non-repeatable read ^ Phantom read ^ | ❌ Can happen | ❌ Can happen | ❌ Can happen | ==== How it works ==== No read locks are acquired. A transaction reads directly from the latest in-memory state of the data, regardless of whether other transactions have committed or not. ==== Example ==== balance = 100 Txn B: UPDATE balance = 200 (not committed yet) Txn A: SELECT balance → reads 200 ← dirty read Txn B: ROLLBACK → balance reverts to 100 Txn A: is now working with 200, which never existed ==== SQL ==== SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED; BEGIN; SELECT balance FROM accounts WHERE id = 1; -- May return uncommitted values from other transactions COMMIT; ==== When to use ==== Almost **never** in production. The only theoretical use case is large read-only analytics queries where approximate results are acceptable and you want to avoid any locking overhead at all. In practice, even then there are better options. ---- ===== 3. READ COMMITTED ===== ==== What it does ==== The most **commonly used** isolation level. A transaction can only read data that has been **committed**. Any uncommitted changes from other transactions are invisible. ==== Anomalies ==== ^ Dirty read ^ Non-repeatable read ^ Phantom read ^ | ✅ Prevented | ❌ Can happen | ❌ Can happen | ==== How it works ==== Each `SELECT` statement gets a **fresh snapshot** of committed data at the moment the statement runs — not at the moment the transaction began. This means if you run the same `SELECT` twice in one transaction, you may get different results if another transaction committed a change in between. ==== Example ==== balance = 1000 Txn A: SELECT balance → 1000 ← snapshot at statement time Txn B: UPDATE balance = 500, COMMIT Txn A: SELECT balance → 500 ← new snapshot, sees B's commit same transaction, different result = non-repeatable read ==== SQL ==== -- Default in PostgreSQL and SQL Server SET TRANSACTION ISOLATION LEVEL READ COMMITTED; BEGIN; SELECT balance FROM accounts WHERE id = 1; -- → 1000 -- another transaction commits balance = 500 here SELECT balance FROM accounts WHERE id = 1; -- → 500 (different result — non-repeatable read) COMMIT; ==== When to use ==== The right default for **most web applications**. It prevents the most dangerous anomaly (dirty read) while keeping performance high. Used by default in: * **PostgreSQL** ✓ * **SQL Server** ✓ * **Oracle** ✓ ---- ===== 4. REPEATABLE READ ===== ==== What it does ==== A transaction always sees the **same snapshot** of data as it did at the start of the transaction. If you read a row, that row will return the same value for the entire duration of the transaction — even if another transaction updates it and commits. ==== Anomalies ==== ^ Dirty read ^ Non-repeatable read ^ Phantom read ^ | ✅ Prevented | ✅ Prevented | ❌ Can happen* | > * MySQL's `REPEATABLE READ` also prevents phantom reads via MVCC. PostgreSQL's does not fully prevent them in all cases. ==== How it works ==== The database takes a **snapshot at the start of the transaction** (`BEGIN`). All `SELECT` statements in that transaction read from this fixed snapshot — they never see changes committed by other transactions after the transaction began. Transaction starts → snapshot created → all reads use this snapshot ↓ other transactions commit changes ↓ those changes are invisible to this transaction ==== Example ==== balance = 1000 Txn A: BEGIN → snapshot created (balance=1000) Txn A: SELECT balance → 1000 Txn B: UPDATE balance = 500, COMMIT Txn A: SELECT balance → still 1000 ← reads from snapshot, not current data Txn A: COMMIT ==== SQL ==== -- Default in MySQL (InnoDB) SET TRANSACTION ISOLATION LEVEL REPEATABLE READ; BEGIN; SELECT balance FROM accounts WHERE id = 1; -- → 1000 (snapshot created here) -- Txn B updates balance to 500 and commits SELECT balance FROM accounts WHERE id = 1; -- → still 1000 ✓ (reads from snapshot) COMMIT; ==== Phantom read still possible (PostgreSQL) ==== Even though each row is stable, new rows can still appear: BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ; -- PostgreSQL SELECT COUNT(*) FROM orders WHERE amount > 100; -- → 3 rows -- Txn B inserts a new order with amount=500 and commits SELECT COUNT(*) FROM orders WHERE amount > 100; -- → 4 rows ← phantom read! (PostgreSQL) -- → 3 rows ← MySQL MVCC prevents this ==== When to use ==== Use when your transaction needs **consistent reads across multiple queries** — reports, analytics, financial summaries. Used by default in: * **MySQL (InnoDB)** ✓ ---- ===== 5. SERIALIZABLE ===== ==== What it does ==== The **strongest** isolation level. Transactions behave as if they were executed **one at a time, serially** — even though they actually run concurrently. All anomalies are prevented, including write skew. ==== Anomalies ==== ^ Dirty read ^ Non-repeatable read ^ Phantom read ^ Lost update ^ Write skew ^ | ✅ Prevented | ✅ Prevented | ✅ Prevented | ✅ Prevented | ✅ Prevented | ==== How it works ==== There are two implementation approaches depending on the database: **Approach 1 — Lock-based (traditional)** The database places range locks on all rows and gaps scanned by a query. Other transactions cannot insert, update, or delete data in those ranges until the lock is released. **Approach 2 — SSI (Serializable Snapshot Isolation — PostgreSQL)** Transactions run freely using snapshots (no extra locking). The database tracks read/write dependencies between transactions. At commit time, if a conflict is detected that would violate serializability → one transaction is aborted with a `serialization failure` error and must retry. ==== Example — Write skew prevented ==== Rule: at least 1 doctor must be on call Currently: Alice on_call=true, Bob on_call=true Txn A (SERIALIZABLE): SELECT COUNT(*) WHERE on_call=true → 2, UPDATE Alice off call Txn B (SERIALIZABLE): SELECT COUNT(*) WHERE on_call=true → 2, UPDATE Bob off call Without SERIALIZABLE → both succeed → 0 doctors on call (rule violated) With SERIALIZABLE → one transaction is aborted → must retry → rule preserved ==== SQL ==== SET TRANSACTION ISOLATION LEVEL SERIALIZABLE; BEGIN; -- Any reads here are tracked for dependency SELECT COUNT(*) FROM doctors WHERE on_call = true; UPDATE doctors SET on_call = false WHERE id = 1; COMMIT; -- If a conflict is detected: ERROR: could not serialize access -- → catch this error in your code and retry the transaction ==== Retry pattern for serialization failures ==== import psycopg2 def run_serializable(conn, fn): MAX_RETRY = 5 for attempt in range(MAX_RETRY): try: with conn.cursor() as cur: cur.execute("BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE") fn(cur) # your business logic here cur.execute("COMMIT") return # success except psycopg2.errors.SerializationFailure: conn.rollback() print(f"Serialization conflict, retrying... ({attempt + 1})") raise Exception("Failed after max retries") async function runSerializable(pool, fn) { const MAX_RETRY = 5; for (let attempt = 0; attempt < MAX_RETRY; attempt++) { const client = await pool.connect(); try { await client.query('BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE'); await fn(client); // your business logic here await client.query('COMMIT'); return; // success } catch (err) { await client.query('ROLLBACK'); // PostgreSQL serialization failure error code: 40001 if (err.code === '40001') { console.log(`Serialization conflict, retrying... (${attempt + 1})`); continue; } throw err; // different error, rethrow } finally { client.release(); } } throw new Error('Failed after max retries'); } ==== When to use ==== Use for **critical financial operations** where correctness is non-negotiable: * Bank transfers * Booking systems (rooms, flights, tickets) * Inventory reservation under high concurrency * Any operation with complex shared invariants **Performance note:** `SERIALIZABLE` is significantly slower under high concurrency because of the extra tracking or locking overhead. Only apply it to transactions that truly need it — not the entire application. ---- ===== 6. How Isolation Works Internally ===== Databases implement isolation using two main techniques: ==== Technique 1 — Locking ==== The traditional approach. Transactions acquire locks on data they read or write. Other transactions that need the same data must wait. Txn A reads row 1 → acquires Shared Lock (S) Txn B wants to write row 1 → must wait for A to release S lock Txn A commits → releases lock Txn B proceeds **Lock types:** ^ Lock ^ Who can hold it ^ Blocks ^ | Shared (S) | Multiple readers simultaneously | Writers only | | Exclusive (X) | One writer at a time | All readers and writers | **Downside:** Can cause **deadlocks** — A holds lock 1 waiting for lock 2, B holds lock 2 waiting for lock 1. Both wait forever. Databases detect and resolve this by aborting one transaction. ---- ==== Technique 2 — MVCC (Multi-Version Concurrency Control) ==== Used by **PostgreSQL**, **MySQL InnoDB**, and Oracle. Instead of blocking readers, the database keeps **multiple versions** of each row. Each transaction reads from its own consistent snapshot. Row versions stored in database: balance=1000 (committed at T=1) balance=500 (committed at T=5) balance=200 (committed at T=9) Txn A started at T=3 → always sees balance=1000 Txn B started at T=7 → always sees balance=500 New query at T=10 → sees balance=200 **Key benefit:** Readers never block writers, writers never block readers. Much higher concurrency than pure locking. **Downside:** Old row versions accumulate and must be cleaned up (PostgreSQL calls this **VACUUM**). ---- ==== Which technique does each database use? ==== ^ Database ^ Technique ^ Notes ^ | PostgreSQL | MVCC + SSI for SERIALIZABLE | Readers never block writers | | MySQL InnoDB | MVCC + gap locks | MVCC for reads, locks for writes | | SQL Server | Locking (default) or MVCC (with RCSI) | MVCC must be enabled manually | | Oracle | MVCC | Similar to PostgreSQL | ---- ===== 7. Full Comparison Table ===== ==== Anomaly protection by level ==== ^ Isolation Level ^ Dirty Read ^ Non-repeatable Read ^ Phantom Read ^ Lost Update ^ Write Skew ^ | READ UNCOMMITTED | ❌ | ❌ | ❌ | ❌ | ❌ | | READ COMMITTED | ✅ | ❌ | ❌ | ❌ | ❌ | | REPEATABLE READ | ✅ | ✅ | ❌ (✅ MySQL) | ❌ | ❌ | | SERIALIZABLE | ✅ | ✅ | ✅ | ✅ | ✅ | ==== Default level by database ==== ^ Database ^ Default Level ^ Dirty Read ^ Non-repeatable Read ^ Phantom Read ^ | MySQL InnoDB | REPEATABLE READ | ✅ No | ✅ No | ✅ No* | | PostgreSQL | READ COMMITTED | ✅ No | ❌ Possible | ❌ Possible | | SQL Server | READ COMMITTED | ✅ No | ❌ Possible | ❌ Possible | | Oracle | READ COMMITTED | ✅ No | ❌ Possible | ❌ Possible | > * MySQL prevents phantom reads via MVCC at REPEATABLE READ level. ==== Performance vs safety trade-off ==== READ UNCOMMITTED ──────────────────────────────► SERIALIZABLE ↑ ↑ Fastest Slowest Most concurrent Least concurrent Least safe Most safe ---- ===== 8. How to Set Isolation Level ===== ==== For a single transaction ==== -- Set before BEGIN SET TRANSACTION ISOLATION LEVEL READ COMMITTED; BEGIN; -- your SQL here COMMIT; -- Or inline with BEGIN (PostgreSQL) BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ; -- your SQL here COMMIT; ==== For the current session ==== -- PostgreSQL SET default_transaction_isolation = 'repeatable read'; -- MySQL SET SESSION TRANSACTION ISOLATION LEVEL REPEATABLE READ; -- SQL Server SET TRANSACTION ISOLATION LEVEL SERIALIZABLE; -- (applies to all subsequent transactions in the session) ==== For all new connections (global default) ==== -- MySQL: set globally (requires SUPER privilege) SET GLOBAL TRANSACTION ISOLATION LEVEL READ COMMITTED; -- PostgreSQL: set in postgresql.conf -- default_transaction_isolation = 'read committed' ==== In application code ==== # Python (psycopg2 — PostgreSQL) import psycopg2 from psycopg2 import extensions conn = psycopg2.connect(...) conn.set_isolation_level(extensions.ISOLATION_LEVEL_REPEATABLE_READ) // Node.js (pg — PostgreSQL) const client = await pool.connect(); await client.query('BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE'); // ... your queries ... await client.query('COMMIT'); // Java (JDBC) connection.setTransactionIsolation(Connection.TRANSACTION_REPEATABLE_READ); connection.setAutoCommit(false); // ... your queries ... connection.commit(); // Go (database/sql) tx, err := db.BeginTx(ctx, &sql.TxOptions{ Isolation: sql.LevelRepeatableRead, }) ---- ===== 9. Practical Guide — Which One to Use ===== ==== Decision flowchart ==== Does your transaction read data more than once and needs consistent results throughout? │ ├── No → READ COMMITTED (default) is fine │ └── Yes → Does it involve complex shared invariants (e.g. "total must be >= 1", "no double booking")? │ ├── No → REPEATABLE READ │ └── Yes → SERIALIZABLE (or SELECT FOR UPDATE for specific rows) ==== Recommendations by use case ==== ^ Use Case ^ Recommended Level ^ Why ^ | Read a single row | `READ COMMITTED` | No repeated reads needed | | Display a news feed or product listing | `READ COMMITTED` | Slight inconsistency is acceptable | | Generate a financial report | `REPEATABLE READ` | Multiple reads must be consistent | | Deduct stock on purchase | `READ COMMITTED` + `SELECT FOR UPDATE` | Need to check stock before deducting | | Bank transfer (debit + credit) | `READ COMMITTED` + `SELECT FOR UPDATE` | Lock both accounts before writing | | Doctor on-call scheduling | `SERIALIZABLE` | Complex invariant across multiple rows | | Flight/hotel booking | `SERIALIZABLE` or `UNIQUE constraint` | No double booking allowed | | Update a user profile | `READ COMMITTED` + Optimistic lock | Conflicts are rare | ==== The most practical approach for most apps ==== Most production applications **do not need to change the default isolation level**. Instead: -- 1. Use READ COMMITTED (default) for regular queries -- 2. Use SELECT FOR UPDATE only where race conditions could occur BEGIN; SELECT balance FROM accounts WHERE id = 1 FOR UPDATE; -- check balance, then update UPDATE accounts SET balance = balance - 100 WHERE id = 1; COMMIT; -- 3. Use atomic updates for simple increments/decrements UPDATE products SET qty = qty - 1 WHERE id = 1; -- 4. Use UNIQUE constraints to enforce uniqueness at the DB level ALTER TABLE bookings ADD CONSTRAINT uq_room_time UNIQUE (room_id, slot); -- 5. Only raise to SERIALIZABLE for truly complex invariants ==== Common mistakes to avoid ==== ^ Mistake ^ Problem ^ Fix ^ | Using `READ UNCOMMITTED` | Dirty reads, unreliable data | Use `READ COMMITTED` or higher | | Setting everything to `SERIALIZABLE` | Severe performance degradation | Only use where necessary | | Read-then-write without locking | Lost update | Use `FOR UPDATE` or atomic update | | Locking in inconsistent order | Deadlocks | Always lock rows in the same order | | Ignoring serialization failure errors | Silent data corruption | Always catch and retry | ---- ''For related topics, see: SQL Transaction Anomalies, Locking Strategies, MVCC Internals.''