====== SQL Performance Optimization Guide ====== > A practical guide to optimizing SQL queries and database performance. ---- ===== Core Principles ===== When optimizing SQL, focus on these goals: * Reduce the number of scanned rows * Reduce unnecessary data transfer * Reduce expensive JOIN operations * Avoid repeated queries * Use indexes effectively * Cache frequently accessed data ---- ===== 1. Use EXPLAIN to Analyze Queries ===== Before optimizing a query, inspect its execution plan using `EXPLAIN`. ==== Why use EXPLAIN? ==== `EXPLAIN` shows: * Whether indexes are used * Full table scans * Join order * Number of scanned rows * Sorting operations * Temporary tables ==== Example ==== EXPLAIN SELECT * FROM orders WHERE customer_id = 10 ORDER BY created_at DESC; ==== Things to watch for ==== ^ Problem ^ Meaning ^ | Full table scan | Database scans entire table | | Using temporary | Temporary table created internally | | Using filesort | Expensive sorting operation | | Large row count | Too many rows scanned | | No index used | Missing or incorrect index | ==== Goal ==== Queries should: * Use indexes * Scan as few rows as possible * Avoid unnecessary sorting and temporary tables ---- ===== 2. Add Proper Indexes ===== Indexes are one of the biggest performance improvements in SQL. ==== Add indexes on commonly used columns ==== Create indexes for columns frequently used in: * `WHERE` * `JOIN` * `ORDER BY` * `GROUP BY` ==== Example ==== -- WHERE condition CREATE INDEX idx_orders_customer_id ON orders(customer_id); -- ORDER BY CREATE INDEX idx_orders_created_at ON orders(created_at); -- JOIN CREATE INDEX idx_orders_user_id ON orders(user_id); ==== Composite Index Example ==== If queries often filter by multiple columns: SELECT * FROM orders WHERE customer_id = 10 AND status = 'paid'; Use a composite index: CREATE INDEX idx_orders_customer_status ON orders(customer_id, status); ==== Important Notes ==== * Indexes improve reads but slow down writes (`INSERT`, `UPDATE`, `DELETE`) * Avoid adding indexes on every column * Remove unused indexes ---- ===== 3. Select Only Needed Columns ===== Avoid using: SELECT * FROM users; ==== Why is `SELECT *` bad? ==== * Transfers unnecessary data * Uses more memory * Prevents some index optimizations * Slower network response ==== Better ==== SELECT id, name, email FROM users; Only select the columns your application actually needs. ---- ===== 4. Use LIMIT for Large Queries ===== Fetching huge datasets is expensive. ==== Bad ==== SELECT * FROM logs ORDER BY created_at DESC; ==== Better ==== SELECT id, message, created_at FROM logs ORDER BY created_at DESC LIMIT 100; ==== Benefits ==== * Reduces memory usage * Faster response time * Less network transfer ---- ===== 5. Optimize JOIN Operations ===== JOINs are expensive when tables are large. ==== Ensure JOIN columns are indexed ==== SELECT o.id, u.name FROM orders o JOIN users u ON o.user_id = u.id; Recommended indexes: CREATE INDEX idx_orders_user_id ON orders(user_id); Usually, primary keys are already indexed automatically. ==== Avoid unnecessary JOINs ==== Do not JOIN tables if the data is not needed. ==== Select only required columns ==== SELECT o.id, u.name FROM orders o JOIN users u ON o.user_id = u.id; Avoid: SELECT * FROM orders o JOIN users u ON o.user_id = u.id; ---- ===== 6. Avoid N+1 Query Problems ===== The N+1 problem happens when an application executes one query to fetch a list, then runs additional queries for each item. ==== Bad Example ==== 1 query to get users + 100 queries to get each user's orders = 101 queries total ==== Example in application code ==== $users = User::all(); foreach ($users as $user) { echo $user->orders; } This may generate hundreds of SQL queries. ---- ==== Solution 1 — Eager Loading ===== Load related data in a single query. $users = User::with('orders')->get(); Benefits: * Fewer database queries * Faster response * Reduced database load ---- ==== Solution 2 — Use JOIN ===== Sometimes a JOIN is more efficient. SELECT users.name, orders.total FROM users JOIN orders ON users.id = orders.user_id; Use JOIN when: * You only need combined result data * You do not need full ORM relationship objects ---- ===== 7. Use Query Caching ===== Frequently executed queries should be cached. ==== Good candidates for caching ==== * Dashboard statistics * Product lists * Configuration data * Frequently accessed reports ==== Example Flow ==== Application ↓ Check cache ↓ If cache exists → return cached data If not: Query database Store result in cache Return result ==== Popular cache systems ==== * Redis * Memcached ==== Benefits ==== * Reduces database load * Faster response times * Better scalability ---- ===== 8. Consider Denormalization Carefully ===== Normalization improves data consistency, but too many JOINs can hurt performance. Sometimes it is beneficial to combine data from multiple tables into one table. ==== Example ==== Instead of: orders users products addresses You may store some duplicated data directly in `orders`: orders: customer_name customer_email shipping_address This reduces expensive JOIN operations. ---- ==== Benefits ==== * Faster reads * Simpler queries * Fewer JOINs ==== Downsides ==== * Data duplication * More storage usage * Harder updates * Risk of inconsistent data ==== Rule ==== Only denormalize after identifying real performance bottlenecks. ---- ===== 9. Pagination Instead of Loading Everything ===== Avoid loading all rows at once. ==== Bad ==== SELECT * FROM products; ==== Better ==== SELECT id, name, price FROM products LIMIT 20 OFFSET 0; ==== Even Better (Keyset Pagination) ==== SELECT id, name, price FROM products WHERE id > 100 ORDER BY id LIMIT 20; Keyset pagination is usually faster for large datasets. ---- ===== 10. Monitor Slow Queries ===== Enable slow query logging in production. ==== Monitor ==== * Queries taking too long * Queries scanning too many rows * Missing indexes * Frequent repeated queries ==== Tools ==== * MySQL Slow Query Log * PostgreSQL `pg_stat_statements` * APM tools (New Relic, Datadog) ---- ===== Optimization Checklist ===== Before deploying a query, check: * Did you run `EXPLAIN`? * Are indexes used correctly? * Are indexes missing on `WHERE`, `JOIN`, or `ORDER BY` columns? * Are you avoiding `SELECT *`? * Are you limiting returned rows? * Are JOINs necessary? * Is there an N+1 query problem? * Can caching help? * Can denormalization reduce heavy JOINs? * Is pagination implemented? ---- ===== Practical Recommendations ===== ^ Situation ^ Recommended Solution ^ | Slow `WHERE` query | Add index | | Slow sorting | Index `ORDER BY` columns | | Too many ORM queries | Use eager loading | | Heavy repeated reads | Add cache | | Complex JOINs on huge tables | Consider denormalization | | Large result sets | Use pagination | | Slow query unknown cause | Use `EXPLAIN` | ---- ===== Rule of Thumb ===== * First: measure with `EXPLAIN` * Second: add proper indexes * Third: reduce unnecessary data * Fourth: optimize query patterns * Fifth: cache expensive operations Do not optimize blindly — optimize based on actual bottlenecks. ----