====== SQL Performance Optimization Guide ======
> A practical guide to optimizing SQL queries and database performance.
----
===== Core Principles =====
When optimizing SQL, focus on these goals:
* Reduce the number of scanned rows
* Reduce unnecessary data transfer
* Reduce expensive JOIN operations
* Avoid repeated queries
* Use indexes effectively
* Cache frequently accessed data
----
===== 1. Use EXPLAIN to Analyze Queries =====
Before optimizing a query, inspect its execution plan using `EXPLAIN`.
==== Why use EXPLAIN? ====
`EXPLAIN` shows:
* Whether indexes are used
* Full table scans
* Join order
* Number of scanned rows
* Sorting operations
* Temporary tables
==== Example ====
EXPLAIN
SELECT *
FROM orders
WHERE customer_id = 10
ORDER BY created_at DESC;
==== Things to watch for ====
^ Problem ^ Meaning ^
| Full table scan | Database scans entire table |
| Using temporary | Temporary table created internally |
| Using filesort | Expensive sorting operation |
| Large row count | Too many rows scanned |
| No index used | Missing or incorrect index |
==== Goal ====
Queries should:
* Use indexes
* Scan as few rows as possible
* Avoid unnecessary sorting and temporary tables
----
===== 2. Add Proper Indexes =====
Indexes are one of the biggest performance improvements in SQL.
==== Add indexes on commonly used columns ====
Create indexes for columns frequently used in:
* `WHERE`
* `JOIN`
* `ORDER BY`
* `GROUP BY`
==== Example ====
-- WHERE condition
CREATE INDEX idx_orders_customer_id
ON orders(customer_id);
-- ORDER BY
CREATE INDEX idx_orders_created_at
ON orders(created_at);
-- JOIN
CREATE INDEX idx_orders_user_id
ON orders(user_id);
==== Composite Index Example ====
If queries often filter by multiple columns:
SELECT *
FROM orders
WHERE customer_id = 10
AND status = 'paid';
Use a composite index:
CREATE INDEX idx_orders_customer_status
ON orders(customer_id, status);
==== Important Notes ====
* Indexes improve reads but slow down writes (`INSERT`, `UPDATE`, `DELETE`)
* Avoid adding indexes on every column
* Remove unused indexes
----
===== 3. Select Only Needed Columns =====
Avoid using:
SELECT *
FROM users;
==== Why is `SELECT *` bad? ====
* Transfers unnecessary data
* Uses more memory
* Prevents some index optimizations
* Slower network response
==== Better ====
SELECT id, name, email
FROM users;
Only select the columns your application actually needs.
----
===== 4. Use LIMIT for Large Queries =====
Fetching huge datasets is expensive.
==== Bad ====
SELECT *
FROM logs
ORDER BY created_at DESC;
==== Better ====
SELECT id, message, created_at
FROM logs
ORDER BY created_at DESC
LIMIT 100;
==== Benefits ====
* Reduces memory usage
* Faster response time
* Less network transfer
----
===== 5. Optimize JOIN Operations =====
JOINs are expensive when tables are large.
==== Ensure JOIN columns are indexed ====
SELECT o.id, u.name
FROM orders o
JOIN users u ON o.user_id = u.id;
Recommended indexes:
CREATE INDEX idx_orders_user_id ON orders(user_id);
Usually, primary keys are already indexed automatically.
==== Avoid unnecessary JOINs ====
Do not JOIN tables if the data is not needed.
==== Select only required columns ====
SELECT o.id, u.name
FROM orders o
JOIN users u ON o.user_id = u.id;
Avoid:
SELECT *
FROM orders o
JOIN users u ON o.user_id = u.id;
----
===== 6. Avoid N+1 Query Problems =====
The N+1 problem happens when an application executes one query to fetch a list, then runs additional queries for each item.
==== Bad Example ====
1 query to get users
+ 100 queries to get each user's orders
= 101 queries total
==== Example in application code ====
$users = User::all();
foreach ($users as $user) {
echo $user->orders;
}
This may generate hundreds of SQL queries.
----
==== Solution 1 — Eager Loading =====
Load related data in a single query.
$users = User::with('orders')->get();
Benefits:
* Fewer database queries
* Faster response
* Reduced database load
----
==== Solution 2 — Use JOIN =====
Sometimes a JOIN is more efficient.
SELECT users.name, orders.total
FROM users
JOIN orders ON users.id = orders.user_id;
Use JOIN when:
* You only need combined result data
* You do not need full ORM relationship objects
----
===== 7. Use Query Caching =====
Frequently executed queries should be cached.
==== Good candidates for caching ====
* Dashboard statistics
* Product lists
* Configuration data
* Frequently accessed reports
==== Example Flow ====
Application
↓
Check cache
↓
If cache exists → return cached data
If not:
Query database
Store result in cache
Return result
==== Popular cache systems ====
* Redis
* Memcached
==== Benefits ====
* Reduces database load
* Faster response times
* Better scalability
----
===== 8. Consider Denormalization Carefully =====
Normalization improves data consistency, but too many JOINs can hurt performance.
Sometimes it is beneficial to combine data from multiple tables into one table.
==== Example ====
Instead of:
orders
users
products
addresses
You may store some duplicated data directly in `orders`:
orders:
customer_name
customer_email
shipping_address
This reduces expensive JOIN operations.
----
==== Benefits ====
* Faster reads
* Simpler queries
* Fewer JOINs
==== Downsides ====
* Data duplication
* More storage usage
* Harder updates
* Risk of inconsistent data
==== Rule ====
Only denormalize after identifying real performance bottlenecks.
----
===== 9. Pagination Instead of Loading Everything =====
Avoid loading all rows at once.
==== Bad ====
SELECT *
FROM products;
==== Better ====
SELECT id, name, price
FROM products
LIMIT 20 OFFSET 0;
==== Even Better (Keyset Pagination) ====
SELECT id, name, price
FROM products
WHERE id > 100
ORDER BY id
LIMIT 20;
Keyset pagination is usually faster for large datasets.
----
===== 10. Monitor Slow Queries =====
Enable slow query logging in production.
==== Monitor ====
* Queries taking too long
* Queries scanning too many rows
* Missing indexes
* Frequent repeated queries
==== Tools ====
* MySQL Slow Query Log
* PostgreSQL `pg_stat_statements`
* APM tools (New Relic, Datadog)
----
===== Optimization Checklist =====
Before deploying a query, check:
* Did you run `EXPLAIN`?
* Are indexes used correctly?
* Are indexes missing on `WHERE`, `JOIN`, or `ORDER BY` columns?
* Are you avoiding `SELECT *`?
* Are you limiting returned rows?
* Are JOINs necessary?
* Is there an N+1 query problem?
* Can caching help?
* Can denormalization reduce heavy JOINs?
* Is pagination implemented?
----
===== Practical Recommendations =====
^ Situation ^ Recommended Solution ^
| Slow `WHERE` query | Add index |
| Slow sorting | Index `ORDER BY` columns |
| Too many ORM queries | Use eager loading |
| Heavy repeated reads | Add cache |
| Complex JOINs on huge tables | Consider denormalization |
| Large result sets | Use pagination |
| Slow query unknown cause | Use `EXPLAIN` |
----
===== Rule of Thumb =====
* First: measure with `EXPLAIN`
* Second: add proper indexes
* Third: reduce unnecessary data
* Fourth: optimize query patterns
* Fifth: cache expensive operations
Do not optimize blindly — optimize based on actual bottlenecks.
----