Table of Contents
SQL Performance Optimization Guide
A practical guide to optimizing SQL queries and database performance.
Core Principles
When optimizing SQL, focus on these goals:
- Reduce the number of scanned rows
- Reduce unnecessary data transfer
- Reduce expensive JOIN operations
- Avoid repeated queries
- Use indexes effectively
- Cache frequently accessed data
1. Use EXPLAIN to Analyze Queries
Before optimizing a query, inspect its execution plan using `EXPLAIN`.
Why use EXPLAIN?
`EXPLAIN` shows:
- Whether indexes are used
- Full table scans
- Join order
- Number of scanned rows
- Sorting operations
- Temporary tables
Example
EXPLAIN SELECT * FROM orders WHERE customer_id = 10 ORDER BY created_at DESC;
Things to watch for
| Problem | Meaning |
|---|---|
| Full table scan | Database scans entire table |
| Using temporary | Temporary table created internally |
| Using filesort | Expensive sorting operation |
| Large row count | Too many rows scanned |
| No index used | Missing or incorrect index |
Goal
Queries should:
- Use indexes
- Scan as few rows as possible
- Avoid unnecessary sorting and temporary tables
2. Add Proper Indexes
Indexes are one of the biggest performance improvements in SQL.
Add indexes on commonly used columns
Create indexes for columns frequently used in:
- `WHERE`
- `JOIN`
- `ORDER BY`
- `GROUP BY`
Example
-- WHERE condition CREATE INDEX idx_orders_customer_id ON orders(customer_id); -- ORDER BY CREATE INDEX idx_orders_created_at ON orders(created_at); -- JOIN CREATE INDEX idx_orders_user_id ON orders(user_id);
Composite Index Example
If queries often filter by multiple columns:
SELECT * FROM orders WHERE customer_id = 10 AND STATUS = 'paid';
Use a composite index:
CREATE INDEX idx_orders_customer_status ON orders(customer_id, STATUS);
Important Notes
- Indexes improve reads but slow down writes (`INSERT`, `UPDATE`, `DELETE`)
- Avoid adding indexes on every column
- Remove unused indexes
3. Select Only Needed Columns
Avoid using:
SELECT * FROM users;
Why is `SELECT *` bad?
- Transfers unnecessary data
- Uses more memory
- Prevents some index optimizations
- Slower network response
Better
SELECT id, name, email FROM users;
Only select the columns your application actually needs.
4. Use LIMIT for Large Queries
Fetching huge datasets is expensive.
Bad
SELECT * FROM logs ORDER BY created_at DESC;
Better
SELECT id, message, created_at FROM logs ORDER BY created_at DESC LIMIT 100;
Benefits
- Reduces memory usage
- Faster response time
- Less network transfer
5. Optimize JOIN Operations
JOINs are expensive when tables are large.
Ensure JOIN columns are indexed
SELECT o.id, u.name FROM orders o JOIN users u ON o.user_id = u.id;
Recommended indexes:
CREATE INDEX idx_orders_user_id ON orders(user_id);
Usually, primary keys are already indexed automatically.
Avoid unnecessary JOINs
Do not JOIN tables if the data is not needed.
Select only required columns
SELECT o.id, u.name FROM orders o JOIN users u ON o.user_id = u.id;
Avoid:
SELECT * FROM orders o JOIN users u ON o.user_id = u.id;
6. Avoid N+1 Query Problems
The N+1 problem happens when an application executes one query to fetch a list, then runs additional queries for each item.
Bad Example
1 query to get users + 100 queries to get each user's orders = 101 queries total
Example in application code
$users = User::all(); foreach ($users as $user) { echo $user->orders; }
This may generate hundreds of SQL queries.
Solution 1 — Eager Loading
Load related data in a single query.
$users = User::with('orders')->get();
Benefits:
- Fewer database queries
- Faster response
- Reduced database load
Solution 2 — Use JOIN
Sometimes a JOIN is more efficient.
SELECT users.name, orders.total FROM users JOIN orders ON users.id = orders.user_id;
Use JOIN when:
- You only need combined result data
- You do not need full ORM relationship objects
7. Use Query Caching
Frequently executed queries should be cached.
Good candidates for caching
- Dashboard statistics
- Product lists
- Configuration data
- Frequently accessed reports
Example Flow
Application
↓
Check cache
↓
If cache exists → return cached data
If not:
Query database
Store result in cache
Return result
Popular cache systems
- Redis
- Memcached
Benefits
- Reduces database load
- Faster response times
- Better scalability
8. Consider Denormalization Carefully
Normalization improves data consistency, but too many JOINs can hurt performance.
Sometimes it is beneficial to combine data from multiple tables into one table.
Example
Instead of:
orders users products addresses
You may store some duplicated data directly in `orders`:
orders: customer_name customer_email shipping_address
This reduces expensive JOIN operations.
Benefits
- Faster reads
- Simpler queries
- Fewer JOINs
Downsides
- Data duplication
- More storage usage
- Harder updates
- Risk of inconsistent data
Rule
Only denormalize after identifying real performance bottlenecks.
9. Pagination Instead of Loading Everything
Avoid loading all rows at once.
Bad
SELECT * FROM products;
Better
SELECT id, name, price FROM products LIMIT 20 OFFSET 0;
Even Better (Keyset Pagination)
SELECT id, name, price FROM products WHERE id > 100 ORDER BY id LIMIT 20;
Keyset pagination is usually faster for large datasets.
10. Monitor Slow Queries
Enable slow query logging in production.
Monitor
- Queries taking too long
- Queries scanning too many rows
- Missing indexes
- Frequent repeated queries
Tools
- MySQL Slow Query Log
- PostgreSQL `pg_stat_statements`
- APM tools (New Relic, Datadog)
Optimization Checklist
Before deploying a query, check:
- Did you run `EXPLAIN`?
- Are indexes used correctly?
- Are indexes missing on `WHERE`, `JOIN`, or `ORDER BY` columns?
- Are you avoiding `SELECT *`?
- Are you limiting returned rows?
- Are JOINs necessary?
- Is there an N+1 query problem?
- Can caching help?
- Can denormalization reduce heavy JOINs?
- Is pagination implemented?
Practical Recommendations
| Situation | Recommended Solution |
|---|---|
| Slow `WHERE` query | Add index |
| Slow sorting | Index `ORDER BY` columns |
| Too many ORM queries | Use eager loading |
| Heavy repeated reads | Add cache |
| Complex JOINs on huge tables | Consider denormalization |
| Large result sets | Use pagination |
| Slow query unknown cause | Use `EXPLAIN` |
Rule of Thumb
- First: measure with `EXPLAIN`
- Second: add proper indexes
- Third: reduce unnecessary data
- Fourth: optimize query patterns
- Fifth: cache expensive operations
Do not optimize blindly — optimize based on actual bottlenecks.
