SQL Performance Optimization Guide

SQL Performance Optimization Guide

A practical guide to optimizing SQL queries and database performance.

Core Principles

When optimizing SQL, focus on these goals:

Reduce the number of scanned rows
Reduce unnecessary data transfer
Reduce expensive JOIN operations
Avoid repeated queries
Use indexes effectively
Cache frequently accessed data

1. Use EXPLAIN to Analyze Queries

Before optimizing a query, inspect its execution plan using `EXPLAIN`.

Why use EXPLAIN?

`EXPLAIN` shows:

Whether indexes are used
Full table scans
Join order
Number of scanned rows
Sorting operations
Temporary tables

Example

EXPLAIN
SELECT *
FROM orders
WHERE customer_id = 10
ORDER BY created_at DESC;

Things to watch for

Problem	Meaning
Full table scan	Database scans entire table
Using temporary	Temporary table created internally
Using filesort	Expensive sorting operation
Large row count	Too many rows scanned
No index used	Missing or incorrect index

Goal

Queries should:

Use indexes
Scan as few rows as possible
Avoid unnecessary sorting and temporary tables

2. Add Proper Indexes

Indexes are one of the biggest performance improvements in SQL.

Add indexes on commonly used columns

Create indexes for columns frequently used in:

`WHERE`
`JOIN`
`ORDER BY`
`GROUP BY`

Example

-- WHERE condition
CREATE INDEX idx_orders_customer_id
ON orders(customer_id);
 
-- ORDER BY
CREATE INDEX idx_orders_created_at
ON orders(created_at);
 
-- JOIN
CREATE INDEX idx_orders_user_id
ON orders(user_id);

Composite Index Example

If queries often filter by multiple columns:

SELECT *
FROM orders
WHERE customer_id = 10
AND STATUS = 'paid';

Use a composite index:

CREATE INDEX idx_orders_customer_status
ON orders(customer_id, STATUS);

Important Notes

Indexes improve reads but slow down writes (`INSERT`, `UPDATE`, `DELETE`)
Avoid adding indexes on every column
Remove unused indexes

3. Select Only Needed Columns

Avoid using:

SELECT *
FROM users;

Why is `SELECT *` bad?

Transfers unnecessary data
Uses more memory
Prevents some index optimizations
Slower network response

Better

SELECT id, name, email
FROM users;

Only select the columns your application actually needs.

4. Use LIMIT for Large Queries

Fetching huge datasets is expensive.

Bad

SELECT *
FROM logs
ORDER BY created_at DESC;

Better

SELECT id, message, created_at
FROM logs
ORDER BY created_at DESC
LIMIT 100;

Benefits

Reduces memory usage
Faster response time
Less network transfer

5. Optimize JOIN Operations

JOINs are expensive when tables are large.

Ensure JOIN columns are indexed

SELECT o.id, u.name
FROM orders o
JOIN users u ON o.user_id = u.id;

Recommended indexes:

CREATE INDEX idx_orders_user_id ON orders(user_id);

Usually, primary keys are already indexed automatically.

Avoid unnecessary JOINs

Do not JOIN tables if the data is not needed.

Select only required columns

SELECT o.id, u.name
FROM orders o
JOIN users u ON o.user_id = u.id;

Avoid:

SELECT *
FROM orders o
JOIN users u ON o.user_id = u.id;

6. Avoid N+1 Query Problems

The N+1 problem happens when an application executes one query to fetch a list, then runs additional queries for each item.

Bad Example

1 query to get users
+ 100 queries to get each user's orders
= 101 queries total

Example in application code

$users = User::all();
 
foreach ($users as $user) {
    echo $user->orders;
}

This may generate hundreds of SQL queries.

Solution 1 — Eager Loading

Load related data in a single query.

$users = User::with('orders')->get();

Benefits:

Fewer database queries
Faster response
Reduced database load

Solution 2 — Use JOIN

Sometimes a JOIN is more efficient.

SELECT users.name, orders.total
FROM users
JOIN orders ON users.id = orders.user_id;

Use JOIN when:

You only need combined result data
You do not need full ORM relationship objects

7. Use Query Caching

Frequently executed queries should be cached.

Good candidates for caching

Dashboard statistics
Product lists
Configuration data
Frequently accessed reports

Example Flow

Application
    ↓
Check cache
    ↓
If cache exists → return cached data
If not:
    Query database
    Store result in cache
    Return result

Popular cache systems

Redis
Memcached

Benefits

Reduces database load
Faster response times
Better scalability

8. Consider Denormalization Carefully

Normalization improves data consistency, but too many JOINs can hurt performance.

Sometimes it is beneficial to combine data from multiple tables into one table.

Example

Instead of:

orders
users
products
addresses

You may store some duplicated data directly in `orders`:

orders:
  customer_name
  customer_email
  shipping_address

This reduces expensive JOIN operations.

Benefits

Faster reads
Simpler queries
Fewer JOINs

Downsides

Data duplication
More storage usage
Harder updates
Risk of inconsistent data

Rule

Only denormalize after identifying real performance bottlenecks.

9. Pagination Instead of Loading Everything

Avoid loading all rows at once.

Bad

SELECT *
FROM products;

Better

SELECT id, name, price
FROM products
LIMIT 20 OFFSET 0;

Even Better (Keyset Pagination)

SELECT id, name, price
FROM products
WHERE id > 100
ORDER BY id
LIMIT 20;

Keyset pagination is usually faster for large datasets.

10. Monitor Slow Queries

Enable slow query logging in production.

Monitor

Queries taking too long
Queries scanning too many rows
Missing indexes
Frequent repeated queries

Tools

MySQL Slow Query Log
PostgreSQL `pg_stat_statements`
APM tools (New Relic, Datadog)

Optimization Checklist

Before deploying a query, check:

Did you run `EXPLAIN`?
Are indexes used correctly?
Are indexes missing on `WHERE`, `JOIN`, or `ORDER BY` columns?
Are you avoiding `SELECT *`?
Are you limiting returned rows?
Are JOINs necessary?
Is there an N+1 query problem?
Can caching help?
Can denormalization reduce heavy JOINs?
Is pagination implemented?

Practical Recommendations

Situation	Recommended Solution
Slow `WHERE` query	Add index
Slow sorting	Index `ORDER BY` columns
Too many ORM queries	Use eager loading
Heavy repeated reads	Add cache
Complex JOINs on huge tables	Consider denormalization
Large result sets	Use pagination
Slow query unknown cause	Use `EXPLAIN`

Rule of Thumb

First: measure with `EXPLAIN`
Second: add proper indexes
Third: reduce unnecessary data
Fourth: optimize query patterns
Fifth: cache expensive operations

Do not optimize blindly — optimize based on actual bottlenecks.

Table of Contents

SQL Performance Optimization Guide

Core Principles

1. Use EXPLAIN to Analyze Queries

Why use EXPLAIN?

Example

Things to watch for

Goal

2. Add Proper Indexes

Add indexes on commonly used columns

Example

Composite Index Example

Important Notes

3. Select Only Needed Columns

Why is `SELECT *` bad?

Better

4. Use LIMIT for Large Queries

Bad

Better

Benefits

5. Optimize JOIN Operations

Ensure JOIN columns are indexed

Avoid unnecessary JOINs

Select only required columns

6. Avoid N+1 Query Problems

Bad Example

Example in application code

Solution 1 — Eager Loading

Solution 2 — Use JOIN

7. Use Query Caching

Good candidates for caching

Example Flow

Popular cache systems

Benefits

8. Consider Denormalization Carefully

Example

Benefits

Downsides

Rule

9. Pagination Instead of Loading Everything

Bad

Better

Even Better (Keyset Pagination)

10. Monitor Slow Queries

Monitor

Tools

Optimization Checklist

Practical Recommendations

Rule of Thumb