User Tools

Site Tools


mysql:questions:how-to-optimize-sql

SQL Performance Optimization Guide

A practical guide to optimizing SQL queries and database performance.

Core Principles

When optimizing SQL, focus on these goals:

  • Reduce the number of scanned rows
  • Reduce unnecessary data transfer
  • Reduce expensive JOIN operations
  • Avoid repeated queries
  • Use indexes effectively
  • Cache frequently accessed data

1. Use EXPLAIN to Analyze Queries

Before optimizing a query, inspect its execution plan using `EXPLAIN`.

Why use EXPLAIN?

`EXPLAIN` shows:

  • Whether indexes are used
  • Full table scans
  • Join order
  • Number of scanned rows
  • Sorting operations
  • Temporary tables

Example

EXPLAIN
SELECT *
FROM orders
WHERE customer_id = 10
ORDER BY created_at DESC;

Things to watch for

Problem Meaning
Full table scan Database scans entire table
Using temporary Temporary table created internally
Using filesort Expensive sorting operation
Large row count Too many rows scanned
No index used Missing or incorrect index

Goal

Queries should:

  • Use indexes
  • Scan as few rows as possible
  • Avoid unnecessary sorting and temporary tables

2. Add Proper Indexes

Indexes are one of the biggest performance improvements in SQL.

Add indexes on commonly used columns

Create indexes for columns frequently used in:

  • `WHERE`
  • `JOIN`
  • `ORDER BY`
  • `GROUP BY`

Example

-- WHERE condition
CREATE INDEX idx_orders_customer_id
ON orders(customer_id);
 
-- ORDER BY
CREATE INDEX idx_orders_created_at
ON orders(created_at);
 
-- JOIN
CREATE INDEX idx_orders_user_id
ON orders(user_id);

Composite Index Example

If queries often filter by multiple columns:

SELECT *
FROM orders
WHERE customer_id = 10
AND STATUS = 'paid';

Use a composite index:

CREATE INDEX idx_orders_customer_status
ON orders(customer_id, STATUS);

Important Notes

  • Indexes improve reads but slow down writes (`INSERT`, `UPDATE`, `DELETE`)
  • Avoid adding indexes on every column
  • Remove unused indexes

3. Select Only Needed Columns

Avoid using:

SELECT *
FROM users;

Why is `SELECT *` bad?

  • Transfers unnecessary data
  • Uses more memory
  • Prevents some index optimizations
  • Slower network response

Better

SELECT id, name, email
FROM users;

Only select the columns your application actually needs.


4. Use LIMIT for Large Queries

Fetching huge datasets is expensive.

Bad

SELECT *
FROM logs
ORDER BY created_at DESC;

Better

SELECT id, message, created_at
FROM logs
ORDER BY created_at DESC
LIMIT 100;

Benefits

  • Reduces memory usage
  • Faster response time
  • Less network transfer

5. Optimize JOIN Operations

JOINs are expensive when tables are large.

Ensure JOIN columns are indexed

SELECT o.id, u.name
FROM orders o
JOIN users u ON o.user_id = u.id;

Recommended indexes:

CREATE INDEX idx_orders_user_id ON orders(user_id);

Usually, primary keys are already indexed automatically.

Avoid unnecessary JOINs

Do not JOIN tables if the data is not needed.

Select only required columns

SELECT o.id, u.name
FROM orders o
JOIN users u ON o.user_id = u.id;

Avoid:

SELECT *
FROM orders o
JOIN users u ON o.user_id = u.id;

6. Avoid N+1 Query Problems

The N+1 problem happens when an application executes one query to fetch a list, then runs additional queries for each item.

Bad Example

1 query to get users
+ 100 queries to get each user's orders
= 101 queries total

Example in application code

$users = User::all();
 
foreach ($users as $user) {
    echo $user->orders;
}

This may generate hundreds of SQL queries.


Solution 1 — Eager Loading

Load related data in a single query.

$users = User::with('orders')->get();

Benefits:

  • Fewer database queries
  • Faster response
  • Reduced database load

Solution 2 — Use JOIN

Sometimes a JOIN is more efficient.

SELECT users.name, orders.total
FROM users
JOIN orders ON users.id = orders.user_id;

Use JOIN when:

  • You only need combined result data
  • You do not need full ORM relationship objects

7. Use Query Caching

Frequently executed queries should be cached.

Good candidates for caching

  • Dashboard statistics
  • Product lists
  • Configuration data
  • Frequently accessed reports

Example Flow

Application
    ↓
Check cache
    ↓
If cache exists → return cached data
If not:
    Query database
    Store result in cache
    Return result
  • Redis
  • Memcached

Benefits

  • Reduces database load
  • Faster response times
  • Better scalability

8. Consider Denormalization Carefully

Normalization improves data consistency, but too many JOINs can hurt performance.

Sometimes it is beneficial to combine data from multiple tables into one table.

Example

Instead of:

orders
users
products
addresses

You may store some duplicated data directly in `orders`:

orders:
  customer_name
  customer_email
  shipping_address

This reduces expensive JOIN operations.


Benefits

  • Faster reads
  • Simpler queries
  • Fewer JOINs

Downsides

  • Data duplication
  • More storage usage
  • Harder updates
  • Risk of inconsistent data

Rule

Only denormalize after identifying real performance bottlenecks.


9. Pagination Instead of Loading Everything

Avoid loading all rows at once.

Bad

SELECT *
FROM products;

Better

SELECT id, name, price
FROM products
LIMIT 20 OFFSET 0;

Even Better (Keyset Pagination)

SELECT id, name, price
FROM products
WHERE id > 100
ORDER BY id
LIMIT 20;

Keyset pagination is usually faster for large datasets.


10. Monitor Slow Queries

Enable slow query logging in production.

Monitor

  • Queries taking too long
  • Queries scanning too many rows
  • Missing indexes
  • Frequent repeated queries

Tools

  • MySQL Slow Query Log
  • PostgreSQL `pg_stat_statements`
  • APM tools (New Relic, Datadog)

Optimization Checklist

Before deploying a query, check:

  • Did you run `EXPLAIN`?
  • Are indexes used correctly?
  • Are indexes missing on `WHERE`, `JOIN`, or `ORDER BY` columns?
  • Are you avoiding `SELECT *`?
  • Are you limiting returned rows?
  • Are JOINs necessary?
  • Is there an N+1 query problem?
  • Can caching help?
  • Can denormalization reduce heavy JOINs?
  • Is pagination implemented?

Practical Recommendations

Situation Recommended Solution
Slow `WHERE` query Add index
Slow sorting Index `ORDER BY` columns
Too many ORM queries Use eager loading
Heavy repeated reads Add cache
Complex JOINs on huge tables Consider denormalization
Large result sets Use pagination
Slow query unknown cause Use `EXPLAIN`

Rule of Thumb

  • First: measure with `EXPLAIN`
  • Second: add proper indexes
  • Third: reduce unnecessary data
  • Fourth: optimize query patterns
  • Fifth: cache expensive operations

Do not optimize blindly — optimize based on actual bottlenecks.


mysql/questions/how-to-optimize-sql.txt · Last modified: by phong2018