Query Optimization

Comprehensive guide to writing fast, efficient SQL queries

Query optimization is the art and science of writing SQL that executes efficiently. Understanding how databases process queries enables you to write performant code from the start.

How Query Processing Works

Parsing

The database parses your SQL into an abstract syntax tree (AST), checking for syntax errors.

Rewriting

Query is transformed: views expanded, rules applied, subqueries flattened.

Optimization

The optimizer generates multiple execution plans and chooses the most efficient one based on statistics.

Execution

The chosen plan is executed, accessing data through the most efficient path.

EXPLAIN Basics

Reading and understanding execution plans

The Optimizer's Decision Process

Quick Reference: Optimization Checklist

Essential Optimization Techniques

1. Use Indexes Effectively

-- ✅ Index can be used
SELECT * FROM orders WHERE customer_id = 123;

-- ❌ Index cannot be used (function on indexed column)
SELECT * FROM orders WHERE YEAR(created_at) = 2024;

-- ✅ Rewritten to use index
SELECT * FROM orders 
WHERE created_at >= '2024-01-01' 
  AND created_at < '2025-01-01';

2. Limit Data Early

-- ❌ Filters after joining all data
SELECT o.*, c.name
FROM orders o
JOIN customers c ON o.customer_id = c.id
WHERE o.created_at > '2024-01-01';

-- ✅ Filter before join (if optimizer doesn't do it)
SELECT o.*, c.name
FROM (
    SELECT * FROM orders WHERE created_at > '2024-01-01'
) o
JOIN customers c ON o.customer_id = c.id;

3. Choose the Right Join

-- Use EXISTS for existence checks (can short-circuit)
SELECT * FROM customers c
WHERE EXISTS (SELECT 1 FROM orders WHERE customer_id = c.id);

-- Use JOIN when you need data from both tables
SELECT c.*, o.total
FROM customers c
JOIN orders o ON c.id = o.customer_id;

4. Avoid N+1 Queries

-- ❌ N+1 Pattern (in application code)
customers = db.query("SELECT * FROM customers")
for customer in customers:
    orders = db.query(f"SELECT * FROM orders WHERE customer_id = {customer.id}")

-- ✅ Single query with JOIN
SELECT c.*, o.*
FROM customers c
LEFT JOIN orders o ON c.id = o.customer_id;

-- ✅ Or batch fetch
SELECT * FROM orders WHERE customer_id IN (1, 2, 3, 4, 5);

Cost is an arbitrary unit representing the estimated resources (CPU, I/O, memory) needed to execute a query. Lower is better, but absolute values are meaningless—only compare costs for the same query.

Scan Type	Relative Cost	When Used
Index-Only Scan	Lowest	All columns in index
Index Scan	Low	Few rows, random access OK
Bitmap Scan	Medium	Many rows, combines indexes
Sequential Scan	High	Most/all rows needed

Statistics Matter

The optimizer relies on table statistics to make decisions:

-- PostgreSQL: Update statistics
ANALYZE table_name;

-- MySQL: Update statistics
ANALYZE TABLE table_name;

-- SQL Server: Update statistics
UPDATE STATISTICS table_name;

-- View statistics
-- PostgreSQL
SELECT * FROM pg_stats WHERE tablename = 'orders';

-- Check when stats were last updated
SELECT relname, last_analyze, last_autoanalyze
FROM pg_stat_user_tables;

Stale Statistics

Stale statistics lead to poor query plans. The optimizer might choose a sequential scan when an index scan would be faster, or vice versa. Run ANALYZE after large data changes.

Common Performance Problems

Database-Specific Optimization

-- Enable timing
\timing on

-- Detailed EXPLAIN
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT) 
SELECT ...;

-- See actual I/O
EXPLAIN (ANALYZE, BUFFERS) SELECT ...;

-- Force specific plan (for testing)
SET enable_seqscan = off;
SET enable_indexscan = off;
SET enable_hashjoin = off;

-- Auto-explain for slow queries
LOAD 'auto_explain';
SET auto_explain.log_min_duration = '1s';

-- Analyze query
EXPLAIN SELECT ...;
EXPLAIN ANALYZE SELECT ...;  -- MySQL 8.0.18+

-- Show query profile
SET profiling = 1;
SELECT ...;
SHOW PROFILE FOR QUERY 1;

-- Optimizer hints
SELECT /*+ INDEX(orders idx_customer) */ *
FROM orders WHERE customer_id = 123;

-- Force index
SELECT * FROM orders FORCE INDEX (idx_customer)
WHERE customer_id = 123;

-- Show execution plan
SET STATISTICS IO ON;
SET STATISTICS TIME ON;

-- Estimated plan
SET SHOWPLAN_ALL ON;
SELECT ...;
SET SHOWPLAN_ALL OFF;

-- Actual plan (graphical in SSMS)
SET STATISTICS PROFILE ON;
SELECT ...;

-- Query hints
SELECT * FROM orders WITH (INDEX(idx_customer))
WHERE customer_id = 123;

OPTION (RECOMPILE);
OPTION (OPTIMIZE FOR (@param = 'value'));

Performance Tuning Workflow

Identify Slow Queries

Enable slow query logging or use monitoring tools.

-- PostgreSQL: pg_stat_statements
SELECT query, calls, mean_time, total_time
FROM pg_stat_statements
ORDER BY total_time DESC
LIMIT 10;

Analyze with EXPLAIN

Get the execution plan for the slow query.

EXPLAIN (ANALYZE, BUFFERS)
SELECT ...;

Identify Bottlenecks

Look for:

Sequential scans on large tables
High loop counts
Large sort operations
Estimate vs actual mismatches

Apply Optimizations

Add missing indexes
Rewrite query logic
Update statistics
Adjust database parameters

INSERT, UPDATE, DELETE, and MERGE statements for modifying database data

Understanding EXPLAIN

Master execution plan analysis to understand and optimize query performance

Query Optimization

How Query Processing Works

Parsing

Rewriting

Optimization

Execution

Quick Navigation

EXPLAIN Basics

Join Optimization

Subquery Optimization

Common Anti-patterns

The Optimizer's Decision Process

Quick Reference: Optimization Checklist

Planning Phase

Writing Phase

Verification Phase

Essential Optimization Techniques

1. Use Indexes Effectively

2. Limit Data Early

3. Choose the Right Join

4. Avoid N+1 Queries

Understanding Costs

Statistics Matter

Common Performance Problems

Database-Specific Optimization

Performance Tuning Workflow

Identify Slow Queries

Analyze with EXPLAIN

Identify Bottlenecks

Apply Optimizations

Verify Improvement

Next Steps

EXPLAIN Deep Dive

Indexing Strategies

On this page

Query Optimization

EXPLAIN Basics

Join Optimization

Subquery Optimization

Common Anti-patterns

Sequential Scans on Large Tables

High Row Estimates vs Actual

Expensive Sorts

Nested Loop with High Loops

EXPLAIN Deep Dive

Indexing Strategies

On this page