DevDocsDev Docs
Query Optimization

Understanding EXPLAIN

Master execution plan analysis to understand and optimize query performance

EXPLAIN is your primary tool for understanding how the database executes queries. Learning to read execution plans is essential for query optimization.

EXPLAIN Basics

-- Basic explain (estimated plan)
EXPLAIN SELECT * FROM orders WHERE customer_id = 123;

-- With actual execution times
EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 123;

-- With buffer/IO information
EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM orders WHERE customer_id = 123;

-- Different output formats
EXPLAIN (FORMAT JSON) SELECT ...;
EXPLAIN (FORMAT YAML) SELECT ...;
EXPLAIN (FORMAT XML) SELECT ...;

-- All options
EXPLAIN (
    ANALYZE,      -- Actually run the query
    BUFFERS,      -- Show buffer usage
    COSTS,        -- Show cost estimates (default: on)
    TIMING,       -- Show actual timing (default: on with ANALYZE)
    VERBOSE,      -- Show additional info
    FORMAT TEXT   -- Output format
) SELECT ...;
-- Basic explain
EXPLAIN SELECT * FROM orders WHERE customer_id = 123;

-- With analyze (MySQL 8.0.18+)
EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 123;

-- Extended explain
EXPLAIN EXTENDED SELECT * FROM orders WHERE customer_id = 123;

-- JSON format (more details)
EXPLAIN FORMAT=JSON SELECT * FROM orders WHERE customer_id = 123;

-- Tree format (MySQL 8.0.16+)
EXPLAIN FORMAT=TREE SELECT * FROM orders WHERE customer_id = 123;
-- Estimated execution plan
SET SHOWPLAN_ALL ON;
GO
SELECT * FROM orders WHERE customer_id = 123;
GO
SET SHOWPLAN_ALL OFF;

-- Actual execution plan with statistics
SET STATISTICS IO ON;
SET STATISTICS TIME ON;
SELECT * FROM orders WHERE customer_id = 123;

-- XML plan
SET SHOWPLAN_XML ON;
GO
SELECT * FROM orders WHERE customer_id = 123;
GO
SET SHOWPLAN_XML OFF;

-- In SSMS: Ctrl+L for estimated, Ctrl+M for actual plan

Reading PostgreSQL EXPLAIN Output

Basic Structure

EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 123;
EXPLAIN Output Breakdown
Index Scan using idx_orders_customer on orders  (cost=0.42..8.44 rows=1 width=48) 
   (actual time=0.015..0.016 rows=1 loops=1) 
   Index Cond: (customer_id = 123)
   Buffers: shared hit=3
Planning Time: 0.085 ms
Execution Time: 0.032 ms

Understanding the Output

ComponentExampleMeaning
Node TypeIndex ScanHow data is accessed
Indexidx_orders_customerWhich index is used
Cost0.42..8.44Startup cost..total cost (arbitrary units)
Rowsrows=1Estimated rows to return
Widthwidth=48Average row size in bytes
Actual Time0.015..0.016Real execution time (ms)
Actual Rowsrows=1Real rows returned
Loopsloops=1Times this node executed
Buffersshared hit=3Pages read from cache

Key Components

Cost Explained

The cost is shown as (cost=startup..total):

Cost Example
(cost=0.42..8.44 rows=1 width=48)
--    ^^^^ ^^^^ --
--    |    └─ Total cost to return all rows
--    └─ Startup cost (before first row)
PartValueMeaning
Startup0.42Work before returning first row
Total8.44Total work to return all rows

Cost Factors

The optimizer uses seq_page_cost, random_page_cost, cpu_tuple_cost, and other parameters to calculate cost. You can view these:

SHOW seq_page_cost;       -- Default: 1.0 --
SHOW random_page_cost;    -- Default: 4.0 --
SHOW cpu_tuple_cost;      -- Default: 0.01 --

Important

Cost units are arbitrary - they're only useful for comparing plans within the same query, not across different queries.

Row Estimates

Row Estimate
(cost=0.42..8.44 rows=1 width=48)

The rows=1 is the estimated number of rows this node will return.

Why estimates matter:

  • Optimizer uses row estimates to choose plans
  • Wrong estimates → wrong plan → slow queries

Common causes of bad estimates:

  • Stale statistics (run ANALYZE)
  • Correlated columns
  • Complex expressions
  • Non-uniform data distribution

Actual Execution (EXPLAIN ANALYZE only)

Actual Metrics
(actual time=0.015..0.016 rows=1 loops=1)
--           ^^^^^ ^^^^^ ^^^^ ^^^^^^^ --
--           |     |     |    └─ Times node executed
--           |     |     └─ Actual rows returned
--           |     └─ Time to return all rows (ms)
--           └─ Time to return first row (ms)
PartValueMeaning
Start time0.015Time to return first row (ms)
End time0.016Time to return all rows (ms)
Rows1Actual rows returned
Loops1Times this node executed

Red Flag: Estimate Mismatch

When estimated and actual rows differ significantly:

Index Scan  (cost=... rows=100) (actual rows=10000) -- [!code error]
--                    ^^^^^^^          ^^^^^^^
--                    Expected 100     Got 10,000!

This means statistics are stale. Fix with:

ANALYZE table_name; 

Buffer Usage (BUFFERS option)

EXPLAIN (ANALYZE, BUFFERS) SELECT ...;
Buffer Statistics
Buffers: shared hit=3 read=2 dirtied=0 written=0
--       ^^^^^^^^^^ ^^^^^^ --
--       From cache From disk (slow!)
MetricMeaning
shared hitPages found in cache ✅ Fast
shared readPages read from disk ⚠️ Slow
dirtiedPages modified
writtenPages written to disk

More hits = faster. High read count means data isn't cached - consider if this is expected or if you need more shared_buffers.

Common Node Types

Scan Types

Join Types

Interpreting Complex Plans

Reading Plan Trees

Sort  (cost=100..110 rows=100)
  Sort Key: orders.created_at DESC
  ->  Hash Join  (cost=10..80 rows=100)
        Hash Cond: (orders.customer_id = customers.id)
        ->  Seq Scan on orders  (cost=0..50 rows=1000)
              Filter: (total > 100)
        ->  Hash
              ->  Index Scan on customers  (cost=0..20 rows=50)
                    Index Cond: (status = 'active')

Read from inside out, bottom to top:

  1. Index Scan on customers → finds active customers
  2. Hash → builds hash table from customers
  3. Seq Scan on orders → filters orders > 100
  4. Hash Join → matches orders to customers
  5. Sort → sorts result by date

Cost Accumulation

Parent node cost includes child costs.

Troubleshooting Patterns

Pattern: Wrong Row Estimate

Index Scan (rows=100) (actual rows=50000) -- [!code error]

Solution:

ANALYZE table_name; 
-- Or for specific columns:
ANALYZE table_name(column_name);

Pattern: Seq Scan on Large Table

Seq Scan on large_table (cost=0..100000) -- [!code warning]
  Filter: (indexed_column = 'value')

Solutions:

\di large_table -- Check if index exists

CREATE INDEX idx ON large_table(indexed_column); 

ANALYZE large_table; -- Check statistics

-- Check for function on column
WHERE LOWER(column) = 'value'
WHERE column = 'VALUE'

Pattern: High Loops Count

Nested Loop (actual loops=100000) -- [!code error]
  ->  Seq Scan on table_a
  ->  Index Scan on table_b

Solutions:

CREATE INDEX idx ON table_b(join_column); 

-- Or force hash join (if appropriate)
SET enable_nestloop = off;

Pattern: Sort on Disk

Sort (Sort Method: external merge Disk: 50MB) -- [!code warning]

Solutions:

SET work_mem = '256MB'; 

-- Or add index to avoid sort
CREATE INDEX idx ON table(sort_column); 

Visual Tools

explain.depesz.com

  1. Run EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
  2. Copy output to https://explain.depesz.com
  3. Get visual breakdown with problem areas highlighted

Features:

  • Highlights slow nodes
  • Shows exclusive vs inclusive time
  • Color-codes by performance

pgAdmin Graphical Explain

  1. Open Query Tool
  2. Write your query
  3. Click "Explain Analyze" (F7)
  4. View graphical plan

Features:

  • Visual node representation
  • Click nodes for details
  • Shows data flow

pev2 (Postgres Explain Visualizer 2)

https://explain.dalibo.com/

  1. Run EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON)
  2. Paste JSON output
  3. Get interactive visualization

Features:

  • Modern interface
  • Statistics comparison
  • Shareable links

Best Practices

EXPLAIN Checklist

  1. Always use ANALYZE for real data, not estimates
  2. Use BUFFERS to see I/O patterns
  3. Compare estimated vs actual rows to detect statistics issues
  4. Look for the most expensive nodes first
  5. Check for sequential scans on large tables
  6. Watch for high loop counts in nested loops
  7. Monitor buffer reads vs hits for caching efficiency

Next Steps

On this page