Understanding EXPLAIN
Master execution plan analysis to understand and optimize query performance
EXPLAIN is your primary tool for understanding how the database executes queries. Learning to read execution plans is essential for query optimization.
EXPLAIN Basics
-- Basic explain (estimated plan)
EXPLAIN SELECT * FROM orders WHERE customer_id = 123;
-- With actual execution times
EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 123;
-- With buffer/IO information
EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM orders WHERE customer_id = 123;
-- Different output formats
EXPLAIN (FORMAT JSON) SELECT ...;
EXPLAIN (FORMAT YAML) SELECT ...;
EXPLAIN (FORMAT XML) SELECT ...;
-- All options
EXPLAIN (
ANALYZE, -- Actually run the query
BUFFERS, -- Show buffer usage
COSTS, -- Show cost estimates (default: on)
TIMING, -- Show actual timing (default: on with ANALYZE)
VERBOSE, -- Show additional info
FORMAT TEXT -- Output format
) SELECT ...;-- Basic explain
EXPLAIN SELECT * FROM orders WHERE customer_id = 123;
-- With analyze (MySQL 8.0.18+)
EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 123;
-- Extended explain
EXPLAIN EXTENDED SELECT * FROM orders WHERE customer_id = 123;
-- JSON format (more details)
EXPLAIN FORMAT=JSON SELECT * FROM orders WHERE customer_id = 123;
-- Tree format (MySQL 8.0.16+)
EXPLAIN FORMAT=TREE SELECT * FROM orders WHERE customer_id = 123;-- Estimated execution plan
SET SHOWPLAN_ALL ON;
GO
SELECT * FROM orders WHERE customer_id = 123;
GO
SET SHOWPLAN_ALL OFF;
-- Actual execution plan with statistics
SET STATISTICS IO ON;
SET STATISTICS TIME ON;
SELECT * FROM orders WHERE customer_id = 123;
-- XML plan
SET SHOWPLAN_XML ON;
GO
SELECT * FROM orders WHERE customer_id = 123;
GO
SET SHOWPLAN_XML OFF;
-- In SSMS: Ctrl+L for estimated, Ctrl+M for actual planReading PostgreSQL EXPLAIN Output
Basic Structure
EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 123;Index Scan using idx_orders_customer on orders (cost=0.42..8.44 rows=1 width=48)
(actual time=0.015..0.016 rows=1 loops=1)
Index Cond: (customer_id = 123)
Buffers: shared hit=3
Planning Time: 0.085 ms
Execution Time: 0.032 msUnderstanding the Output
| Component | Example | Meaning |
|---|---|---|
| Node Type | Index Scan | How data is accessed |
| Index | idx_orders_customer | Which index is used |
| Cost | 0.42..8.44 | Startup cost..total cost (arbitrary units) |
| Rows | rows=1 | Estimated rows to return |
| Width | width=48 | Average row size in bytes |
| Actual Time | 0.015..0.016 | Real execution time (ms) |
| Actual Rows | rows=1 | Real rows returned |
| Loops | loops=1 | Times this node executed |
| Buffers | shared hit=3 | Pages read from cache |
Key Components
Cost Explained
The cost is shown as (cost=startup..total):
(cost=0.42..8.44 rows=1 width=48)
-- ^^^^ ^^^^ --
-- | └─ Total cost to return all rows
-- └─ Startup cost (before first row)| Part | Value | Meaning |
|---|---|---|
| Startup | 0.42 | Work before returning first row |
| Total | 8.44 | Total work to return all rows |
Cost Factors
The optimizer uses seq_page_cost, random_page_cost, cpu_tuple_cost, and other parameters to calculate cost. You can view these:
SHOW seq_page_cost; -- Default: 1.0 --
SHOW random_page_cost; -- Default: 4.0 --
SHOW cpu_tuple_cost; -- Default: 0.01 --Important
Cost units are arbitrary - they're only useful for comparing plans within the same query, not across different queries.
Row Estimates
(cost=0.42..8.44 rows=1 width=48)The rows=1 is the estimated number of rows this node will return.
Why estimates matter:
- Optimizer uses row estimates to choose plans
- Wrong estimates → wrong plan → slow queries
Common causes of bad estimates:
- Stale statistics (run
ANALYZE) - Correlated columns
- Complex expressions
- Non-uniform data distribution
Actual Execution (EXPLAIN ANALYZE only)
(actual time=0.015..0.016 rows=1 loops=1)
-- ^^^^^ ^^^^^ ^^^^ ^^^^^^^ --
-- | | | └─ Times node executed
-- | | └─ Actual rows returned
-- | └─ Time to return all rows (ms)
-- └─ Time to return first row (ms)| Part | Value | Meaning |
|---|---|---|
| Start time | 0.015 | Time to return first row (ms) |
| End time | 0.016 | Time to return all rows (ms) |
| Rows | 1 | Actual rows returned |
| Loops | 1 | Times this node executed |
Red Flag: Estimate Mismatch
When estimated and actual rows differ significantly:
Index Scan (cost=... rows=100) (actual rows=10000) -- [!code error]
-- ^^^^^^^ ^^^^^^^
-- Expected 100 Got 10,000!This means statistics are stale. Fix with:
ANALYZE table_name; Buffer Usage (BUFFERS option)
EXPLAIN (ANALYZE, BUFFERS) SELECT ...;Buffers: shared hit=3 read=2 dirtied=0 written=0
-- ^^^^^^^^^^ ^^^^^^ --
-- From cache From disk (slow!)| Metric | Meaning |
|---|---|
| shared hit | Pages found in cache ✅ Fast |
| shared read | Pages read from disk ⚠️ Slow |
| dirtied | Pages modified |
| written | Pages written to disk |
More hits = faster. High read count means data isn't cached - consider if this is expected or if you need more shared_buffers.
Common Node Types
Scan Types
Join Types
Interpreting Complex Plans
Reading Plan Trees
Sort (cost=100..110 rows=100)
Sort Key: orders.created_at DESC
-> Hash Join (cost=10..80 rows=100)
Hash Cond: (orders.customer_id = customers.id)
-> Seq Scan on orders (cost=0..50 rows=1000)
Filter: (total > 100)
-> Hash
-> Index Scan on customers (cost=0..20 rows=50)
Index Cond: (status = 'active')Read from inside out, bottom to top:
- Index Scan on
customers→ finds active customers - Hash → builds hash table from customers
- Seq Scan on
orders→ filters orders > 100 - Hash Join → matches orders to customers
- Sort → sorts result by date
Cost Accumulation
Parent node cost includes child costs.
Troubleshooting Patterns
Pattern: Wrong Row Estimate
Index Scan (rows=100) (actual rows=50000) -- [!code error]Solution:
ANALYZE table_name;
-- Or for specific columns:
ANALYZE table_name(column_name);Pattern: Seq Scan on Large Table
Seq Scan on large_table (cost=0..100000) -- [!code warning]
Filter: (indexed_column = 'value')Solutions:
\di large_table -- Check if index exists
CREATE INDEX idx ON large_table(indexed_column);
ANALYZE large_table; -- Check statistics
-- Check for function on column
WHERE LOWER(column) = 'value'
WHERE column = 'VALUE'Pattern: High Loops Count
Nested Loop (actual loops=100000) -- [!code error]
-> Seq Scan on table_a
-> Index Scan on table_bSolutions:
CREATE INDEX idx ON table_b(join_column);
-- Or force hash join (if appropriate)
SET enable_nestloop = off;Pattern: Sort on Disk
Sort (Sort Method: external merge Disk: 50MB) -- [!code warning]Solutions:
SET work_mem = '256MB';
-- Or add index to avoid sort
CREATE INDEX idx ON table(sort_column); Visual Tools
explain.depesz.com
- Run
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT) - Copy output to https://explain.depesz.com
- Get visual breakdown with problem areas highlighted
Features:
- Highlights slow nodes
- Shows exclusive vs inclusive time
- Color-codes by performance
pgAdmin Graphical Explain
- Open Query Tool
- Write your query
- Click "Explain Analyze" (F7)
- View graphical plan
Features:
- Visual node representation
- Click nodes for details
- Shows data flow
pev2 (Postgres Explain Visualizer 2)
- Run
EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON) - Paste JSON output
- Get interactive visualization
Features:
- Modern interface
- Statistics comparison
- Shareable links
Best Practices
EXPLAIN Checklist
- Always use ANALYZE for real data, not estimates
- Use BUFFERS to see I/O patterns
- Compare estimated vs actual rows to detect statistics issues
- Look for the most expensive nodes first
- Check for sequential scans on large tables
- Watch for high loop counts in nested loops
- Monitor buffer reads vs hits for caching efficiency