B-Tree Indexes

B-Tree (Balanced Tree) indexes are the default and most versatile index type in relational databases. Understanding how they work is essential for writing performant queries.

B-Tree Structure

Key Properties

Component	Description
Root Page	Top of the tree, entry point for all searches
Internal Pages	Navigation nodes, contain keys and child pointers
Leaf Pages	Contain actual index entries and row pointers
Keys	Indexed column values, sorted within each page
Pointers	References to child pages or table rows

Search Operation

Insert Operation

Search for correct leaf page
Insert key in sorted order
If page full, split page
Propagate splits up the tree if needed

Delete Operation

Search for key in leaf page
Remove key
If page less than half full, merge with neighbor
Propagate merges up if needed

Operation	Time Complexity	Description
Search	O(log n)	Tree height is logarithmic
Insert	O(log n)	Find location + possible splits
Delete	O(log n)	Find location + possible merges
Range Scan	O(log n + k)	Find start + scan k results

For a table with 1 billion rows:

B-Tree height ≈ 4-5 levels
Any lookup requires ≈ 4-5 page reads
With caching, often just 1-2 disk reads

B-Tree Operations in Detail

Equality Search

-- Exact match uses B-Tree efficiently
SELECT * FROM users WHERE id = 12345;

Range Scan

-- B-Tree excels at range queries
SELECT * FROM orders 
WHERE created_at BETWEEN '2024-01-01' AND '2024-01-31';

Leaf Page Linking

Leaf pages are linked together, allowing efficient range scans without returning to internal pages.

Prefix Search

-- LIKE with prefix uses B-Tree
SELECT * FROM products WHERE name LIKE 'Apple%';

-- ❌ Cannot use B-Tree (starts with wildcard)
SELECT * FROM products WHERE name LIKE '%Apple';

Index Scan Types

EXPLAIN ANALYZE
SELECT * FROM orders WHERE customer_id = 123;

-- Output:
-- Index Scan using idx_orders_customer on orders
--   Index Cond: (customer_id = 123)

When used:

Need columns not in the index
Relatively few rows returned
Random access to table is acceptable

-- All needed columns are in the index
CREATE INDEX idx_covering ON orders(customer_id) INCLUDE (total, status);

EXPLAIN ANALYZE
SELECT total, status FROM orders WHERE customer_id = 123;

-- Output:
-- Index Only Scan using idx_covering on orders
--   Index Cond: (customer_id = 123)

Benefits:

No table access needed
Much faster than Index Scan
Requires visibility map for MVCC (PostgreSQL)

-- Multiple conditions on different indexes
EXPLAIN ANALYZE
SELECT * FROM orders 
WHERE status = 'active' OR customer_id IN (1, 2, 3);

-- Output:
-- Bitmap Heap Scan on orders
--   Recheck Cond: ((status = 'active') OR (customer_id = ANY('{1,2,3}')))
--   -> BitmapOr
--       -> Bitmap Index Scan on idx_status
--       -> Bitmap Index Scan on idx_customer

When used:

Combining multiple indexes
OR conditions
Large result sets where random I/O is expensive

Scan Type	Index Access	Table Access	Best For
Sequential	None	Full	Small tables, no index
Index Scan	Random	Random	Few rows, need all columns
Index Only	Random	None	Covered queries
Bitmap	Batch	Sequential	Many rows, multiple indexes

-- Force specific scan type (PostgreSQL)
SET enable_seqscan = off;
SET enable_indexscan = off;
SET enable_bitmapscan = off;

Composite B-Tree Indexes

Column Order Matters

CREATE INDEX idx_orders ON orders(status, created_at, customer_id);

Usable Query Patterns

-- ✅ Uses index: All prefix columns
SELECT * FROM orders WHERE status = 'active';
SELECT * FROM orders WHERE status = 'active' AND created_at > '2024-01-01';
SELECT * FROM orders 
WHERE status = 'active' AND created_at > '2024-01-01' AND customer_id = 123;

-- ✅ Uses index for status, then filter
SELECT * FROM orders WHERE status = 'active' AND customer_id = 123;

-- ⚠️ Less efficient: Skip first column
SELECT * FROM orders WHERE created_at > '2024-01-01';

-- ❌ Cannot use index efficiently
SELECT * FROM orders WHERE customer_id = 123;

Ordering and Sorting

-- Index: (status, created_at DESC)
CREATE INDEX idx_orders_sorted ON orders(status, created_at DESC);

-- ✅ Uses index for ORDER BY
SELECT * FROM orders 
WHERE status = 'active' 
ORDER BY created_at DESC;

-- ✅ Also works
SELECT * FROM orders 
WHERE status = 'active' 
ORDER BY created_at ASC;  -- Backward scan

-- ❌ Cannot use single index
SELECT * FROM orders ORDER BY status ASC, created_at ASC;
-- If index is (status ASC, created_at DESC)

B-Tree Limitations

B-Tree Not Suitable For

Full-text search: Use GIN/GiST indexes instead
Pattern matching with leading wildcard: LIKE '%search%'
Array containment: Use GIN
Geometric/spatial data: Use GiST/SP-GiST
Very large text columns: High overhead

-- ❌ B-Tree cannot help with these
SELECT * FROM docs WHERE content LIKE '%search term%';
SELECT * FROM products WHERE tags @> ARRAY['electronics'];
SELECT * FROM locations WHERE point <-> '(0,0)' < 100;

Performance Optimization

Choose Selective Columns First

Put high-cardinality columns at the beginning of composite indexes.

-- ✅ Good: customer_id has high cardinality
CREATE INDEX idx_orders ON orders(customer_id, status);

-- ❌ Poor: status has low cardinality
CREATE INDEX idx_orders ON orders(status, customer_id);

Consider Index-Only Scans

Include frequently accessed columns to avoid table lookups.

CREATE INDEX idx_orders ON orders(customer_id) 
INCLUDE (total, status, created_at);

Use Partial Indexes for Subsets

Index only the rows you query.

-- Only index active orders
CREATE INDEX idx_active_orders ON orders(customer_id) 
WHERE status = 'active';

Monitor Index Bloat

B-Trees can become bloated after many updates/deletes.

-- PostgreSQL: Check bloat
SELECT 
    tablename,
    indexname,
    pg_size_pretty(pg_relation_size(indexrelid)) as size
FROM pg_stat_user_indexes
ORDER BY pg_relation_size(indexrelid) DESC;

-- Rebuild bloated index
REINDEX INDEX CONCURRENTLY idx_name;

Database-Specific B-Tree Features

-- Deduplication (PostgreSQL 13+)
-- Automatically deduplicates duplicate keys

-- INCLUDE columns for covering indexes
CREATE INDEX idx ON orders(customer_id) 
INCLUDE (total, status);

-- Partial indexes
CREATE INDEX idx ON orders(customer_id) 
WHERE status = 'active';

-- Concurrent operations
CREATE INDEX CONCURRENTLY idx ON orders(customer_id);
REINDEX INDEX CONCURRENTLY idx;

-- Collation-specific indexes
CREATE INDEX idx ON users(name COLLATE "C");

-- InnoDB uses clustered index (primary key)
-- Secondary indexes store primary key values

-- Descending indexes (MySQL 8.0+)
CREATE INDEX idx ON orders(created_at DESC);

-- Invisible indexes (for testing)
CREATE INDEX idx ON orders(column) INVISIBLE;
ALTER INDEX idx ON orders INVISIBLE;

-- Prefix indexes for large columns
CREATE INDEX idx ON products(name(50));

-- Index hints
SELECT * FROM orders USE INDEX (idx_customer);
SELECT * FROM orders FORCE INDEX (idx_customer);

-- Clustered vs Nonclustered
CREATE CLUSTERED INDEX idx ON orders(id);
CREATE NONCLUSTERED INDEX idx ON orders(customer_id);

-- Included columns
CREATE INDEX idx ON orders(customer_id) 
INCLUDE (total, status);

-- Filtered indexes
CREATE INDEX idx ON orders(customer_id) 
WHERE status = 'active';

-- Online index operations
CREATE INDEX idx ON orders(customer_id) 
WITH (ONLINE = ON);

-- Compression
CREATE INDEX idx ON orders(customer_id) 
WITH (DATA_COMPRESSION = PAGE);

Next Steps

Hash Indexes

When B-Tree isn't the best choice

GiST & GIN

Full-text and specialized indexes

Query Optimization

Using indexes effectively

Hash Indexes

GiST & GIN

Query Optimization

On this page