Indexing Strategies
B-Tree Indexes
Deep dive into B-Tree index structure, operations, and optimization
B-Tree (Balanced Tree) indexes are the default and most versatile index type in relational databases. Understanding how they work is essential for writing performant queries.
B-Tree Structure
Key Properties
| Component | Description |
|---|---|
| Root Page | Top of the tree, entry point for all searches |
| Internal Pages | Navigation nodes, contain keys and child pointers |
| Leaf Pages | Contain actual index entries and row pointers |
| Keys | Indexed column values, sorted within each page |
| Pointers | References to child pages or table rows |
Search Operation
Insert Operation
- Search for correct leaf page
- Insert key in sorted order
- If page full, split page
- Propagate splits up the tree if needed
Delete Operation
- Search for key in leaf page
- Remove key
- If page less than half full, merge with neighbor
- Propagate merges up if needed
| Operation | Time Complexity | Description |
|---|---|---|
| Search | O(log n) | Tree height is logarithmic |
| Insert | O(log n) | Find location + possible splits |
| Delete | O(log n) | Find location + possible merges |
| Range Scan | O(log n + k) | Find start + scan k results |
For a table with 1 billion rows:
- B-Tree height ≈ 4-5 levels
- Any lookup requires ≈ 4-5 page reads
- With caching, often just 1-2 disk reads
B-Tree Operations in Detail
Equality Search
-- Exact match uses B-Tree efficiently
SELECT * FROM users WHERE id = 12345;Range Scan
-- B-Tree excels at range queries
SELECT * FROM orders
WHERE created_at BETWEEN '2024-01-01' AND '2024-01-31';Leaf Page Linking
Leaf pages are linked together, allowing efficient range scans without returning to internal pages.
Prefix Search
-- LIKE with prefix uses B-Tree
SELECT * FROM products WHERE name LIKE 'Apple%';
-- ❌ Cannot use B-Tree (starts with wildcard)
SELECT * FROM products WHERE name LIKE '%Apple';Index Scan Types
EXPLAIN ANALYZE
SELECT * FROM orders WHERE customer_id = 123;
-- Output:
-- Index Scan using idx_orders_customer on orders
-- Index Cond: (customer_id = 123)When used:
- Need columns not in the index
- Relatively few rows returned
- Random access to table is acceptable
-- All needed columns are in the index
CREATE INDEX idx_covering ON orders(customer_id) INCLUDE (total, status);
EXPLAIN ANALYZE
SELECT total, status FROM orders WHERE customer_id = 123;
-- Output:
-- Index Only Scan using idx_covering on orders
-- Index Cond: (customer_id = 123)Benefits:
- No table access needed
- Much faster than Index Scan
- Requires visibility map for MVCC (PostgreSQL)
-- Multiple conditions on different indexes
EXPLAIN ANALYZE
SELECT * FROM orders
WHERE status = 'active' OR customer_id IN (1, 2, 3);
-- Output:
-- Bitmap Heap Scan on orders
-- Recheck Cond: ((status = 'active') OR (customer_id = ANY('{1,2,3}')))
-- -> BitmapOr
-- -> Bitmap Index Scan on idx_status
-- -> Bitmap Index Scan on idx_customerWhen used:
- Combining multiple indexes
- OR conditions
- Large result sets where random I/O is expensive
| Scan Type | Index Access | Table Access | Best For |
|---|---|---|---|
| Sequential | None | Full | Small tables, no index |
| Index Scan | Random | Random | Few rows, need all columns |
| Index Only | Random | None | Covered queries |
| Bitmap | Batch | Sequential | Many rows, multiple indexes |
-- Force specific scan type (PostgreSQL)
SET enable_seqscan = off;
SET enable_indexscan = off;
SET enable_bitmapscan = off;Composite B-Tree Indexes
Column Order Matters
CREATE INDEX idx_orders ON orders(status, created_at, customer_id);Usable Query Patterns
-- ✅ Uses index: All prefix columns
SELECT * FROM orders WHERE status = 'active';
SELECT * FROM orders WHERE status = 'active' AND created_at > '2024-01-01';
SELECT * FROM orders
WHERE status = 'active' AND created_at > '2024-01-01' AND customer_id = 123;
-- ✅ Uses index for status, then filter
SELECT * FROM orders WHERE status = 'active' AND customer_id = 123;
-- ⚠️ Less efficient: Skip first column
SELECT * FROM orders WHERE created_at > '2024-01-01';
-- ❌ Cannot use index efficiently
SELECT * FROM orders WHERE customer_id = 123;Ordering and Sorting
-- Index: (status, created_at DESC)
CREATE INDEX idx_orders_sorted ON orders(status, created_at DESC);
-- ✅ Uses index for ORDER BY
SELECT * FROM orders
WHERE status = 'active'
ORDER BY created_at DESC;
-- ✅ Also works
SELECT * FROM orders
WHERE status = 'active'
ORDER BY created_at ASC; -- Backward scan
-- ❌ Cannot use single index
SELECT * FROM orders ORDER BY status ASC, created_at ASC;
-- If index is (status ASC, created_at DESC)B-Tree Limitations
B-Tree Not Suitable For
- Full-text search: Use GIN/GiST indexes instead
- Pattern matching with leading wildcard:
LIKE '%search%' - Array containment: Use GIN
- Geometric/spatial data: Use GiST/SP-GiST
- Very large text columns: High overhead
-- ❌ B-Tree cannot help with these
SELECT * FROM docs WHERE content LIKE '%search term%';
SELECT * FROM products WHERE tags @> ARRAY['electronics'];
SELECT * FROM locations WHERE point <-> '(0,0)' < 100;Performance Optimization
Choose Selective Columns First
Put high-cardinality columns at the beginning of composite indexes.
-- ✅ Good: customer_id has high cardinality
CREATE INDEX idx_orders ON orders(customer_id, status);
-- ❌ Poor: status has low cardinality
CREATE INDEX idx_orders ON orders(status, customer_id);Consider Index-Only Scans
Include frequently accessed columns to avoid table lookups.
CREATE INDEX idx_orders ON orders(customer_id)
INCLUDE (total, status, created_at);Use Partial Indexes for Subsets
Index only the rows you query.
-- Only index active orders
CREATE INDEX idx_active_orders ON orders(customer_id)
WHERE status = 'active';Monitor Index Bloat
B-Trees can become bloated after many updates/deletes.
-- PostgreSQL: Check bloat
SELECT
tablename,
indexname,
pg_size_pretty(pg_relation_size(indexrelid)) as size
FROM pg_stat_user_indexes
ORDER BY pg_relation_size(indexrelid) DESC;
-- Rebuild bloated index
REINDEX INDEX CONCURRENTLY idx_name;Database-Specific B-Tree Features
-- Deduplication (PostgreSQL 13+)
-- Automatically deduplicates duplicate keys
-- INCLUDE columns for covering indexes
CREATE INDEX idx ON orders(customer_id)
INCLUDE (total, status);
-- Partial indexes
CREATE INDEX idx ON orders(customer_id)
WHERE status = 'active';
-- Concurrent operations
CREATE INDEX CONCURRENTLY idx ON orders(customer_id);
REINDEX INDEX CONCURRENTLY idx;
-- Collation-specific indexes
CREATE INDEX idx ON users(name COLLATE "C");-- InnoDB uses clustered index (primary key)
-- Secondary indexes store primary key values
-- Descending indexes (MySQL 8.0+)
CREATE INDEX idx ON orders(created_at DESC);
-- Invisible indexes (for testing)
CREATE INDEX idx ON orders(column) INVISIBLE;
ALTER INDEX idx ON orders INVISIBLE;
-- Prefix indexes for large columns
CREATE INDEX idx ON products(name(50));
-- Index hints
SELECT * FROM orders USE INDEX (idx_customer);
SELECT * FROM orders FORCE INDEX (idx_customer);-- Clustered vs Nonclustered
CREATE CLUSTERED INDEX idx ON orders(id);
CREATE NONCLUSTERED INDEX idx ON orders(customer_id);
-- Included columns
CREATE INDEX idx ON orders(customer_id)
INCLUDE (total, status);
-- Filtered indexes
CREATE INDEX idx ON orders(customer_id)
WHERE status = 'active';
-- Online index operations
CREATE INDEX idx ON orders(customer_id)
WITH (ONLINE = ON);
-- Compression
CREATE INDEX idx ON orders(customer_id)
WITH (DATA_COMPRESSION = PAGE);