Hash Indexes
Understanding hash indexes for equality lookups and their limitations
Hash indexes use a hash function to map keys to bucket locations, providing O(1) average-case lookups for equality comparisons.
How Hash Indexes Work
Hash Function Process
Creating Hash Indexes
-- Create hash index
CREATE INDEX idx_users_email_hash ON users USING HASH (email);
-- PostgreSQL 10+: Hash indexes are WAL-logged and crash-safe
-- Before PostgreSQL 10: Hash indexes were not recommended
-- Check index type
SELECT indexname, indexdef
FROM pg_indexes
WHERE tablename = 'users';PostgreSQL Hash Index History
Before PostgreSQL 10, hash indexes were not WAL-logged and could be corrupted after a crash. Since PostgreSQL 10, they are fully supported and safe to use.
-- MySQL: Hash indexes only available in MEMORY storage engine
CREATE TABLE users_memory (
id INT PRIMARY KEY,
email VARCHAR(255),
INDEX USING HASH (email)
) ENGINE = MEMORY;
-- InnoDB uses adaptive hash index internally (automatic)
-- Show adaptive hash index status
SHOW ENGINE INNODB STATUS;
-- Cannot explicitly create hash index on InnoDB tables
-- InnoDB automatically creates adaptive hash indexes for frequently accessed pagesMySQL InnoDB
InnoDB doesn't support explicit hash indexes. It uses an internal "adaptive hash index" that automatically creates hash-like structures for hot data.
-- SQL Server: Hash indexes for memory-optimized tables only
CREATE TABLE users_memory
(
id INT NOT NULL PRIMARY KEY NONCLUSTERED HASH WITH (BUCKET_COUNT = 100000),
email NVARCHAR(255) NOT NULL INDEX idx_email HASH WITH (BUCKET_COUNT = 100000)
) WITH (MEMORY_OPTIMIZED = ON);
-- Bucket count should be 1-2x expected unique values
-- Too few buckets = long chains = slow
-- Too many buckets = wasted memoryHash vs B-Tree Comparison
| Operation | Hash Index | B-Tree Index |
|---|---|---|
Equality = | ✅ O(1) average | ✅ O(log n) |
Range <, >, BETWEEN | ❌ Not supported | ✅ Efficient |
LIKE 'prefix%' | ❌ Not supported | ✅ Efficient |
ORDER BY | ❌ Not supported | ✅ Efficient |
MIN() / MAX() | ❌ Full scan needed | ✅ First/last entry |
IS NULL | ❌ Varies by DB | ✅ Supported |
-- Hash index works perfectly for:
SELECT * FROM users WHERE email = 'alice@example.com';
-- Hash index CANNOT help with:
SELECT * FROM users WHERE email LIKE 'alice%';
SELECT * FROM users WHERE created_at > '2024-01-01';
SELECT * FROM users ORDER BY email;
SELECT MIN(email) FROM users;| Metric | Hash | B-Tree |
|---|---|---|
| Equality lookup | Faster (O(1)) | Slower (O(log n)) |
| Memory usage | Higher (buckets) | Lower (pages) |
| Insert overhead | Lower | Higher (balancing) |
| Range queries | N/A | Efficient |
| Index size | Usually larger | Usually smaller |
When the Difference Matters
For most workloads, B-Tree performance is excellent. Hash indexes provide meaningful improvement mainly for:
- Very large tables (billions of rows)
- High-concurrency equality lookups
- No need for range queries on the column
✅ Good for Hash Indexes
-- Exact email lookups
SELECT * FROM users WHERE email = 'user@example.com';
-- Session token lookup
SELECT * FROM sessions WHERE token = 'abc123xyz';
-- UUID primary key lookups
SELECT * FROM entities WHERE uuid = 'a1b2c3d4-...';
-- High-cardinality equality checks
SELECT * FROM logs WHERE request_id = '12345';❌ Not Suitable for Hash Indexes
-- Range queries
SELECT * FROM orders WHERE created_at > '2024-01-01';
-- Prefix matching
SELECT * FROM products WHERE sku LIKE 'ABC%';
-- Sorting
SELECT * FROM users ORDER BY email;
-- Low cardinality columns
SELECT * FROM orders WHERE status = 'pending';Hash Collisions
When multiple keys hash to the same bucket:
- All entries stored in the same bucket (chain)
- Lookup scans the chain to find exact match
- Too many collisions = degraded performance
Bucket Count Matters
In SQL Server memory-optimized tables, you specify bucket count:
- Too few buckets → Long chains → Slow lookups
- Too many buckets → Wasted memory
- Ideal: 1-2x the number of unique values
Hash Index Limitations
When to Use Hash Indexes
High-Volume Equality Lookups
When you have millions of exact-match queries per second.
-- Session/token lookups
SELECT * FROM sessions WHERE token = ?;
-- User authentication
SELECT * FROM users WHERE api_key = ?;Very Large Tables
When B-Tree depth becomes a bottleneck.
-- Tables with billions of rows
-- Hash: O(1) vs B-Tree: O(log n)
-- At 1 billion rows: 1 vs ~30 comparisonsMemory-Optimized Scenarios
In-memory databases where hash indexes shine.
-- SQL Server In-Memory OLTP
-- Redis-like caching patternsNo Range Query Requirements
Only when you're certain you'll never need range queries.
Practical Recommendations
General Guidance
For most use cases, B-Tree indexes are the better choice because:
- They support all query types
- Performance difference is usually negligible
- They're more flexible for future query patterns
- They're better supported across databases
Consider hash indexes only when:
- You have measured B-Tree performance as a bottleneck
- You only need equality lookups
- Your database fully supports hash indexes
- You understand the maintenance implications
Database Support Summary
| Database | Hash Index Support | Notes |
|---|---|---|
| PostgreSQL | ✅ Full (10+) | USING HASH, WAL-logged since v10 |
| MySQL | ⚠️ MEMORY only | InnoDB uses adaptive hash internally |
| SQL Server | ⚠️ In-Memory only | Memory-optimized tables only |
| SQLite | ❌ Not supported | B-Tree only |
| Oracle | ❌ Not supported | Cluster tables only |