Hash Indexes

Hash indexes use a hash function to map keys to bucket locations, providing O(1) average-case lookups for equality comparisons.

How Hash Indexes Work

Hash Function Process

Creating Hash Indexes

-- Create hash index
CREATE INDEX idx_users_email_hash ON users USING HASH (email);

-- PostgreSQL 10+: Hash indexes are WAL-logged and crash-safe
-- Before PostgreSQL 10: Hash indexes were not recommended

-- Check index type
SELECT indexname, indexdef 
FROM pg_indexes 
WHERE tablename = 'users';

PostgreSQL Hash Index History

Before PostgreSQL 10, hash indexes were not WAL-logged and could be corrupted after a crash. Since PostgreSQL 10, they are fully supported and safe to use.

-- MySQL: Hash indexes only available in MEMORY storage engine
CREATE TABLE users_memory (
    id INT PRIMARY KEY,
    email VARCHAR(255),
    INDEX USING HASH (email)
) ENGINE = MEMORY;

-- InnoDB uses adaptive hash index internally (automatic)
-- Show adaptive hash index status
SHOW ENGINE INNODB STATUS;

-- Cannot explicitly create hash index on InnoDB tables
-- InnoDB automatically creates adaptive hash indexes for frequently accessed pages

MySQL InnoDB

InnoDB doesn't support explicit hash indexes. It uses an internal "adaptive hash index" that automatically creates hash-like structures for hot data.

-- SQL Server: Hash indexes for memory-optimized tables only
CREATE TABLE users_memory
(
    id INT NOT NULL PRIMARY KEY NONCLUSTERED HASH WITH (BUCKET_COUNT = 100000),
    email NVARCHAR(255) NOT NULL INDEX idx_email HASH WITH (BUCKET_COUNT = 100000)
) WITH (MEMORY_OPTIMIZED = ON);

-- Bucket count should be 1-2x expected unique values
-- Too few buckets = long chains = slow
-- Too many buckets = wasted memory

Hash vs B-Tree Comparison

Operation	Hash Index	B-Tree Index
Equality `=`	✅ O(1) average	✅ O(log n)
Range `<`, `>`, `BETWEEN`	❌ Not supported	✅ Efficient
`LIKE 'prefix%'`	❌ Not supported	✅ Efficient
`ORDER BY`	❌ Not supported	✅ Efficient
`MIN()` / `MAX()`	❌ Full scan needed	✅ First/last entry
`IS NULL`	❌ Varies by DB	✅ Supported

-- Hash index works perfectly for:
SELECT * FROM users WHERE email = 'alice@example.com';

-- Hash index CANNOT help with:
SELECT * FROM users WHERE email LIKE 'alice%';
SELECT * FROM users WHERE created_at > '2024-01-01';
SELECT * FROM users ORDER BY email;
SELECT MIN(email) FROM users;

Metric	Hash	B-Tree
Equality lookup	Faster (O(1))	Slower (O(log n))
Memory usage	Higher (buckets)	Lower (pages)
Insert overhead	Lower	Higher (balancing)
Range queries	N/A	Efficient
Index size	Usually larger	Usually smaller

When the Difference Matters

For most workloads, B-Tree performance is excellent. Hash indexes provide meaningful improvement mainly for:

Very large tables (billions of rows)
High-concurrency equality lookups
No need for range queries on the column

✅ Good for Hash Indexes

-- Exact email lookups
SELECT * FROM users WHERE email = 'user@example.com';

-- Session token lookup
SELECT * FROM sessions WHERE token = 'abc123xyz';

-- UUID primary key lookups
SELECT * FROM entities WHERE uuid = 'a1b2c3d4-...';

-- High-cardinality equality checks
SELECT * FROM logs WHERE request_id = '12345';

❌ Not Suitable for Hash Indexes

-- Range queries
SELECT * FROM orders WHERE created_at > '2024-01-01';

-- Prefix matching
SELECT * FROM products WHERE sku LIKE 'ABC%';

-- Sorting
SELECT * FROM users ORDER BY email;

-- Low cardinality columns
SELECT * FROM orders WHERE status = 'pending';

Hash Collisions

When multiple keys hash to the same bucket:

All entries stored in the same bucket (chain)
Lookup scans the chain to find exact match
Too many collisions = degraded performance

Bucket Count Matters

In SQL Server memory-optimized tables, you specify bucket count:

Too few buckets → Long chains → Slow lookups
Too many buckets → Wasted memory
Ideal: 1-2x the number of unique values

-- Session/token lookups
SELECT * FROM sessions WHERE token = ?;

-- User authentication
SELECT * FROM users WHERE api_key = ?;

Very Large Tables

When B-Tree depth becomes a bottleneck.

-- Tables with billions of rows
-- Hash: O(1) vs B-Tree: O(log n)
-- At 1 billion rows: 1 vs ~30 comparisons

Memory-Optimized Scenarios

In-memory databases where hash indexes shine.

-- SQL Server In-Memory OLTP
-- Redis-like caching patterns

No Range Query Requirements

Only when you're certain you'll never need range queries.

Practical Recommendations

General Guidance

For most use cases, B-Tree indexes are the better choice because:

They support all query types
Performance difference is usually negligible
They're more flexible for future query patterns
They're better supported across databases

Consider hash indexes only when:

You have measured B-Tree performance as a bottleneck
You only need equality lookups
Your database fully supports hash indexes
You understand the maintenance implications

Database Support Summary

Database	Hash Index Support	Notes
PostgreSQL	✅ Full (10+)	`USING HASH`, WAL-logged since v10
MySQL	⚠️ MEMORY only	InnoDB uses adaptive hash internally
SQL Server	⚠️ In-Memory only	Memory-optimized tables only
SQLite	❌ Not supported	B-Tree only
Oracle	❌ Not supported	Cluster tables only

How Hash Indexes Work

Hash Function Process

Creating Hash Indexes

Hash vs B-Tree Comparison

✅ Good for Hash Indexes

❌ Not Suitable for Hash Indexes

Hash Collisions

Hash Index Limitations

When to Use Hash Indexes

High-Volume Equality Lookups

Very Large Tables

Memory-Optimized Scenarios

No Range Query Requirements

Practical Recommendations

Database Support Summary

Next Steps

GiST & GIN Indexes

Advanced Indexes

On this page

Hash Indexes

No Range Queries

No Ordering Support

No Partial Matching

No Multi-Column Queries

GiST & GIN Indexes

Advanced Indexes

On this page