DevDocsDev Docs
Indexing Strategies

Hash Indexes

Understanding hash indexes for equality lookups and their limitations

Hash indexes use a hash function to map keys to bucket locations, providing O(1) average-case lookups for equality comparisons.

How Hash Indexes Work

Hash Function Process

Creating Hash Indexes

-- Create hash index
CREATE INDEX idx_users_email_hash ON users USING HASH (email);

-- PostgreSQL 10+: Hash indexes are WAL-logged and crash-safe
-- Before PostgreSQL 10: Hash indexes were not recommended

-- Check index type
SELECT indexname, indexdef 
FROM pg_indexes 
WHERE tablename = 'users';

PostgreSQL Hash Index History

Before PostgreSQL 10, hash indexes were not WAL-logged and could be corrupted after a crash. Since PostgreSQL 10, they are fully supported and safe to use.

-- MySQL: Hash indexes only available in MEMORY storage engine
CREATE TABLE users_memory (
    id INT PRIMARY KEY,
    email VARCHAR(255),
    INDEX USING HASH (email)
) ENGINE = MEMORY;

-- InnoDB uses adaptive hash index internally (automatic)
-- Show adaptive hash index status
SHOW ENGINE INNODB STATUS;

-- Cannot explicitly create hash index on InnoDB tables
-- InnoDB automatically creates adaptive hash indexes for frequently accessed pages

MySQL InnoDB

InnoDB doesn't support explicit hash indexes. It uses an internal "adaptive hash index" that automatically creates hash-like structures for hot data.

-- SQL Server: Hash indexes for memory-optimized tables only
CREATE TABLE users_memory
(
    id INT NOT NULL PRIMARY KEY NONCLUSTERED HASH WITH (BUCKET_COUNT = 100000),
    email NVARCHAR(255) NOT NULL INDEX idx_email HASH WITH (BUCKET_COUNT = 100000)
) WITH (MEMORY_OPTIMIZED = ON);

-- Bucket count should be 1-2x expected unique values
-- Too few buckets = long chains = slow
-- Too many buckets = wasted memory

Hash vs B-Tree Comparison

OperationHash IndexB-Tree Index
Equality =✅ O(1) average✅ O(log n)
Range <, >, BETWEEN❌ Not supported✅ Efficient
LIKE 'prefix%'❌ Not supported✅ Efficient
ORDER BY❌ Not supported✅ Efficient
MIN() / MAX()❌ Full scan needed✅ First/last entry
IS NULL❌ Varies by DB✅ Supported
-- Hash index works perfectly for:
SELECT * FROM users WHERE email = 'alice@example.com';

-- Hash index CANNOT help with:
SELECT * FROM users WHERE email LIKE 'alice%';
SELECT * FROM users WHERE created_at > '2024-01-01';
SELECT * FROM users ORDER BY email;
SELECT MIN(email) FROM users;
MetricHashB-Tree
Equality lookupFaster (O(1))Slower (O(log n))
Memory usageHigher (buckets)Lower (pages)
Insert overheadLowerHigher (balancing)
Range queriesN/AEfficient
Index sizeUsually largerUsually smaller

When the Difference Matters

For most workloads, B-Tree performance is excellent. Hash indexes provide meaningful improvement mainly for:

  • Very large tables (billions of rows)
  • High-concurrency equality lookups
  • No need for range queries on the column

✅ Good for Hash Indexes

-- Exact email lookups
SELECT * FROM users WHERE email = 'user@example.com';

-- Session token lookup
SELECT * FROM sessions WHERE token = 'abc123xyz';

-- UUID primary key lookups
SELECT * FROM entities WHERE uuid = 'a1b2c3d4-...';

-- High-cardinality equality checks
SELECT * FROM logs WHERE request_id = '12345';

❌ Not Suitable for Hash Indexes

-- Range queries
SELECT * FROM orders WHERE created_at > '2024-01-01';

-- Prefix matching
SELECT * FROM products WHERE sku LIKE 'ABC%';

-- Sorting
SELECT * FROM users ORDER BY email;

-- Low cardinality columns
SELECT * FROM orders WHERE status = 'pending';

Hash Collisions

When multiple keys hash to the same bucket:

  1. All entries stored in the same bucket (chain)
  2. Lookup scans the chain to find exact match
  3. Too many collisions = degraded performance

Bucket Count Matters

In SQL Server memory-optimized tables, you specify bucket count:

  • Too few buckets → Long chains → Slow lookups
  • Too many buckets → Wasted memory
  • Ideal: 1-2x the number of unique values

Hash Index Limitations

When to Use Hash Indexes

High-Volume Equality Lookups

When you have millions of exact-match queries per second.

-- Session/token lookups
SELECT * FROM sessions WHERE token = ?;

-- User authentication
SELECT * FROM users WHERE api_key = ?;

Very Large Tables

When B-Tree depth becomes a bottleneck.

-- Tables with billions of rows
-- Hash: O(1) vs B-Tree: O(log n)
-- At 1 billion rows: 1 vs ~30 comparisons

Memory-Optimized Scenarios

In-memory databases where hash indexes shine.

-- SQL Server In-Memory OLTP
-- Redis-like caching patterns

No Range Query Requirements

Only when you're certain you'll never need range queries.

Practical Recommendations

General Guidance

For most use cases, B-Tree indexes are the better choice because:

  1. They support all query types
  2. Performance difference is usually negligible
  3. They're more flexible for future query patterns
  4. They're better supported across databases

Consider hash indexes only when:

  1. You have measured B-Tree performance as a bottleneck
  2. You only need equality lookups
  3. Your database fully supports hash indexes
  4. You understand the maintenance implications

Database Support Summary

DatabaseHash Index SupportNotes
PostgreSQL✅ Full (10+)USING HASH, WAL-logged since v10
MySQL⚠️ MEMORY onlyInnoDB uses adaptive hash internally
SQL Server⚠️ In-Memory onlyMemory-optimized tables only
SQLite❌ Not supportedB-Tree only
Oracle❌ Not supportedCluster tables only

Next Steps

On this page