Indexing Strategies
GiST & GIN Indexes
Specialized indexes for full-text search, arrays, JSON, and geometric data
GiST (Generalized Search Tree) and GIN (Generalized Inverted Index) are specialized index types designed for complex data types that B-Tree cannot efficiently handle.
Index Type Overview
GIN (Generalized Inverted Index)
Best for:
- Full-text search
- Array containment queries
- JSONB queries
- Many keys per row
Characteristics:
- Inverted index structure
- Fast lookups for contained elements
- Slower to build/update
- Larger size on disk
GiST (Generalized Search Tree)
Best for:
- Geometric/spatial data
- Range types
- Nearest-neighbor searches
- Full-text search (alternative)
- Exclusion constraints
Characteristics:
- Tree structure with lossy keys
- Supports complex operators
- Faster to update than GIN
- Smaller than GIN
| Aspect | GIN | GiST |
|---|---|---|
| Structure | Inverted index | Tree with bounding keys |
| Build speed | Slower | Faster |
| Update speed | Slower | Faster |
| Lookup speed | Faster | Slower |
| Size | Larger | Smaller |
| Full-text | Preferred | Alternative |
| Geometry | Not suitable | Preferred |
| Arrays | Preferred | Not suitable |
| Nearest neighbor | Not suitable | Preferred |
| Exact match | Faster | Slower |
General Rule
- GIN: When you have many keys per row and exact containment queries
- GiST: When you need spatial, range, or nearest-neighbor queries
Full-Text Search Indexes
-- Create tsvector column for search
ALTER TABLE articles ADD COLUMN search_vector tsvector;
UPDATE articles SET search_vector =
to_tsvector('english', coalesce(title, '') || ' ' || coalesce(body, ''));
-- GIN index (recommended for most cases)
CREATE INDEX idx_articles_search_gin
ON articles USING GIN (search_vector);
-- GiST index (alternative, smaller but slower)
CREATE INDEX idx_articles_search_gist
ON articles USING GiST (search_vector);
-- Query
SELECT title, ts_rank(search_vector, query) AS rank
FROM articles, to_tsquery('english', 'database & performance') query
WHERE search_vector @@ query
ORDER BY rank DESC;
-- Expression index (no separate column needed)
CREATE INDEX idx_articles_fts ON articles
USING GIN (to_tsvector('english', title || ' ' || body));GIN vs GiST for Full-Text
- GIN: 3x faster for lookups, but 3x slower to update
- GiST: Faster updates, but slower lookups
- Use GIN for read-heavy workloads, GiST for write-heavy
-- Full-text index (InnoDB 5.6+)
CREATE FULLTEXT INDEX idx_articles_fulltext
ON articles(title, body);
-- Query with natural language mode
SELECT *, MATCH(title, body) AGAINST('database performance') AS score
FROM articles
WHERE MATCH(title, body) AGAINST('database performance')
ORDER BY score DESC;
-- Query with boolean mode
SELECT *
FROM articles
WHERE MATCH(title, body) AGAINST('+database +performance -slow' IN BOOLEAN MODE);
-- Query expansion
SELECT *
FROM articles
WHERE MATCH(title, body) AGAINST('database' WITH QUERY EXPANSION);-- Create full-text catalog
CREATE FULLTEXT CATALOG ftCatalog AS DEFAULT;
-- Create full-text index
CREATE FULLTEXT INDEX ON articles(title, body)
KEY INDEX PK_articles
ON ftCatalog
WITH STOPLIST = SYSTEM;
-- Query with CONTAINS
SELECT * FROM articles
WHERE CONTAINS((title, body), 'database AND performance');
-- Query with FREETEXT (more flexible)
SELECT * FROM articles
WHERE FREETEXT((title, body), 'database performance optimization');
-- Ranked results
SELECT *, KEY_TBL.RANK
FROM articles
INNER JOIN CONTAINSTABLE(articles, (title, body), 'database') AS KEY_TBL
ON articles.id = KEY_TBL.[KEY]
ORDER BY KEY_TBL.RANK DESC;JSON/JSONB Indexes
-- Full JSONB indexing
CREATE INDEX idx_data_gin ON documents USING GIN (data);
-- Supports these operators:
-- @> containment: '{"a": 1}' @> '{"a": 1}'
-- ? key exists: data ? 'key'
-- ?& all keys exist: data ?& array['a', 'b']
-- ?| any key exists: data ?| array['a', 'b']
-- Query examples
SELECT * FROM documents WHERE data @> '{"status": "active"}';
SELECT * FROM documents WHERE data ? 'email';
SELECT * FROM documents WHERE data ?& array['name', 'email'];-- Path operations index (smaller, more specific)
CREATE INDEX idx_data_path ON documents
USING GIN (data jsonb_path_ops);
-- Only supports @> operator
-- But smaller and faster for containment queries
SELECT * FROM documents WHERE data @> '{"user": {"role": "admin"}}';
-- For specific path queries, use expression index
CREATE INDEX idx_data_status ON documents ((data->>'status'));
SELECT * FROM documents WHERE data->>'status' = 'active';-- Index specific JSON paths (B-Tree on expression)
CREATE INDEX idx_user_email ON users ((data->>'email'));
CREATE INDEX idx_user_age ON users (((data->>'age')::int));
-- Query efficiently
SELECT * FROM users WHERE data->>'email' = 'user@example.com';
SELECT * FROM users WHERE (data->>'age')::int > 21;
-- GIN on nested path
CREATE INDEX idx_tags ON products USING GIN ((data->'tags'));
SELECT * FROM products WHERE data->'tags' @> '"electronics"';Array Indexes
-- GIN index on array column
CREATE INDEX idx_tags ON products USING GIN (tags);
-- Array operators supported
SELECT * FROM products WHERE tags @> ARRAY['electronics']; -- Contains
SELECT * FROM products WHERE tags && ARRAY['sale', 'new']; -- Overlaps
SELECT * FROM products WHERE tags <@ ARRAY['a', 'b', 'c']; -- Contained by
-- GIN with specific operator class
CREATE INDEX idx_tags_ops ON products
USING GIN (tags array_ops);Geometric/Spatial Indexes
-- GiST index for point data
CREATE INDEX idx_location ON places USING GiST (location);
-- Nearest neighbor (k-NN)
SELECT name, location <-> point(40.7128, -74.0060) AS distance
FROM places
ORDER BY location <-> point(40.7128, -74.0060)
LIMIT 10;
-- Within bounding box
SELECT * FROM places
WHERE location <@ box(point(40, -75), point(41, -73));
-- Distance within radius
SELECT * FROM places
WHERE location <-> point(40.7128, -74.0060) < 0.1;-- GiST for range types
CREATE TABLE reservations (
id SERIAL PRIMARY KEY,
room_id INT,
during TSTZRANGE -- timestamp range
);
CREATE INDEX idx_reservations_during
ON reservations USING GiST (during);
-- Overlap query
SELECT * FROM reservations
WHERE during && tstzrange('2024-01-15', '2024-01-20');
-- Contains
SELECT * FROM reservations
WHERE during @> '2024-01-16 14:00:00'::timestamptz;
-- Exclusion constraint (no overlapping reservations)
ALTER TABLE reservations ADD CONSTRAINT no_overlap
EXCLUDE USING GiST (room_id WITH =, during WITH &&);-- PostGIS spatial index
CREATE INDEX idx_geom ON geographic_data USING GiST (geom);
-- Find points within polygon
SELECT * FROM points
WHERE ST_Within(geom, ST_GeomFromText('POLYGON((...))', 4326));
-- Find nearest neighbors
SELECT name, ST_Distance(geom, ST_MakePoint(-74.006, 40.7128)) AS dist
FROM places
ORDER BY geom <-> ST_MakePoint(-74.006, 40.7128)
LIMIT 10;
-- Find within distance (meters)
SELECT * FROM places
WHERE ST_DWithin(
geom,
ST_SetSRID(ST_MakePoint(-74.006, 40.7128), 4326),
1000 -- 1000 meters
);Performance Tuning
-- Fast update mode (PostgreSQL)
-- Trades query speed for faster inserts
CREATE INDEX idx_fts ON articles USING GIN (search_vector)
WITH (fastupdate = on); -- Default is on
-- gin_pending_list_limit controls pending list size
ALTER INDEX idx_fts SET (gin_pending_list_limit = 1024);
-- Disable fast update for read-heavy workloads
CREATE INDEX idx_fts ON articles USING GIN (search_vector)
WITH (fastupdate = off);
-- Check pending tuples
SELECT * FROM pg_stat_user_indexes WHERE indexrelname = 'idx_fts';-- Fillfactor for GiST
CREATE INDEX idx_geom ON places USING GiST (location)
WITH (fillfactor = 90);
-- Buffered builds for large tables
SET maintenance_work_mem = '1GB';
CREATE INDEX idx_geom ON large_table USING GiST (geom);
-- siglen for signature length (affects selectivity vs size)
CREATE INDEX idx_trgm ON texts USING GiST (content gist_trgm_ops(siglen=32));-- VACUUM to update visibility map
VACUUM ANALYZE table_name;
-- REINDEX bloated indexes
REINDEX INDEX CONCURRENTLY idx_fts;
-- Monitor index size
SELECT
indexrelname,
pg_size_pretty(pg_relation_size(indexrelid)) as size,
idx_scan,
idx_tup_read
FROM pg_stat_user_indexes
WHERE indexrelname LIKE '%gin%' OR indexrelname LIKE '%gist%'
ORDER BY pg_relation_size(indexrelid) DESC;Choosing the Right Index
Identify Your Query Pattern
What operators do you use most?
| Operator | Recommended Index |
|---|---|
@@ (text search) | GIN (read-heavy) / GiST (write-heavy) |
@>, ?, ?&, `? | ` (JSON) |
@>, &&, <@ (arrays) | GIN |
<-> (nearest neighbor) | GiST |
&& (range overlap) | GiST |
@> (geometric containment) | GiST |
Consider Write vs Read Ratio
- Read-heavy: Use GIN
- Write-heavy: Use GiST
- Balanced: Benchmark both
Account for Size Constraints
GIN indexes are typically 2-3x larger than GiST.
Test Performance
Always benchmark with real data and queries.
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM articles
WHERE search_vector @@ to_tsquery('database');Summary Table
| Data Type | Recommended Index | Operators |
|---|---|---|
| tsvector (full-text) | GIN (reads) / GiST (writes) | @@ |
| jsonb | GIN | @>, ?, ?&, `? |
| array | GIN | @>, &&, <@ |
| geometry (point) | GiST | <->, <@, ~= |
| geometry (polygon) | GiST | &&, @>, <@ |
| range types | GiST | &&, @>, <@, `- |
| trigram | GIN / GiST | %, <-> |