Database Design
Guide to database design including normalization, relationships, schema design, and best practices for building maintainable databases
Good database design is crucial for application performance, data integrity, and maintainability. This guide covers normalization, relationships, schema design patterns, and best practices.
Database Design Process
- Requirements Analysis: Understand what data needs to be stored
- Conceptual Design: Create an Entity-Relationship Diagram (ERD)
- Logical Design: Normalize tables and define relationships
- Physical Design: Optimize for performance with indexes and partitioning
Entity-Relationship Modeling
Entities and Attributes
An entity represents a real-world object or concept. Attributes describe the entity.
Entity: Customer
Attributes:
- id (Primary Key)
- email
- first_name
- last_name
- created_atRelationship Types
One-to-One (1:1)
Each entity in A relates to exactly one entity in B.
-- User has one profile
CREATE TABLE users (
id SERIAL PRIMARY KEY,
email VARCHAR(255) NOT NULL UNIQUE
);
CREATE TABLE user_profiles (
id SERIAL PRIMARY KEY,
user_id INTEGER UNIQUE REFERENCES users(id),
bio TEXT,
avatar_url VARCHAR(500)
);One-to-Many (1:N)
Each entity in A relates to multiple entities in B.
-- Customer has many orders
CREATE TABLE customers (
id SERIAL PRIMARY KEY,
name VARCHAR(255) NOT NULL
);
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
customer_id INTEGER REFERENCES customers(id),
total DECIMAL(10, 2),
created_at TIMESTAMP DEFAULT NOW()
);Many-to-Many (M:N)
Entities in A relate to multiple entities in B, and vice versa.
-- Products can be in many orders, orders can have many products
CREATE TABLE products (
id SERIAL PRIMARY KEY,
name VARCHAR(255) NOT NULL,
price DECIMAL(10, 2)
);
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
customer_id INTEGER,
created_at TIMESTAMP DEFAULT NOW()
);
-- Junction/Bridge table
CREATE TABLE order_items (
id SERIAL PRIMARY KEY,
order_id INTEGER REFERENCES orders(id),
product_id INTEGER REFERENCES products(id),
quantity INTEGER NOT NULL,
price DECIMAL(10, 2) NOT NULL,
UNIQUE(order_id, product_id)
);Normalization
Normalization organizes data to reduce redundancy and improve data integrity.
First Normal Form (1NF)
Rules:
- Each column contains atomic (indivisible) values
- Each column contains only one type of data
- Each row is unique (has a primary key)
-- ❌ Violates 1NF: Multiple values in one column
CREATE TABLE orders_bad (
id INTEGER,
products VARCHAR(500) -- "iPhone, MacBook, AirPods"
);
-- ✅ 1NF: Atomic values
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
customer_id INTEGER
);
CREATE TABLE order_items (
id SERIAL PRIMARY KEY,
order_id INTEGER REFERENCES orders(id),
product_name VARCHAR(255)
);Second Normal Form (2NF)
Rules:
- Is in 1NF
- All non-key attributes depend on the entire primary key (no partial dependencies)
-- ❌ Violates 2NF: product_name depends only on product_id, not order_id
CREATE TABLE order_items_bad (
order_id INTEGER,
product_id INTEGER,
product_name VARCHAR(255), -- Partial dependency
quantity INTEGER,
PRIMARY KEY (order_id, product_id)
);
-- ✅ 2NF: Separate tables
CREATE TABLE products (
id SERIAL PRIMARY KEY,
name VARCHAR(255)
);
CREATE TABLE order_items (
order_id INTEGER,
product_id INTEGER REFERENCES products(id),
quantity INTEGER,
PRIMARY KEY (order_id, product_id)
);Third Normal Form (3NF)
Rules:
- Is in 2NF
- No transitive dependencies (non-key attributes depend only on the primary key)
-- ❌ Violates 3NF: city depends on zip_code, not customer_id
CREATE TABLE customers_bad (
id SERIAL PRIMARY KEY,
name VARCHAR(255),
zip_code VARCHAR(10),
city VARCHAR(100) -- Transitive dependency: city depends on zip_code
);
-- ✅ 3NF: Separate tables
CREATE TABLE zip_codes (
zip_code VARCHAR(10) PRIMARY KEY,
city VARCHAR(100),
state VARCHAR(50)
);
CREATE TABLE customers (
id SERIAL PRIMARY KEY,
name VARCHAR(255),
zip_code VARCHAR(10) REFERENCES zip_codes(zip_code)
);Boyce-Codd Normal Form (BCNF)
Rules:
- Is in 3NF
- Every determinant is a candidate key
-- ❌ Violates BCNF: instructor determines room, but instructor isn't a key
CREATE TABLE classes_bad (
student_id INTEGER,
course VARCHAR(100),
instructor VARCHAR(100),
room VARCHAR(50), -- room depends on instructor, not on (student_id, course)
PRIMARY KEY (student_id, course)
);
-- ✅ BCNF: Separate instructor-room relationship
CREATE TABLE instructors (
id SERIAL PRIMARY KEY,
name VARCHAR(100),
room VARCHAR(50)
);
CREATE TABLE enrollments (
student_id INTEGER,
course VARCHAR(100),
instructor_id INTEGER REFERENCES instructors(id),
PRIMARY KEY (student_id, course)
);When to Denormalize
Sometimes denormalization improves performance:
-- Normalized: Requires JOIN for every query
SELECT o.id, c.name, c.email
FROM orders o
JOIN customers c ON o.customer_id = c.id;
-- Denormalized: Customer name copied to orders for faster reads
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
customer_id INTEGER REFERENCES customers(id),
customer_name VARCHAR(255), -- Denormalized
customer_email VARCHAR(255), -- Denormalized
total DECIMAL(10, 2)
);Denormalization trades storage and data consistency for read performance. Use it sparingly and maintain data integrity with triggers or application logic.
Primary Keys
Surrogate vs Natural Keys
-- Natural key: Real-world identifier
CREATE TABLE countries (
country_code CHAR(2) PRIMARY KEY, -- ISO code
name VARCHAR(100)
);
-- Surrogate key: Auto-generated identifier
CREATE TABLE users (
id SERIAL PRIMARY KEY, -- No real-world meaning
email VARCHAR(255) UNIQUE
);| Type | Pros | Cons |
|---|---|---|
| Natural | Meaningful, no extra column | Can change, may be long |
| Surrogate | Stable, compact, fast | Extra column, meaningless |
UUID vs Auto-Increment
-- Auto-increment: Simple, sequential
CREATE TABLE orders (
id SERIAL PRIMARY KEY
);
-- UUID: Globally unique, non-sequential
CREATE TABLE orders (
id UUID PRIMARY KEY DEFAULT gen_random_uuid()
);| Type | Pros | Cons |
|---|---|---|
| Auto-increment | Compact, ordered, fast inserts | Predictable, replication issues |
| UUID | Globally unique, unpredictable | Larger (16 bytes), random I/O |
Foreign Keys
Defining Relationships
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
customer_id INTEGER NOT NULL,
-- Foreign key constraint
CONSTRAINT fk_orders_customer
FOREIGN KEY (customer_id)
REFERENCES customers(id)
ON DELETE RESTRICT
ON UPDATE CASCADE
);Referential Actions
| Action | On DELETE | On UPDATE |
|---|---|---|
| CASCADE | Delete child rows | Update child FK values |
| RESTRICT | Prevent delete | Prevent update |
| SET NULL | Set FK to NULL | Set FK to NULL |
| SET DEFAULT | Set FK to default | Set FK to default |
| NO ACTION | Check at end of transaction | Check at end of transaction |
-- CASCADE: Delete order items when order is deleted
CREATE TABLE order_items (
id SERIAL PRIMARY KEY,
order_id INTEGER REFERENCES orders(id) ON DELETE CASCADE
);
-- SET NULL: Keep products but remove category reference
CREATE TABLE products (
id SERIAL PRIMARY KEY,
category_id INTEGER REFERENCES categories(id) ON DELETE SET NULL
);
-- RESTRICT: Prevent deleting customers with orders
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
customer_id INTEGER REFERENCES customers(id) ON DELETE RESTRICT
);Common Schema Patterns
Hierarchical Data (Trees)
Adjacency List
-- Simple but requires recursive queries
CREATE TABLE categories (
id SERIAL PRIMARY KEY,
name VARCHAR(255),
parent_id INTEGER REFERENCES categories(id)
);
-- Query children
SELECT * FROM categories WHERE parent_id = 1;
-- Recursive query for full tree
WITH RECURSIVE category_tree AS (
SELECT id, name, parent_id, 0 AS depth
FROM categories WHERE parent_id IS NULL
UNION ALL
SELECT c.id, c.name, c.parent_id, ct.depth + 1
FROM categories c
JOIN category_tree ct ON c.parent_id = ct.id
)
SELECT * FROM category_tree;Materialized Path
-- Store full path as string
CREATE TABLE categories (
id SERIAL PRIMARY KEY,
name VARCHAR(255),
path VARCHAR(500) -- "/1/5/12/"
);
-- Find all descendants
SELECT * FROM categories WHERE path LIKE '/1/5/%';
-- Find all ancestors
SELECT * FROM categories
WHERE '/1/5/12/' LIKE path || '%';Nested Sets
-- Store left and right boundaries
CREATE TABLE categories (
id SERIAL PRIMARY KEY,
name VARCHAR(255),
lft INTEGER NOT NULL,
rgt INTEGER NOT NULL
);
-- Find all descendants
SELECT * FROM categories
WHERE lft > 2 AND rgt < 13; -- Between parent's lft and rgt
-- Find all ancestors
SELECT * FROM categories
WHERE lft < 5 AND rgt > 8; -- Contains child's lft and rgtPolymorphic Associations
Single Table Inheritance
-- All types in one table
CREATE TABLE notifications (
id SERIAL PRIMARY KEY,
type VARCHAR(50), -- 'email', 'sms', 'push'
recipient VARCHAR(255),
subject VARCHAR(255), -- Only for email
phone_number VARCHAR(20), -- Only for sms
device_token VARCHAR(255), -- Only for push
message TEXT
);Class Table Inheritance
-- Base table
CREATE TABLE notifications (
id SERIAL PRIMARY KEY,
type VARCHAR(50),
recipient VARCHAR(255),
message TEXT
);
-- Type-specific tables
CREATE TABLE email_notifications (
id INTEGER PRIMARY KEY REFERENCES notifications(id),
subject VARCHAR(255),
html_body TEXT
);
CREATE TABLE sms_notifications (
id INTEGER PRIMARY KEY REFERENCES notifications(id),
phone_number VARCHAR(20)
);Polymorphic Junction
-- Polymorphic comments
CREATE TABLE comments (
id SERIAL PRIMARY KEY,
body TEXT,
commentable_type VARCHAR(50), -- 'Post', 'Photo', 'Video'
commentable_id INTEGER,
created_at TIMESTAMP DEFAULT NOW()
);
-- Index for efficient lookups
CREATE INDEX idx_comments_polymorphic
ON comments (commentable_type, commentable_id);Audit/History Tables
-- Main table
CREATE TABLE products (
id SERIAL PRIMARY KEY,
name VARCHAR(255),
price DECIMAL(10, 2),
updated_at TIMESTAMP DEFAULT NOW()
);
-- Audit table
CREATE TABLE products_audit (
audit_id SERIAL PRIMARY KEY,
product_id INTEGER,
name VARCHAR(255),
price DECIMAL(10, 2),
changed_at TIMESTAMP DEFAULT NOW(),
changed_by VARCHAR(100),
operation CHAR(1) -- 'I', 'U', 'D'
);
-- Trigger for audit
CREATE OR REPLACE FUNCTION audit_products() RETURNS TRIGGER AS $$
BEGIN
IF TG_OP = 'DELETE' THEN
INSERT INTO products_audit (product_id, name, price, operation)
VALUES (OLD.id, OLD.name, OLD.price, 'D');
RETURN OLD;
ELSIF TG_OP = 'UPDATE' THEN
INSERT INTO products_audit (product_id, name, price, operation)
VALUES (NEW.id, NEW.name, NEW.price, 'U');
RETURN NEW;
ELSIF TG_OP = 'INSERT' THEN
INSERT INTO products_audit (product_id, name, price, operation)
VALUES (NEW.id, NEW.name, NEW.price, 'I');
RETURN NEW;
END IF;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER products_audit_trigger
AFTER INSERT OR UPDATE OR DELETE ON products
FOR EACH ROW EXECUTE FUNCTION audit_products();Soft Delete
-- Add deleted_at column
CREATE TABLE users (
id SERIAL PRIMARY KEY,
email VARCHAR(255),
deleted_at TIMESTAMP -- NULL = active
);
-- Soft delete
UPDATE users SET deleted_at = NOW() WHERE id = 1;
-- Query active users
SELECT * FROM users WHERE deleted_at IS NULL;
-- View for active users only
CREATE VIEW active_users AS
SELECT * FROM users WHERE deleted_at IS NULL;
-- Partial unique index
CREATE UNIQUE INDEX idx_users_email_active
ON users (email)
WHERE deleted_at IS NULL;Tags/Taxonomy
-- Many-to-many with metadata
CREATE TABLE posts (
id SERIAL PRIMARY KEY,
title VARCHAR(255),
body TEXT
);
CREATE TABLE tags (
id SERIAL PRIMARY KEY,
name VARCHAR(100) UNIQUE,
slug VARCHAR(100) UNIQUE
);
CREATE TABLE post_tags (
post_id INTEGER REFERENCES posts(id) ON DELETE CASCADE,
tag_id INTEGER REFERENCES tags(id) ON DELETE CASCADE,
created_at TIMESTAMP DEFAULT NOW(),
PRIMARY KEY (post_id, tag_id)
);
-- Find posts with specific tags
SELECT p.*
FROM posts p
JOIN post_tags pt ON p.id = pt.post_id
JOIN tags t ON pt.tag_id = t.id
WHERE t.slug IN ('javascript', 'typescript');
-- Find posts with ALL specified tags
SELECT p.*
FROM posts p
JOIN post_tags pt ON p.id = pt.post_id
JOIN tags t ON pt.tag_id = t.id
WHERE t.slug IN ('javascript', 'typescript')
GROUP BY p.id
HAVING COUNT(DISTINCT t.id) = 2;Schema Conventions
Naming Conventions
-- Tables: plural, snake_case
CREATE TABLE order_items (...);
-- Columns: singular, snake_case
CREATE TABLE users (
id SERIAL PRIMARY KEY,
first_name VARCHAR(100),
last_name VARCHAR(100),
created_at TIMESTAMP
);
-- Foreign keys: singular_table_id
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
customer_id INTEGER REFERENCES customers(id)
);
-- Junction tables: alphabetical order or descriptive
CREATE TABLE post_tags (...); -- or tag_assignments
-- Indexes: idx_table_columns
CREATE INDEX idx_orders_customer_id ON orders (customer_id);
-- Constraints: chk_, fk_, uq_, pk_
CONSTRAINT chk_price_positive CHECK (price > 0)
CONSTRAINT fk_orders_customer FOREIGN KEY (customer_id) REFERENCES customers(id)
CONSTRAINT uq_users_email UNIQUE (email)Standard Columns
-- Recommended standard columns
CREATE TABLE entities (
-- Primary key
id SERIAL PRIMARY KEY,
-- or
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
-- Timestamps
created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
-- Soft delete
deleted_at TIMESTAMP WITH TIME ZONE,
-- Audit
created_by INTEGER REFERENCES users(id),
updated_by INTEGER REFERENCES users(id)
);
-- Auto-update updated_at (PostgreSQL)
CREATE OR REPLACE FUNCTION update_updated_at()
RETURNS TRIGGER AS $$
BEGIN
NEW.updated_at = NOW();
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER set_updated_at
BEFORE UPDATE ON entities
FOR EACH ROW EXECUTE FUNCTION update_updated_at();Best Practices
Do's
- Use constraints: Primary keys, foreign keys, NOT NULL, CHECK
- Normalize first: Then denormalize if needed for performance
- Index foreign keys: Foreign keys should usually be indexed
- Use appropriate data types: Don't use VARCHAR(255) for everything
- Add timestamps: created_at and updated_at are almost always useful
- Document your schema: Comments on tables and columns
-- Example with best practices
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
customer_id INTEGER NOT NULL REFERENCES customers(id),
status VARCHAR(20) NOT NULL DEFAULT 'pending'
CHECK (status IN ('pending', 'processing', 'shipped', 'delivered', 'cancelled')),
total DECIMAL(10, 2) NOT NULL CHECK (total >= 0),
notes TEXT,
created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);
COMMENT ON TABLE orders IS 'Customer orders';
COMMENT ON COLUMN orders.status IS 'Order status: pending, processing, shipped, delivered, cancelled';
CREATE INDEX idx_orders_customer_id ON orders (customer_id);
CREATE INDEX idx_orders_status ON orders (status);
CREATE INDEX idx_orders_created_at ON orders (created_at);Don'ts
- Don't use reserved words: Avoid naming tables/columns with SQL keywords
- Don't skip foreign keys: They ensure data integrity
- Don't over-normalize: Balance integrity with practicality
- Don't ignore NULLs: Explicitly decide nullable vs NOT NULL
- Don't use FLOAT for money: Use DECIMAL or integer cents
-- ❌ Don't
CREATE TABLE order ( -- Reserved word
float_price FLOAT, -- Imprecise for money
user VARCHAR(255) -- Reserved word
);
-- ✅ Do
CREATE TABLE orders (
price DECIMAL(10, 2) NOT NULL,
user_name VARCHAR(255)
);Schema Migration Tips
-- Adding columns (safe)
ALTER TABLE users ADD COLUMN phone VARCHAR(20);
-- Adding NOT NULL column with default (safe)
ALTER TABLE users ADD COLUMN status VARCHAR(20) NOT NULL DEFAULT 'active';
-- Renaming columns (may break queries)
ALTER TABLE users RENAME COLUMN name TO full_name;
-- Changing column type (may fail with data)
ALTER TABLE users ALTER COLUMN age TYPE INTEGER USING age::integer;
-- Adding foreign key (may fail if orphaned data exists)
ALTER TABLE orders
ADD CONSTRAINT fk_orders_customer
FOREIGN KEY (customer_id) REFERENCES customers(id);
-- Dropping column (irreversible)
ALTER TABLE users DROP COLUMN deprecated_field;Always test migrations on a copy of production data before applying to production.