Database Design

Guide to database design including normalization, relationships, schema design, and best practices for building maintainable databases

Good database design is crucial for application performance, data integrity, and maintainability. This guide covers normalization, relationships, schema design patterns, and best practices.

Database Design Process

Requirements Analysis: Understand what data needs to be stored
Conceptual Design: Create an Entity-Relationship Diagram (ERD)
Logical Design: Normalize tables and define relationships
Physical Design: Optimize for performance with indexes and partitioning

Entity-Relationship Modeling

Entities and Attributes

An entity represents a real-world object or concept. Attributes describe the entity.

Entity: Customer
Attributes:
  - id (Primary Key)
  - email
  - first_name
  - last_name
  - created_at

Relationship Types

One-to-One (1:1)

Each entity in A relates to exactly one entity in B.

-- User has one profile
CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    email VARCHAR(255) NOT NULL UNIQUE
);

CREATE TABLE user_profiles (
    id SERIAL PRIMARY KEY,
    user_id INTEGER UNIQUE REFERENCES users(id),
    bio TEXT,
    avatar_url VARCHAR(500)
);

One-to-Many (1:N)

Each entity in A relates to multiple entities in B.

-- Customer has many orders
CREATE TABLE customers (
    id SERIAL PRIMARY KEY,
    name VARCHAR(255) NOT NULL
);

CREATE TABLE orders (
    id SERIAL PRIMARY KEY,
    customer_id INTEGER REFERENCES customers(id),
    total DECIMAL(10, 2),
    created_at TIMESTAMP DEFAULT NOW()
);

Many-to-Many (M:N)

Entities in A relate to multiple entities in B, and vice versa.

-- Products can be in many orders, orders can have many products
CREATE TABLE products (
    id SERIAL PRIMARY KEY,
    name VARCHAR(255) NOT NULL,
    price DECIMAL(10, 2)
);

CREATE TABLE orders (
    id SERIAL PRIMARY KEY,
    customer_id INTEGER,
    created_at TIMESTAMP DEFAULT NOW()
);

-- Junction/Bridge table
CREATE TABLE order_items (
    id SERIAL PRIMARY KEY,
    order_id INTEGER REFERENCES orders(id),
    product_id INTEGER REFERENCES products(id),
    quantity INTEGER NOT NULL,
    price DECIMAL(10, 2) NOT NULL,
    UNIQUE(order_id, product_id)
);

Normalization

Normalization organizes data to reduce redundancy and improve data integrity.

First Normal Form (1NF)

Rules:

Each column contains atomic (indivisible) values
Each column contains only one type of data
Each row is unique (has a primary key)

-- ❌ Violates 1NF: Multiple values in one column
CREATE TABLE orders_bad (
    id INTEGER,
    products VARCHAR(500)  -- "iPhone, MacBook, AirPods"
);

-- ✅ 1NF: Atomic values
CREATE TABLE orders (
    id SERIAL PRIMARY KEY,
    customer_id INTEGER
);

CREATE TABLE order_items (
    id SERIAL PRIMARY KEY,
    order_id INTEGER REFERENCES orders(id),
    product_name VARCHAR(255)
);

Second Normal Form (2NF)

Rules:

Is in 1NF
All non-key attributes depend on the entire primary key (no partial dependencies)

-- ❌ Violates 2NF: product_name depends only on product_id, not order_id
CREATE TABLE order_items_bad (
    order_id INTEGER,
    product_id INTEGER,
    product_name VARCHAR(255),  -- Partial dependency
    quantity INTEGER,
    PRIMARY KEY (order_id, product_id)
);

-- ✅ 2NF: Separate tables
CREATE TABLE products (
    id SERIAL PRIMARY KEY,
    name VARCHAR(255)
);

CREATE TABLE order_items (
    order_id INTEGER,
    product_id INTEGER REFERENCES products(id),
    quantity INTEGER,
    PRIMARY KEY (order_id, product_id)
);

Third Normal Form (3NF)

Rules:

Is in 2NF
No transitive dependencies (non-key attributes depend only on the primary key)

-- ❌ Violates 3NF: city depends on zip_code, not customer_id
CREATE TABLE customers_bad (
    id SERIAL PRIMARY KEY,
    name VARCHAR(255),
    zip_code VARCHAR(10),
    city VARCHAR(100)  -- Transitive dependency: city depends on zip_code
);

-- ✅ 3NF: Separate tables
CREATE TABLE zip_codes (
    zip_code VARCHAR(10) PRIMARY KEY,
    city VARCHAR(100),
    state VARCHAR(50)
);

CREATE TABLE customers (
    id SERIAL PRIMARY KEY,
    name VARCHAR(255),
    zip_code VARCHAR(10) REFERENCES zip_codes(zip_code)
);

Boyce-Codd Normal Form (BCNF)

Rules:

Is in 3NF
Every determinant is a candidate key

-- ❌ Violates BCNF: instructor determines room, but instructor isn't a key
CREATE TABLE classes_bad (
    student_id INTEGER,
    course VARCHAR(100),
    instructor VARCHAR(100),
    room VARCHAR(50),  -- room depends on instructor, not on (student_id, course)
    PRIMARY KEY (student_id, course)
);

-- ✅ BCNF: Separate instructor-room relationship
CREATE TABLE instructors (
    id SERIAL PRIMARY KEY,
    name VARCHAR(100),
    room VARCHAR(50)
);

CREATE TABLE enrollments (
    student_id INTEGER,
    course VARCHAR(100),
    instructor_id INTEGER REFERENCES instructors(id),
    PRIMARY KEY (student_id, course)
);

When to Denormalize

Sometimes denormalization improves performance:

-- Normalized: Requires JOIN for every query
SELECT o.id, c.name, c.email 
FROM orders o 
JOIN customers c ON o.customer_id = c.id;

-- Denormalized: Customer name copied to orders for faster reads
CREATE TABLE orders (
    id SERIAL PRIMARY KEY,
    customer_id INTEGER REFERENCES customers(id),
    customer_name VARCHAR(255),  -- Denormalized
    customer_email VARCHAR(255),  -- Denormalized
    total DECIMAL(10, 2)
);

Denormalization trades storage and data consistency for read performance. Use it sparingly and maintain data integrity with triggers or application logic.

Primary Keys

Surrogate vs Natural Keys

-- Natural key: Real-world identifier
CREATE TABLE countries (
    country_code CHAR(2) PRIMARY KEY,  -- ISO code
    name VARCHAR(100)
);

-- Surrogate key: Auto-generated identifier
CREATE TABLE users (
    id SERIAL PRIMARY KEY,  -- No real-world meaning
    email VARCHAR(255) UNIQUE
);

Type	Pros	Cons
Natural	Meaningful, no extra column	Can change, may be long
Surrogate	Stable, compact, fast	Extra column, meaningless

UUID vs Auto-Increment

-- Auto-increment: Simple, sequential
CREATE TABLE orders (
    id SERIAL PRIMARY KEY
);

-- UUID: Globally unique, non-sequential
CREATE TABLE orders (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid()
);

Type	Pros	Cons
Auto-increment	Compact, ordered, fast inserts	Predictable, replication issues
UUID	Globally unique, unpredictable	Larger (16 bytes), random I/O

Foreign Keys

Defining Relationships

CREATE TABLE orders (
    id SERIAL PRIMARY KEY,
    customer_id INTEGER NOT NULL,
    
    -- Foreign key constraint
    CONSTRAINT fk_orders_customer
        FOREIGN KEY (customer_id)
        REFERENCES customers(id)
        ON DELETE RESTRICT
        ON UPDATE CASCADE
);

Referential Actions

Action	On DELETE	On UPDATE
CASCADE	Delete child rows	Update child FK values
RESTRICT	Prevent delete	Prevent update
SET NULL	Set FK to NULL	Set FK to NULL
SET DEFAULT	Set FK to default	Set FK to default
NO ACTION	Check at end of transaction	Check at end of transaction

-- CASCADE: Delete order items when order is deleted
CREATE TABLE order_items (
    id SERIAL PRIMARY KEY,
    order_id INTEGER REFERENCES orders(id) ON DELETE CASCADE
);

-- SET NULL: Keep products but remove category reference
CREATE TABLE products (
    id SERIAL PRIMARY KEY,
    category_id INTEGER REFERENCES categories(id) ON DELETE SET NULL
);

-- RESTRICT: Prevent deleting customers with orders
CREATE TABLE orders (
    id SERIAL PRIMARY KEY,
    customer_id INTEGER REFERENCES customers(id) ON DELETE RESTRICT
);

Common Schema Patterns

Hierarchical Data (Trees)

Adjacency List

-- Simple but requires recursive queries
CREATE TABLE categories (
    id SERIAL PRIMARY KEY,
    name VARCHAR(255),
    parent_id INTEGER REFERENCES categories(id)
);

-- Query children
SELECT * FROM categories WHERE parent_id = 1;

-- Recursive query for full tree
WITH RECURSIVE category_tree AS (
    SELECT id, name, parent_id, 0 AS depth
    FROM categories WHERE parent_id IS NULL
    
    UNION ALL
    
    SELECT c.id, c.name, c.parent_id, ct.depth + 1
    FROM categories c
    JOIN category_tree ct ON c.parent_id = ct.id
)
SELECT * FROM category_tree;

Materialized Path

-- Store full path as string
CREATE TABLE categories (
    id SERIAL PRIMARY KEY,
    name VARCHAR(255),
    path VARCHAR(500)  -- "/1/5/12/"
);

-- Find all descendants
SELECT * FROM categories WHERE path LIKE '/1/5/%';

-- Find all ancestors
SELECT * FROM categories 
WHERE '/1/5/12/' LIKE path || '%';

Nested Sets

-- Store left and right boundaries
CREATE TABLE categories (
    id SERIAL PRIMARY KEY,
    name VARCHAR(255),
    lft INTEGER NOT NULL,
    rgt INTEGER NOT NULL
);

-- Find all descendants
SELECT * FROM categories
WHERE lft > 2 AND rgt < 13;  -- Between parent's lft and rgt

-- Find all ancestors
SELECT * FROM categories
WHERE lft < 5 AND rgt > 8;  -- Contains child's lft and rgt

Polymorphic Associations

Single Table Inheritance

-- All types in one table
CREATE TABLE notifications (
    id SERIAL PRIMARY KEY,
    type VARCHAR(50),  -- 'email', 'sms', 'push'
    recipient VARCHAR(255),
    subject VARCHAR(255),  -- Only for email
    phone_number VARCHAR(20),  -- Only for sms
    device_token VARCHAR(255),  -- Only for push
    message TEXT
);

Class Table Inheritance

-- Base table
CREATE TABLE notifications (
    id SERIAL PRIMARY KEY,
    type VARCHAR(50),
    recipient VARCHAR(255),
    message TEXT
);

-- Type-specific tables
CREATE TABLE email_notifications (
    id INTEGER PRIMARY KEY REFERENCES notifications(id),
    subject VARCHAR(255),
    html_body TEXT
);

CREATE TABLE sms_notifications (
    id INTEGER PRIMARY KEY REFERENCES notifications(id),
    phone_number VARCHAR(20)
);

Polymorphic Junction

-- Polymorphic comments
CREATE TABLE comments (
    id SERIAL PRIMARY KEY,
    body TEXT,
    commentable_type VARCHAR(50),  -- 'Post', 'Photo', 'Video'
    commentable_id INTEGER,
    created_at TIMESTAMP DEFAULT NOW()
);

-- Index for efficient lookups
CREATE INDEX idx_comments_polymorphic 
ON comments (commentable_type, commentable_id);

Audit/History Tables

-- Main table
CREATE TABLE products (
    id SERIAL PRIMARY KEY,
    name VARCHAR(255),
    price DECIMAL(10, 2),
    updated_at TIMESTAMP DEFAULT NOW()
);

-- Audit table
CREATE TABLE products_audit (
    audit_id SERIAL PRIMARY KEY,
    product_id INTEGER,
    name VARCHAR(255),
    price DECIMAL(10, 2),
    changed_at TIMESTAMP DEFAULT NOW(),
    changed_by VARCHAR(100),
    operation CHAR(1)  -- 'I', 'U', 'D'
);

-- Trigger for audit
CREATE OR REPLACE FUNCTION audit_products() RETURNS TRIGGER AS $$
BEGIN
    IF TG_OP = 'DELETE' THEN
        INSERT INTO products_audit (product_id, name, price, operation)
        VALUES (OLD.id, OLD.name, OLD.price, 'D');
        RETURN OLD;
    ELSIF TG_OP = 'UPDATE' THEN
        INSERT INTO products_audit (product_id, name, price, operation)
        VALUES (NEW.id, NEW.name, NEW.price, 'U');
        RETURN NEW;
    ELSIF TG_OP = 'INSERT' THEN
        INSERT INTO products_audit (product_id, name, price, operation)
        VALUES (NEW.id, NEW.name, NEW.price, 'I');
        RETURN NEW;
    END IF;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER products_audit_trigger
AFTER INSERT OR UPDATE OR DELETE ON products
FOR EACH ROW EXECUTE FUNCTION audit_products();

Soft Delete

-- Add deleted_at column
CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    email VARCHAR(255),
    deleted_at TIMESTAMP  -- NULL = active
);

-- Soft delete
UPDATE users SET deleted_at = NOW() WHERE id = 1;

-- Query active users
SELECT * FROM users WHERE deleted_at IS NULL;

-- View for active users only
CREATE VIEW active_users AS
SELECT * FROM users WHERE deleted_at IS NULL;

-- Partial unique index
CREATE UNIQUE INDEX idx_users_email_active 
ON users (email) 
WHERE deleted_at IS NULL;

Tags/Taxonomy

-- Many-to-many with metadata
CREATE TABLE posts (
    id SERIAL PRIMARY KEY,
    title VARCHAR(255),
    body TEXT
);

CREATE TABLE tags (
    id SERIAL PRIMARY KEY,
    name VARCHAR(100) UNIQUE,
    slug VARCHAR(100) UNIQUE
);

CREATE TABLE post_tags (
    post_id INTEGER REFERENCES posts(id) ON DELETE CASCADE,
    tag_id INTEGER REFERENCES tags(id) ON DELETE CASCADE,
    created_at TIMESTAMP DEFAULT NOW(),
    PRIMARY KEY (post_id, tag_id)
);

-- Find posts with specific tags
SELECT p.*
FROM posts p
JOIN post_tags pt ON p.id = pt.post_id
JOIN tags t ON pt.tag_id = t.id
WHERE t.slug IN ('javascript', 'typescript');

-- Find posts with ALL specified tags
SELECT p.*
FROM posts p
JOIN post_tags pt ON p.id = pt.post_id
JOIN tags t ON pt.tag_id = t.id
WHERE t.slug IN ('javascript', 'typescript')
GROUP BY p.id
HAVING COUNT(DISTINCT t.id) = 2;

Schema Conventions

Naming Conventions

-- Tables: plural, snake_case
CREATE TABLE order_items (...);

-- Columns: singular, snake_case
CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    first_name VARCHAR(100),
    last_name VARCHAR(100),
    created_at TIMESTAMP
);

-- Foreign keys: singular_table_id
CREATE TABLE orders (
    id SERIAL PRIMARY KEY,
    customer_id INTEGER REFERENCES customers(id)
);

-- Junction tables: alphabetical order or descriptive
CREATE TABLE post_tags (...);  -- or tag_assignments

-- Indexes: idx_table_columns
CREATE INDEX idx_orders_customer_id ON orders (customer_id);

-- Constraints: chk_, fk_, uq_, pk_
CONSTRAINT chk_price_positive CHECK (price > 0)
CONSTRAINT fk_orders_customer FOREIGN KEY (customer_id) REFERENCES customers(id)
CONSTRAINT uq_users_email UNIQUE (email)

Standard Columns

-- Recommended standard columns
CREATE TABLE entities (
    -- Primary key
    id SERIAL PRIMARY KEY,
    -- or
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    
    -- Timestamps
    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    
    -- Soft delete
    deleted_at TIMESTAMP WITH TIME ZONE,
    
    -- Audit
    created_by INTEGER REFERENCES users(id),
    updated_by INTEGER REFERENCES users(id)
);

-- Auto-update updated_at (PostgreSQL)
CREATE OR REPLACE FUNCTION update_updated_at()
RETURNS TRIGGER AS $$
BEGIN
    NEW.updated_at = NOW();
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER set_updated_at
BEFORE UPDATE ON entities
FOR EACH ROW EXECUTE FUNCTION update_updated_at();

Best Practices

Do's

Use constraints: Primary keys, foreign keys, NOT NULL, CHECK
Normalize first: Then denormalize if needed for performance
Index foreign keys: Foreign keys should usually be indexed
Use appropriate data types: Don't use VARCHAR(255) for everything
Add timestamps: created_at and updated_at are almost always useful
Document your schema: Comments on tables and columns

-- Example with best practices
CREATE TABLE orders (
    id SERIAL PRIMARY KEY,
    customer_id INTEGER NOT NULL REFERENCES customers(id),
    status VARCHAR(20) NOT NULL DEFAULT 'pending' 
        CHECK (status IN ('pending', 'processing', 'shipped', 'delivered', 'cancelled')),
    total DECIMAL(10, 2) NOT NULL CHECK (total >= 0),
    notes TEXT,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);

COMMENT ON TABLE orders IS 'Customer orders';
COMMENT ON COLUMN orders.status IS 'Order status: pending, processing, shipped, delivered, cancelled';

CREATE INDEX idx_orders_customer_id ON orders (customer_id);
CREATE INDEX idx_orders_status ON orders (status);
CREATE INDEX idx_orders_created_at ON orders (created_at);

Don'ts

Don't use reserved words: Avoid naming tables/columns with SQL keywords
Don't skip foreign keys: They ensure data integrity
Don't over-normalize: Balance integrity with practicality
Don't ignore NULLs: Explicitly decide nullable vs NOT NULL
Don't use FLOAT for money: Use DECIMAL or integer cents

-- ❌ Don't
CREATE TABLE order (           -- Reserved word
    float_price FLOAT,         -- Imprecise for money
    user VARCHAR(255)          -- Reserved word
);

-- ✅ Do
CREATE TABLE orders (
    price DECIMAL(10, 2) NOT NULL,
    user_name VARCHAR(255)
);

Schema Migration Tips

-- Adding columns (safe)
ALTER TABLE users ADD COLUMN phone VARCHAR(20);

-- Adding NOT NULL column with default (safe)
ALTER TABLE users ADD COLUMN status VARCHAR(20) NOT NULL DEFAULT 'active';

-- Renaming columns (may break queries)
ALTER TABLE users RENAME COLUMN name TO full_name;

-- Changing column type (may fail with data)
ALTER TABLE users ALTER COLUMN age TYPE INTEGER USING age::integer;

-- Adding foreign key (may fail if orphaned data exists)
ALTER TABLE orders 
ADD CONSTRAINT fk_orders_customer 
FOREIGN KEY (customer_id) REFERENCES customers(id);

-- Dropping column (irreversible)
ALTER TABLE users DROP COLUMN deprecated_field;

Always test migrations on a copy of production data before applying to production.

On this page