Skip to content

Advanced Postgres

You know SELECT , INSERT , UPDATE , DELETE Now let's talk about the features that separate people who "know SQL" from people who make databases sing - transactions that save your ass when things go sideways, window functions that replace five nested subqueries with one elegant statement, and CTEs that turn unreadable 200-line queries into composable data pipelines

Transactions - ACID in action

A transaction groups multiple operations into a single atomic unit: either all of them succeed (COMMIT) or none of them take effect (ROLLBACK). This is how banks move money between accounts without losing a cent and how your e-commerce site doesn't charge customers for orders that failed to create

-- BEGIN a transaction
BEGIN;

-- Deduct from sender
UPDATE app.accounts
SET balance = balance - 100.00
WHERE id = 'account-ali';

-- Credit to receiver
UPDATE app.accounts
SET balance = balance + 100.00
WHERE id = 'account-omar';

-- If both succeeded , commit
COMMIT;

-- If anything went wrong , rollback (undoes both updates)
ROLLBACK;

Savepoints let you roll back part of a transaction without aborting the whole thing. Think of them as nested checkpoints within a larger operation where you can recover from specific failures

BEGIN;

INSERT INTO app.orders (user_id, total_amount) VALUES ('user-1', 50.00);

SAVEPOINT before_items;

INSERT INTO app.order_items (order_id, product_id, quantity, price)
VALUES ('order-1', 'product-xyz', 1, 50.00);

-- Something went wrong with the items , rollback to savepoint
ROLLBACK TO SAVEPOINT before_items;

-- Try again with corrected data
INSERT INTO app.order_items (order_id, product_id, quantity, price)
VALUES ('order-1', 'product-abc', 2, 25.00);

COMMIT;

Transaction isolation levels

Postgres implements four isolation levels that control what happens when transactions run concurrently and try to read/write the same data. The defaults are sane but understanding the edge cases will save you from the kind of bug that only manifests under load when two users simultaneously book the last flight seat

-- Read Committed (default) - prevents dirty reads
-- Each query in the transaction sees only committed data at the moment the query runs
-- Different queries in the same transaction can see different states of the data

-- Repeatable Read - prevents dirty reads AND non-repeatable reads
-- The transaction sees a snapshot of data taken at the first query
-- Prevents "same query , different results" within one transaction

-- Serializable - the strictest level
-- Transactions execute as if they ran one after another
-- May fail with "serialization failure" if conflicts are detected
BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;

-- Check seat availability atomically
SELECT COUNT(*) FROM app.flight_seats
WHERE flight_id = 'flight-123' AND booked = FALSE;

-- If count > 0, book the seat
UPDATE app.flight_seats SET booked = TRUE
WHERE flight_id = 'flight-123' AND seat_number = '12A';

COMMIT;

Common Table Expressions (CTEs)

CTEs let you define temporary result sets within a query that you can reference like a virtual table. They make complex queries readable by breaking them into named steps instead of piling up subqueries like a drunk developer's dependency graph

-- Basic CTE - define a named subquery
WITH active_users AS (
    SELECT id, email, username
    FROM app.users
    WHERE is_active = TRUE
      AND last_login > NOW() - INTERVAL '30 days'
)
SELECT au.username, COUNT(o.id) AS order_count
FROM active_users au
LEFT JOIN app.orders o ON o.user_id = au.id
GROUP BY au.id, au.username
ORDER BY order_count DESC;

-- Multiple CTEs in one query
WITH
user_orders AS (
    SELECT user_id, COUNT(*) AS orders, SUM(total_amount) AS total_spent
    FROM app.orders
    GROUP BY user_id
),
user_sessions AS (
    SELECT user_id, COUNT(*) AS sessions
    FROM app.sessions
    GROUP BY user_id
)
SELECT u.email, uo.orders, uo.total_spent, us.sessions
FROM app.users u
LEFT JOIN user_orders uo ON uo.user_id = u.id
LEFT JOIN user_sessions us ON us.user_id = u.id
WHERE u.is_active = TRUE;

-- Recursive CTE - queries that reference themselves (hierarchical data)
WITH RECURSIVE org_tree AS (
    -- Base case: top-level managers
    SELECT id, name, manager_id, 1 AS level
    FROM app.employees
    WHERE manager_id IS NULL

    UNION ALL

    -- Recursive step: direct reports
    SELECT e.id, e.name, e.manager_id, ot.level + 1
    FROM app.employees e
    JOIN org_tree ot ON ot.id = e.manager_id
)
SELECT * FROM org_tree ORDER BY level, name;

Window functions

Window functions perform calculations across a set of rows related to the current row, without collapsing rows into a single output like GROUP BY does. Each row keeps its identity while gaining access to aggregate values, rank positions, and running totals from its "window" of related data

-- ROW_NUMBER - assign a unique number to each row within a partition
SELECT
    email,
    role,
    created_at,
    ROW_NUMBER() OVER (PARTITION BY role ORDER BY created_at) AS join_order
FROM app.users;
-- Each role group gets its own numbered sequence from 1

-- RANK and DENSE_RANK - ranking with ties
SELECT
    username,
    total_spent,
    RANK() OVER (ORDER BY total_spent DESC) AS rank,
    DENSE_RANK() OVER (ORDER BY total_spent DESC) AS dense_rank
FROM user_spending;
-- RANK: 1, 2, 2, 4 (skips 3 when two are tied)
-- DENSE_RANK: 1, 2, 2, 3 (no gaps)

-- Running totals with SUM
SELECT
    created_at,
    total_amount,
    SUM(total_amount) OVER (ORDER BY created_at) AS running_revenue
FROM app.orders
WHERE status != 'cancelled';

-- Moving average (last 7 days)
SELECT
    created_at::date AS day,
    SUM(total_amount) AS daily_revenue,
    AVG(SUM(total_amount)) OVER (
        ORDER BY created_at::date
        ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
    ) AS moving_avg_7day
FROM app.orders
GROUP BY created_at::date;

-- LAG and LEAD - access previous/next row values
SELECT
    created_at::date AS day,
    SUM(total_amount) AS revenue,
    LAG(SUM(total_amount), 1) OVER (ORDER BY created_at::date) AS prev_day,
    SUM(total_amount) - LAG(SUM(total_amount), 1) OVER (ORDER BY created_at::date) AS day_over_day_change
FROM app.orders
GROUP BY created_at::date;

Views - virtual tables

Views are saved SQL queries that behave like tables in SELECT statements but store no data themselves. They're security boundaries (hide columns , restrict rows), simplification layers (complex joins become single-table reads), and consistency tools (everyone uses the same definition)

-- Create a view that shows only active users without password hashes
CREATE VIEW app.view_active_users AS
SELECT id, email, username, role, created_at
FROM app.users
WHERE is_active = TRUE;

-- Query the view like a table
SELECT * FROM app.view_active_users ORDER BY created_at DESC;

-- Create a view joining multiple tables for reporting
CREATE VIEW app.order_summary AS
SELECT
    o.id AS order_id,
    u.email AS user_email,
    o.total_amount,
    o.status,
    o.created_at,
    COUNT(oi.id) AS item_count
FROM app.orders o
JOIN app.users u ON u.id = o.user_id
LEFT JOIN app.order_items oi ON oi.order_id = o.id
GROUP BY o.id, u.email;

Materialized Views - cached queries

Materialized views physically store the query results on disk and can be refreshed on demand. They're faster than regular views because they avoid executing the underlying query every time, at the cost of serving potentially stale data

-- Create a materialized view for an expensive reporting query
CREATE MATERIALIZED VIEW app.daily_revenue_mv AS
SELECT
    created_at::date AS day,
    COUNT(*) AS orders,
    SUM(total_amount) AS revenue,
    COUNT(DISTINCT user_id) AS unique_customers
FROM app.orders
WHERE status != 'cancelled'
GROUP BY created_at::date
WITH DATA;

-- Refresh it (takes a lock - use CONCURRENTLY for production)
REFRESH MATERIALIZED VIEW CONCURRENTLY app.daily_revenue_mv;
-- CONCURRENTLY requires a unique index on the MV but doesn't block reads

Triggers - automatic reactions

Triggers fire functions automatically when INSERT , UPDATE , DELETE , or TRUNCATE happens on a table. They're the database's version of event listeners - useful for audit logging, validation, and computed columns, but dangerous because they execute invisibly and can create unexpected side effects

-- Create a trigger function
CREATE OR REPLACE FUNCTION app.log_user_changes()
RETURNS TRIGGER AS $$
BEGIN
    INSERT INTO app.audit_log (table_name, record_id, old_data, new_data, action, changed_by)
    VALUES (
        'users',
        NEW.id,
        CASE WHEN TG_OP = 'UPDATE' THEN row_to_json(OLD) ELSE NULL END,
        CASE WHEN TG_OP = 'DELETE' THEN NULL ELSE row_to_json(NEW) END,
        TG_OP,
        current_user
    );
    RETURN NEW;
END;
$$ LANGUAGE plpgsql SECURITY DEFINER;

-- Attach trigger to table
CREATE TRIGGER audit_users_changes
    AFTER INSERT OR UPDATE OR DELETE ON app.users
    FOR EACH ROW
    EXECUTE FUNCTION app.log_user_changes();

Triggers add latency to every DML operation they fire on. Don't put business logic in triggers unless you genuinely need database-enforced behavior that the application layer can't guarantee - it's a common pattern that creates "magic" side effects that new developers will spend hours debugging

prerequisites

db_03_postgres_indexes.md - you should understand how queries execute before you start composing them into CTEs and window functions that span multiple tables with complex aggregation logic


next → db_05_postgres_security.md