Scaling Backend Apps¶

The worst time to learn about scaling is when your app is on fire because some product launch brought in 10x traffic and your single Express process is crying in the corner Scale isn't a feature you add — it's a constraint you design for from day one. Your architecture either scales or it doesn't , and retrofitting is expensive as hell

Horizontal vs Vertical Scaling¶

Vertical Scaling (scale up): Add more power to the same machine — more RAM , faster CPU , bigger SSD

Pros: No code changes , simple , works for stateful apps
Cons: Hard limit (biggest cloud instance available) , downtime during upgrade , single point of failure , cost grows exponentially past a point
When to use: Legacy apps you can't refactor , stateful services you haven't decoupled yet , databases (for now)

Horizontal Scaling (scale out): Add more machines behind a load balancer

Pros: Near-limitless scale , failover (one dies , others keep going) , cost grows linearly , can use cheaper instances
Cons: Requires stateless app design , adds network complexity , distributed systems problems (eventual consistency , coordination)
When to use: Web servers , APIs , stateless microservices — anything that doesn't store local state

# Vertical — bigger box
# From t3.medium (2 vCPU , 4GB) → t3.xlarge (4 vCPU , 16GB)

# Horizontal — more boxes
# From 1 instance → 10 instances behind a load balancer

Stateless App Design — The Foundation of Scaling¶

A stateless app doesn't store session data , cache , or any state on the local filesystem or memory

// Bad — stateful (session stored in memory)
app.use(session({
  secret: 'keyboard cat',
  resave: false,
  saveUninitialized: true
}))
// Session dies if this instance goes down
// Next request might hit a different instance — user gets logged out

// Good — stateless (session stored externally)
const RedisStore = require('connect-redis').default
const { createClient } = require('redis')

const redisClient = createClient({ url: process.env.REDIS_URL })

app.use(session({
  store: new RedisStore({ client: redisClient }),
  secret: 'keyboard cat',
  resave: false,
  saveUninitialized: true
}))
// Session survives instance restarts
// Any instance can serve any request

What NOT to store locally: * User sessions — use Redis or external session store * File uploads — use S3 / CDN * Cached data — use Redis / Memcached * Logs — stream to centralized logging (stdout → collector)

What's OK to store locally: * Application code — the actual JS files * Static configuration — loaded at startup , doesn't change per-request * Connection pools — database connections are OK since they point to external databases

Load Balancers — Distributing the Pain¶

flowchart TD
    Client[DNS / Client]
    LB[Load Balancer<br/>nginx / ALB]
    I1[Instance 1<br/>Node]
    I2[Instance 2<br/>Node]
    I3[Instance 3<br/>Node]

    Client --> LB
    LB --> I1
    LB --> I2
    LB --> I3

Nginx as Load Balancer:

upstream backend {
    least_connections;                    # Send to least busy instance
    server backend1:3000 weight=3;        # Can handle more traffic
    server backend2:3000;
    server backend3:3000 backup;          # Only used if others die
}

server {
    listen 80;

    location / {
        proxy_pass http://backend;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;

        # Timeouts
        proxy_connect_timeout 5;
        proxy_read_timeout 30;
        proxy_send_timeout 30;
    }

    location /health {
        proxy_pass http://backend;
        health_check interval=10s fails=3 passes=2;
    }
}

Load balancing algorithms: * Round Robin — distributes evenly regardless of load * Least Connections — sends to the least busy instance * IP Hash — same client always hits same instance (session affinity — avoid this) * Weighted — preferred instances get more traffic

Database Connection Pooling — Your Database Will Thank You¶

Every new connection costs memory and CPU. Connection pools reuse connections so your app doesn't melt

// Bad — new connection per request
app.get('/users', async (req, res) => {
  const pool = new Pool()        // New connection pool every request
  const result = await pool.query('SELECT * FROM users')
  res.json(result.rows)
})

// Good — reused pool
const { Pool } = require('pg')

const pool = new Pool({
  host: process.env.DB_HOST,
  port: parseInt(process.env.DB_PORT, 10),
  database: process.env.DB_NAME,
  user: process.env.DB_USER,
  password: process.env.DB_PASSWORD,
  max: 20,                        // Max 20 connections per instance
  idleTimeoutMillis: 30000,       // Close idle connections after 30s
  connectionTimeoutMillis: 2000   // Fail fast if DB is down
})

app.get('/users', async (req, res) => {
  const client = await pool.connect()
  try {
    const result = await client.query('SELECT * FROM users')
    res.json(result.rows)
  } finally {
    client.release()              // Return to pool
  }
})

Pool sizing rule of thumb:

Pool size = ((core_count * 2) + effective_spindle_count)
For most Node apps: pool size of 10-30 per instance
PGBouncer or similar for connection pooling at the database layer

Caching Layers — Stop Hammering the Database¶

const redis = require('redis')
const client = redis.createClient({ url: process.env.REDIS_URL })
const util = require('util')
const getAsync = util.promisify(client.get).bind(client)

// Cache middleware
async function cacheMiddleware(req, res, next) {
  const key = `cache:${req.originalUrl}`
  const cached = await getAsync(key)

  if (cached) {
    return res.json(JSON.parse(cached))
  }

  // Store original json method
  const originalJson = res.json.bind(res)
  res.json = function(data) {
    // Cache for 60 seconds
    client.setex(key, 60, JSON.stringify(data))
    originalJson(data)
  }

  next()
}

// Apply to expensive endpoints
app.get('/api/products', cacheMiddleware, async (req, res) => {
  const products = await db.query('SELECT * FROM products')
  res.json(products.rows)
})

Cache strategies: * Cache-aside — app checks cache first , falls back to database * Write-through — write to cache AND database simultaneously * Write-behind — write to cache , async write to database * TTL-based — set expiry , stale data auto-purges

CDN for Static Assets — Global Distribution¶

flowchart LR
    Tokyo[User in Tokyo] --> EdgeT[CDN Edge Tokyo]
    EdgeT --> Cached[Serves cached asset]
    London[User in London] --> EdgeL[CDN Edge London]
    EdgeL -->|first request| Origin[Origin Server<br/>us-east-1]

# Nginx with CDN-aware caching
location /static/ {
    expires 365d;
    add_header Cache-Control "public, immutable";

    # CDN headers
    add_header CDN-Cache-Control "public, max-age=31536000";
}

Never serve static assets from your Node process — that's what Nginx + CDN are for

Auto-Scaling Strategies¶

Reactive scaling (metric threshold based): Scale up when CPU > 70% for 5 minutes , scale down when CPU < 30% for 10 minutes

# AWS Auto Scaling config
AutoScalingGroupName: myapp-asg
MinSize: 2
MaxSize: 20
Policies:
  - PolicyName: scale-out
    ScalingAdjustment: 2           # Add 2 instances
    Cooldown: 120                  # Wait 2 min before next action
    Metric:
      Name: CPUUtilization
      Threshold: 70
      Period: 300

  - PolicyName: scale-in
    ScalingAdjustment: -1
    Cooldown: 300
    Metric:
      Name: CPUUtilization
      Threshold: 30
      Period: 600

Predictive scaling (ML-based): AWS forecast traffic based on historical patterns — scales proactively before traffic hits

Scheduled scaling (event-based): Scale up before Black Friday , scale down after — predictable traffic patterns

Scaling Checklist — Before You Need It¶

next → devops_10_monitoring.md