Skip to content

Clustering and Scaling - One Thread Is Never Enough

Table of Contents


Node Single-Thread Limitation

Node runs JavaScript on a single thread. That thread handles incoming requests , runs your code , and manages the event loop. If one request takes 500ms of CPU time , every other request waits. That's not a bug - it's the architecture. The event loop is a singular responsibility

You get parallelism from child processes (cluster module) or worker threads (CPU-heavy tasks). The operating system distributes incoming connections across your processes

cluster Module : What It Does , When to Use

The cluster module forks the main process into multiple worker processes. Each worker runs its own event loop , has its own memory space , and handles requests independently

const cluster = require('node:cluster')
const http = require('node:http')
const os = require('node:os')

const numCPUs = os.cpus().length

if (cluster.isPrimary) {
  console.log(`Primary ${process.pid} spawning ${numCPUs} workers`)

  // Fork workers - one per CPU core
  for (let i = 0; i < numCPUs; i++) {
    cluster.fork()
  }

  // Handle worker crashes
  cluster.on('exit', (worker, code, signal) => {
    if (code !== 0) {
      console.log(`Worker ${worker.process.pid} died (code: ${code}). Restarting...`)
      cluster.fork()
    }
  })
} else {
  // Workers share the same port
  http.createServer((req, res) => {
    res.writeHead(200)
    res.end(`Handled by worker ${process.pid}\n`)
  }).listen(8000)

  console.log(`Worker ${process.pid} started`)
}

When to cluster: CPU-bound operations , high-traffic APIs , any time one CPU core can't handle the load

When NOT to cluster: If your app is I/O-bound (database queries , API calls) and not saturated. The single thread might be fine. Measure first , cluster second

Load Balancing : Round-Robin vs Custom

By default , the primary process listens on the port and distributes connections to workers using round-robin (on Linux/Mac)

Round-robin - each new connection goes to the next worker in sequence. Simple , fair , but doesn't account for worker load

Custom scheduling - access the underlying handle and implement your own distribution:

const cluster = require('node:cluster')

// Force round-robin on Windows (where it's not default)
cluster.schedulingPolicy = cluster.SCHED_RR

// Or set via env variable:
// export NODE_CLUSTER_SCHED_POLICY=rr

The problem with sticky sessions: WebSocket connections must stay on the same worker after the initial handshake. If you're using WebSockets or session affinity , you need a custom approach

// Sticky session balancing (simplified)
const cluster = require('node:cluster')
const hash = require('hash-sum')

if (cluster.isPrimary) {
  const workers = []

  for (let i = 0; i < os.cpus().length; i++) {
    workers.push(cluster.fork())
  }

  // Custom TCP balancer with sticky sessions
  const server = require('net').createServer({ pauseOnConnect: true }, (socket) => {
    // Read the first packet to determine which worker should handle it
    socket.once('data', (data) => {
      // Hash the client IP for sticky session
      const clientIP = socket.remoteAddress
      const workerIndex = hash(clientIP) % workers.length
      workers[workerIndex].send('connection', socket)
    })
  })

  server.listen(8000)
}

Honestly though? Just use nginx for sticky sessions. It's more battle-tested than custom cluster routing

processenvNODE_APP_INSTANCE

When cluster.fork() creates workers , each gets a unique NODE_APP_INSTANCE environment variable (index starting from 0)

const cluster = require('node:cluster')

if (cluster.isPrimary) {
  for (let i = 0; i < 4; i++) {
    const worker = cluster.fork({ NODE_APP_INSTANCE: i })

    worker.on('message', (msg) => {
      console.log(`Message from worker ${i}:`, msg)
    })
  }
} else {
  const instanceId = process.env.NODE_APP_INSTANCE

  // Each worker handles different data
  const shard = parseInt(instanceId) + 1

  // Worker 0 handles users 1-25 , worker 1 handles 26-50 , etc
  const userRange = {
    start: shard * 25 - 24,
    end: shard * 25,
  }

  process.send({ pid: process.pid, instanceId, userRange })

  const app = require('express')()
  app.get('/users', (req, res) => {
    res.json({ shard: instanceId, range: userRange })
  })
  app.listen(8000)
}

Use NODE_APP_INSTANCE for sharding data , assigning unique per-worker resources , or coordinating timed jobs among workers

PM2 for Process Management in Production

PM2 is the production process manager. It handles clustering , restarts , logging , and monitoring

npm install -g pm2

# Start with 4 instances (or max CPUs)
pm2 start app.js -i max

# Start with specific instance count
pm2 start app.js -i 4

# Name the process
pm2 start app.js -i max --name "my-api"

PM2 ecosystem.config.js:

module.exports = {
  apps: [{
    name: 'api-server',
    script: './src/index.js',
    instances: 'max',
    exec_mode: 'cluster',
    env: {
      NODE_ENV: 'production',
      PORT: 3000,
    },
    env_development: {
      NODE_ENV: 'development',
    },
    max_memory_restart: '500M',
    log_date_format: 'YYYY-MM-DD HH:mm:ss Z',
    error_file: './logs/error.log',
    out_file: './logs/output.log',
    merge_logs: true,
    autorestart: true,
    watch: false,
    max_restarts: 10,
    restart_delay: 1000,
  }],
}
# Start with config
pm2 start ecosystem.config.js

# Save process list for auto-restart on reboot
pm2 save
pm2 startup

# Monitoring
pm2 monit
pm2 status
pm2 logs
pm2 show api-server

# Graceful reload (zero downtime)
pm2 reload all

# Scale up/down without restart
pm2 scale api-server 8

PM2 handles worker crashes , log rotation , and memory-based restarts. Don't write your own clustering script for production - use PM2 or a container orchestration platform like Kubernetes

OS Process Distribution : nginx + Multiple Node Processes

In production you'll often run nginx in front of multiple Node processes (or containers). nginx handles TLS termination , static file serving , and load balancing while each Node process focuses on application logic

# /etc/nginx/sites-available/api
upstream node_backend {
    # round-robin (default)
    server 127.0.0.1:3001;
    server 127.0.0.1:3002;
    server 127.0.0.1:3003;
    server 127.0.0.1:3004;

    # For sticky sessions (WebSocket support)
    ip_hash;
}

server {
    listen 443 ssl http2;
    server_name api.example.com;

    ssl_certificate /etc/ssl/certs/example.crt;
    ssl_certificate_key /etc/ssl/private/example.key;

    # Proxy API requests to Node
    location /api/ {
        proxy_pass http://node_backend;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
    }

    # WebSocket support
    location /ws/ {
        proxy_pass http://node_backend;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
        proxy_set_header Host $host;
        proxy_read_timeout 86400;
    }

    # Serve static files directly (zero Node involvement)
    location /static/ {
        root /var/www/myapp;
        expires 30d;
        add_header Cache-Control "public, immutable";
    }

    # Rate limiting
    location /api/auth/ {
        limit_req zone=login burst=5 nodelay;
        proxy_pass http://node_backend;
    }
}
# Run each Node process separately
node src/index.js --port=3001 &
node src/index.js --port=3002 &
node src/index.js --port=3003 &
node src/index.js --port=3004 &

Or with PM2 for each process:

pm2 start src/index.js --name "node-1" -- --port=3001
pm2 start src/index.js --name "node-2" -- --port=3002
pm2 start src/index.js --name "node-3" -- --port=3003
pm2 start src/index.js --name "node-4" -- --port=3004

Security : Shared Port Binding Requires Root , CAP_NET_BIND_SERVICE

Only root can bind to ports below 1024 (privileged ports). If you want your Node process on port 80 or 443 without root , you need capability bits

# Give node the ability to bind to privileged ports
sudo setcap cap_net_bind_service=+ep /usr/bin/node

# Verify
getcap /usr/bin/node
# /usr/bin/node = cap_net_bind_service+ep

Alternative - run nginx on 80/443 , proxy to Node on high ports (3000+):

Safer because nginx has a smaller attack surface than your Node app. If someone exploits a vulnerability in your Express routes , they don't have root access by default

server {
    listen 80;
    location / {
        proxy_pass http://127.0.0.1:3000;
    }
}

Security hardening for clustered Node apps:

  • Never run clusters as root (drop privileges after binding)
  • Use process.setgid() and process.setuid() after binding ports below 1024
  • Set worker.rlimits.rss and worker.rlimits.maxfiles in PM2 to limit resource usage per worker
  • Restrict inter-process communication - workers shouldn't share secrets
  • Implement health checks that kill unresponsive workers

prerequisites

perf_01_profiling.md - profiling , flamegraphs , memory analysis

next -> perf_03_load_testing.md