Scaling Backend Apps¶
The worst time to learn about scaling is when your app is on fire because some product launch brought in 10x traffic and your single Express process is crying in the corner Scale isn't a feature you add — it's a constraint you design for from day one. Your architecture either scales or it doesn't , and retrofitting is expensive as hell
Horizontal vs Vertical Scaling¶
Vertical Scaling (scale up): Add more power to the same machine — more RAM , faster CPU , bigger SSD
- Pros: No code changes , simple , works for stateful apps
- Cons: Hard limit (biggest cloud instance available) , downtime during upgrade , single point of failure , cost grows exponentially past a point
- When to use: Legacy apps you can't refactor , stateful services you haven't decoupled yet , databases (for now)
Horizontal Scaling (scale out): Add more machines behind a load balancer
- Pros: Near-limitless scale , failover (one dies , others keep going) , cost grows linearly , can use cheaper instances
- Cons: Requires stateless app design , adds network complexity , distributed systems problems (eventual consistency , coordination)
- When to use: Web servers , APIs , stateless microservices — anything that doesn't store local state
# Vertical — bigger box
# From t3.medium (2 vCPU , 4GB) → t3.xlarge (4 vCPU , 16GB)
# Horizontal — more boxes
# From 1 instance → 10 instances behind a load balancer
Stateless App Design — The Foundation of Scaling¶
A stateless app doesn't store session data , cache , or any state on the local filesystem or memory
// Bad — stateful (session stored in memory)
app.use(session({
secret: 'keyboard cat',
resave: false,
saveUninitialized: true
}))
// Session dies if this instance goes down
// Next request might hit a different instance — user gets logged out
// Good — stateless (session stored externally)
const RedisStore = require('connect-redis').default
const { createClient } = require('redis')
const redisClient = createClient({ url: process.env.REDIS_URL })
app.use(session({
store: new RedisStore({ client: redisClient }),
secret: 'keyboard cat',
resave: false,
saveUninitialized: true
}))
// Session survives instance restarts
// Any instance can serve any request
What NOT to store locally: * User sessions — use Redis or external session store * File uploads — use S3 / CDN * Cached data — use Redis / Memcached * Logs — stream to centralized logging (stdout → collector)
What's OK to store locally: * Application code — the actual JS files * Static configuration — loaded at startup , doesn't change per-request * Connection pools — database connections are OK since they point to external databases
Load Balancers — Distributing the Pain¶
flowchart TD
Client[DNS / Client]
LB[Load Balancer<br/>nginx / ALB]
I1[Instance 1<br/>Node]
I2[Instance 2<br/>Node]
I3[Instance 3<br/>Node]
Client --> LB
LB --> I1
LB --> I2
LB --> I3 Nginx as Load Balancer:
upstream backend {
least_connections; # Send to least busy instance
server backend1:3000 weight=3; # Can handle more traffic
server backend2:3000;
server backend3:3000 backup; # Only used if others die
}
server {
listen 80;
location / {
proxy_pass http://backend;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
# Timeouts
proxy_connect_timeout 5;
proxy_read_timeout 30;
proxy_send_timeout 30;
}
location /health {
proxy_pass http://backend;
health_check interval=10s fails=3 passes=2;
}
}
Load balancing algorithms: * Round Robin — distributes evenly regardless of load * Least Connections — sends to the least busy instance * IP Hash — same client always hits same instance (session affinity — avoid this) * Weighted — preferred instances get more traffic
Database Connection Pooling — Your Database Will Thank You¶
Every new connection costs memory and CPU. Connection pools reuse connections so your app doesn't melt
// Bad — new connection per request
app.get('/users', async (req, res) => {
const pool = new Pool() // New connection pool every request
const result = await pool.query('SELECT * FROM users')
res.json(result.rows)
})
// Good — reused pool
const { Pool } = require('pg')
const pool = new Pool({
host: process.env.DB_HOST,
port: parseInt(process.env.DB_PORT, 10),
database: process.env.DB_NAME,
user: process.env.DB_USER,
password: process.env.DB_PASSWORD,
max: 20, // Max 20 connections per instance
idleTimeoutMillis: 30000, // Close idle connections after 30s
connectionTimeoutMillis: 2000 // Fail fast if DB is down
})
app.get('/users', async (req, res) => {
const client = await pool.connect()
try {
const result = await client.query('SELECT * FROM users')
res.json(result.rows)
} finally {
client.release() // Return to pool
}
})
Pool sizing rule of thumb:
Pool size = ((core_count * 2) + effective_spindle_count)
For most Node apps: pool size of 10-30 per instance
PGBouncer or similar for connection pooling at the database layer
Caching Layers — Stop Hammering the Database¶
const redis = require('redis')
const client = redis.createClient({ url: process.env.REDIS_URL })
const util = require('util')
const getAsync = util.promisify(client.get).bind(client)
// Cache middleware
async function cacheMiddleware(req, res, next) {
const key = `cache:${req.originalUrl}`
const cached = await getAsync(key)
if (cached) {
return res.json(JSON.parse(cached))
}
// Store original json method
const originalJson = res.json.bind(res)
res.json = function(data) {
// Cache for 60 seconds
client.setex(key, 60, JSON.stringify(data))
originalJson(data)
}
next()
}
// Apply to expensive endpoints
app.get('/api/products', cacheMiddleware, async (req, res) => {
const products = await db.query('SELECT * FROM products')
res.json(products.rows)
})
Cache strategies: * Cache-aside — app checks cache first , falls back to database * Write-through — write to cache AND database simultaneously * Write-behind — write to cache , async write to database * TTL-based — set expiry , stale data auto-purges
CDN for Static Assets — Global Distribution¶
flowchart LR
Tokyo[User in Tokyo] --> EdgeT[CDN Edge Tokyo]
EdgeT --> Cached[Serves cached asset]
London[User in London] --> EdgeL[CDN Edge London]
EdgeL -->|first request| Origin[Origin Server<br/>us-east-1] # Nginx with CDN-aware caching
location /static/ {
expires 365d;
add_header Cache-Control "public, immutable";
# CDN headers
add_header CDN-Cache-Control "public, max-age=31536000";
}
Never serve static assets from your Node process — that's what Nginx + CDN are for
Auto-Scaling Strategies¶
Reactive scaling (metric threshold based): Scale up when CPU > 70% for 5 minutes , scale down when CPU < 30% for 10 minutes
# AWS Auto Scaling config
AutoScalingGroupName: myapp-asg
MinSize: 2
MaxSize: 20
Policies:
- PolicyName: scale-out
ScalingAdjustment: 2 # Add 2 instances
Cooldown: 120 # Wait 2 min before next action
Metric:
Name: CPUUtilization
Threshold: 70
Period: 300
- PolicyName: scale-in
ScalingAdjustment: -1
Cooldown: 300
Metric:
Name: CPUUtilization
Threshold: 30
Period: 600
Predictive scaling (ML-based): AWS forecast traffic based on historical patterns — scales proactively before traffic hits
Scheduled scaling (event-based): Scale up before Black Friday , scale down after — predictable traffic patterns
Scaling Checklist — Before You Need It¶
- App is stateless (sessions , cache , files stored externally)
- Database has connection pooling configured
- Static assets served via CDN
- Background jobs handled by message queue (not in-process)
- Health checks implemented for load balancer
- Graceful shutdown handles in-flight requests
- Rate limiting implemented before traffic spikes
- Caching layer in front of expensive queries
- Database indexes optimized for query patterns
- You've load-tested with realistic traffic patterns
next → devops_10_monitoring.md