Load Testing - Break It Before Production Does¶
Table of Contents¶
- Load Testing Tools : autocannon , k6 , wrk , ab
- Metrics : RPS , Latency (p50/p95/p99) , Error Rate
- Identifying Bottlenecks : CPU , I/O , Memory , Network
- Setting Up autocannon Test Scripts
- Interpreting Results
- Security : DoS Patterns , Rate Limiting Implications
Load Testing Tools : autocannon , k6 , wrk , ab¶
autocannon - Node-native , scriptable , great for HTTP benchmarking. Use it when you want to script complex scenarios in JavaScript and run them in the same ecosystem as your app
npm install -g autocannon
k6 - the professional choice. Scriptable in JavaScript , excellent reporting , CI/CD integration , large-scale distributed testing. Grafana maintains it and it's the industry standard for serious load testing
wrk - fast C-based HTTP benchmark. Minimal features but insane performance. Good for raw throughput testing without the overhead of scripting
ab (Apache Bench) - the granddaddy of load testing. Pre-installed on many systems , basic functionality. Use it for a quick smell test , not for serious benchmarking
# Quick smell test with ab
ab -n 1000 -c 100 http://localhost:3000/api/users
Metrics : RPS , Latency (p50/p95/p99) , Error Rate¶
Requests Per Second (RPS) - how many requests your server handles per second. Higher is better but the number alone is misleading without latency context
Latency percentiles:
- p50 (median) - halfway point. Half your users experience this or worse
- p95 - 95% of requests are faster than this. The number your boss cares about
- p99 - 99% of requests are faster than this. The number that keeps you up at night
# autocannon output example
| Stat | 2.5% | 50% | 97.5% | 99% | Avg | Stdev | Max (ms) |
|---------|------|------|-------|-------|--------|--------|----------|
| Latency | 12ms | 28ms | 145ms | 210ms | 35.2ms | 28.1ms | 512ms |
| Stat | 1% | 2.5% | 50% | 97.5% | Avg | Stdev | Min |
|---------|------|------|-----|-------|-------|-------|-----|
| Req/Sec | 125 | 150 | 180 | 210 | 178.5 | 22.3 | 120 |
Requests: 10000 total, 10000 success, 0 errors
Duration: 56 seconds
Error rate - percentage of requests that returned non-2xx/3xx status codes or timed out. Any errors at all under normal load means something is wrong
The relationship between concurrency and latency:
- Low concurrency (10-50 connections): tests baseline performance
- Medium concurrency (100-500): reveals queuing and contention
- High concurrency (1000+): finds bottlenecks in connection handling , thread pools , database pools
Identifying Bottlenecks : CPU , I/O , Memory , Network¶
CPU bottleneck - the process is compute-bound. Flamegraphs show wide bars. The fix is algorithmic optimization or offloading to worker threads
# Watch CPU during load test
top -p $(pgrep -d',' -f "node")
# or
htop
I/O bottleneck - the process spends most of its time waiting (database queries , file reads , API calls). Event loop lag increases but CPU stays low
const fs = require('fs')
// Blocking I/O in a load test scenario:
const data = fs.readFileSync('/large/file.json') // BLOCKS THE EVENT LOOP
Memory bottleneck - heap grows during load , GC pressure increases , eventually OOM. Watch the heap snapshot comparison
# Watch memory
node -e "setInterval(() => console.log(process.memoryUsage().heapUsed), 1000)"
Network bottleneck - bandwidth saturation , connection limits , TLS handshake overhead. netstat or ss show connection states
# Check connection states
ss -s
# Total: 1234 (estab 890, closed 200, orphaned 0, synrecv 0, timewait 144)
Setting Up autocannon Test Scripts¶
const autocannon = require('autocannon')
const { PassThrough } = require('stream')
async function runLoadTest() {
const url = 'http://localhost:3000'
const results = []
// Test various endpoints
const endpoints = [
{ method: 'GET', path: '/api/users' },
{ method: 'GET', path: '/api/users/1' },
{ method: 'POST', path: '/api/login', body: { email: 'test@test.com', password: 'pass123' } },
{ method: 'GET', path: '/api/orders?page=1&limit=50' },
]
for (const endpoint of endpoints) {
const instance = autocannon({
url: `${url}${endpoint.path}`,
connections: 100,
duration: 30,
pipelining: 1,
method: endpoint.method,
body: endpoint.body ? JSON.stringify(endpoint.body) : undefined,
headers: {
'content-type': 'application/json',
...(endpoint.requiresAuth ? { authorization: 'Bearer test-token' } : {}),
},
// Warm up for 5 seconds
setupClient: (client) => {
client.setBody('')
},
})
// Track progress
autocannon.track(instance, { renderProgressBar: true })
// Collect results
instance.on('done', (result) => {
results.push({
endpoint: endpoint.path,
method: endpoint.method,
latency: {
avg: result.latency.average,
p50: result.latency.p50,
p95: result.latency.p95,
p99: result.latency.p99,
max: result.latency.max,
},
requests: {
total: result.requests.total,
average: result.requests.average,
},
errors: result.errors,
timeouts: result.timeouts,
})
})
}
return results
}
// Execute
runLoadTest().then((results) => {
console.log('\n=== LOAD TEST RESULTS ===\n')
results.forEach((r) => {
console.log(`\n${r.method} ${r.endpoint}`)
console.log(` Requests/sec: ${r.requests.average.toFixed(0)}`)
console.log(` Latency (p50/p95/p99): ${r.latency.p50}ms / ${r.latency.p95}ms / ${r.latency.p99}ms`)
console.log(` Max latency: ${r.latency.max}ms`)
console.log(` Errors: ${r.errors}, Timeouts: ${r.timeouts}`)
})
})
Test structured ramp-up to find the breaking point:
const autocannon = require('autocannon')
async function findBreakingPoint() {
const url = 'http://localhost:3000'
for (const connections of [10, 50, 100, 200, 500, 1000]) {
console.log(`\n--- Testing with ${connections} concurrent connections ---`)
const result = await autocannon({
url,
connections,
duration: 15,
})
console.log(`RPS: ${result.requests.average.toFixed(0)}`)
console.log(`p50: ${result.latency.p50}ms | p95: ${result.latency.p95}ms | p99: ${result.latency.p99}ms`)
console.log(`Errors: ${result.errors} | Timeouts: ${result.timeouts}`)
if (result.errors > 0 || result.timeouts > 0) {
console.log(`\n!!! Breaking point found at ${connections} connections !!!`)
break
}
}
}
findBreakingPoint()
Interpreting Results¶
Good signs:
- Latency stays flat as concurrency increases (linear scaling)
- Error rate stays at zero
- RPS scales proportionally with connections
Bad signs:
- Latency jumps suddenly at a specific concurrency level (queue backing up)
- Errors appear as load increases (connection pool exhaustion , database saturation)
- RPS flatlines or drops after a certain point (bottleneck hit)
- High p99 but low p50 (some requests are getting stuck - maybe GC pauses or lock contention)
# Good - scales linearly
Connections: 10 RPS: 450 p99: 25ms
Connections: 100 RPS: 4200 p99: 32ms
Connections: 500 RPS: 18500 p99: 48ms
# Bad - hits a wall
Connections: 10 RPS: 450 p99: 25ms
Connections: 100 RPS: 4100 p99: 35ms
Connections: 200 RPS: 4500 p99: 120ms <-- wall at 200 connections
Connections: 500 RPS: 4800 p99: 890ms <-- queue backing up
What the wall usually means:
- Database connection pool exhausted (check pool size)
- Thread pool exhausted (crypto , compression , filesystem)
- Event loop blocked (slow synchronous code)
- Network bandwidth saturated (less likely on localhost)
Security : DoS Patterns , Rate Limiting Implications¶
Load testing reveals security characteristics you can't see in unit tests
Rate limiting under load: If your rate limiter uses in-memory state , it breaks in clustered apps because each worker has its own state
// BAD - per-process rate limiting breaks under cluster
const rateLimit = require('express-rate-limit')
const limiter = rateLimit({
windowMs: 60000,
max: 100,
// Each worker tracks independently - user can send 400 req/min with 4 workers
})
// GOOD - use shared state (Redis)
const rateLimit = require('express-rate-limit')
const RedisStore = require('rate-limit-redis')
const Redis = require('ioredis')
const limiter = rateLimit({
store: new RedisStore({
sendCommand: (...args) => client.call(...args),
}),
windowMs: 60000,
max: 100,
})
DoS pattern detection during load tests:
- Endpoints without auth that consume significant resources (search , report generation)
- Routes that trigger expensive database operations (JOINs across large tables)
- Unbounded query parameters that could cause out-of-memory
- File upload endpoints without size limits
// Identify DoS-vulnerable patterns during load testing
app.get('/api/search', async (req, res) => {
const { q, limit } = req.query
// VULNERABLE under load - no limit on search results
const results = await db.query(
`SELECT * FROM products WHERE name ILIKE '%${q}%'`
// SQL injection AND unbounded results!
)
// FIX:
const results = await db.query(
`SELECT * FROM products WHERE name ILIKE $1 LIMIT 100`,
[`%${q}%`]
)
})
What your load test should also test:
- Slowloris-style attacks (slow headers , slow body)
- Connection exhaustion (open connections and never close them)
- Payload size attacks (send max-size bodies repeatedly)
- Concurrent authentication attempts (account lockout behavior)
// Test rate limiting behavior under load
test('rate limiter blocks after threshold', async () => {
const promises = []
// Send 101 requests - the 101st should be blocked
for (let i = 0; i < 101; i++) {
promises.push(
fetch('http://localhost:3000/api/login', {
method: 'POST',
body: JSON.stringify({ email: 'test@test.com', password: 'test' }),
headers: { 'content-type': 'application/json' },
})
)
}
const responses = await Promise.all(promises)
const statusCodes = responses.map(r => r.status)
const tooManyRequestsCount = statusCodes.filter(s => s === 429).length
// Assert that at least some requests were rate-limited
expect(tooManyRequestsCount).toBeGreaterThan(0)
})
prerequisites¶
perf_02_cluster.md - clustering , PM2 , nginx load balancing
next -> No next section - this is the end of the Testing & Debugging series