Profiling and Optimization - Make It Fast , Not Just Correct¶
Table of Contents¶
- Why Profile : Latency , Throughput , Memory
- Flamegraphs with 0x or clinicjs
- --prof and --prof-process
- Chrome DevTools Memory and Performance Tabs
- Memory Leak Patterns : Closures , Event Listeners , Caches
- Monitoring : processmemoryUsage , processcpuUsage
Why Profile : Latency , Throughput , Memory¶
You don't know what's slow until you measure it. Gut feelings about performance are almost always wrong - the slow part is never where you think it is
Latency - how long each operation takes. High p99 means users are waiting. Profile to find the bottlenecks
Throughput - how many operations per second. Low throughput means you're leaving hardware on the table
Memory - memory grows until the process crashes. Find what's holding references and why
The rule: profile first , optimize second. Never guess what's slow
Flamegraphs with 0x or clinicjs¶
Flamegraphs are the gold standard for CPU profiling. The x-axis is stack frequency (wider = more CPU time), the y-axis is call depth
0x - single-command flamegraphs:
npm install -g 0x
# profile your app
0x app.js
# generate load, then Ctrl+C
# opens flamegraph.html automatically
clinic.js - more structured with multiple profiling tools:
npm install -g clinic
# Doctor - high-level health check
clinic doctor -- node app.js
# Flame - CPU flamegraph
clinic flame -- node app.js
# Bubbleprof - async latency visualization
clinic bubbleprof -- node app.js
# Heap profiler
clinic heapprofiler -- node app.js
Each tool generates an HTML report in the current directory. Open it in a browser and look for:
- Wide bars at the top of the flamegraph (hot functions)
- Deep call stacks that could be flattened
- Functions consuming disproportionate CPU time
// If you see this function wide in the flamegraph, you found your bottleneck
function parseLogFile(lines) {
return lines.map(line => {
const parts = line.split(',')
return {
timestamp: parseInt(parts[0]),
level: parts[1],
message: parts.slice(2).join(','),
metadata: JSON.parse(parts[3] || '{}'),
}
})
}
--prof and --prof-process¶
The built-in V8 profiler works without any npm packages. It's always available and doesn't need installation
# Start profiling
node --prof app.js
# Let it run under load, then kill the process
# V8 writes a file: isolate-<pid>-<v8-version>-sample-<timestamp>.log
# Process the raw log into readable output
node --prof-process isolate-*.log > processed-profile.txt
Output looks like:
Statistical profiling result from isolate-*.log, (1234 ticks, 100ms interval)
[JavaScript]:
ticks total nonlib name
342 27.7% 35.2% Function: validateSchema /app/node_modules/ajv/dist/ajv.js:1:234
156 12.6% 16.1% Function: parseRequest /app/lib/parser.js:45:12
89 7.2% 9.2% Function: stringify /app/node_modules/fast-json-stringify/index.js:1:567
67 5.4% 6.9% Function: queryDatabase /app/lib/db.js:89:34
[C++]:
ticks total nonlib name
123 10.0% 12.7% v8::internal::JsonParser<v8::internal::JsonParser<...>>
[Summary]:
ticks total nonlib name
1234 100.0% 100.0% Total
What to look for:
- Functions consuming > 10% of ticks - those are your optimization targets
- JSON parsing showing up in both JavaScript and C++ ticks - consider faster serialization (fast-json-stringify , schema compilation)
- Garbage collection in C++ section - excessive GC means memory pressure
Chrome DevTools Memory and Performance Tabs¶
The DevTools profiler we covered in debugging also works for performance analysis
Performance tab workflow:
- Start with
node --inspect-brk app.js - Open
chrome://inspectand connect - Go to Performance tab
- Click Record , generate load , stop recording
- Analyze the flamechart , summary , and call tree
Memory tab for heap snapshots:
- Go to Memory tab
- Select "Heap snapshot"
- Take snapshot , generate load , take another snapshot
- Compare snapshots to find what grew between them
Allocation instrumentation timeline:
Shows where objects are allocated over time. Filter by type to see which code path creates the most garbage
// Use Chrome DevTools to confirm if this pattern causes allocation pressure
function handleRequest(req, res) {
// Each request creates a new closure - intentional but visible in the timeline
const start = Date.now()
const data = processData(req.body)
res.json({
data,
processingTime: Date.now() - start,
})
}
Memory Leak Patterns : Closures , Event Listeners , Caches¶
Memory leaks are the silent killers of Node.js production deployments
Pattern 1 - Accidental closure capture:
// LEAKY
function createHandlers(db) {
const handlers = []
for (const table of ['users', 'orders', 'products']) {
handlers.push(function() {
return db.query(`SELECT * FROM ${table}`)
// Each handler closes over 'table' - fine
// But if 'db' is large, it's captured in every handler
})
}
return handlers
}
Pattern 2 - Event listeners never removed:
class Monitor {
start() {
// LEAKY - attaching listeners but never removing them
process.on('data', this.handleData)
// Every call to start() adds another listener
// The old listeners keep the Monitor instance alive
}
handleData(data) {
this.lastData = data
}
}
Pattern 3 - Unbounded caches:
const cache = new Map()
function getCachedData(key, fetchFn) {
// LEAKY - caches grow forever
if (cache.has(key)) return cache.get(key)
const data = fetchFn(key)
cache.set(key, data)
return data
}
// FIX - add size limits
const { LRUCache } = require('lru-cache')
const cache = new LRUCache({ max: 500, ttl: 1000 * 60 * 5 })
Pattern 4 - Timers keeping references:
// LEAKY
function startPolling() {
setInterval(async () => {
const data = await fetch('/api/data')
this.cache = data // 'this' is captured forever
}, 1000)
}
Detecting leaks in production:
const heapdump = require('heapdump')
// Take snapshots on a schedule in production
setInterval(() => {
const usage = process.memoryUsage()
console.log({
rss: `${(usage.rss / 1024 / 1024).toFixed(1)} MB`,
heapTotal: `${(usage.heapTotal / 1024 / 1024).toFixed(1)} MB`,
heapUsed: `${(usage.heapUsed / 1024 / 1024).toFixed(1)} MB`,
external: `${(usage.external / 1024 / 1024).toFixed(1)} MB`,
})
if (usage.heapUsed > 500 * 1024 * 1024) {
// 500MB threshold - dump heap and analyze
heapdump.writeSnapshot(`/tmp/heap-${Date.now()}.heapsnapshot`)
}
}, 60000)
Monitoring : processmemoryUsage , processcpuUsage¶
Node provides real-time metrics without external tools
function printMetrics() {
const mem = process.memoryUsage()
const cpu = process.cpuUsage()
console.log({
// Resident Set Size - total memory assigned to process
rss: `${(mem.rss / 1024 / 1024).toFixed(2)} MB`,
// V8 heap
heapTotal: `${(mem.heapTotal / 1024 / 1024).toFixed(2)} MB`,
heapUsed: `${(mem.heapUsed / 1024 / 1024).toFixed(2)} MB`,
// C++ objects outside V8 heap (buffers, typedarrays)
external: `${(mem.external / 1024 / 1024).toFixed(2)} MB`,
// CPU time in microseconds
userCPUSeconds: (cpu.user / 1000000).toFixed(2),
systemCPUSeconds: (cpu.system / 1000000).toFixed(2),
})
// Event loop lag
const start = Date.now()
setImmediate(() => {
const lag = Date.now() - start
if (lag > 50) {
console.warn(`Event loop lag detected: ${lag}ms`)
}
})
}
setInterval(printMetrics, 30000)
Uptime and event loop health endpoint:
app.get('/health', (req, res) => {
const mem = process.memoryUsage()
res.json({
status: 'ok',
uptime: process.uptime(),
memory: {
rss: Math.round(mem.rss / 1024 / 1024),
heapUsed: Math.round(mem.heapUsed / 1024 / 1024),
heapTotal: Math.round(mem.heapTotal / 1024 / 1024),
},
cpuLoad: os.loadavg(),
pid: process.pid,
})
})
prerequisites¶
test_05_debugging.md - debugging , memory analysis , heap snapshots
next -> perf_02_cluster.md