Profiling and Optimization - Make It Fast , Not Just Correct¶

Table of Contents¶

Why Profile : Latency , Throughput , Memory
Flamegraphs with 0x or clinicjs
--prof and --prof-process
Chrome DevTools Memory and Performance Tabs
Memory Leak Patterns : Closures , Event Listeners , Caches
Monitoring : processmemoryUsage , processcpuUsage

Why Profile : Latency , Throughput , Memory¶

You don't know what's slow until you measure it. Gut feelings about performance are almost always wrong - the slow part is never where you think it is

Latency - how long each operation takes. High p99 means users are waiting. Profile to find the bottlenecks

Throughput - how many operations per second. Low throughput means you're leaving hardware on the table

Memory - memory grows until the process crashes. Find what's holding references and why

The rule: profile first , optimize second. Never guess what's slow

Flamegraphs with 0x or clinicjs¶

Flamegraphs are the gold standard for CPU profiling. The x-axis is stack frequency (wider = more CPU time), the y-axis is call depth

0x - single-command flamegraphs:

npm install -g 0x

# profile your app
0x app.js

# generate load, then Ctrl+C
# opens flamegraph.html automatically

clinic.js - more structured with multiple profiling tools:

npm install -g clinic

# Doctor - high-level health check
clinic doctor -- node app.js

# Flame - CPU flamegraph
clinic flame -- node app.js

# Bubbleprof - async latency visualization
clinic bubbleprof -- node app.js

# Heap profiler
clinic heapprofiler -- node app.js

Each tool generates an HTML report in the current directory. Open it in a browser and look for:

Wide bars at the top of the flamegraph (hot functions)
Deep call stacks that could be flattened
Functions consuming disproportionate CPU time

// If you see this function wide in the flamegraph, you found your bottleneck
function parseLogFile(lines) {
  return lines.map(line => {
    const parts = line.split(',')
    return {
      timestamp: parseInt(parts[0]),
      level: parts[1],
      message: parts.slice(2).join(','),
      metadata: JSON.parse(parts[3] || '{}'),
    }
  })
}

--prof and --prof-process¶

The built-in V8 profiler works without any npm packages. It's always available and doesn't need installation

# Start profiling
node --prof app.js

# Let it run under load, then kill the process
# V8 writes a file: isolate-<pid>-<v8-version>-sample-<timestamp>.log

# Process the raw log into readable output
node --prof-process isolate-*.log > processed-profile.txt

Output looks like:

Statistical profiling result from isolate-*.log, (1234 ticks, 100ms interval)

 [JavaScript]:
   ticks  total  nonlib   name
    342   27.7%   35.2%  Function: validateSchema /app/node_modules/ajv/dist/ajv.js:1:234
    156   12.6%   16.1%  Function: parseRequest /app/lib/parser.js:45:12
     89    7.2%    9.2%  Function: stringify /app/node_modules/fast-json-stringify/index.js:1:567
     67    5.4%    6.9%  Function: queryDatabase /app/lib/db.js:89:34

 [C++]:
   ticks  total  nonlib   name
    123   10.0%   12.7%  v8::internal::JsonParser<v8::internal::JsonParser<...>>

 [Summary]:
   ticks  total  nonlib   name
   1234  100.0%  100.0%  Total

What to look for:

Functions consuming > 10% of ticks - those are your optimization targets
JSON parsing showing up in both JavaScript and C++ ticks - consider faster serialization (fast-json-stringify , schema compilation)
Garbage collection in C++ section - excessive GC means memory pressure

Chrome DevTools Memory and Performance Tabs¶

The DevTools profiler we covered in debugging also works for performance analysis

Performance tab workflow:

Start with node --inspect-brk app.js
Open chrome://inspect and connect
Go to Performance tab
Click Record , generate load , stop recording
Analyze the flamechart , summary , and call tree

Memory tab for heap snapshots:

Go to Memory tab
Select "Heap snapshot"
Take snapshot , generate load , take another snapshot
Compare snapshots to find what grew between them

Allocation instrumentation timeline:

Shows where objects are allocated over time. Filter by type to see which code path creates the most garbage

// Use Chrome DevTools to confirm if this pattern causes allocation pressure
function handleRequest(req, res) {
  // Each request creates a new closure - intentional but visible in the timeline
  const start = Date.now()
  const data = processData(req.body)

  res.json({
    data,
    processingTime: Date.now() - start,
  })
}

Memory Leak Patterns : Closures , Event Listeners , Caches¶

Memory leaks are the silent killers of Node.js production deployments

Pattern 1 - Accidental closure capture:

// LEAKY
function createHandlers(db) {
  const handlers = []

  for (const table of ['users', 'orders', 'products']) {
    handlers.push(function() {
      return db.query(`SELECT * FROM ${table}`)
      // Each handler closes over 'table' - fine
      // But if 'db' is large, it's captured in every handler
    })
  }

  return handlers
}

Pattern 2 - Event listeners never removed:

class Monitor {
  start() {
    // LEAKY - attaching listeners but never removing them
    process.on('data', this.handleData)

    // Every call to start() adds another listener
    // The old listeners keep the Monitor instance alive
  }

  handleData(data) {
    this.lastData = data
  }
}

Pattern 3 - Unbounded caches:

const cache = new Map()

function getCachedData(key, fetchFn) {
  // LEAKY - caches grow forever
  if (cache.has(key)) return cache.get(key)

  const data = fetchFn(key)
  cache.set(key, data)
  return data
}

// FIX - add size limits
const { LRUCache } = require('lru-cache')
const cache = new LRUCache({ max: 500, ttl: 1000 * 60 * 5 })

Pattern 4 - Timers keeping references:

// LEAKY
function startPolling() {
  setInterval(async () => {
    const data = await fetch('/api/data')
    this.cache = data // 'this' is captured forever
  }, 1000)
}

Detecting leaks in production:

const heapdump = require('heapdump')

// Take snapshots on a schedule in production
setInterval(() => {
  const usage = process.memoryUsage()
  console.log({
    rss: `${(usage.rss / 1024 / 1024).toFixed(1)} MB`,
    heapTotal: `${(usage.heapTotal / 1024 / 1024).toFixed(1)} MB`,
    heapUsed: `${(usage.heapUsed / 1024 / 1024).toFixed(1)} MB`,
    external: `${(usage.external / 1024 / 1024).toFixed(1)} MB`,
  })

  if (usage.heapUsed > 500 * 1024 * 1024) {
    // 500MB threshold - dump heap and analyze
    heapdump.writeSnapshot(`/tmp/heap-${Date.now()}.heapsnapshot`)
  }
}, 60000)

Monitoring : processmemoryUsage , processcpuUsage¶

Node provides real-time metrics without external tools

function printMetrics() {
  const mem = process.memoryUsage()
  const cpu = process.cpuUsage()

  console.log({
    // Resident Set Size - total memory assigned to process
    rss: `${(mem.rss / 1024 / 1024).toFixed(2)} MB`,

    // V8 heap
    heapTotal: `${(mem.heapTotal / 1024 / 1024).toFixed(2)} MB`,
    heapUsed: `${(mem.heapUsed / 1024 / 1024).toFixed(2)} MB`,

    // C++ objects outside V8 heap (buffers, typedarrays)
    external: `${(mem.external / 1024 / 1024).toFixed(2)} MB`,

    // CPU time in microseconds
    userCPUSeconds: (cpu.user / 1000000).toFixed(2),
    systemCPUSeconds: (cpu.system / 1000000).toFixed(2),
  })

  // Event loop lag
  const start = Date.now()
  setImmediate(() => {
    const lag = Date.now() - start
    if (lag > 50) {
      console.warn(`Event loop lag detected: ${lag}ms`)
    }
  })
}

setInterval(printMetrics, 30000)

Uptime and event loop health endpoint:

app.get('/health', (req, res) => {
  const mem = process.memoryUsage()

  res.json({
    status: 'ok',
    uptime: process.uptime(),
    memory: {
      rss: Math.round(mem.rss / 1024 / 1024),
      heapUsed: Math.round(mem.heapUsed / 1024 / 1024),
      heapTotal: Math.round(mem.heapTotal / 1024 / 1024),
    },
    cpuLoad: os.loadavg(),
    pid: process.pid,
  })
})

prerequisites¶

test_05_debugging.md - debugging , memory analysis , heap snapshots

next -> perf_02_cluster.md