Skip to content

Core 11 url

Core 11 - URL Module

Basic Idea

URLs are how everything on the web is addressed Node has two URL APIs - the legacy url.parse() and the WHATWG new URL() They parse differently , behave differently , and one of them is wrong half the time

Legacy vs WHATWG - The Two APIs

const url = require('url')

// LEGACY API - url.parse() (deprecated but still everywhere)
const legacy = url.parse('https://user:pass@example.com:8080/path?q=1#hash')
console.log(legacy)
// {
//   protocol: 'https:',
//   hostname: 'example.com',
//   port: '8080',
//   pathname: '/path',
//   search: '?q=1',
//   hash: '#hash',
//   auth: 'user:pass'
// }

// WHATWG API - new URL() (modern , spec-compliant)
const whatwg = new URL('https://user:pass@example.com:8080/path?q=1#hash')
console.log(whatwg)
// {
//   protocol: 'https:',
//   hostname: 'example.com',
//   port: '8080',
//   pathname: '/path',
//   search: '?q=1',
//   hash: '#hash',
//   username: 'user',
//   password: 'pass'
// }

Use the WHATWG API unless you're maintaining code from 2014 url.parse() is deprecated but not removed - too much legacy code depends on it The biggest difference: WHATWG follows the browser URL spec , legacy follows Node's own path

URL Components

const myURL = new URL('https://user:pass@api.example.com:8443/v2/users?page=1&limit=10#section')

console.log('href:',       myURL.href)       // full URL
console.log('protocol:',   myURL.protocol)   // https: (with colon)
console.log('hostname:',   myURL.hostname)   // api.example.com
console.log('port:',       myURL.port)       // 8443
console.log('host:',       myURL.host)       // api.example.com:8443
console.log('pathname:',   myURL.pathname)   // /v2/users
console.log('search:',     myURL.search)     // ?page=1&limit=10
console.log('hash:',       myURL.hash)       // #section
console.log('username:',   myURL.username)   // user
console.log('password:',   myURL.password)   // pass (don't log this in production)
console.log('origin:',     myURL.origin)     // https://api.example.com:8443

// URL is mutable - you can change components
myURL.pathname = '/v3/products'
myURL.searchParams.set('sort', 'asc')
console.log(myURL.href) // https://user:pass@api.example.com:8443/v3/products?page=1&limit=10&sort=asc#section

The WHATWG URL object is mutable - you can reassign components or modify searchParams origin is derived from protocol + host - useful for CORS checks host includes port , hostname doesn't - common confusion

URLSearchParams

const myURL = new URL('https://example.com/api?name=mahmoud&age=25&active=true')

// get - first value
console.log(myURL.searchParams.get('name')) // mahmoud

// getAll - all values (duplicate keys)
myURL.searchParams.append('tag', 'admin')
myURL.searchParams.append('tag', 'dev')
console.log(myURL.searchParams.getAll('tag')) // ['admin', 'dev']

// has - existence check
console.log(myURL.searchParams.has('age')) // true

// set - overwrites existing
myURL.searchParams.set('active', 'false')

// delete - removes key
myURL.searchParams.delete('age')

// keys , values , entries - iterables
for (const [key, value] of myURL.searchParams) {
  console.log(key, '=', value)
}

// toString - serializes to query string
console.log(myURL.searchParams.toString()) // 'name=mahmoud&active=false&tag=admin&tag=dev'

// working directly with QueryString
const params = new URLSearchParams('?q=search&page=2')
params.set('page', '3')
console.log(params.toString()) // 'q=search&page=3'

URLSearchParams handles encoding/decoding automatically Duplicate keys are allowed and common (e.g., ?tag=admin&tag=dev) toString() produces a query string without the leading ?

url.format() and url.resolve()

const url = require('url')

// format - build URL from object (legacy API)
const formatted = url.format({
  protocol: 'https:',
  hostname: 'example.com',
  port: 9090,
  pathname: '/api/status',
  query: { format: 'json' }
})
console.log(formatted) // https://example.com:9090/api/status?format=json

// format with WHATWG URL
const myURL = new URL('https://example.com')
myURL.pathname = '/api'
myURL.searchParams.set('v', '2')
console.log(url.format(myURL, { fragment: false })) // https://example.com/api?v=2

url.format() handles both legacy and WHATWG URLs With legacy URL objects/options , it builds URLs from components With WHATWG URLs , it applies formatting options (like suppressing fragment)

URL Construction and Resolution

const url = require('url')

// url.resolve() - resolves relative URL against base (legacy)
const base = 'https://api.example.com/v2/'
const relative = url.resolve(base, 'users/123')
console.log(relative) // 'https://api.example.com/v2/users/123'

// relative starting with / replaces path
const relativeRoot = url.resolve(base, '/admin')
console.log(relativeRoot) // 'https://api.example.com/admin'

// WHATWG equivalent - new URL with relative
const resolved = new URL('users/123', 'https://api.example.com/v2/')
console.log(resolved.href) // 'https://api.example.com/v2/users/123'

url.resolve(base, relative) resolves relative URLs against an absolute base The WHATWG new URL(relative, base) does the same thing but better A relative path starting with / replaces the entire pathname of the base

Security: URL Parsing Differences

// DANGER - url.parse() and new URL() parse hostname DIFFERENTLY
const legacy = url.parse('https://evil.com@legit.com/path')
console.log('legacy hostname:', legacy.hostname) // 'legit.com' (wrong!)

const whatwg = new URL('https://evil.com@legit.com/path')
console.log('whatwg hostname:', whatwg.hostname) // 'legit.com' (correct)
// Wait - both say legit.com ?
// Let's try something else:

// Host confusion via unicode
const unicodeURL = new URL('https://googlе.com') // Cyrillic 'е' instead of latin 'e'
console.log('visible:', unicodeURL.hostname) // 'googlе.com' (looks identical)
console.log('actual:', unicodeURL.href)       // https://xn--googl-tz9e.com (punycode)

// auth stripping
const withAuth = new URL('https://attacker:password@bank.com')
console.log('origin:', withAuth.origin) // 'https://bank.com'
// Looks like bank.com - but actually sends credentials to attacker's server

// DEFENSE - validate hostnames explicitly
function validateHostname(hostname, expected) {
  // decode punycode for comparison
  try {
    const url = new URL(`https://${hostname}/`)
    const decoded = url.hostname
    if (decoded !== expected) {
      throw new Error(`hostname mismatch: ${decoded}`)
    }
  } catch (err) {
    throw new Error(`invalid hostname: ${hostname}`)
  }
}

URL parsing inconsistencies are a major attack vector url.parse() and new URL() can return different hostnames for the same input Punycode homograph attacks: characters that look identical but are different unicode code points

The @ sign in URLs creates auth context: user:pass@host Phishers abuse this: https://legitimate.com@evil.com looks like legitimate.com but goes to evil.com Always validate the hostname after parsing - never trust the user-facing URL string

Security: SSRF via URL Confusion

// SSRF - Server-Side Request Forgery via URL
async function fetchFromURL(userURL) {
  // what if userURL is 'http://127.0.0.1:6379/' (Redis) or
  // 'http://169.254.169.254/' (AWS metadata endpoint)?

  const target = new URL(userURL)

  // DEFENSE 1 - block private IPs
  const net = require('net')
  const isPrivate = (ip) => {
    const parts = ip.split('.').map(Number)
    return parts[0] === 10 ||
      parts[0] === 127 ||
      (parts[0] === 172 && parts[1] >= 16 && parts[1] <= 31) ||
      (parts[0] === 192 && parts[1] === 168) ||
      (parts[0] === 169 && parts[1] === 254)
  }

  // resolve hostname to IP
  const dns = require('dns/promises')
  const { address } = await dns.lookup(target.hostname)

  if (isPrivate(address)) {
    throw new Error('target resolves to private IP')
  }

  // DEFENSE 2 - allowlist protocols
  if (!['http:', 'https:'].includes(target.protocol)) {
    throw new Error('protocol not allowed')
  }

  // now safe to fetch
  const response = await fetch(target.href)
  return response.text()
}

SSRF lets attackers make your server request internal resources AWS metadata endpoint (169.254.169.254) , Redis (127.0.0.1:6379) , internal Kubernetes services Always validate resolved IPs , not just hostnames - DNS rebinding bypasses hostname checks Always restrict protocols - file:// , ftp:// , gopher:// can be dangerous

Summary

  • Use WHATWG new URL() over legacy url.parse() - spec-compliant , less buggy
  • URLSearchParams for query string manipulation - handles encoding
  • url.resolve() and new URL(relative, base) for relative URL resolution
  • URL parsing differences between APIs create security bugs
  • Always validate hostnames (punycode) and resolved IPs (SSRF)
  • Block private IPs , restrict protocols , and canonicalize before using

Prerequisites

next -> web_01_http.md