Core 11 url
Core 11 - URL Module¶
Basic Idea¶
URLs are how everything on the web is addressed Node has two URL APIs - the legacy url.parse() and the WHATWG new URL() They parse differently , behave differently , and one of them is wrong half the time
Legacy vs WHATWG - The Two APIs¶
const url = require('url')
// LEGACY API - url.parse() (deprecated but still everywhere)
const legacy = url.parse('https://user:pass@example.com:8080/path?q=1#hash')
console.log(legacy)
// {
// protocol: 'https:',
// hostname: 'example.com',
// port: '8080',
// pathname: '/path',
// search: '?q=1',
// hash: '#hash',
// auth: 'user:pass'
// }
// WHATWG API - new URL() (modern , spec-compliant)
const whatwg = new URL('https://user:pass@example.com:8080/path?q=1#hash')
console.log(whatwg)
// {
// protocol: 'https:',
// hostname: 'example.com',
// port: '8080',
// pathname: '/path',
// search: '?q=1',
// hash: '#hash',
// username: 'user',
// password: 'pass'
// }
Use the WHATWG API unless you're maintaining code from 2014 url.parse() is deprecated but not removed - too much legacy code depends on it The biggest difference: WHATWG follows the browser URL spec , legacy follows Node's own path
URL Components¶
const myURL = new URL('https://user:pass@api.example.com:8443/v2/users?page=1&limit=10#section')
console.log('href:', myURL.href) // full URL
console.log('protocol:', myURL.protocol) // https: (with colon)
console.log('hostname:', myURL.hostname) // api.example.com
console.log('port:', myURL.port) // 8443
console.log('host:', myURL.host) // api.example.com:8443
console.log('pathname:', myURL.pathname) // /v2/users
console.log('search:', myURL.search) // ?page=1&limit=10
console.log('hash:', myURL.hash) // #section
console.log('username:', myURL.username) // user
console.log('password:', myURL.password) // pass (don't log this in production)
console.log('origin:', myURL.origin) // https://api.example.com:8443
// URL is mutable - you can change components
myURL.pathname = '/v3/products'
myURL.searchParams.set('sort', 'asc')
console.log(myURL.href) // https://user:pass@api.example.com:8443/v3/products?page=1&limit=10&sort=asc#section
The WHATWG URL object is mutable - you can reassign components or modify searchParams origin is derived from protocol + host - useful for CORS checks host includes port , hostname doesn't - common confusion
URLSearchParams¶
const myURL = new URL('https://example.com/api?name=mahmoud&age=25&active=true')
// get - first value
console.log(myURL.searchParams.get('name')) // mahmoud
// getAll - all values (duplicate keys)
myURL.searchParams.append('tag', 'admin')
myURL.searchParams.append('tag', 'dev')
console.log(myURL.searchParams.getAll('tag')) // ['admin', 'dev']
// has - existence check
console.log(myURL.searchParams.has('age')) // true
// set - overwrites existing
myURL.searchParams.set('active', 'false')
// delete - removes key
myURL.searchParams.delete('age')
// keys , values , entries - iterables
for (const [key, value] of myURL.searchParams) {
console.log(key, '=', value)
}
// toString - serializes to query string
console.log(myURL.searchParams.toString()) // 'name=mahmoud&active=false&tag=admin&tag=dev'
// working directly with QueryString
const params = new URLSearchParams('?q=search&page=2')
params.set('page', '3')
console.log(params.toString()) // 'q=search&page=3'
URLSearchParams handles encoding/decoding automatically Duplicate keys are allowed and common (e.g., ?tag=admin&tag=dev) toString() produces a query string without the leading ?
url.format() and url.resolve()¶
const url = require('url')
// format - build URL from object (legacy API)
const formatted = url.format({
protocol: 'https:',
hostname: 'example.com',
port: 9090,
pathname: '/api/status',
query: { format: 'json' }
})
console.log(formatted) // https://example.com:9090/api/status?format=json
// format with WHATWG URL
const myURL = new URL('https://example.com')
myURL.pathname = '/api'
myURL.searchParams.set('v', '2')
console.log(url.format(myURL, { fragment: false })) // https://example.com/api?v=2
url.format() handles both legacy and WHATWG URLs With legacy URL objects/options , it builds URLs from components With WHATWG URLs , it applies formatting options (like suppressing fragment)
URL Construction and Resolution¶
const url = require('url')
// url.resolve() - resolves relative URL against base (legacy)
const base = 'https://api.example.com/v2/'
const relative = url.resolve(base, 'users/123')
console.log(relative) // 'https://api.example.com/v2/users/123'
// relative starting with / replaces path
const relativeRoot = url.resolve(base, '/admin')
console.log(relativeRoot) // 'https://api.example.com/admin'
// WHATWG equivalent - new URL with relative
const resolved = new URL('users/123', 'https://api.example.com/v2/')
console.log(resolved.href) // 'https://api.example.com/v2/users/123'
url.resolve(base, relative) resolves relative URLs against an absolute base The WHATWG new URL(relative, base) does the same thing but better A relative path starting with / replaces the entire pathname of the base
Security: URL Parsing Differences¶
// DANGER - url.parse() and new URL() parse hostname DIFFERENTLY
const legacy = url.parse('https://evil.com@legit.com/path')
console.log('legacy hostname:', legacy.hostname) // 'legit.com' (wrong!)
const whatwg = new URL('https://evil.com@legit.com/path')
console.log('whatwg hostname:', whatwg.hostname) // 'legit.com' (correct)
// Wait - both say legit.com ?
// Let's try something else:
// Host confusion via unicode
const unicodeURL = new URL('https://googlе.com') // Cyrillic 'е' instead of latin 'e'
console.log('visible:', unicodeURL.hostname) // 'googlе.com' (looks identical)
console.log('actual:', unicodeURL.href) // https://xn--googl-tz9e.com (punycode)
// auth stripping
const withAuth = new URL('https://attacker:password@bank.com')
console.log('origin:', withAuth.origin) // 'https://bank.com'
// Looks like bank.com - but actually sends credentials to attacker's server
// DEFENSE - validate hostnames explicitly
function validateHostname(hostname, expected) {
// decode punycode for comparison
try {
const url = new URL(`https://${hostname}/`)
const decoded = url.hostname
if (decoded !== expected) {
throw new Error(`hostname mismatch: ${decoded}`)
}
} catch (err) {
throw new Error(`invalid hostname: ${hostname}`)
}
}
URL parsing inconsistencies are a major attack vector url.parse() and new URL() can return different hostnames for the same input Punycode homograph attacks: characters that look identical but are different unicode code points
The @ sign in URLs creates auth context: user:pass@host Phishers abuse this: https://legitimate.com@evil.com looks like legitimate.com but goes to evil.com Always validate the hostname after parsing - never trust the user-facing URL string
Security: SSRF via URL Confusion¶
// SSRF - Server-Side Request Forgery via URL
async function fetchFromURL(userURL) {
// what if userURL is 'http://127.0.0.1:6379/' (Redis) or
// 'http://169.254.169.254/' (AWS metadata endpoint)?
const target = new URL(userURL)
// DEFENSE 1 - block private IPs
const net = require('net')
const isPrivate = (ip) => {
const parts = ip.split('.').map(Number)
return parts[0] === 10 ||
parts[0] === 127 ||
(parts[0] === 172 && parts[1] >= 16 && parts[1] <= 31) ||
(parts[0] === 192 && parts[1] === 168) ||
(parts[0] === 169 && parts[1] === 254)
}
// resolve hostname to IP
const dns = require('dns/promises')
const { address } = await dns.lookup(target.hostname)
if (isPrivate(address)) {
throw new Error('target resolves to private IP')
}
// DEFENSE 2 - allowlist protocols
if (!['http:', 'https:'].includes(target.protocol)) {
throw new Error('protocol not allowed')
}
// now safe to fetch
const response = await fetch(target.href)
return response.text()
}
SSRF lets attackers make your server request internal resources AWS metadata endpoint (169.254.169.254) , Redis (127.0.0.1:6379) , internal Kubernetes services Always validate resolved IPs , not just hostnames - DNS rebinding bypasses hostname checks Always restrict protocols - file:// , ftp:// , gopher:// can be dangerous
Summary¶
- Use WHATWG
new URL()over legacyurl.parse()- spec-compliant , less buggy URLSearchParamsfor query string manipulation - handles encodingurl.resolve()andnew URL(relative, base)for relative URL resolution- URL parsing differences between APIs create security bugs
- Always validate hostnames (punycode) and resolved IPs (SSRF)
- Block private IPs , restrict protocols , and canonicalize before using
Prerequisites¶
next -> web_01_http.md