NoSQL Databases¶
Not every workload needs ACID transactions and rigid schemas NoSQL databases trade consistency guarantees for scale , flexibility , or speed - but tradeoffs bite you if you pick the wrong one for the job
MongoDB with Mongoose¶
Mongoose adds schema validation on top of MongoDB's flexible document model
const mongoose = require('mongoose')
const userSchema = new mongoose.Schema({
email: {
type: String,
required: true,
unique: true,
lowercase: true,
match: /^\S+@\S+\.\S+$/
},
password: { type: String, required: true },
role: {
type: String,
enum: ['user', 'admin', 'moderator'],
default: 'user'
},
profile: {
firstName: String,
lastName: String,
avatar: String
},
createdAt: { type: Date, default: Date.now }
})
// Index for fast lookups
userSchema.index({ email: 1 })
userSchema.index({ 'profile.firstName': 'text', 'profile.lastName': 'text' })
const User = mongoose.model('User', userSchema)
// Create
const user = await User.create({
email: 'test@example.com',
password: hashedPassword,
profile: { firstName: 'Mahmoud' }
})
// Read
const found = await User.findOne({ email: 'test@example.com' }).lean()
// Update
await User.findByIdAndUpdate(id, { 'profile.lastName': 'Ali' })
// Delete
await User.findByIdAndDelete(id)
.lean() returns plain JavaScript objects instead of Mongoose documents Without lean , every query creates a full Mongoose document instance - that's expensive for read-heavy endpoints
Queries and Aggregation¶
// Basic queries
const activeUsers = await User.find({
role: 'user',
createdAt: { $gte: new Date('2025-01-01') }
}).sort({ createdAt: -1 }).limit(10)
// Aggregation pipeline - powerful but complex
const stats = await User.aggregate([
{ $match: { role: 'user' } },
{ $group: {
_id: { $dateToString: { format: '%Y-%m', date: '$createdAt' } },
count: { $sum: 1 },
avgAge: { $avg: '$age' }
}},
{ $sort: { _id: -1 } }
])
// Text search
const results = await User.find(
{ $text: { $search: 'mahmoud ali' } },
{ score: { $meta: 'textScore' } }
).sort({ score: { $meta: 'textScore' } })
Aggregation pipelines look like JSON but they're full programs A poorly written aggregation with no index can scan millions of documents and kill performance
Redis with ioredis¶
Redis is an in-memory data store - not a primary database
const Redis = require('ioredis')
const redis = new Redis()
// String
await redis.set('user:123', JSON.stringify(userData), 'EX', 3600)
const cached = await redis.get('user:123')
// Hash - store object fields separately
await redis.hset('session:abc', {
userId: 123,
role: 'admin',
ip: '192.168.1.1'
})
const session = await redis.hgetall('session:abc')
// List - ordered collection
await redis.lpush('recent:views:123', 'post:456')
await redis.ltrim('recent:views:123', 0, 9) // keep last 10
// Set - unique members
await redis.sadd('online:users', 'user:123', 'user:456')
const onlineCount = await redis.scard('online:users')
// Sorted Set - leaderboard / score-based
await redis.zadd('leaderboard', 100, 'user:123', 85, 'user:456')
const top10 = await redis.zrevrange('leaderboard', 0, 9, 'WITHSCORES')
Redis operations are atomic - that's why it's perfect for counters and rate limiting But atomic doesn't mean durable - Redis without persistence loses all data on restart
Pub/Sub with Redis¶
// Publisher
const Redis = require('ioredis')
const publisher = new Redis()
async function notifyUser(userId, event) {
await publisher.publish('user:notifications', JSON.stringify({
userId,
event,
timestamp: Date.now()
}))
}
// Subscriber (separate connection)
const subscriber = new Redis()
subscriber.subscribe('user:notifications')
subscriber.on('message', (channel, message) => {
const data = JSON.parse(message)
console.log('Notification for user:', data.userId)
// Send WebSocket message , email , push notification
})
Redis pub/sub has no message persistence If the subscriber disconnects , it misses every message sent while it was gone - use Redis Streams for reliable messaging
Caching with Redis¶
async function getUser(id) {
const cacheKey = `user:${id}`
// Cache-aside: check cache first
const cached = await redis.get(cacheKey)
if (cached) {
return JSON.parse(cached)
}
// Miss - query database
const user = await User.findById(id)
// Write to cache
await redis.set(cacheKey, JSON.stringify(user), 'EX', 3600)
return user
}
// Invalidate on update
async function updateUser(id, data) {
await User.findByIdAndUpdate(id, data)
await redis.del(`user:${id}`) // invalidate cache
}
Cache invalidation is one of the hard problems in computer science Just delete the cache key on updates - don't try to keep it in sync
Elasticsearch Basics¶
const { Client } = require('@elastic/elasticsearch')
const es = new Client({ node: 'http://localhost:9200' })
// Index a document
await es.index({
index: 'logs',
body: {
timestamp: new Date(),
level: 'ERROR',
message: 'Database connection failed',
service: 'api-server'
}
})
// Search
const { hits } = await es.search({
index: 'logs',
body: {
query: {
bool: {
must: [
{ match: { level: 'ERROR' } },
{ range: { timestamp: { gte: 'now-1h' } } }
]
}
},
aggs: {
by_service: { terms: { field: 'service.keyword' } }
}
}
})
Elasticsearch is great for full-text search and log aggregation Never use it as your primary database - it has no transactions and weak consistency guarantees
NoSQL Injection¶
MongoDB is vulnerable to injection when you use $where or pass unvalidated query operators
// VULNERABLE - $where injection
User.find({ $where: `this.email === '${email}'` })
// If email = "' || true || '"
// Query becomes: $where: "this.email === '' || true || ''"
// Returns ALL users
// VULNERABLE - query parameter injection
User.find({ email: userInput })
// If userInput = { $gt: '' }
// Returns first user where email > '' (basically every user)
Mongoose types are safer because they enforce schema types But raw MongoDB queries with user input are a script-kiddie's playground - validate everything
// SAFE - use specific query methods
User.findOne({ email: req.body.email })
// SAFE - validate input type
if (typeof req.body.email !== 'string') {
throw new Error('email must be a string')
}
// SAFE - use Mongoose's typed schema
// With a schema , email is always a string - no injection possible
Prevent NoSQL injection by: * Never passing user input directly into $where, $gte, $regex, $gt * Validating that query parameters are the expected type * Using Mongoose schemas instead of raw MongoDB driver * Sanitizing user input before building queries
Prerequisites¶
- db_02_sql.md - understand SQL before comparing NoSQL tradeoffs
next -> db_04_orms.md