MongoDB Intro¶
Schemaless sounds like freedom until you're debugging why half your documents have phoneNumber and the other half have phone_number and your aggregation pipeline silently drops 30% of records because of a field name mismatch you introduced six months ago during a "quick refactor" that you swore you'd clean up later
MongoDB is a document-oriented NoSQL database that stores data as JSON-like documents (BSON internally) in collections instead of rows in tables. It trades ACID transactions (though recent versions added multi-document transactions) for horizontal scalability, flexible schemas, and query patterns that map naturally to how your application code already structures data as objects and arrays
The document model¶
A MongoDB document is a JSON object stored as BSON (Binary JSON) with support for more data types than plain JSON: dates, ObjectIds, binary data, regular expressions, and even geospatial coordinates. Each document lives in a collection (analogous to a table) and each document in a collection can have different fields - that's the "schemaless" part
// A MongoDB document - this is exactly what you'd store
{
_id: ObjectId("507f1f77bcf86cd799439011"),
email: "omar@example.com",
username: "omar_hacker",
profile: {
displayName: "Omar the Tester",
avatar: "https://cdn.example.com/avatars/omar.jpg",
bio: "Breaking things since 2019"
},
roles: ["user", "beta_tester"],
metadata: {
signupIp: "192.168.1.100",
userAgent: "Mozilla/5.0...",
referralCode: "ALI2024"
},
createdAt: ISODate("2024-01-15T10:30:00Z"),
lastLogin: ISODate("2024-06-20T14:22:00Z")
}
Notice every document is self-contained - the profile is nested , roles are an array , metadata is a sub-document. This is the document model's strength: related data lives together instead of being scattered across JOIN tables. The tradeoff is data duplication and harder cross-document queries
Installation¶
# Ubuntu / Debian - using the official MongoDB repo
curl -fsSL https://www.mongodb.org/static/pgp/server-7.0.asc | \
sudo gpg -o /usr/share/keyrings/mongodb-server-7.0.gpg --dearmor
echo "deb [ signed-by=/usr/share/keyrings/mongodb-server-7.0.gpg ] \
https://repo.mongodb.org/apt/ubuntu jammy/mongodb-org/7.0 multiverse" | \
sudo tee /etc/apt/sources.list.d/mongodb-org-7.0.list
sudo apt update && sudo apt install mongodb-org -y
# macOS
brew tap mongodb/brew
brew install mongodb-community@7.0
brew services start mongodb-community@7.0
# Verify
mongosh --eval "db.version()"
# 7.0.x
mongosh - the shell¶
mongosh is the MongoDB Shell (replacing the old mongo CLI) and it's where you'll spend most of your direct database interaction time
// Connect to a database
mongosh "mongodb://localhost:27017/mydb"
// Or connect with authentication
mongosh "mongodb://admin:password@localhost:27017/mydb?authSource=admin"
// Show databases
show dbs
// Switch to (or create) a database
use mydb
// Show collections (tables)
show collections
// Basic commands in the shell
db.version() // Server version
db.stats() // Database statistics
db.serverStatus() // Server status (lots of info)
Databases, collections, documents¶
In MongoDB, databases contain collections and collections contain documents. Unlike SQL where you CREATE TABLE with a predefined schema, MongoDB creates a collection implicitly when you insert the first document into it - which is convenient for prototyping and dangerous for production because a typo in your collection name creates a new collection instead of failing with "table not found"
// Switch to (or create) a database
use blog
// Insert a document - creates both the 'posts' collection and the document
db.posts.insertOne({
title: "MongoDB Security Best Practices",
content: "In this post...",
author: "omar_hacker",
tags: ["mongodb", "security", "nosql"],
published: true,
views: 0,
createdAt: new Date()
});
// View collections
show collections
// posts
// Find what we just inserted
db.posts.findOne()
JSON vs BSON¶
MongoDB stores documents as BSON internally but exposes them as JSON-like structures to the application. The difference matters for data types:
// JSON types: string, number, boolean, null, object, array
// BSON adds: ObjectId, Date, Binary, Regex, Code, Timestamp
// BSON-specific types in shell
ObjectId() // 12-byte unique identifier
ISODate("2024-06-20") // Date/time
Binary("base64data") // Binary data
new RegExp("pattern", "i") // Regular expression
// ObjectId structure:
// 4 bytes: timestamp (creation time)
// 5 bytes: random value per machine
// 3 bytes: incrementing counter
// Example: 507f1f77bcf86cd799439011
The _id field is required for every document. If you don't provide one, MongoDB generates an ObjectId automatically. You can use any unique value as _id (UUIDs, integers, strings) but ObjectIds are optimized for distributed generation without collisions
Collection creation options¶
While MongoDB creates collections automatically on insert, you should explicitly create collections with validation rules and options for production
// Create a collection with schema validation
db.createCollection("users", {
validator: {
$jsonSchema: {
bsonType: "object",
required: ["email", "username", "passwordHash", "createdAt"],
properties: {
email: {
bsonType: "string",
pattern: "^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\\.[A-Za-z]{2,}$"
},
username: {
bsonType: "string",
minLength: 3,
maxLength: 30
},
role: {
enum: ["user", "admin", "moderator"]
},
isActive: {
bsonType: "bool"
}
}
}
},
validationAction: "error" // Reject documents that don't match
});
Schema validation is optional in MongoDB but you should always use it in production. "Schemaless" doesn't mean "structureless" - it means the database doesn't enforce it for you unless you explicitly configure it, and relying on application-level validation alone means every microservice interprets the schema differently until your data is a mess of competing formats
Security footgun - default MongoDB¶
Out of the box, MongoDB listens on port 27017 on all network interfaces with no authentication enabled. Shodan scans for open MongoDB instances constantly and ransom campaigns specifically target unsecured databases by dumping the data and demanding payment before deletion
# Immediately after installation:
# 1. Enable authentication in mongod.conf
# security:
# authorization: "enabled"
# 2. Bind to localhost only (or your application server IP)
# net:
# bindIp: 127.0.0.1
# port: 27017
# 3. Create an admin user FIRST, then restart with auth enabled
mongosh
use admin
db.createUser({
user: "admin",
pwd: "secure_password_here",
roles: ["root"]
})
# 4. Restart mongod with authentication enabled
sudo systemctl restart mongod
prerequisites¶
Knowledge of basic JSON syntax and having a terminal. Running MongoDB locally (or a cloud Atlas instance) helps but isn't strictly required since most of the concepts translate between any database
next → db_07_mongo_crud.md