Breach Databases¶

Breach databases are a goldmine for OSINT. They help you find email addresses, usernames, and compromised passwords for a target domain. Use this intel to spot password reuse, build targeted wordlists for password spraying, and understand how exposed an organization is.

1. Introduction to Breach Data¶

When companies get breached, the stolen data (usernames, emails, passwords) gets shared or sold in underground communities. Breach databases aggregate all this, making it searchable. For security pros, this gives you insight into:

Password Habits: Do employees reuse passwords across different services?
Username/Email Formats: What is the standard corporate email structure?
Exposed Credentials: Finding valid credentials that could grant initial access.
Phishing Targets: Identifying a list of valid employee emails.

Ethical Considerations

Using credentials found in data breaches to access systems without authorization is illegal. This information should only be used for authorized penetration testing and security assessments to identify risks like password reuse. Always operate within the rules of engagement.

2. Key Services and Tools¶

Services range from free and public to paid and private, with varying levels of data sensitivity.

Public and Freemium Services¶

Service	Description	Link
Have I Been Pwned (HIBP)	The most well-known service for checking emails against public breaches. Domain search is available for verified admins.	https://haveibeenpwned.com/
Intelligence X	A search engine that indexes the darknet, document sharing platforms, and more. Includes breach data.	https://intelx.io/
Leak-Lookup	A free data breach search engine with a large collection of databases.	https://leak-lookup.com/

Commercial (Paid) Services¶

Service	Description	Link
DeHashed	A powerful, fast search engine for breach data, often containing more extensive results than public services.	https://www.dehashed.com/
Snusbase	Another popular, index-based breach search engine with a large dataset.	https://snusbase.com/

3. Core Methodology¶

The process involves generating potential corporate emails and then querying them against breach databases.

Step 1: Enumerate Employee Names¶

Use OSINT sources to gather a list of employee names. LinkedIn is the primary source for this.

See the Social Media OSINT cheatsheet for more on this topic.

Example: Let's say we find employees "John Doe" and "Jane Smith" at example.com.

Step 2: Identify Corporate Email Format¶

Determine the company's email pattern. You can often guess this or find an example on their website. Common formats include: - firstname.lastname@example.com (john.doe@example.com) - f.lastname@example.com (j.doe@example.com) - flastname@example.com (jdoe@example.com) - firstname@example.com (john@example.com)

Step 3: Generate Potential Email Addresses¶

Create a list of potential emails using the discovered names and formats. You can do this manually or with a script.

Bash Snippet for Generating Emails:

# Create a file with names, one per line (e.g., "john.doe")
names_file="names.txt"
domain="example.com"

while read -r name; do
  echo "${name}@${domain}"
done < "$names_file" > emails.txt

# Example names.txt:
# john.doe
# jane.smith

# Output emails.txt:
# john.doe@example.com
# jane.smith@example.com

Step 4: Query Breach Databases¶

Use the generated email list to query services like DeHashed or HIBP.

Using the DeHashed API (Conceptual):

import requests
import json

# Requires a DeHashed API key and email
DEHASHED_EMAIL = "your_email"
DEHASHED_KEY = "your_api_key"

def query_dehashed(email):
    headers = {
        "Accept": "application/json",
    }
    auth = (DEHASHED_EMAIL, DEHASHED_KEY)
    response = requests.get(f"https://api.dehashed.com/search?query=email:{email}", headers=headers, auth=auth)
    if response.status_code == 200:
        return response.json()
    return None

# Loop through your email list
with open("emails.txt") as f:
    for email in f:
        results = query_dehashed(email.strip())
        if results and results.get("entries"):
            print(f"[*] Found results for: {email.strip()}")
            for entry in results["entries"]:
                print(f"  - Password Hash: {entry['hashed_password']}")
                print(f"  - Source: {entry['database_name']}")

4. Analyzing Breach Data¶

Finding a breach is just the first step. The value is in the analysis.

Password Reuse: The primary goal. If an employee used Password123! on a breached forum, they might use Password123! or Password2024! for their corporate account. This is critical information for password spraying attacks.
Password Complexity Patterns: Do employees use simple, guessable passwords? Do they follow a pattern (e.g., CompanyName123)? This helps in building a highly targeted wordlist.
Username Enumeration: Even if the password is a strong hash, a hit confirms the existence of a valid corporate email address and username.
PII Exposure: Breaches often contain personal information (DOB, address) that can be used for social engineering.

5. Operational Security (OPSEC)¶

When interacting with breach databases, especially from a pentesting environment, consider the following:

Anonymity: Use a VPN or proxy when accessing these services, especially if you are not using an official, authorized account. You don't want your real IP associated with queries for a target company.
APIs vs. Web UI: Using APIs is generally better for automation and can be less revealing than using a web browser with extensive tracking capabilities.
Data Handling: Treat any data downloaded from these services as highly sensitive. Store it securely and delete it after the engagement is complete, following your organization's data handling policies.

6. Notes and Pitfalls¶

False Positives: Just because an email is in a breach doesn't mean the password is still in use or that the employee is still with the company.
Password Hashes: You will often get password hashes, not plaintext passwords. You may need to use tools like Hashcat or John the Ripper to crack them, but this is often out of scope for a standard pentest unless password cracking is explicitly permitted.
Information Overload: For a large company, you may find thousands of breached credentials. Focus on recent breaches and accounts that appear to belong to privileged users (e.g., IT staff, executives).
Subscription Costs: The most effective services are commercial and require a subscription. Factor this into your team's budget.

7. Quick Reference Table¶

Task	Tool / Service	Example / Note
Enumerate Employee Names	LinkedIn	Search for `People -> Current Company: "Example Inc."`
Generate Email Permutations	Custom Script (Bash/Python)	`f.lastname`, `firstname.lastname`, etc.
Check Public Breaches (Email)	HIBP	Good for a quick, free check on a single email.
Deep Breach Search	DeHashed, Snusbase	Requires a subscription but provides much more data, including password hashes.
Automate Queries	Service API	Use Python or Bash to loop through a list of emails and query the API.
Analyze Password Patterns	Manual Review	Look for company names, years, and common password structures.

Ethical Considerations

Using credentials found in data breaches to access systems without authorization is illegal. This information should only be used for authorized penetration testing and security assessments to identify risks like password reuse.