OSINT Introduction¶

Open Source Intelligence (OSINT) means collecting and analyzing info from public sources to get actionable intel. For pentesters and security pros, OSINT is the foundation of recon. It gives you a complete picture of a target's digital footprint without ever touching their systems.
1. What is OSINT?¶
OSINT isn't just Googling. It's a systematic process that includes:
- Collection: Grab raw data from tons of public sources.
- Processing: Structure the data, remove duplicates, get it ready for analysis.
- Analysis: Connect the dots, find patterns, turn raw data into real intelligence.
- Dissemination: Present your findings clearly (like a recon report).
For security pros, OSINT intelligence is your map of the attack surface. It shows you where to focus.
Key OSINT Categories¶
| Category | Description | Examples of Sources |
|---|---|---|
| Technical OSINT | Information about a target's technology, infrastructure, and digital assets. | DNS records, IP addresses, SSL/TLS certificates, code repositories, Shodan, Censys. |
| Human OSINT (HUMINT) | Information about the people associated with a target, including employees, executives, and developers. | LinkedIn, Twitter, forums, breach databases, personal blogs. |
| Business OSINT | Information about the organization itself, its structure, partners, and operations. | Company website, news articles, press releases, financial reports, job postings. |
| Geospatial OSINT (GEOINT) | Information about the physical location of a target's assets. | Google Maps, public webcams, photo metadata (EXIF data). |
2. The OSINT Mindset¶
Good OSINT needs the right mindset:
- Stay curious and persistent: The good stuff isn't on Google's first page. Dig deep, follow rabbit holes, keep pushing.
- Pivot like an analyst: Every piece of info is a pivot point. An employee name leads to their GitHub. A domain leads to an IP, which leads to other domains on the same server. Follow the trail.
- Verify everything: Not everything online is true or current. Corroborate from multiple sources.
- Watch your OPSEC: You're leaving digital footprints when doing OSINT. Use a VPN, dedicated research accounts (sock puppets), and be careful what you request and from where.
3. Core OSINT Workflow¶
A structured workflow keeps you on track and ensures you don't miss anything.
- Define your goals: What do you want to find? "All subdomains," "email format," "developer GitHub profiles" - be specific.
- Gather seed data: Start with one piece of info - company name or main domain. That's your starting point.
- Expand and pivot: Use that seed to find more.
- Company Name → Website, LinkedIn, News Articles
- Domain Name → DNS Records, IPs, Subdomains, Technologies
- LinkedIn Page → Employee Names, Job Titles, Tech Stack
- Employee Name → Email Permutations, GitHub, Twitter
- Aggregate and analyze: Put everything in a structured format (mind map, spreadsheet, or dedicated tool). Find connections and identify high-value targets.
4. Advanced OSINT Techniques¶
Data Collection Techniques¶
Web Scraping: - Use libraries like BeautifulSoup (Python) or Scrapy to extract data from websites. - Automate data collection from social media platforms using APIs (e.g., Twitter API, LinkedIn API).
Example:
import requests
from bs4 import BeautifulSoup
url = 'https://example.com'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
for link in soup.find_all('a'):
print(link.get('href'))
Social Media Mining: - Use tools like Maltego or SpiderFoot to gather information from social media profiles. - Analyze connections and relationships between individuals and organizations.
Advanced Search Techniques¶
Google Dorking: - Use advanced search operators to find specific information. - Example Dorks:
# Find sensitive files
filetype:pdf site:example.com
# Find login pages
inurl:login site:example.com
Using APIs for OSINT: - Leverage APIs from various platforms to gather data programmatically. - Example: Using the Shodan API to find devices:
curl "https://api.shodan.io/shodan/host/{ip}?key={API_KEY}"
Data Correlation and Analysis¶
Link Analysis: - Use tools like Maltego to visualize relationships between entities. - Identify connections between individuals, organizations, and infrastructure.
Data Aggregation: - Combine data from multiple sources to create a comprehensive profile. - Use spreadsheets or databases to organize and analyze findings.
real world Case Study: OSINT in Action¶
Scenario: During a security assessment of a tech company, OSINT was used to gather information about employees and infrastructure.
Approach: 1. Social Media Analysis: Collected data from LinkedIn to identify key personnel. 2. Domain Analysis: Used whois and DNS enumeration to gather domain-related information. 3. Public Records: Searched for company filings and press releases to gather insights.
Findings: - Discovered a staging environment exposed to the internet. - Identified key personnel in the security team, which could be targeted for social engineering.
5. Essential OSINT Tools¶
While manual techniques are crucial, tools help automate and scale the collection process.
Google Dorking¶
Using advanced Google search operators is a fundamental OSINT skill.
| Dork | Description | Example |
|---|---|---|
site:<domain> | Restricts search to a specific site. | site:example.com admin |
inurl:<text> | Finds pages with specific text in the URL. | inurl:login site:example.com |
intitle:<text> | Finds pages with specific text in the title. | intitle:"index of" site:example.com |
filetype:<ext> | Searches for specific file types. | filetype:pdf site:example.com internal |
"<text>" | Searches for an exact phrase. | "Example Inc." "API key" |
Frameworks and Platforms¶
- Maltego: A powerful graphical link analysis tool for visualizing relationships between pieces of information. It uses "transforms" to query various data sources.
- SpiderFoot: An open-source OSINT automation tool. You provide a target, and it runs dozens of modules to collect information on everything from subdomains to employee names.
- theHarvester: A classic command line tool for gathering emails, subdomains, hosts, employee names, open ports, and banners from different public sources.
# Example theHarvester usage
theharvester -d example.com -l 500 -b google,bing,linkedin
Specialized Search Engines¶
- Shodan: The "search engine for hackers." Finds internet-connected devices (servers, webcams, IoT) using service banners instead of web content. You can search by IP, organization, or product.
- Censys: Similar to Shodan, Censys scans the internet and allows you to search for hosts and websites based on their configuration (e.g., find all servers using a specific TLS certificate).
6. Advanced Human OSINT (HUMINT) Techniques¶
Finding information about the people behind the technology is often the key to a successful engagement. Here are advanced techniques:
LinkedIn Advanced Analysis¶
Advanced Search Techniques:
# LinkedIn Sales Navigator search operators:
# - "security engineer" AND "example.com" AND "current"
# - "devops" AND "aws" AND "san francisco"
# - "cto" OR "chief technology officer" AND "startup"
# Boolean search examples:
# (title:security OR title:infosec) AND company:example
# skills:(python OR golang) AND location:"new york"
Employee Enumeration Automation:
import requests
from bs4 import BeautifulSoup
import re
def linkedin_employee_search(company):
# This is a conceptual example - actual LinkedIn scraping requires proper API usage
headers = {'User-Agent': 'Mozilla/5.0'}
url = f"https://www.linkedin.com/company/{company}/people/"
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.text, 'html.parser')
employees = []
for profile in soup.find_all('div', class_='profile-card'):
name = profile.find('h3').text.strip()
title = profile.find('p', class_='subline').text.strip()
employees.append({'name': name, 'title': title})
return employees
Email Enumeration and Verification¶
Advanced Email Discovery:
# Using theHarvester with multiple sources
theharvester -d example.com -l 1000 -b google,bing,linkedin,twitter,github -v
# Hunter.io API integration
curl "https://api.hunter.io/v2/domain-search?domain=example.com&api_key=YOUR_KEY"
# Email format pattern analysis
# Common patterns: first.last@, flast@, firstl@, f.last@, first@
Email Verification Techniques:
# Using hunter.io email verifier
curl "https://api.hunter.io/v2/email-verifier?email=test@example.com&api_key=YOUR_KEY"
# SMTP verification (careful with rate limiting)
python -c "import smtplib; server = smtplib.SMTP('mail.example.com'); server.verify('test@example.com')"
Social Media Cross-Referencing¶
Username Enumeration:
# Using Sherlock for username discovery
sherlock username
# Social-analyzer for comprehensive analysis
social-analyzer --username "johndoe" --websites "twitter,github,instagram"
# Whatsmyname project integration
python whatsmyname.py -u johndoe
Breach Data Analysis:
# Have I Been Pwned API
curl "https://haveibeenpwned.com/api/v3/breachedaccount/test@example.com" \
-H "hibp-api-key: YOUR_KEY"
# DeHashed.com search
curl "https://api.dehashed.com/search?query=email:test@example.com" \
-u "email:api_key" -H "Accept: application/json"
Advanced Automation Scripts¶
Comprehensive People Search:
import requests
import json
from theHarvester import theHarvester
def comprehensive_people_search(domain):
results = {}
# Email discovery
harvester = theHarvester()
emails = harvester.search(domain, 'google', 100)
results['emails'] = emails
# Social media discovery
social_results = {}
platforms = ['twitter', 'github', 'linkedin']
for platform in platforms:
try:
response = requests.get(f"https://api.some-social-site.com/search?q={domain}")
social_results[platform] = response.json()
except:
pass
results['social_media'] = social_results
return results
Geolocation Intelligence:
# EXIF data extraction from images
exiftool image.jpg
# Google Maps location analysis
# Use Google Earth Pro for advanced geospatial analysis
# Analyze geotagged social media posts
# WiFi network mapping
wigle.net database search for network locations
Psychological Profiling¶
Behavioral Analysis: - Analyze writing patterns across platforms - Identify interests and hobbies from social media - Determine technical skill level from code repositories - Assess security awareness from online behavior
Threat Modeling: - Identify potential social engineering targets - Map organizational influence and decision-making - Determine access levels based on job titles and responsibilities - Identify disgruntled employees through sentiment analysis
Operational Security (OPSEC) for HUMINT¶
Sock Puppet Management: - Create realistic online personas - Maintain consistent backstories across platforms - Use separate browsers and VPN connections - Avoid cross-contamination between real and research identities
Legal and Ethical Considerations: - Only collect publicly available information - Respect privacy settings and terms of service - Avoid harassment or unwanted contact - Document all findings for legal compliance
LinkedIn Advanced Analysis¶
- Employee Enumeration: The primary source for finding current and former employees.
- Organizational Structure: Identify key personnel in IT, security, and development.
- Technology Stack: Job descriptions for developer and DevOps roles often explicitly list the technologies, frameworks, and cloud providers they use.
GitHub / GitLab¶
- Developers often use personal accounts for work-related activities.
- Search for commits or issues made by corporate email addresses.
-
See the Code Repositories OSINT cheatsheet for a deep dive.
7. Notes and Pitfalls¶
- Information Overload: It's easy to drown in data. Stick to your objectives and focus on information that is relevant to the attack surface.
- False Positives and Outdated Info: Information on the internet can be old or incorrect. Always seek to verify critical findings from a second source.
- Legal and Ethical Boundaries: OSINT is legal as it uses public data. However, using that data for unauthorized access, harassment, or other malicious activities is illegal. Always operate ethically and within your legal mandate.
- The Rabbit Hole: It's easy to spend days on OSINT. Know when to stop and move on to the next phase of reconnaissance (active scanning). A time-boxed approach is often effective.
7. Quick Reference: The OSINT Funnel¶
This table illustrates the pivoting process, from broad to specific.
| Starting Point | Pivot To | Tools / Sources |
|---|---|---|
| Company Name | Domain Name, Social Media Profiles, News Articles, Key Executives. | Google, LinkedIn, Crunchbase. |
| Domain Name | Subdomains, IP Addresses, DNS Records, Web Technologies, Web Archives. | subfinder, amass, whatweb, gau. |
| IP Address / CIDR | Open Ports, Hosted Domains (Reverse IP), Service Banners, Geolocation. | nmap, masscan, Shodan, whois. |
| Employee Name | Email Address, Username, Social Media Profiles, Code Commits, Breached Credentials. | Google, LinkedIn, theHarvester, DeHashed. |
| Email Address | Breached Passwords, Associated Accounts, Google Profile Picture. | HIBP, DeHashed, Google. |
8. Summary¶
OSINT is the cornerstone of effective reconnaissance, providing critical insights into a target's digital footprint without direct engagement. By systematically collecting, processing, and analyzing publicly available information from technical, human, business, and geospatial sources, security professionals can build comprehensive attack surface maps. Mastering OSINT techniques, tools, and mindset enables thorough reconnaissance while maintaining ethical boundaries and operational security. The ability to pivot between different data sources and connect disparate pieces of information is what transforms raw data into actionable intelligence for successful security assessments.