Open Source Intelligence (OSINT) Techniques

Open Source Intelligence (OSINT) Techniques | CDA.Wiki | CDA.Wiki

# Open Source Intelligence (OSINT) Techniques

Open Source Intelligence (OSINT) refers to the systematic collection, processing, and analysis of publicly available information to produce actionable intelligence about targets, threats, and attack surfaces. OSINT exists because adversaries conduct extensive pre-attack research before engaging a target network, and defenders who fail to perform the same research operate blind to their own exposure. The problem OSINT solves is asymmetric information: attackers invest significant time understanding a target's digital footprint while defenders often have no clear picture of what information about them is already public. By applying structured OSINT techniques, security teams close that gap, discovering exposed credentials, misconfigured systems, leaked internal documentation, and personnel data before attackers do.

---

Definition and Scope

Open Source Intelligence is the collection and analysis of information derived exclusively from publicly available sources, without any unauthorized access, covert intrusion, or proprietary data feeds. The "open" in OSINT does not mean free or easy to find. It means the information exists in sources that are legally accessible to anyone: websites, public records, social media platforms, academic publications, government databases, code repositories, and broadcast media.

OSINT is distinct from several adjacent disciplines that are sometimes conflated with it. Signals Intelligence (SIGINT) involves the interception of communications, which requires legal authorization and specialized equipment. Human Intelligence (HUMINT) relies on interpersonal contact and source development. Cyber Threat Intelligence (CTI) may draw on OSINT as one input but also incorporates closed-source feeds, dark web monitoring, and vendor telemetry. Competitive intelligence is a business discipline that overlaps methodologically but serves commercial strategy rather than security operations.

What OSINT is NOT: it is not passive scanning of a target's infrastructure. Sending packets to a target's IP addresses, running active vulnerability scans, or querying a target's DNS resolvers directly are active reconnaissance techniques, not OSINT. OSINT relies entirely on data that was already made public by the data subject or by third parties, with no direct interaction with target systems.

Subtypes of OSINT relevant to security operations include: external attack surface mapping (identifying internet-facing assets), personnel OSINT (mapping employees, roles, and contact information), credential intelligence (identifying leaked usernames and passwords in breach data), supply chain OSINT (researching vendors and technology partners), and geospatial OSINT (using satellite imagery and location data for physical security assessments). Each subtype requires different tools, data sources, and analytical methods, though they share a common methodology grounded in source enumeration, data correlation, and structured analysis.

---

How It Works

OSINT collection follows a structured process that moves from target definition through data collection, processing, analysis, and reporting. Each phase builds on the previous one, and skipping steps produces incomplete or misleading intelligence.

Phase 1: Target Definition and Scope

Before any collection begins, the analyst defines the target scope precisely. For an organization, this includes the primary domain name, all known subsidiary domains, registered IP ranges, known brand names, key personnel, and relevant technology products. Scope creep is a real problem in OSINT operations: without boundaries, analysts can spend weeks following tangential leads. A clear scope statement, such as "all internet-facing assets associated with acme-corp.com and its confirmed subsidiaries," keeps collection focused and efficient.

Phase 2: Passive DNS and Domain Enumeration

Domain enumeration is typically the first technical step in organizational OSINT. Analysts query WHOIS records to identify domain registration history, registrant contact information (often partially redacted post-GDPR but still useful historically), nameserver configurations, and registration patterns that may reveal additional related domains. Certificate transparency logs, accessible through services like crt.sh, expose every TLS certificate ever issued for a domain, including subdomains that the organization never intended to publicize. A single query to crt.sh for a large enterprise will often return hundreds of subdomains, many pointing to staging environments, development servers, internal tools exposed accidentally, or decommissioned infrastructure still running vulnerable software.

DNS brute-forcing tools such as dnsx and amass expand this further by testing common subdomain patterns against authoritative DNS servers. This is still considered passive because the analyst is querying public DNS infrastructure, not interacting with the target's systems directly.

Phase 3: Technology Fingerprinting and Asset Profiling

Once subdomains are enumerated, analysts profile each asset without touching it directly. Shodan and Censys maintain continuously updated indexes of internet-facing services, storing banner information, certificate details, open ports, and detected software versions. An analyst can query Shodan for all hosts returning a specific server banner associated with the target's IP ranges and immediately identify running software versions, exposed administrative interfaces, and misconfigured services, all without sending a single packet to the target.

Job postings are a frequently underestimated OSINT source for technology stack identification. A posting for a "Senior DevOps Engineer" that lists required experience with specific SIEM platforms, cloud providers, identity providers, and monitoring tools tells an analyst exactly which products the organization is running and, by extension, which CVEs may be relevant.

Phase 4: Personnel and Organizational Mapping

LinkedIn, corporate websites, conference speaker bios, and academic publications collectively expose an organization's internal structure to a degree that surprises most security teams when they see it documented. Analysts build org charts, identify IT and security personnel, and map relationships between employees and business units. This intelligence is directly actionable for social engineering campaigns, spear-phishing target selection, and business email compromise attacks.

Email address harvesting tools such as theHarvester and Hunter.io enumerate corporate email addresses across public data sources. Once an analyst has a confirmed email address format (firstname.lastname@acme-corp.com), they can cross-reference that format against breach databases like Have I Been Pwned and Dehashed to identify valid credentials that may have been exposed in third-party data breaches. A compromised password from a 2019 breach of an unrelated service may still be valid if the employee reused it.

Phase 5: Correlation and Profile Assembly

The highest-value OSINT work is not collection but correlation. Maltego and SpiderFoot automate relationship mapping across disparate data sources, visualizing connections between domains, IP addresses, email addresses, social media profiles, and organizational entities. A single employee's GitHub profile may expose internal hostnames in commit history, AWS S3 bucket names in configuration files, and API keys in code that was meant to be cleaned before publication. None of these findings is obvious in isolation, but combined they reveal significant attack surface.

Concrete Scenario: During an external attack surface assessment, an analyst queries crt.sh for acme-corp.com and discovers a subdomain staging-payments.acme-corp.com that does not appear in the organization's documented asset inventory. A Shodan query for the associated IP shows an outdated nginx version with a known path traversal vulnerability. A search on GitHub for "acme-corp" in repository names reveals a public repository containing a configuration file with a plaintext database connection string pointing to that same staging server. The organization has no awareness of this exposure. The analyst now has a complete attack chain derived entirely from public data, discovered in under two hours.

---

Why It Matters

Organizations that do not conduct regular OSINT assessments against themselves are making security decisions without complete information. Their incident response plans assume adversaries arrive at the network perimeter with no prior knowledge. In practice, sophisticated threat actors invest weeks or months in OSINT before executing an attack, using that intelligence to select the most effective initial access vector, identify high-value personnel to impersonate, and anticipate defensive controls.

The business impact of inadequate OSINT awareness is measurable. Credential exposure in third-party breach databases is one of the most common initial access vectors documented in breach investigations. The 2021 Colonial Pipeline ransomware attack, attributed to DarkSide, reportedly began with a compromised VPN credential that appeared in a leaked password database. The VPN account had no multi-factor authentication. An OSINT assessment would have identified that credential in breach data before the attackers did, enabling remediation.

Exposed source code repositories represent a second high-frequency risk category. Development teams routinely publish code to GitHub or GitLab with API keys, cloud credentials, or internal hostnames included inadvertently. These secrets are indexed within minutes of publication by automated scanning tools operated by both security researchers and threat actors. Organizations that do not monitor their own code repositories for secret exposure are typically unaware of this attack surface until after a breach.

A common misconception about OSINT is that privacy measures like GDPR-mandated WHOIS redaction or social media privacy settings meaningfully limit attacker capability. Privacy controls reduce the convenience of data collection but do not eliminate it. Cached data, breach databases, third-party data aggregators, and historical records preserve information long after the original source is restricted. Defenders must treat OSINT as an ongoing, continuous function rather than a one-time assessment, because the internet's memory is long and adversaries are patient.

A second misconception is that OSINT is only useful for red teams. Blue teams, threat intelligence analysts, vulnerability management programs, and executive protection functions all depend on OSINT to do their jobs effectively.

---

CDA Perspective

The Cyber Defense Alliance approaches OSINT through the Planetary Defense Model (PDM), specifically within the Threat Intelligence Domain (TID). CDA's operational methodology, Predictive Defense Intelligence (PDI), is grounded in the principle: "See the threat before it sees you." OSINT is the mechanism by which that principle is operationalized.

CDA distinguishes between reactive and predictive OSINT postures. A reactive posture waits for indicators of compromise to appear in logs and then attempts to attribute them to known threat actors. A predictive posture continuously monitors the external information environment for signals that precede an attack: new subdomains being registered that mimic the organization's brand, credential exposure in fresh breach databases, dark web forum discussions referencing the organization by name, and social media profiles impersonating executives. By the time a threat actor engages a target's infrastructure, the predictive OSINT program has already flagged the pre-attack research activity.

CDA's TID methodology structures OSINT collection around three layers. The surface layer covers indexed web content, public records, and social media. The deep layer covers content that is publicly accessible but not indexed by standard search engines, including paste sites, public cloud storage buckets, and academic databases. The dark layer, while not strictly "open source" in the traditional sense, is monitored through authorized access to closed criminal forums and marketplaces where compromised credentials and internal documents are traded.

What CDA does differently is treat OSINT not as a project but as a persistent intelligence function. Most organizations conduct an OSINT assessment annually as part of a penetration test engagement and then file the report. CDA integrates continuous OSINT monitoring into the security operations cycle, with automated alerting for new certificate issuances on client domains, credential exposure in breach data feeds, and code repository secret scanning. This persistent posture means that the intelligence picture is current, not twelve months stale, and that the security team can act on findings before they become incidents.

CDA also applies OSINT outputs directly to the Vulnerability and Security Domain (VSD) by feeding discovered assets into the vulnerability management program. Subdomains discovered through OSINT that are not in the asset inventory cannot be patched, monitored, or decommissioned if the security team does not know they exist. OSINT closes that gap systematically.

---

Key Takeaways

Run a certificate transparency query against your primary domain on crt.sh monthly and add every discovered subdomain to your official asset inventory for vulnerability scanning and monitoring.
Search your organization's name, primary domain, and key employee email addresses against Have I Been Pwned and Dehashed at least quarterly; any exposed credentials must be force-reset and accounts audited for unauthorized access.
Configure GitHub secret scanning alerts or deploy a tool like TruffleHog against all repositories associated with your organization's GitHub organization account; treat any exposed secret as compromised and rotate it immediately, regardless of whether the repository is public or private.
Conduct a Shodan and Censys query against your registered IP ranges before every quarterly vulnerability management cycle to identify internet-facing services that are not in your documented inventory, particularly administrative interfaces and outdated software versions.
Brief your social media and HR teams on the OSINT value of job postings: postings that enumerate specific internal tools, software versions, and infrastructure platforms give adversaries an accurate technology inventory at no cost; audit job descriptions for operational security before publication.

---

External Attack Surface Management (EASM)
Threat Intelligence Platforms and Feeds
Credential Exposure Monitoring and Response
Subdomain Enumeration and DNS Reconnaissance
Social Engineering and Pretexting: Technical Foundations
Dark Web Monitoring and Underground Forum Intelligence

---

Sources

MITRE ATT&CK. "Reconnaissance, Tactic TA0043." MITRE Corporation. https://attack.mitre.org/tactics/TA0043/

National Institute of Standards and Technology. "Guide to Cyber Threat Information Sharing." NIST Special Publication 800-150. https://csrc.nist.gov/publications/detail/sp/800-150/final

CIS Controls Version 8. "Control 7: Continuous Vulnerability Management." Center for Internet Security. https://www.cisecurity.org/controls/v8

NIST. "Cybersecurity Framework 2.0, Identify Function: Asset Management and Risk Assessment." National Institute of Standards and Technology. https://www.nist.gov/cyberframework

MITRE ATT&CK. "Gather Victim Identity Information, Technique T1589." MITRE Corporation. https://attack.mitre.org/techniques/T1589/

Table of Contents

Definition and Scope

How It Works

Why It Matters

CDA Perspective

Key Takeaways

Sources

Related CDA Missions

Related Articles

AWS Security Hub

HashiCorp Vault Assessment

Wireshark Network Analysis

Discussion

The Armory