httpx

# httpx

httpx is a fast, multi-purpose HTTP probing toolkit developed by ProjectDiscovery that transforms raw lists of hosts, IP addresses, and CIDR ranges into structured, actionable data about live web services. It exists because asset discovery tools produce thousands of candidate targets, but raw hostnames tell defenders and testers almost nothing about what is actually running. httpx fills that gap by performing rapid HTTP and HTTPS requests at scale, extracting response metadata, fingerprinting technologies, and filtering results so that downstream tools and analysts work only with confirmed, live, characterized targets. Without a probing layer between discovery and scanning, security teams waste enormous time and compute resources against hosts that return nothing meaningful.

---

Definition and Scope

httpx is an open-source command-line tool written in Go, maintained by ProjectDiscovery as part of their reconnaissance toolkit ecosystem. Technically, it is an HTTP probing engine: it accepts a list of potential web-facing assets and determines which ones are actually serving HTTP or HTTPS traffic, then extracts structured metadata from those responses.

httpx is not a vulnerability scanner. It does not attempt to exploit weaknesses or deliver payloads. It is not a web crawler or spider; it does not follow links or map site structure. It is not a brute-force tool for directories or files. Confusing httpx with tools like Nikto, Burp Suite, or dirb is a common error. Those tools operate on already-confirmed targets; httpx is what confirms and characterizes targets so those tools can function efficiently.

httpx is also distinct from the Python library of the same name (a Python HTTP client library). The ProjectDiscovery httpx and the Python httpx share only a name; they serve entirely different purposes and communities.

Within the ProjectDiscovery ecosystem, httpx sits between subfinder or amass (which produce raw subdomains and IPs) and nuclei (which performs template-based vulnerability detection). It is the validation and enrichment layer. A practitioner running a full recon pipeline without httpx is essentially asking nuclei to scan thousands of dead or irrelevant hosts, which inflates scan time and degrades signal quality.

Variants and modes within httpx include passive probing (extracting data without aggressive interaction), authenticated probing (using custom headers or cookies), and screenshot-enabled probing when paired with the appropriate flags, which invokes a headless browser to capture visual evidence of live services. The tool also supports probe-only modes focused solely on TLS certificate extraction or favicon hashing for rapid technology fingerprinting at scale.

---

How It Works

httpx accepts input through standard input (stdin), file arguments, or direct host/URL arguments. That input can include bare hostnames, IP addresses, CIDR notation blocks, or full URLs with explicit ports. The tool normalizes all inputs into request targets, automatically appending HTTP and HTTPS schemes and probing both where applicable.

Connection and Protocol Handling

httpx establishes TCP connections using Go's native net/http stack with configurable concurrency. Default thread counts can be raised to several hundred concurrent connections for large-scale probing. The tool negotiates TLS automatically, handles HTTP/1.1 and HTTP/2, and follows redirects up to a configurable limit. This redirect-following behavior is important because many assets return 301 or 302 responses that point to the real application; capturing only the redirect response without following it produces incomplete data.

Timeout values, retry counts, and connection limits are all tunable. In environments with unstable or rate-limited infrastructure, practitioners often lower concurrency and raise timeout thresholds to avoid false negatives from transient failures.

Metadata Extraction

Once httpx receives a response, it extracts a configurable set of metadata fields from that response. The full list includes: HTTP status code, content length, content type, page title (parsed from the HTML title tag), web server header, all response headers, the response body (optionally), TLS certificate details including subject, issuer, SANs (Subject Alternative Names), and expiration dates, CNAME chain, CDN provider identification, favicon hash (using the Shodan-compatible MurmurHash3 algorithm), JARM fingerprint (a TLS fingerprint that identifies the server-side TLS stack), and technology stack identification powered by Wappalyzer signature matching.

Each of these data points serves a distinct purpose. JARM fingerprints help identify specific web server implementations or load balancers even when banners are suppressed. Favicon hashes enable pivoting in Shodan or Censys to find other assets running the same software. TLS SANs reveal related domains that may not have appeared in subdomain enumeration. CDN identification informs which hosts can be attacked directly versus those sitting behind WAF protection.

Filtering and Output

httpx supports powerful content-based filtering that controls which results are written to output. Practitioners can include or exclude results based on: HTTP status codes or ranges, content length (exact, minimum, or maximum), response body strings or regex patterns, page titles, and web server headers. This filtering capability is operationally significant. In a scan of 50,000 subdomains, the raw result set might include thousands of parked domains returning 200 status codes with generic parking page content. A single content-length filter or body-string exclusion drops those results and leaves only meaningful targets.

Output formats include plain text (one result per line), JSON (full structured metadata per result), and CSV. JSON output is the most commonly used in automated pipelines because downstream tools and SIEMs can parse it directly.

A Concrete Pipeline Scenario

Consider a security team performing continuous external attack surface management for a financial services organization with 200 known root domains. The pipeline runs as follows:

subfinder and amass enumerate subdomains across all 200 root domains, producing a raw list of approximately 85,000 candidate subdomains.
That list pipes directly into httpx with the following flags: status code filter allowing 200, 301, 302, 403, 500; JARM fingerprinting enabled; technology detection enabled; title extraction enabled; JSON output to a file.
httpx completes the probe in under 30 minutes using 200 concurrent threads. Of 85,000 inputs, approximately 12,000 return meaningful HTTP responses.
The JSON output feeds into a parsing script that flags any host with a JARM signature matching known outdated Apache or IIS versions, any host with TLS certificates expiring within 14 days, and any host returning server headers disclosing internal software versions.
Those flagged hosts pass to nuclei for targeted vulnerability scanning.

Without httpx in that chain, nuclei would attempt to scan all 85,000 hosts, wasting hours of compute time and generating thousands of false negatives against dead hosts.

---

Why It Matters

The security impact of httpx is primarily about signal quality and operational speed. External attack surface management is only as useful as the data it produces. A reconnaissance pipeline that cannot distinguish live, characterized targets from dead or irrelevant hosts forces analysts to manually validate results, which eliminates the scale advantage that automated tooling is supposed to provide.

Reduction of Reconnaissance Blind Spots

One of the most consistent findings in post-incident reviews of major breaches is that the compromised asset was known to exist but was not being monitored. Forgotten subdomains, staging environments left public, and acquisition-era infrastructure that was never decommissioned all fall into this category. httpx, run continuously against the full known asset inventory, surfaces these forgotten assets before attackers do. When httpx returns an unexpected 200 from a host that should have been decommissioned six months prior, that is an immediate remediation priority.

The Real Cost of Skipping Probing

In 2020 and 2021, subdomain takeover attacks increased significantly as organizations accelerated cloud migrations and left dangling DNS records pointing to deprovisioned services. Tools like httpx can detect the signatures of takeover-vulnerable assets (for example, CNAME records pointing to unclaimed cloud service endpoints that return specific error messages) when configured with appropriate filters. Organizations that ran only passive DNS enumeration without an active probing layer missed these vulnerable conditions entirely until attackers exploited them.

A well-documented consequence pattern: an organization enumerates subdomains quarterly and relies only on DNS records to assess exposure. A subdomain created for a marketing campaign continues to have a live DNS record after the campaign ends, but the underlying cloud storage bucket or CDN endpoint is deleted. httpx, probing that subdomain, would return the cloud provider's "bucket not found" or "service not configured" error, which is a fingerprint of takeover vulnerability. Without that active probe, the dangling record sits undetected.

Common Misconception

A frequent misconception is that httpx is only useful for offensive security or red team work. In practice, it is equally valuable for defensive asset management. Any organization running continuous external monitoring benefits from the same metadata extraction and filtering that penetration testers use to scope engagements. The difference is in how results are consumed: attackers look for weaknesses; defenders look for unauthorized or unexpected changes from a baseline.

---

CDA Perspective

CDA approaches httpx through the Planetary Defense Model's Vulnerability Surface Discovery (VSD) domain. VSD is the systematic process of identifying, characterizing, and quantifying every externally accessible surface that an organization exposes. httpx is a primary operational instrument within VSD because it produces the enriched, structured data that makes surface reduction decisions possible.

CDA's methodology, Continuous Surface Reduction (CSR), operates on the principle that every surface you expose is a surface we eliminate. httpx is not run once per quarter as part of a periodic assessment. CDA runs httpx continuously against client asset inventories on defined cadences, typically daily for high-priority asset classes and weekly for lower-priority ranges. The output is not delivered as a report; it feeds directly into CDA's surface inventory database, where each result is compared against the previous run to identify new assets (unexpected additions), changed assets (status code changes, header changes, certificate changes), and missing assets (hosts that were previously live and are now returning nothing, which can indicate unauthorized decommissioning or a takeover condition).

What CDA does differently is correlation. Most teams run httpx and look at the output in isolation. CDA correlates httpx results with threat intelligence feeds, Shodan and Censys data, and internal asset management records. A host that httpx identifies as running a specific Apache version, combined with a Shodan record showing that version has been exposed for 180 days, combined with a MITRE ATT&CK technique mapping to known exploitation of that Apache version, produces an immediate prioritized action item rather than a line in a spreadsheet.

CDA also applies httpx's favicon hash and JARM fingerprint outputs to pivot: when an unexpected asset fingerprint appears in a client's scan, CDA checks whether the same fingerprint appears on other hosts in the client's IP space that were not in the known asset inventory. This catches shadow IT and unauthorized deployments that never made it into CMDB records.

The operational output from CDA's httpx-based VSD process is a continuously updated, enriched asset surface map that drives remediation queues rather than periodic reports.

---

Key Takeaways

Run httpx as a continuous pipeline stage between subdomain enumeration and vulnerability scanning; do not treat it as a one-time reconnaissance step, because your external surface changes daily.
Configure httpx with JARM fingerprinting and favicon hash extraction enabled; these two fields enable Shodan and Censys pivoting that frequently surfaces related assets not found through DNS enumeration alone.
Use httpx's content-length and body-string filters aggressively to exclude parked domains, CDN default pages, and other low-signal responses before results reach downstream tools or analyst queues.
Parse TLS Subject Alternative Names from httpx JSON output as a secondary discovery source; SANs frequently reveal subdomains that did not appear in public DNS enumeration because they were never published in DNS datasets used by passive tools.
Baseline your httpx output and alert on deviations: a host that was returning 404 last week and is now returning 200 with a login page is a higher-priority investigation than a host that has been consistently live and unchanged.

---

Sources

MITRE ATT&CK, "Gather Victim Network Information: IP Addresses (T1590.005)," MITRE Corporation. Available at: https://attack.mitre.org/techniques/T1590/005/

NIST Special Publication 800-115, "Technical Guide to Information Security Testing and Assessment," National Institute of Standards and Technology, 2008. Available at: https://csrc.nist.gov/publications/detail/sp/800-115/final

CIS Controls v8, "Control 7: Continuous Vulnerability Management," Center for Internet Security, 2021. Available at: https://www.cisecurity.org/controls/v8

MITRE ATT&CK, "Active Scanning: Scanning IP Blocks (T1595.001)," MITRE Corporation. Available at: https://attack.mitre.org/techniques/T1595/001/

NIST Special Publication 800-137, "Information Security Continuous Monitoring (ISCM) for Federal Information Systems and Organizations," National Institute of Standards and Technology, 2011. Available at: https://csrc.nist.gov/publications/detail/sp/800-137/final

Table of Contents

Definition and Scope

How It Works

Why It Matters

CDA Perspective

Key Takeaways

Sources

Related CDA Missions

Related Articles

AWS Security Hub

HashiCorp Vault Assessment

Wireshark Network Analysis

Discussion

The Academy

The Command Post

The Armory