NetFlow vs sFlow Analysis

NetFlow vs sFlow Analysis | CDA.Wiki | CDA.Wiki

# NetFlow vs sFlow Analysis

Network telemetry protocols exist because security teams cannot defend what they cannot observe. NetFlow and sFlow solve the fundamental problem of network blindness by exporting traffic metadata from routers, switches, and firewalls to centralized collectors where analysts can reconstruct what happened, when it happened, and between which systems. Neither protocol captures full packet payloads, which makes them lightweight enough for production deployment across high-volume infrastructure. Choosing between them is not a matter of preference but of operational fit: the protocol that matches your hardware constraints, traffic volume, retention requirements, and detection use cases will produce better security outcomes than a theoretically superior protocol deployed incorrectly or incompletely.

---

Definition and Scope

NetFlow is a network telemetry protocol developed by Cisco in the mid-1990s that records metadata for every IP flow passing through a network device. A flow is defined by a five-tuple: source IP address, destination IP address, source port, destination port, and IP protocol. NetFlow v5 exports fixed-format records; NetFlow v9 introduced flexible templates; and IPFIX (IP Flow Information Export), standardized by the IETF in RFC 7011, extended those templates further and became the vendor-neutral successor to NetFlow v9. When security professionals say "NetFlow," they often mean any of these variants interchangeably.

sFlow, defined in RFC 3176 and later RFC 5476, is a statistical sampling protocol. It does not track every flow. Instead, it samples one packet out of every N packets (where N is configurable) and forwards a copy of that packet header, along with interface counter statistics, to a collector. The sampling ratio is explicit and built into the protocol, allowing analysts to scale the reported numbers to estimate actual traffic volumes.

These protocols are not packet capture. They do not record application-layer payloads, TLS session contents, or DNS query strings unless additional metadata export is explicitly configured. They are also not SNMP. SNMP polls devices for counter data on a schedule; flow protocols push per-flow or per-sample records asynchronously as traffic occurs. IPFIX and sFlow are complementary to endpoint telemetry (EDR), not replacements. Security teams sometimes incorrectly assume flow data alone will catch everything; it will not catch encrypted command-and-control traffic that blends into normal HTTPS volume without additional behavioral baselining.

---

How It Works

NetFlow mechanics: When a packet arrives at a NetFlow-enabled router or switch, the device checks whether an active flow entry already exists for that five-tuple in its flow cache. If yes, it increments the byte and packet counters for that entry. If no, it creates a new cache entry. The flow record remains active until one of several expiration conditions is met: the TCP FIN or RST flag signals connection termination, an idle timeout expires (typically 15 seconds of no traffic), or an active timeout forces export of long-running flows (typically 30 minutes). When a flow expires, the device sends the record to a configured collector via UDP (port 2055 for NetFlow, port 4739 for IPFIX) or SCTP. The exported record contains the five-tuple, byte count, packet count, start and end timestamps, TCP flags observed across the entire flow, Type of Service (ToS) byte, and input and output interface indexes.

NetFlow v9 and IPFIX add configurable templates. A network engineer can define a template that exports additional fields: VLAN IDs for segmentation analysis, BGP autonomous system numbers for peering visibility, MPLS label stacks for service provider environments, and sampled flow direction (ingress versus egress). This template flexibility is why IPFIX became the IETF standard and why most modern security deployments prefer it over v5.

Modern IPFIX implementations can export application-layer metadata beyond the basic five-tuple. Deep Packet Inspection (DPI) modules in enterprise-class routers can extract HTTP hostnames, SSL certificate subject names, DNS query names, and even file transfer metadata (FTP, SMB) into flow records. This bridges the gap between basic connection metadata and application visibility without requiring full packet capture. A security analyst investigating suspicious TLS traffic can see in IPFIX records that connections to IP address 203.0.113.45 were actually attempts to reach "malicious-domain.example.com" based on the Server Name Indication (SNI) field extracted during the TLS handshake.

sFlow mechanics: sFlow operates at the hardware level on the switch ASIC. The sampling engine counts packets and, every N packets, copies the packet header (typically the first 128 bytes) along with the current interface counter values into an sFlow datagram. That datagram is sent via UDP to the sFlow collector (default port 6343). Because the switch is not maintaining per-flow state, there is no flow cache, no memory pressure, and no CPU overhead proportional to the number of concurrent connections. This makes sFlow practical on 10 Gbps, 40 Gbps, and 100 Gbps interfaces where maintaining a full flow cache would require dedicated hardware or cause performance degradation.

The packet sampling in sFlow is statistically representative but not comprehensive. At a 1:2000 sampling ratio, sFlow will see one packet out of every 2,000. For a DDoS attack generating 10 million packets per second, sFlow will export 5,000 samples per second, which is more than sufficient to detect the attack and characterize its source distribution. For a low-and-slow exfiltration event generating 50 packets per minute, sFlow at 1:2000 will statistically observe approximately one sample every 40 minutes, which may be insufficient for reliable detection without tuning the sampling ratio much lower.

Modern sFlow implementations compensate for this statistical limitation through adaptive sampling and extended metadata. Some switches can dynamically adjust sampling rates based on interface utilization: normal sampling at 1:10000 during business hours, but increased sampling at 1:1000 when anomalous traffic patterns are detected. Extended sFlow includes not just packet headers but also 802.1Q VLAN tags, 802.11 wireless metadata, and even application performance metrics from the switch's built-in DPI engine.

Collector and analysis pipeline: Both protocols require a collector to receive, parse, and store records. Open-source options include nfdump with nfcapd for NetFlow/IPFIX, sflowtool and Host sFlow for sFlow data, and the Elastic Stack with purpose-built flow ingest pipelines. Commercial platforms include Cisco Secure Network Analytics (formerly Stealthwatch), Kentik Detect, Plixer Scrutinizer, and SolarWinds NTA. The collector normalizes records from multiple vendor formats, applies geo-IP enrichment and ASN lookups, and feeds data into a SIEM or data lake for correlation.

The analysis pipeline typically operates in three stages. Real-time processing applies immediate threat intelligence (known-bad IP addresses, malicious domains) and generates alerts for high-confidence indicators. Near-real-time processing (5-15 minute windows) performs baseline deviation analysis: a host that normally sends 100 MB/hour outbound and suddenly sends 2 GB triggers an anomaly alert. Historical analysis runs daily or weekly to identify long-term trends, map network topology changes, and refine baseline models.

Storage and retention strategies differ significantly between NetFlow and sFlow due to volume differences. A typical enterprise might generate 50,000 NetFlow records per second during business hours, consuming approximately 200 MB/hour when compressed. The same network monitored with sFlow at 1:5000 sampling might generate 80,000 sFlow records per second but each record contains more metadata, resulting in 400 MB/hour compressed. The counterintuitive result is that sFlow, despite sampling, can consume more storage than NetFlow because each sample includes complete packet headers rather than just flow summaries.

---

Why It Matters

Without flow telemetry, network security monitoring is reactive and incomplete. Firewall logs show what was blocked; flow data shows what was allowed and what it did afterward. This distinction is critical for detecting threats that bypass perimeter controls: compromised credentials used for legitimate VPN access, insider threats moving data across internal segments, malware that establishes outbound connections on permitted ports, and supply chain compromises that communicate through authorized cloud services.

The practical consequence of operating without flow data is extended dwell time. The 2023 Mandiant M-Trends report documented that organizations with mature network telemetry programs detect intrusions an average of 16 days faster than those relying solely on endpoint detection. Network-level visibility catches attacker behavior that endpoint agents miss: lateral movement between systems where no EDR is deployed (industrial control systems, embedded devices, legacy servers), data staging on network file shares accessed from multiple source IPs, and command-and-control beaconing from systems where the EDR was disabled or bypassed through living-off-the-land techniques.

Flow data also provides the temporal precision necessary for incident reconstruction. When a security team discovers that an internal server was compromised three weeks ago, endpoint logs may have been rotated and memory artifacts are long gone. Flow records, retained for 90 days, can reconstruct exactly which internal systems the compromised server contacted, when those contacts occurred, how much data was transferred, and whether the communication patterns match known attack frameworks. This capability transforms incident response from damage assessment ("what did they take?") to tactical countermeasures ("where else are they operating?").

A common misconception is that sFlow is less security-relevant than NetFlow because it samples rather than records everything. Security practitioners often assume that missing 99.9% of packets (at 1:1000 sampling) means missing 99.9% of security events. This assumption conflates packet-level visibility with event-level visibility. Most security events (malware communication, data exfiltration, lateral movement) involve hundreds or thousands of packets over time periods measured in minutes or hours. The statistical probability that sFlow will miss an event entirely approaches zero as the event duration increases. A 10-minute data transfer session will generate multiple sFlow samples even at 1:10000 sampling, providing sufficient visibility to detect, characterize, and investigate the event.

Another misconception treats flow data as inherently noisy and requiring extensive tuning to be useful. Organizations that struggle with flow data volume typically make three configuration errors: enabling flow export on every interface without regard for analytical value, failing to aggregate data appropriately for their use cases, and attempting to alert on raw flow records rather than behavioral deviations. Properly configured flow analysis focuses on communication patterns (which hosts talk to which other hosts), volume deviations (sudden increases in data transfer), and temporal anomalies (connections during unusual hours) rather than individual flow records.

---

CDA Perspective

The Cyber Defense Advancement framework approaches network telemetry through the Threat Intelligence and Defense (TID) domain, applying Predictive Defense Intelligence (PDI): see the threat before it sees you. Flow analysis is not treated as a reactive forensic tool but as a continuous, forward-looking intelligence feed that informs defender decisions before an attacker achieves their objective.

CDA methodology structures flow analysis around three operational layers that build predictive capability. First, perimeter flow analysis establishes organizational communication baselines: which external ASNs receive traffic from internal networks, which cloud services are authorized, and what the normal volume and timing patterns look like for each category of external communication. Deviations from these baselines trigger automated enrichment before human analysis: geolocation checks, passive DNS history, threat intelligence correlation, and ASN reputation scoring. This automated pre-processing reduces analyst triage time and ensures that genuine anomalies receive immediate attention.

Second, east-west flow analysis maps internal network segments and enforces expected communication patterns. CDA builds segment-to-segment communication matrices during an initial characterization period: database servers that communicate with application servers, workstations that access file servers, management networks that connect to infrastructure devices. Any communication outside these established patterns generates an alert, but more importantly, the matrices are updated continuously as legitimate architecture changes occur. This creates a living model of network topology that detects unauthorized access attempts and lateral movement in real time.

Third, flow data is correlated with identity and endpoint telemetry to build multi-source detection logic. A single flow anomaly might be a false positive; a flow anomaly combined with an off-hours authentication event and unusual process execution on the destination host indicates coordinated attack activity. CDA structures these correlations as detection rules that trigger during the reconnaissance and initial access phases of an attack, not after data exfiltration or system impact has occurred.

What distinguishes CDA's approach is treating flow data as intelligence rather than logging. Intelligence drives proactive action; logs support reactive investigation. CDA methodology requires flow baselines to be reviewed on a defined cycle (weekly for perimeter patterns, daily for internal segment matrices) and anomaly detection rules to be updated based on current threat intelligence. This operational discipline ensures that flow analysis remains aligned with evolving attack techniques rather than detecting only historical threat patterns.

---

Key Takeaways

Deploy IPFIX (NetFlow v9 or later) on internet-facing edges and internal segments hosting sensitive data; deploy sFlow on high-speed core interfaces where per-flow state tracking would degrade forwarding performance or exceed memory capacity.
Baseline egress flow volumes per destination ASN and protocol, then alert when any internal host exceeds two standard deviations from its 30-day average outbound volume, regardless of the specific application or service being used.
Configure sFlow sampling ratios proportional to interface speed: 1:1000 for 1 Gbps links, 1:5000 for 10 Gbps links, and 1:20000 or lower for 100 Gbps links to maintain manageable collector load while preserving statistical detection capability.
Retain compressed flow data for 90 days minimum to support incident timeline reconstruction; this retention period covers the median dwell time for advanced persistent threats and provides sufficient data for baseline establishment.
Correlate flow anomalies with identity authentication logs and endpoint process execution events before escalating to incident response; multi-source detection significantly reduces false positive rates while improving detection confidence.

---

Sources

IETF RFC 7011 — "Specification of the IP Flow Information Export (IPFIX) Protocol for the Exchange of Flow Information." Internet Engineering Task Force. https://datatracker.ietf.org/doc/html/rfc7011

IETF RFC 5476 — "Packet Sampling (PSAMP) Protocol Specifications." Internet Engineering Task Force. https://datatracker.ietf.org/doc/html/rfc5476

NIST Special Publication 800-94 — "Guide to Intrusion Detection and Prevention Systems (IDPS)." National Institute of Standards and Technology. https://csrc.nist.gov/publications/detail/sp/800-94/final

Mandiant. "M-Trends 2023: A View from the Front Lines." FireEye, Inc., 2023. https://www.mandiant.com/m-trends

MITRE ATT&CK Framework — "Network Service Discovery" (T1046) and "Data from Network Shared Drive" (T1039). MITRE Corporation. https://attack.mitre.org/

Table of Contents

Definition and Scope

How It Works

Why It Matters

CDA Perspective

Key Takeaways

Sources

Related CDA Missions

Related Articles

Format-Preserving Encryption

HTTP/2 Security

Certificate Transparency Logs

Discussion

The Academy

The Command Post

The Armory