detection-engineering: CDA.Wiki (Print)

# Detection Engineering

Definition

Detection engineering is the discipline of systematically designing, building, testing, and maintaining the rules and logic that cause a security system to alert when an attacker is present. It is not a product category. It is not a job title alone. It is a repeatable engineering practice applied to the problem of finding adversary behavior inside an environment.

The phrase matters because it draws a sharp line between passive monitoring and active, maintained detection. An organization that deploys a SIEM and imports a vendor rule pack on day one and never touches it again is monitoring. An organization that treats detection rules as code, runs them through a review process, tests them against real and synthetic data, measures their accuracy, and retires them when they become irrelevant is doing detection engineering.

The discipline emerged from frustration with the state of commercial detection out of the box. Vendor-supplied rules are written to minimize false positives across a generic customer base, which means they are tuned for breadth, not depth. An organization facing a specific adversary, using a specific set of tools and techniques, needs detection logic written to match that adversary, not a default rule set designed for the average enterprise.

Detection engineering is the operational core of TID (Threat Intelligence and Defense), CDA's atmosphere layer. The atmosphere filters what gets through. Detection rules define the filter. If the rules are weak, stale, or uncorrelated with real threat actor behavior, the atmosphere becomes transparent and adversaries pass through without generating a single alert.

How It Works

The Detection Rule Lifecycle

Every detection rule follows the same lifecycle. Skipping any stage reduces rule quality and increases analyst burden.

Hypothesis. A detection starts with a behavioral question: what does this technique look like in telemetry? The analyst or engineer identifies a specific adversary behavior, maps it to a MITRE ATT&CK technique, and asks what evidence that behavior leaves in logs. PowerShell spawning from a Word process is a hypothesis. Scheduled task creation from a non-standard directory is a hypothesis. The hypothesis drives everything downstream.

Development. The engineer writes the rule in the native query language of the SIEM or detection platform. Splunk uses SPL (Search Processing Language). Elastic uses DSL (Domain-Specific Language). Microsoft Sentinel uses KQL (Kusto Query Language). At this stage, many teams also write the rule in Sigma, a vendor-neutral format that compiles to any of these targets (covered below).

Testing. The rule must be validated against real data before it reaches production. Testing has two components: true positive validation (does the rule fire against a known-bad sample?) and false positive assessment (does the rule fire on normal, benign activity?). Testing against synthetic data (generated by an attack simulation tool like Atomic Red Team, which maps directly to ATT&CK techniques) provides repeatable results. Testing against a production log sample reveals environment-specific noise.

Deployment. Validated rules move to production SIEM. Deployment through a pipeline, not manual copy-paste, ensures auditability. Every deployed rule has a version, an author, and a timestamp.

Tuning. Live production rules generate data: alert volume, false positive rate, analyst close rate, and mean time from rule fire to analyst disposition. Tuning uses this data to tighten rule logic, add suppression for known benign patterns, and improve precision without sacrificing recall. Tuning is continuous, not a one-time activity.

Retirement. Rules that no longer detect relevant behavior, or that have been superseded by better rules, should be retired. Unmaintained rules accumulate into dead weight that consumes compute and clutters alert queues without providing detection value.

Detection-as-Code

Detection-as-code (DaC) applies software engineering discipline to detection rule management. Rules live in a git repository. Changes go through pull request review before deployment. Automated testing runs on every commit. The pipeline deploys passing rules and blocks failing ones. Rollback is a git revert.

The benefits extend beyond process. Version history shows who changed what rule, when, and why. PR comments capture the reasoning behind tuning decisions. Branching allows rule development without disrupting production. Automated testing catches regressions before they reach analysts.

Teams that have not implemented detection-as-code typically have: rules of unknown authorship, no record of when or why a rule was changed, no testing before deployment, and no rollback capability when a bad rule floods the SOC with false positives. The blast radius of a poorly written rule in a non-DaC environment is measured in analyst hours lost per day.

Sigma Rules: Vendor-Neutral Detection Logic

Sigma is an open format for detection rules that compiles to SIEM-specific query languages. A Sigma rule describes a behavior in YAML: what log source to query, what fields to match, what conditions to combine. The Sigma compiler (pySigma or sigmaHQ) converts the rule to SPL, KQL, Elastic DSL, Chronicle YARA-L, or other targets.

The operational value is portability. A detection engineering team maintaining 500 Sigma rules can migrate SIEM platforms without rewriting the entire rule library. A rule shared by a threat intelligence vendor in Sigma format deploys in any environment without manual translation. Sigma rules can be version-controlled, shared between organizations, and published to community repositories. The SigmaHQ GitHub repository maintains thousands of community-contributed rules mapped to ATT&CK techniques.

Detection engineering teams that skip Sigma and write rules only in native query languages create a single-platform dependency. When the SIEM changes, the rule library does not survive.

ATT&CK-Mapped Detection Coverage

MITRE ATT&CK is the standard taxonomy for adversary tactics, techniques, and sub-techniques. Detection coverage measurement asks: for each technique in the ATT&CK matrix, do we have a rule that detects it?

Coverage gaps are hunting priorities. If a team has no detection for T1055 (Process Injection), an attacker using any of the 15 process injection sub-techniques will pass through the SIEM without generating an alert. The gap in detection coverage is a gap in the atmosphere. The team either builds a rule to close it, or accepts the risk explicitly.

Measuring ATT&CK coverage requires mapping each production rule to the technique(s) it detects. Tools like Atomic Red Team (test execution framework) and ATT&CK Navigator (coverage visualization) make this practical. The output is a heat map of the ATT&CK matrix showing which techniques have detection coverage, which have none, and which have high-confidence rules versus low-confidence ones.

Coverage metrics that detection engineering teams should track:

ATT&CK sub-technique coverage percentage
Rules per log source category
Mean time from new technique publication (in ATT&CK or threat intel) to deployed rule
False positive rate per rule (false positives per 1,000 alerts)
Alert volume trend per rule over 30/60/90 days

False Positive Management

The single greatest threat to SOC effectiveness is alert fatigue caused by high false positive rates. An analyst who closes 300 false positives per shift will start closing alerts without investigating them. When a real attack arrives, it gets closed as fast as the false positives before it.

False positive management is not optional tuning done after a rule is working. It is a first-class quality requirement. A rule that fires correctly on true positives but generates unacceptable false positive volume is a bad rule. The acceptable false positive rate depends on the environment: a rule that fires twice per day on benign activity in a 10,000-endpoint enterprise is different from one that fires 100 times per day.

Common false positive reduction techniques: baseline exclusions (exclude known-good processes, users, and hosts), threshold-based suppression (only alert after N occurrences in M minutes), enrichment-based filtering (suppress if the source IP is in a threat intelligence allowlist), and time-based suppression (exclude alerts generated during known maintenance windows).

Cross-Domain Telemetry Dependencies

Detection rules only work when the underlying telemetry exists. A rule written to detect lateral movement via Windows Event ID 4624 (Logon Type 3) does nothing if Windows Security logs are not forwarded to the SIEM. A rule for detecting suspicious outbound DNS queries does nothing if DNS logs are not ingested.

This is the SPH (Security Posture and Hygiene) dependency that detection engineers must manage. Log source coverage, what is sending logs and what is not, is a terrain-layer problem that directly limits the atmosphere layer's ability to detect. VSD (Vulnerability and Surface Defense) also contributes: vulnerability context enriches alerts, allowing rules to prioritize detections on assets with known critical vulnerabilities.

IAT (Identity Access and Trust) provides identity telemetry: authentication events, privilege escalation indicators, and anomalous access patterns. Some of the highest-fidelity detections for insider threat and account compromise come from identity log sources (Azure AD, Okta, CyberArk). A detection engineering program that ignores identity telemetry has a permanent blind spot in the civilization layer below the atmosphere.

Why It Matters

Detection engineering failures are not abstract. They show up in breach timelines. The average attacker dwell time in a breached environment, before detection, was 16 days in Mandiant's 2023 M-Trends report. Organizations with mature detection engineering programs, with maintained ATT&CK-mapped rule libraries and active tuning, detect adversaries in hours, not weeks.

CISOs care about detection engineering because it determines the mean time to detect (MTTD). MTTD drives mean time to respond (MTTR). MTTR determines how much of the environment an attacker can compromise before containment. A 16-day dwell time gives an attacker time to establish persistence across dozens of systems, exfiltrate terabytes, and set up ransomware staging. A 6-hour dwell time gives them almost nothing.

SOC analysts care about detection engineering because alert quality directly determines their working conditions. Analysts drowning in false positives cannot do real threat hunting. Organizations that treat detection rules as set-and-forget produce SOCs where analysts are primarily false positive processors, not threat investigators.

A common misconception is that buying more tools solves the detection problem. A SIEM without well-engineered rules is an expensive log storage system. An EDR with default alerting only fires on known-bad signatures. Detection engineering is what turns those tools into an actual detection capability.

CDA Perspective

Detection engineering is the operational heart of TID (Threat Intelligence and Defense). CDA's PDI (Predictive Defense Intelligence) methodology applies here directly: "See the threat before it sees you." The word "before" is the engineering challenge. You cannot see the threat before it sees you if your detection rules are written for generic threats rather than the specific actors targeting your industry.

CDA's approach under PDI maps detection rules to the specific threat actor profiles relevant to each client's sector. A manufacturing organization faces different TTPs than a healthcare system. The detection library is built around the adversary's known playbook, not a vendor's generic signature set. ATT&CK serves as the shared vocabulary for this mapping.

Two TOP missions are directly relevant here. TID-B01 covers SIEM deployment and the initial rule set: getting the right log sources connected, establishing baseline detection coverage, and implementing the detection-as-code pipeline. TID-H01 covers detection rule tuning and ATT&CK coverage expansion: measuring current coverage, identifying gaps, writing rules for uncovered techniques, and reducing false positive rates to acceptable thresholds. Together, these missions build an atmosphere that actually filters adversary behavior.

On The Shield diagnostic, weak detection engineering shows up as low scores in the TID ring. The SPH ring score also reflects log source coverage, making SPH a necessary input to any TID improvement program. A client with a strong SPH score (good log coverage across endpoints, network, and identity) has the raw material to build strong TID detection. A client with weak SPH coverage cannot build reliable TID detection regardless of rule quality.

The PDM cross-domain dependency is explicit: TID detection only works when SPH telemetry is healthy, IAT identity logs are feeding the SIEM, and VSD vulnerability context is available for enrichment. Detection engineering is not a standalone TID activity; it is a cross-domain program.

Key Takeaways

Detection engineering treats detection rules as code: version-controlled, reviewed, tested, deployed through a pipeline, and maintained over time. Ad-hoc rule management produces alert fatigue and coverage gaps.
Sigma rules provide vendor-neutral portability. Writing detection logic in Sigma first, then compiling to SIEM-specific query languages, prevents single-platform lock-in and enables sharing across teams and tools.
ATT&CK-mapped coverage measurement turns detection quality from a feeling into a metric. Coverage gaps are hunting priorities. Mean time from technique publication to deployed rule is a performance indicator.
False positive rate is a first-class rule quality metric, not an afterthought. High false positive rates cause alert fatigue, which causes real threats to go undetected.
Detection rules depend on telemetry. Log source coverage is an SPH dependency that limits TID effectiveness. Detection engineering and posture hygiene must be managed together.

SIEM Architecture
EDR (Endpoint Detection and Response)
Threat Hunting
MITRE ATT&CK Framework
Behavioral Analytics

Sources

MITRE Corporation. MITRE ATT&CK Framework. https://attack.mitre.org/

SigmaHQ. Sigma: Generic Signature Format for SIEM Systems. https://github.com/SigmaHQ/sigma

Mandiant. M-Trends 2023 Special Report. Mandiant, 2023. https://www.mandiant.com/m-trends

Red Canary. Atomic Red Team. https://github.com/redcanaryco/atomic-red-team

CDA, LLC. Planetary Defense Model Master Reference. CDA Canon, 2026.

Detection Engineering