Security Operations Center (SOC) Design

Security Operations Center (SOC) Design | CDA.Wiki | CDA.Wiki

# Security Operations Center (SOC) Design

Definition

A Security Operations Center is the organizational function responsible for continuous monitoring, detection, investigation, and response to cybersecurity threats across an organization's environment. The SOC is where threat detection becomes operational: SIEM alerts are triaged, incidents are investigated, threats are contained, and detection capabilities are improved based on operational findings.

SOC design is not about building a room with monitors on the wall. It is about designing the operational model that staffs, equips, and sustains a detection and response capability matched to the organization's risk profile. The decisions involved are consequential: build versus buy (in-house versus outsourced), staffing model (24/7 versus business hours with on-call), technology stack (which SIEM, which EDR, which SOAR), process maturity (reactive alert triage versus proactive threat hunting), and metrics (what does success look like and how is it measured).

Most organizations cannot afford or justify a fully internal 24/7 SOC. A minimum viable 24/7 SOC requires 5 to 6 Tier 1 analysts (covering three shifts with sick day and vacation coverage), 2 to 3 Tier 2 analysts, 1 Tier 3/hunt lead, and a SOC manager. Fully loaded cost: $800,000 to $1.5 million annually for personnel alone, before technology costs. This is why the managed detection and response (MDR) market exists: outsourcing all or part of the SOC function to a specialized provider.

How It Works

SOC Models

Internal SOC. The organization builds, staffs, and operates its own SOC. Full control over detection strategy, investigation depth, and response actions. Highest cost, highest customization. Appropriate for large enterprises with significant security budgets, complex environments, and regulatory requirements that demand internal control over security operations.

Outsourced SOC (MDR/MSSP). The organization contracts a managed detection and response (MDR) provider or managed security service provider (MSSP) to monitor, detect, and (in the case of MDR) respond to threats on the organization's behalf. Lower cost (shared infrastructure and analysts across multiple customers), faster deployment (weeks rather than months), but less customization and less direct control.

CDA's position on the MSSP model is documented extensively: "We don't monitor. We operate." The conventional MSSP model is structurally misaligned (the provider profits from monitoring, not from reducing the need for monitoring). CDA's MDR model (TID-C01: Managed Detection and Response) deploys operators on missions with defined objectives and completion states, not analysts on monitoring contracts with perpetual scope.

Hybrid SOC. The organization maintains a small internal security team that handles escalations, investigations, and strategic decisions, while outsourcing Tier 1 triage and 24/7 monitoring to an external provider. The hybrid model provides 24/7 coverage without the full staffing cost of an internal SOC, while maintaining internal expertise for high-severity incidents and strategic detection decisions.

The hybrid model is the most common approach for mid-market organizations. CDA's B2B engagement model supports this: CDA provides the operational detection and response capability (TID-C01), and the client's internal team handles business context, escalation decisions, and remediation coordination.

Technology Stack

The SOC technology stack includes four primary categories:

SIEM (Security Information and Event Management). The central analysis platform that ingests, normalizes, correlates, and alerts on log data from across the environment. SIEM is the SOC's primary tool. Major platforms include Microsoft Sentinel, Splunk, Elastic Security, Chronicle (Google), and LogRhythm. SIEM selection depends on the organization's environment (Microsoft-heavy environments benefit from Sentinel's native integration), data volume (high-volume environments require platforms with efficient data handling), and budget (cloud-native SIEMs like Sentinel and Chronicle offer consumption-based pricing; traditional SIEMs like Splunk charge per data volume ingested).

EDR (Endpoint Detection and Response). Provides visibility and response capability at the endpoint level. EDR agents collect process, file, network, and registry telemetry from every managed endpoint and server. EDR enables the SOC to investigate endpoint-level activity, isolate compromised endpoints, and kill malicious processes without physical access. Major platforms include CrowdStrike Falcon, Microsoft Defender for Endpoint, SentinelOne, and Carbon Black.

SOAR (Security Orchestration, Automation, and Response). Automates repetitive SOC workflows: enriching alerts with threat intelligence, querying multiple data sources simultaneously, executing predefined response playbooks, and coordinating actions across SIEM, EDR, firewall, and identity platforms. SOAR reduces the time spent on manual, repetitive tasks and ensures consistent response execution. Platforms include Palo Alto XSOAR, Splunk SOAR, Microsoft Sentinel automation rules, and Tines.

Threat intelligence platform (TIP). Aggregates threat intelligence from multiple sources (open source, commercial, government, industry sharing groups) and integrates it into SOC workflows. IOCs from threat feeds are matched against SIEM data. Threat actor profiles inform detection rule development and hunting hypotheses. Vulnerability intelligence prioritizes patching. Major TIPs include Recorded Future, Mandiant Advantage, ThreatConnect, and MISP (open source).

SOC Processes

Technology without process produces noise, not security. The SOC's operational processes determine its effectiveness:

Alert triage. Every SIEM alert enters the triage queue. Tier 1 analysts evaluate each alert against triage criteria: is this a known false positive? Does the context match a benign pattern? Is there corroborating evidence from other sources? Triage produces three outcomes: close as false positive (documented with reason), escalate to Tier 2 for investigation (documented with preliminary analysis), or execute a predefined playbook (automated or semi-automated response for well-understood alert types).

Triage efficiency is measured by alert-to-incident ratio: the percentage of alerts that are genuine incidents requiring investigation. A healthy ratio is 5% to 15%. Below 5% suggests the detection rules are too noisy (producing excessive false positives). Above 15% suggests the rules are well-tuned or the organization is under sustained attack.

Investigation. Tier 2 analysts investigate escalated alerts: querying additional data sources for corroborating evidence, examining endpoint telemetry for process and file activity, checking threat intelligence for known indicators, and building a timeline of events. Investigation produces a determination: confirmed incident (trigger IR process), suspicious but inconclusive (monitor and re-evaluate), or false positive requiring rule tuning.

Incident response. Confirmed incidents are managed through the incident response lifecycle. The SOC's role in IR varies by organization: in some, the SOC handles the full response cycle. In others, the SOC handles detection and initial containment, then hands off to a dedicated IR team for eradication and recovery. CDA's model integrates SOC operations and IR into the TID domain: the same operators who detect the threat respond to it, providing continuity of context.

Detection engineering. The SOC continuously improves its detection capabilities. New detection rules are developed from threat intelligence, incident findings, and hunting discoveries. Existing rules are tuned to reduce false positives. Detection coverage is tracked against the MITRE ATT&CK matrix. The detection engineering cycle is what transforms a SOC from a reactive alert-processing function into a proactive defense capability.

Shift handoff. For 24/7 SOCs, shift handoff is a critical process. The outgoing shift briefs the incoming shift on active investigations, pending escalations, environmental changes, and any anomalies observed during the shift. A poor handoff loses context. A good handoff ensures continuity.

SOC Metrics

SOC effectiveness is measured by operational metrics:

Mean time to detect (MTTD). The average time from an attacker's initial activity to the SOC's detection of that activity. MTTD depends on detection rule coverage, log collection completeness, and alert processing speed. Industry benchmark (Mandiant M-Trends): median 10 days globally. CDA target for managed clients: under 24 hours.

Mean time to respond (MTTR). The average time from detection to containment. MTTR depends on investigation speed, response process efficiency, and whether the SOC has the authority and capability to execute containment actions (endpoint isolation, account suspension) without waiting for change management approval. CDA target: under 4 hours for high-severity incidents.

Alert volume and false positive rate. Total alerts generated per day and the percentage that are false positives. High alert volume with a high false positive rate indicates noisy detection rules that require tuning. CDA's detection engineering program (TID-H01) targets a false positive rate under 20%.

Detection coverage. Percentage of MITRE ATT&CK techniques for which detection rules exist. Measured through the TID-R02 (Detection Coverage Assessment). CDA targets 60%+ coverage for managed clients, prioritized by the techniques most relevant to the client's threat profile.

Analyst utilization. The percentage of analyst time spent on high-value activities (investigation, hunting, detection engineering) versus low-value activities (false positive triage, manual data enrichment, report generation). SOAR automation should shift the balance toward high-value activities. Target: over 60% of analyst time on investigation, hunting, or engineering.

Why It Matters

Detection Is the First Response

Every incident begins with detection. The faster the SOC detects the threat, the less time the attacker has to expand access, exfiltrate data, or deploy ransomware. An SOC that detects a threat in 4 hours gives the attacker 4 hours of dwell time. An SOC that detects in 10 days gives the attacker 10 days. The difference in outcomes is exponential: 4 hours of attacker activity is typically one compromised system. 10 days of attacker activity can be full domain compromise.

The Build vs. Buy Decision

The SOC build-versus-buy decision is one of the most consequential security decisions an organization makes. Building an internal SOC provides maximum control but requires sustained investment in personnel (the scarcest resource in cybersecurity), technology, and operational maturity. Buying MDR provides faster time-to-value and lower initial cost but introduces dependency on a third party for a critical security function.

CDA's model offers a third option: mission-based operations. CDA does not provide perpetual monitoring contracts. CDA deploys operators on defined missions (TID-C01: MDR, 20 hours/month steady state) with measurable objectives. The engagement produces a functioning detection and response capability that the client can eventually internalize, not a dependency that the provider profits from perpetuating.

Regulatory Expectations

Compliance frameworks increasingly expect continuous monitoring capability. NIST CSF 2.0 DE (Detect) function requires continuous monitoring and detection processes. PCI DSS Requirement 10 requires monitoring of access to network resources and cardholder data. SOC 2 CC7 (System Operations) requires monitoring processes that detect anomalies. The SEC cybersecurity disclosure rules require companies to describe their cybersecurity detection capabilities. An organization without a SOC function (internal or outsourced) cannot satisfy these requirements.

CDA Perspective

The SOC function sits in the TID (Threat Intelligence and Defense) domain of the Planetary Defense Model. TID is the atmosphere: the detection layer that identifies threats before they reach the surface. The SOC is the weather station that reads the atmospheric data, interprets the signals, and issues warnings.

CDA's Predictive Defense Intelligence (PDI) methodology reframes SOC operations from reactive (wait for alerts, process alerts) to predictive (integrate threat intelligence, hunt proactively, engineer detections for the current threat landscape). "See the threat before it sees you."

Four TOP missions define the SOC lifecycle:

TID-B01 (SIEM Deployment and Tuning): Build the detection platform. Deploy SIEM. Connect log sources. Write initial detection rules. Configure alert workflows. 40 estimated hours.
TID-R02 (Detection Coverage Assessment): Assess what the SOC can detect. Map existing rules to ATT&CK. Identify gaps. Prioritize detection engineering backlog. 16 estimated hours.
TID-H01 (Detection Engineering Program): Continuously improve detection. Develop new rules. Tune existing rules. Test with adversary simulation. 32 estimated hours.
TID-C01 (Managed Detection and Response): Operate the SOC in steady state. 24/7 monitoring, alert triage, investigation, response, and reporting. 20 estimated hours per month.

Key Takeaways

SOC design involves consequential decisions: build vs. buy, staffing model, technology stack, process maturity, and success metrics.
A minimum viable 24/7 internal SOC costs $800K to $1.5M+ annually in personnel alone. Most mid-market organizations use hybrid or fully outsourced models.
SOC effectiveness is measured by MTTD (time to detect), MTTR (time to respond), detection coverage (ATT&CK percentage), and analyst utilization (high-value vs. low-value time).
SOAR automation shifts analyst time from repetitive triage to investigation, hunting, and detection engineering.
CDA's mission-based MDR model provides detection and response capability with defined objectives, not perpetual monitoring contracts.

Sources

Mandiant (Google Cloud). "M-Trends 2024: Special Report." Mandiant, April 2024. (Dwell time and detection statistics.)
SANS Institute. "2024 SOC Survey: Operations and Effectiveness." SANS, 2024.
National Institute of Standards and Technology (NIST). "Cybersecurity Framework (CSF) 2.0: DE (Detect)." U.S. Department of Commerce, 2024.
Gartner. "Market Guide for Managed Detection and Response Services." Gartner, 2024.
MITRE Corporation. "ATT&CK Framework: Detection and Data Sources." attack.mitre.org, updated continuously.

Word count: 1,987

Table of Contents

Definition

How It Works

SOC Models

Technology Stack

SOC Processes

SOC Metrics

Why It Matters

Detection Is the First Response

The Build vs. Buy Decision

Regulatory Expectations

CDA Perspective

Key Takeaways

Sources

Related CDA Missions

Discussion

The Academy

The Command Post

The Armory

Table of Contents

Definition

How It Works

SOC Models

Technology Stack

SOC Processes

SOC Metrics

Why It Matters

Detection Is the First Response

The Build vs. Buy Decision

Regulatory Expectations

CDA Perspective

Key Takeaways

Related Articles

Sources

Related CDA Missions

Discussion

The Academy

The Command Post

The Armory