Backup and Recovery Architecture
Backup and recovery architecture is the design, implementation, and operational maintenance of systems that create copies of data and enable restoration of that data after loss, corruption, or destruction.
# Backup and Recovery Architecture
Definition
Backup and recovery architecture is the design, implementation, and operational maintenance of systems that create copies of data and enable restoration of that data after loss, corruption, or destruction. It is the last line of defense in the DPS (Data Protection and Sovereignty) domain: if every other security control fails, if the attacker breaches the perimeter, escalates privileges, and encrypts the environment, the backup is the control that determines whether the organization recovers or pays.
Backup is not a product. It is not a checkbox. It is an architecture decision with direct impact on organizational survivability. The difference between "we have backups" and "we have tested, immutable, geographically separated backups with defined recovery objectives" is the difference between an organization that survives a ransomware event and one that does not.
The Colonial Pipeline attack (2021) demonstrated backup architecture failure at national scale. The company paid $4.4 million in ransom in part because their backup restoration process was slower than decrypting from the attacker's key. The backups existed. The recovery architecture was inadequate. Existence is not the same as readiness.
How It Works
The 3-2-1-1-0 Rule
The industry-standard backup architecture follows the 3-2-1-1-0 rule, an evolution of the original 3-2-1 rule:
3 copies of data. The production copy plus at least two backup copies. A single backup is a single point of failure. Two backups provide redundancy.
2 different media types. Store backups on at least two different storage media (disk and tape, disk and cloud, local NAS and object storage). Media diversification protects against technology-specific failures: a firmware bug that corrupts one storage type does not affect the other.
1 copy offsite. At least one backup copy must be stored in a geographically separate location. A fire, flood, or ransomware event that destroys the primary data center also destroys any backups stored in the same location. Geographic separation ensures that a site-level disaster does not eliminate both production and backup simultaneously.
1 copy immutable or air-gapped. At least one backup copy must be immutable (cannot be modified or deleted, even by an administrator) or air-gapped (physically disconnected from any network). This is the ransomware-specific evolution. Modern ransomware operators specifically target backup infrastructure: they identify backup servers, compromise backup administrator credentials, and delete or encrypt backup repositories before launching the encryption payload. Immutable backups survive this attack because they cannot be modified regardless of the attacker's privilege level.
0 errors after verification. Every backup must be verified through automated integrity checks and periodic restoration testing. A backup that completes without errors in the backup log may still be unrecoverable if the data is corrupted, the format is incompatible with the recovery environment, or a critical dependency is missing. Verification is the proof that the backup works.
Recovery Objectives
Every backup architecture is designed around two metrics:
Recovery Time Objective (RTO). How quickly must the system be restored after a failure? An RTO of 4 hours means the organization expects the system to be operational within 4 hours of a disaster declaration. RTO determines the backup and recovery technology: a 4-hour RTO requires rapid restoration capabilities (disk-based backups, hot standby environments, automated failover). A 72-hour RTO can tolerate slower restoration (tape retrieval, manual rebuild).
Recovery Point Objective (RPO). How much data loss is tolerable? An RPO of 1 hour means the organization accepts losing up to 1 hour of data. RPO determines backup frequency: a 1-hour RPO requires at least hourly backups (or continuous data protection). A 24-hour RPO can tolerate daily backups.
RTO and RPO must be defined per system and per data classification tier. The payment processing system has a 4-hour RTO and a 15-minute RPO. The marketing website has a 72-hour RTO and a 24-hour RPO. The employee intranet has a 24-hour RTO and a 4-hour RPO. Applying the same RTO/RPO to every system either over-invests in non-critical systems (waste) or under-invests in critical systems (risk).
Backup Types
Full backup. A complete copy of all data. Provides the simplest recovery (restore the full backup) but consumes the most storage and takes the longest to complete. Typically run weekly or less frequently, supplemented by incremental or differential backups.
Incremental backup. Copies only the data that has changed since the last backup of any type. Fastest to complete and smallest storage footprint. Recovery requires the last full backup plus every subsequent incremental backup, making recovery slower and more complex.
Differential backup. Copies all data that has changed since the last full backup. Larger than incremental but simpler to recover (requires only the last full plus the last differential). A common middle ground between full and incremental.
Continuous data protection (CDP). Captures every change to data in real time or near-real time. CDP provides the lowest possible RPO (seconds to minutes) and enables point-in-time recovery to any moment. CDP is the standard for critical systems where data loss tolerance is minimal: databases, email systems, financial platforms.
Snapshot-based backup. Creates point-in-time copies at the storage or hypervisor level. Snapshots are fast to create and enable rapid rollback. Commonly used in cloud and virtualized environments. Snapshots are not full backups on their own (they depend on the underlying storage being intact) but are effective as a complement to traditional backup.
Immutability
Immutable backups cannot be modified, encrypted, or deleted for a defined retention period, regardless of who requests the modification. Even an administrator with full system access cannot alter an immutable backup.
Immutability is implemented through:
Object lock (cloud storage). AWS S3 Object Lock, Azure Immutable Blob Storage, and GCP Bucket Lock provide regulatory-grade immutability. Once a backup object is written with a retention policy, it cannot be deleted or overwritten until the retention period expires. Even the root account cannot override the lock during the retention period (with compliance mode).
WORM storage (on-premises). Write Once Read Many storage, whether hardware-based (purpose-built WORM appliances) or software-based (immutable storage pools in backup platforms like Veeam, Cohesity, Rubrik), provides on-premises immutability.
Air-gapped backups. Physical disconnection from the network. Tape libraries stored in offsite vaults, removable media stored in fire-rated safes. Air-gapped backups are immutable by virtue of being unreachable: an attacker who compromises the network cannot reach a backup that is not on the network.
The operational trade-off: immutable backups consume more storage (you cannot delete them until the retention period expires) and require careful retention policy design (too short and you lose immutability protection before a slow-burn attack is detected; too long and storage costs escalate). The cost is justified. Immutable backup is the control that makes ransomware payment unnecessary.
Why It Matters
Ransomware Neutralization
Ransomware's business model depends on the victim having no alternative to paying. If the victim can restore from backup, the ransom demand is irrelevant: the data is recoverable without the attacker's decryption key. Every dollar invested in backup architecture is a dollar that reduces the expected value of a ransomware attack.
Sophisticated ransomware operators understand this. They specifically target backup infrastructure: identifying backup servers during the reconnaissance phase, compromising backup administrator credentials during the privilege escalation phase, and deleting or encrypting backup repositories before launching the encryption payload. The 3-2-1-1-0 rule with immutability is the architecture that survives this targeting.
Regulatory Requirements
Backup and recovery requirements are embedded in every major compliance framework. NIST CSF 2.0 RC.RP (Recovery Planning) and PR.DS (Data Security) include backup as a core protective control. ISO 27001 A.8.13 (Information Backup) requires backup policies and tested restoration procedures. PCI DSS requires tested restoration procedures for cardholder data. HIPAA requires contingency planning including data backup and disaster recovery. SOC 2 CC9.1 (Recovery) requires demonstrated recovery capability.
Auditors do not ask "do you have backups?" They ask "show me the last backup test results, the RTO/RPO definitions per system, the immutability configuration, and the offsite storage verification." The evidence requirements are operational, not architectural.
Business Continuity
Backup is not just a cybersecurity control. It is a business continuity control. Hardware failure, software bugs, human error, natural disasters, and cloud provider outages all produce data loss that backups recover. An organization with robust backup architecture is resilient against every category of data loss, not just ransomware.
The business case is straightforward. The cost of backup infrastructure (storage, software licenses, offsite replication, testing time) is a known, budgetable expense. The cost of unrecoverable data loss is potentially unbounded: business interruption, regulatory fines, legal liability, customer attrition, and in extreme cases, organizational failure.
CDA Perspective
Backup and recovery sits in the DPS (Data Protection and Sovereignty) domain of the Planetary Defense Model. DPS is the geological core: the last layer of defense around the organization's most critical asset. Backup is the control that operates when every outer layer has failed. If VSD did not stop the attacker from entering, if SPH did not stop them from moving laterally, if IAT did not stop them from escalating privileges, if TID did not detect them before they reached the data, the backup is the DPS control that still protects the organization's ability to recover.
CDA's Sovereign Data Protocol (SDP) governs backup architecture. "Your data lives where you decide. Period." This includes backup data. Where backups are stored, who can access them, and whether they are immutable are sovereignty decisions. An organization that stores backups on the same network as production data, managed by the same credentials, in the same geographic location, has not made a sovereignty decision. It has accepted a single point of failure.
Two TOP missions connect directly to backup:
- DPS-B04 (Backup and Recovery Architecture): Design and deploy the backup architecture: define RTO/RPO per system, implement the 3-2-1-1-0 model, configure immutability, establish offsite replication, and build the recovery procedures. 24 estimated hours.
- DPS-D02 (Backup Recovery Drill): Test the backup architecture under realistic conditions. Initiate a full restoration. Time it. Verify the data. Confirm that RTO and RPO are achievable. 12 estimated hours. This is the single most important test in the DPS domain. If it fails, the entire backup architecture is a paper exercise.
CDA runs the recovery drill with a stopwatch. From the moment the restoration is initiated to the moment the system is operational and verified: that is the actual RTO. If the actual RTO exceeds the defined RTO, the architecture needs modification. The stopwatch does not negotiate.
The interaction with adjacent domains: SPH maintains the backup infrastructure (patching backup servers, monitoring agent health, verifying schedule execution). IAT controls who can access backup systems and, critically, who can modify or delete backup repositories (backup admin credentials must be managed through PAM with MFA). TID detects attacks targeting backup infrastructure (deletion of shadow copies, access to backup admin credentials, unusual backup job modifications). RGA mandates the backup program through compliance frameworks and defines the RTO/RPO requirements based on risk assessment.
Key Takeaways
- Backup and recovery architecture is the last line of DPS defense. If every other control fails, the backup determines whether the organization recovers or pays.
- The 3-2-1-1-0 rule (3 copies, 2 media types, 1 offsite, 1 immutable/air-gapped, 0 errors after verification) is the architecture standard.
- RTO (time to recover) and RPO (data loss tolerance) must be defined per system and per data classification tier. One-size-fits-all recovery objectives either over-invest or under-invest.
- Immutable backups are the specific control that neutralizes ransomware targeting of backup infrastructure. Object lock, WORM storage, and air-gapped media provide immutability.
- CDA tests backup architecture with a stopwatch. DPS-D02 (Backup Recovery Drill) is the most important test in the DPS domain. If the backup cannot be restored within the defined RTO, the architecture needs revision.
Related Articles
- Data Protection and Sovereignty (DPS): The Geological Core
- Ransomware
- Incident Response Lifecycle
- Data Classification
- NIST Cybersecurity Framework (CSF) 2.0
- ISO 27001
Sources
- Cybersecurity and Infrastructure Security Agency (CISA). "Protecting Sensitive and Personal Information from Ransomware-Caused Data Breaches." CISA Fact Sheet, 2024.
- National Institute of Standards and Technology (NIST). "Cybersecurity Framework (CSF) 2.0: RC.RP (Recovery Planning), PR.DS (Data Security)." U.S. Department of Commerce, 2024.
- Veeam. "2024 Ransomware Trends Report." Veeam Software, 2024. (Statistics on ransomware targeting backup infrastructure.)
- International Organization for Standardization. "ISO/IEC 27001:2022, Annex A.8.13 (Information Backup)." ISO, 2022.
- U.S. Government Accountability Office. "Colonial Pipeline Cyberattack: DHS Needs to Better Manage Interagency Coordination." GAO-24-106486, December 2023.
Word count: 1,923
Related CDA Missions
CDA Theater missions that address topics covered in this article.
Written by Evan Morgan
Found an issue? Help improve this article.