# Data Governance
Definition
Data governance is the organizational framework of policies, roles, processes, and standards that ensures data is managed as a strategic asset throughout its lifecycle: creation, collection, storage, use, sharing, retention, and disposal. It answers the foundational questions that every data protection control depends on: what data do we have, where does it live, who owns it, who can access it, how long do we keep it, and what happens to it at end-of-life?
Data governance is broader than data security. Data security (encryption, DLP, access controls) protects data from unauthorized access. Data governance determines what data exists, how it is classified, who is responsible for it, and what the rules are for its lifecycle. Security implements the controls. Governance defines what the controls must achieve.
Without data governance, security operates blind. An organization that cannot answer "where does our customer PII reside?" cannot implement effective DLP (what patterns should it look for?), cannot scope encryption (what should be encrypted?), cannot conduct a breach investigation (what data was exposed?), and cannot comply with data subject access requests under GDPR or state privacy laws (where is this person's data?). Data governance provides the foundational knowledge that every DPS, IAT, and RGA control depends on.
How It Works
Governance Framework Components
Data inventory and mapping. Identify every category of data the organization collects, processes, and stores. Map where each data category resides: which databases, which file shares, which cloud storage, which SaaS applications, which third parties. The data inventory is the foundational artifact. Every subsequent governance decision references it.
Data mapping is particularly important for privacy compliance. GDPR Article 30 requires records of processing activities (RoPA) that document what personal data is processed, for what purpose, where it is stored, who it is shared with, and how long it is retained. State privacy laws (CCPA/CPRA, Virginia CDPA, Colorado CPA) have similar data mapping requirements. An organization that cannot produce its data map cannot demonstrate compliance with privacy regulations.
Data classification. Assign a classification level to each data category based on its sensitivity and the impact of unauthorized disclosure. A typical classification scheme:
Restricted: disclosure would cause severe harm (financial loss, regulatory penalties, legal liability). Examples: payment card data, Social Security numbers, health records, trade secrets, credentials.
Confidential: disclosure would cause significant harm. Examples: employee PII, customer contact information, financial reports, internal strategy documents, source code.
Internal: disclosure would cause minor harm. Examples: internal communications, non-sensitive operational data, organizational charts.
Public: disclosure causes no harm. Examples: marketing materials, published content, public financial filings.
Classification drives control selection: Restricted data requires encryption at rest and in transit, DLP monitoring, strict access controls, and audit logging. Public data requires none of these. Without classification, every data element receives either the same controls (expensive and operationally impractical) or inconsistent controls (risky and non-compliant).
Data ownership and stewardship. Assign accountability for each data category to a specific individual or role:
Data owner: a business leader (not an IT role) who is accountable for the data's classification, access policies, retention rules, and compliance requirements. The VP of Finance owns financial data. The VP of HR owns employee data. The Chief Medical Officer owns patient data. Ownership means accountability: the data owner is responsible for decisions about who accesses the data, how long it is retained, and what happens when a breach exposes it.
Data steward: an operational role that implements the data owner's decisions. The data steward manages the data quality, maintains the metadata, ensures classification labels are applied, and coordinates with IT and security to implement the access controls and retention policies the owner defines.
Data custodian: the IT or security role that implements the technical controls. The database administrator who configures encryption, the security engineer who deploys DLP policies, and the cloud architect who configures access permissions are custodians. They implement the controls that the owner mandates and the steward coordinates.
Without defined ownership, data governance decisions are made by default: whoever created the data manages it however they choose, resulting in inconsistent classification, uncontrolled access, and unclear retention. The question "who is responsible for this data?" must have a specific, named answer for every data category.
Data lifecycle management. Define how data is managed from creation through disposal:
Creation/collection: data is created or collected with a defined purpose and classified at the point of creation. The classification label is applied immediately, not retroactively.
Storage: data is stored in locations appropriate to its classification. Restricted data is stored in encrypted, access-controlled repositories. Internal data may be stored on general-purpose file shares. The storage location must match the classification requirements.
Use: data is accessed and used by authorized individuals for authorized purposes. Access controls enforce the authorized access. DLP monitors for unauthorized use. Audit logs record who accessed what and when.
Sharing: data is shared with internal and external parties according to classification-specific rules. Restricted data is shared through encrypted channels with contractual protections (NDAs, BAAs, DPAs). Internal data may be shared within the organization without additional controls.
Retention: data is retained for the period required by business need, regulatory mandate, or contractual obligation, and no longer. Retention policies define how long each data category is retained: financial records (7 years per IRS requirements), health records (6 years per HIPAA, longer per some state laws), customer data (per the organization's privacy notice), and operational data (per business need).
Disposal: data that has exceeded its retention period is securely destroyed. Disposal methods match the data's classification: Restricted data on physical media is destroyed through degaussing, shredding, or cryptographic erasure. Electronic copies are securely deleted using methods that prevent recovery. Disposal is documented for audit evidence.
Data Governance Roles
Data Governance Committee. A cross-functional body (legal, compliance, IT, security, business unit representatives) that sets data governance policy, resolves classification disputes, approves data sharing agreements, and oversees the governance program. The committee provides the organizational authority that governance requires.
Chief Data Officer (CDO) or equivalent. The executive responsible for data governance strategy, data quality, and data-related compliance. In organizations without a CDO, the CISO, CIO, or General Counsel may own data governance (though this creates potential conflicts of interest: the CIO who owns data governance and IT infrastructure may optimize for convenience rather than governance).
Privacy Officer / DPO. Responsible for privacy-specific governance: GDPR compliance, state privacy law compliance, data subject rights requests, privacy impact assessments, and privacy-by-design integration into new systems and processes. GDPR requires designated Data Protection Officers for certain organizations.
Why It Matters
Regulatory Compliance
Data governance is a prerequisite for compliance with every regulation that governs data: GDPR (records of processing, data subject rights, data protection by design), CCPA/CPRA (data mapping, consumer rights, opt-out mechanisms), HIPAA (ePHI inventory, minimum necessary standard, BAA requirements), PCI DSS (cardholder data flow documentation, data retention), and SEC cybersecurity disclosure rules (material incident determination requires knowing what data was affected).
Organizations that cannot produce their data inventory, classification scheme, and retention policies fail the first stage of every compliance assessment. The auditor asks "where does your customer data reside?" and the answer must be specific, documented, and current.
Breach Impact Assessment
When a breach occurs, the incident response team must determine what data was exposed. This determination drives every subsequent decision: which notification laws apply, how many individuals are affected, what regulatory reports are required, and what the financial exposure is. Without a data inventory that maps data categories to systems, the IR team cannot answer these questions accurately. The breach notification says "we are investigating what data may have been affected" because the organization does not know what data was on the compromised system.
A mature data governance program enables immediate impact assessment: the compromised system is mapped to specific data categories, each category has a known classification and regulatory status, and the notification obligations are determinable within hours rather than weeks.
Data Minimization
Data governance enforces data minimization: collecting only the data needed for a specific purpose and retaining it only for the required period. Data minimization directly reduces breach impact (data that was not collected cannot be exposed) and compliance burden (data that was disposed of after the retention period expires is not subject to ongoing governance requirements).
Most organizations collect more data than they need, retain it longer than required, and store copies in more locations than they track. Data governance identifies this excess and provides the framework for reducing it. Every unnecessary data element eliminated is a data element that cannot be breached.
CDA Perspective
Data governance sits at the intersection of DPS (Data Protection and Sovereignty) and RGA (Risk Governance and Assurance) in the Planetary Defense Model. DPS owns the protection dimension: encryption, DLP, access controls, and sovereignty. RGA owns the governance dimension: policies, roles, processes, and compliance. Data governance is the bridge between them: it defines what DPS must protect and what RGA must govern.
CDA's Sovereign Data Protocol (SDP) is built on data governance. "Your data lives where you decide. Period." That sovereignty decision requires knowing what data exists (inventory), what it is worth (classification), who is responsible for it (ownership), and where it resides (mapping). Without these governance foundations, sovereignty is an aspiration. With them, sovereignty is an enforceable operational state.
Four TOP missions connect to data governance:
- DPS-R01 (Data Inventory and Mapping): Discover and map all data assets. Identify every system that creates, processes, or stores sensitive data. Map data flows between systems, including flows to third parties and cloud platforms. 20 estimated hours.
- DPS-R02 (Data Classification Assessment): Evaluate and classify data categories. Assign classification levels. Identify data that is over-classified (creating unnecessary control burden) or under-classified (creating risk exposure). 12 estimated hours.
- DPS-B01 (Data Classification Program): Build the operational classification program: define the classification scheme, deploy classification tools (Microsoft Information Protection, Google Workspace labels), train data owners and stewards, and integrate classification with DLP policies. 24 estimated hours.
- RGA-B01 (Risk Management Framework): Includes data governance as a component of the overall risk management framework: data ownership assignments, retention policy development, and governance committee establishment. 32 estimated hours.
CDA approaches data governance with one principle: governance precedes protection. An organization that deploys DLP before classifying its data produces false positives (flagging data that is not sensitive) and false negatives (missing data that is sensitive but not classified). An organization that encrypts everything because it has not classified what needs encryption wastes resources on public data and may still miss Restricted data in unmanaged locations. Classification first. Protection second. Governance enables both.
Key Takeaways
- Data governance is the organizational framework that manages data as a strategic asset: inventory, classification, ownership, lifecycle, and compliance.
- Data governance precedes and enables data security. Without knowing what data exists, where it resides, and how it is classified, security controls cannot be effectively deployed.
- Three accountability roles: data owner (business leader accountable for classification and access decisions), data steward (operational role implementing decisions), and data custodian (IT/security implementing technical controls).
- Data lifecycle management covers creation, storage, use, sharing, retention, and disposal. Each phase has governance requirements that vary by classification level.
- CDA's SDP principle: governance precedes protection. Classify first, protect second.
Related Articles
Sources
- International Organization for Standardization. "ISO/IEC 38505-1:2017: Information Technology , Governance of IT , Governance of Data." ISO, 2017.
- European Parliament and Council. "General Data Protection Regulation (GDPR): Article 30 (Records of Processing Activities)." Official Journal of the European Union, 2016.
- DAMA International. "DAMA-DMBOK: Data Management Body of Knowledge, 2nd Edition." Technics Publications, 2017.
- National Institute of Standards and Technology (NIST). "Cybersecurity Framework (CSF) 2.0: GV.RM (Risk Management), PR.DS (Data Security)." U.S. Department of Commerce, 2024.
- International Association of Privacy Professionals (IAPP). "Data Mapping: A Foundation for Privacy Compliance." IAPP, 2024.
Word count: 1,987