DP-900Chapter 91 of 101Objective 1.1

Data Mesh and Domain-Oriented Data Ownership

This chapter covers Data Mesh and domain-oriented data ownership, a modern architectural paradigm for managing data at scale. For the DP-900 exam, understanding data mesh is essential as it appears in questions about data architectures and governance (Objective 1.1). Approximately 5-10% of exam questions touch on modern data architectures, with data mesh being a key concept. You will need to distinguish data mesh from traditional data lakes and warehouses, and understand its core principles: domain ownership, data as a product, self-serve data platform, and federated computational governance.

25 min read
Intermediate
Updated May 31, 2026

Data Mesh as a Supermarket Franchise

Imagine a large supermarket chain that used to have a central warehouse supplying all stores. The central warehouse decided what to stock, how to label items, and who could access inventory data. As the chain grew, the warehouse became a bottleneck: stores couldn't get unique local products, and inventory decisions were slow. To solve this, the chain adopted a franchise model. Each store became its own domain, owning its inventory data. Store managers (domain teams) define their own data schemas, label products according to local preferences, and expose data through standardized APIs. A central franchise office provides shared infrastructure (like refrigerators and checkout systems) and governance rules (like food safety standards) but does not control individual store data. When a customer wants a product from another store, they request it through a data catalog that lists available items across all stores. Each store is responsible for the quality and freshness of its own data. This mirrors data mesh: each domain owns its data as a product, exposes it via standard interfaces, and leverages shared infrastructure (data platform) while adhering to global governance. The central warehouse (monolithic data lake) is replaced by a federated, domain-oriented architecture.

How It Actually Works

What is Data Mesh and Why Does It Exist?

Data mesh is an architectural and organizational paradigm for decentralized data management. It was introduced by Zhamak Dehghani in 2019 as a response to the failures of centralized data lakes and warehouses. In traditional architectures, a central data team is responsible for ingesting, cleaning, and serving data to the entire organization. This creates bottlenecks: the central team becomes a single point of failure, data becomes stale, and domain-specific context is lost. Data mesh flips this model by treating data as a product, owned by the domain teams that generate the data. Each domain is responsible for its own data pipelines, quality, and exposure. The central team instead provides a shared self-serve data platform and enforces global governance through federated computational governance.

How Data Mesh Works Internally

Data mesh is not a specific technology but a set of principles that guide architecture. The four core principles are:

Domain Ownership: Each business domain (e.g., sales, inventory, customer support) owns its data. Domain teams are responsible for collecting, processing, and serving their data as a product. This includes defining schemas, ensuring data quality, and managing access.

Data as a Product: Data is treated as a first-class product, not a byproduct. Each domain must ensure its data is discoverable, addressable, trustworthy, self-describing, interoperable, and secure. This means providing metadata, documentation, and SLA guarantees.

- Self-Serve Data Platform: A shared infrastructure that abstracts the complexity of storage, compute, and networking. Domain teams use this platform to build and operate their data products without needing deep infrastructure expertise. The platform typically includes: - Data storage: e.g., Azure Data Lake Storage (ADLS) Gen2, Azure Blob Storage - Compute: e.g., Azure Databricks, Azure Synapse Analytics - Data catalog: e.g., Microsoft Purview for metadata management and discovery - Data integration: e.g., Azure Data Factory for orchestration - Access control: e.g., Azure Active Directory (now Microsoft Entra ID) and role-based access control (RBAC)

Federated Computational Governance: Governance is not enforced by a central authority but is embedded into the platform. Policies (e.g., data privacy, encryption, retention) are automated and applied consistently across domains. For example, a policy might enforce that all data products containing personally identifiable information (PII) must be encrypted at rest and accessible only via approved roles.

Key Components, Values, Defaults, and Timers

While data mesh is conceptual, its implementation relies on specific Azure services with concrete configurations:

Azure Data Lake Storage Gen2: Default storage for data products. Hierarchical namespace is recommended for efficient access control. Default encryption is enabled (Azure Storage Service Encryption, AES-256).

Azure Databricks: Common compute for data transformation. Default cluster configurations: e.g., 14-day auto-termination for interactive clusters, 120-minute idle timeout.

Microsoft Purview: Data catalog and governance. Scanning intervals default to 12 hours but can be set as low as 1 hour. Purview supports automated classification of sensitive data types (e.g., credit card numbers, SSN).

Azure Policy: Enforces governance at scale. Policies are evaluated on resource creation and at regular intervals (default 24 hours).

Azure Role-Based Access Control (RBAC): Used to grant permissions to data products. Roles include Storage Blob Data Owner, Contributor, Reader. Default deny applies.

Configuration and Verification Commands

To configure a data mesh on Azure, you would typically use Azure CLI or PowerShell. Example commands:

# Create a resource group for the data platform
az group create --name DataMeshPlatform --location eastus

# Create a storage account with hierarchical namespace
az storage account create --name meshdatalake --resource-group DataMeshPlatform --location eastus --sku Standard_RAGRS --kind StorageV2 --hierarchical-namespace true

# Create a container for a domain (e.g., sales)
az storage container create --account-name meshdatalake --name sales-data --auth-mode login

# Assign RBAC role to a domain team
az role assignment create --assignee user@domain.com --role "Storage Blob Data Contributor" --scope "/subscriptions/<sub>/resourceGroups/DataMeshPlatform/providers/Microsoft.Storage/storageAccounts/meshdatalake/blobServices/default/containers/sales-data"

# Register a data product in Purview (via REST API or SDK)
# Purview API example (simplified):
# PUT https://{purviewaccount}.catalog.purview.azure.com/api/atlas/v2/types/typedefs

Verification commands:

# Check storage account encryption
az storage account show --name meshdatalake --query encryption

# List RBAC assignments
az role assignment list --scope "/subscriptions/<sub>/resourceGroups/DataMeshPlatform/providers/Microsoft.Storage/storageAccounts/meshdatalake"

# Check Purview scan status
# (via Purview Studio or REST API)

How Data Mesh Interacts with Related Technologies

Data mesh often coexists with:

Data Lake: The self-serve platform typically uses a data lake as the storage layer. However, in data mesh, the lake is not monolithic; each domain has its own container or folder with isolated access controls.

Data Warehouse: Some data products may be served via a data warehouse (e.g., Azure Synapse dedicated SQL pool) for low-latency queries. The warehouse becomes one of the output ports of a data product.

Data Lakehouse: A combination of data lake and warehouse. Data mesh can be implemented on a lakehouse architecture, with domains publishing data as Delta Lake tables.

Data Fabric: A broader concept that includes data mesh principles but also emphasizes data integration across on-premises and multi-cloud. Azure Data Fabric (Microsoft Fabric) provides a SaaS platform that simplifies building a data mesh.

Data Catalog: Essential for data mesh. Microsoft Purview serves as the central catalog where domain teams register their data products, making them discoverable.

Detailed Mechanism Steps

1.

Domain team identifies a data product: For example, the sales domain decides to create a "Sales Transactions" data product.

2.

Domain team provisions storage: Using the self-serve platform, they create a container in the data lake with appropriate RBAC.

3.

Domain team builds pipelines: Using Azure Data Factory or Databricks, they ingest, clean, and transform sales data. The pipeline writes to the container.

4.

Domain team registers the data product: They register metadata (schema, description, ownership, SLA) in Microsoft Purview, making it discoverable.

5.

Domain team exposes data: They provide access via APIs (e.g., REST endpoints) or direct query interfaces (e.g., Azure Synapse serverless SQL).

6.

Consumer discovers data product: A data scientist uses Purview to find the "Sales Transactions" product, reads the documentation, and requests access.

7.

Access is granted: The domain team (or automated policy) grants the consumer RBAC permissions or API keys.

8.

Consumer uses data: The consumer queries the data product using approved tools (e.g., Power BI, Azure Databricks).

Performance and Scale Considerations

Storage: ADLS Gen2 scales to petabytes. Each domain container can have millions of files. Use partition pruning for optimal query performance.

Compute: Azure Databricks clusters auto-scale based on workload. For high concurrency, use Delta Lake and optimize table layouts (e.g., Z-ordering).

Catalog: Purview can handle thousands of data products. Scan frequency affects freshness. For real-time updates, use event-driven scanning.

Governance: Azure Policy can enforce compliance at scale. However, too many policies can slow down resource creation. Use policy exemptions sparingly.

Common Pitfalls

Over-fragmentation: Too many small data products can lead to management overhead. Aim for data products that align with business domains, not individual tables.

Lack of standardization: Without global governance, domains may produce inconsistent metadata, making discovery difficult. Enforce minimum metadata requirements.

Underestimating platform costs: The self-serve platform must be cost-optimized. Use reserved capacity for predictable workloads, and set budgets.

Ignoring data quality: Each domain must own data quality. Implement automated quality checks (e.g., using Azure Databricks Delta Live Tables expectations).

Walk-Through

1

Identify domain and data product

The first step is to identify a business domain (e.g., Sales, Inventory, Customer Support) and define a specific data product that domain will own. A data product is a curated, ready-to-use dataset that serves a business need. For example, the Sales domain might define 'Sales Transactions' as a data product containing all finalized sales with timestamps, amounts, and product IDs. The domain team must decide on the granularity, update frequency, and quality SLAs. This step requires business alignment and clear ownership. In Azure, this maps to creating a resource group or container dedicated to that domain's data products.

2

Provision self-serve infrastructure

The domain team uses the shared self-serve data platform to provision storage and compute resources. Typically, this involves creating a storage account (ADLS Gen2) with hierarchical namespace for fine-grained access control. The team also sets up a compute environment such as Azure Databricks workspace or Azure Synapse workspace. The platform should provide templates or automation scripts to standardize provisioning. For example, a Terraform script might create a storage container with default RBAC roles for the domain team. The infrastructure must be isolated from other domains to prevent accidental cross-contamination.

3

Build data pipelines and transformations

The domain team builds data pipelines to ingest, clean, transform, and load data into the storage container. They use tools like Azure Data Factory for orchestration or Azure Databricks for complex transformations. The pipeline should enforce data quality checks (e.g., null checks, range validations) and handle errors gracefully. The output is typically stored in open formats like Parquet or Delta Lake. The team also implements incremental updates to minimize latency. For example, a pipeline might run every hour to append new transactions. The pipeline code is version-controlled and tested.

4

Register data product in catalog

Once the data is available, the domain team registers the data product in the enterprise data catalog (e.g., Microsoft Purview). They provide metadata: product name, description, owner, schema, data quality metrics, SLA (e.g., freshness 1 hour), and access request instructions. The catalog automatically scans the storage location to discover schema and classifications. The team also tags the data product with business terms (e.g., 'Sales', 'PII'). This step makes the product discoverable to other domains. Without registration, the data is effectively invisible.

5

Expose data and manage access

The domain team exposes the data product through one or more interfaces: direct storage access (via RBAC), SQL endpoints (e.g., Azure Synapse serverless SQL), or REST APIs. They implement access controls using Azure Active Directory (now Microsoft Entra ID) and RBAC. Access requests are automated via the catalog: a consumer clicks 'Request Access', which triggers a workflow that grants permissions after approval. The domain team monitors usage and may throttle requests to prevent abuse. They also version the data product to support backward compatibility.

6

Consume and provide feedback

Data consumers discover the data product through the catalog, review its documentation, and request access. Once granted, they use their preferred tools (e.g., Power BI, Azure Databricks, custom applications) to query the data. The domain team collects feedback on data quality, missing fields, or performance issues. They iterate on the data product to improve it. This feedback loop is crucial for the product to remain valuable. The platform logs all access for auditing and cost allocation.

What This Looks Like on the Job

Enterprise Scenario 1: Large Retail Chain

A multinational retailer with thousands of stores had a centralized data lake where all sales, inventory, and customer data was dumped. The central data team was overwhelmed: data quality was poor, and business units waited weeks for new reports. They adopted data mesh by assigning each business unit (e.g., North America Sales, Europe Sales, Supply Chain) as a domain. Each domain owns its data products. For example, the North America Sales domain created a 'Daily Sales by Store' data product updated hourly. They used Azure Data Lake Storage Gen2 with containers per domain, Azure Databricks for transformations, and Microsoft Purview for cataloging. The central team built a self-serve platform with automated provisioning via Azure DevOps pipelines. The result: time-to-insight dropped from weeks to hours, and data quality improved because domain teams had direct accountability. A common mistake was initially allowing domains to create too many small data products (e.g., per-store tables), leading to catalog clutter. They corrected by enforcing a minimum granularity (e.g., data products must cover at least a region).

Enterprise Scenario 2: Financial Services Firm

A bank needed to comply with strict regulations (e.g., GDPR, SOX) while enabling data scientists to build ML models. They implemented data mesh with federated governance. Each line of business (Retail Banking, Investment Banking, Risk) owned its data products. The platform enforced policies via Azure Policy: all data products containing PII must be encrypted at rest and have access logs enabled. The catalog (Purview) automatically classified sensitive data. A key challenge was ensuring that data products met SLAs for freshness (e.g., risk data must be updated within 15 minutes). They used Azure Monitor alerts to notify domain teams of SLA breaches. In production, they scaled to 50+ data products across 10 domains, with the platform handling 10 TB of new data daily. Misconfiguration of RBAC was a common issue: domain teams sometimes granted overly broad permissions (e.g., Contributor instead of Reader). They mitigated by using custom roles with least privilege.

Scenario 3: Healthcare Provider

A hospital network wanted to share patient data for research while protecting privacy. They used data mesh with domain ownership: each department (Cardiology, Oncology, etc.) owned its clinical data products. The self-serve platform included Azure API Management to expose data via secure APIs. Governance policies enforced de-identification before data was served. A problem arose when the Cardiology domain created a data product that included raw lab results without de-identification, violating policy. The platform's automated scanning caught this because Purview classified the data as containing protected health information (PHI) and blocked access. The domain team had to rebuild the product with de-identification logic. This scenario highlights the importance of automated governance enforcement.

How DP-900 Actually Tests This

DP-900 Exam Focus on Data Mesh

The DP-900 exam (Objective 1.1) tests your understanding of modern data architectures, including data mesh. While the exam does not require deep implementation details, you must know:

Core Principles: The four principles of data mesh: domain ownership, data as a product, self-serve data platform, and federated computational governance. Be able to identify which principle is being described in a scenario.

Comparison with Traditional Architectures: Understand how data mesh differs from a centralized data lake or warehouse. The exam often presents a scenario and asks you to choose the best architecture.

Azure Services: Know which Azure services support a data mesh: Azure Data Lake Storage Gen2 (storage), Azure Databricks (compute), Microsoft Purview (catalog/governance), Azure Data Factory (integration), Azure Policy (governance), and Azure Active Directory (access control).

Benefits and Challenges: Data mesh improves scalability and domain agility but introduces complexity in governance and requires cultural change.

Common Wrong Answers and Traps

1.

Confusing Data Mesh with Data Lake: The exam may describe a data lake and call it a data mesh. Remember: a data lake is a centralized repository, while data mesh is decentralized. Wrong answer: "Data mesh is a type of data lake."

2.

Thinking Data Mesh Eliminates Central Teams: Data mesh does not eliminate central teams; it shifts their role to building the self-serve platform and governance. Wrong answer: "In data mesh, there is no central data team."

3.

Ignoring Governance: Some candidates think data mesh means no governance. Actually, governance is federated and automated. Wrong answer: "Data mesh allows each domain to set its own security policies without oversight."

4.

Mixing Up Data Product and Dataset: A data product is more than a dataset; it includes metadata, SLAs, and access controls. Wrong answer: "A data product is simply a table in a database."

Specific Numbers and Terms

Four principles: Domain ownership, data as a product, self-serve platform, federated governance.

Key term: "Data product" – a curated, documented, and governed dataset with an owner.

Azure services: Microsoft Purview (catalog), Azure Data Lake Storage (storage), Azure Databricks (compute).

Edge Cases and Exceptions

Small organizations: Data mesh is overkill for small teams; a simple data lake may suffice. The exam might ask when NOT to use data mesh.

Real-time data: Data mesh can support real-time data products using event hubs or streaming, but the exam focuses on batch.

Multi-cloud: Data mesh can span multiple clouds, but DP-900 focuses on Azure.

How to Eliminate Wrong Answers

If the scenario mentions a single central team responsible for all data, it's not data mesh.

If the scenario emphasizes that each business unit owns its data and exposes it as a product, it is data mesh.

If the answer includes "centralized governance," it's likely wrong; data mesh uses federated governance.

Key Takeaways

Data mesh has four principles: domain ownership, data as a product, self-serve platform, federated governance.

Domain teams are responsible for the quality, documentation, and access of their data products.

The self-serve platform provides shared infrastructure (storage, compute, catalog) that domains use to build data products.

Federated computational governance automates policy enforcement (e.g., encryption, classification) across all domains.

Microsoft Purview is the Azure service for data catalog and governance in a data mesh.

Data mesh is not a product but an architecture; it is implemented using services like ADLS Gen2, Azure Databricks, and Azure Policy.

Data mesh differs from a data lake by decentralizing ownership and treating data as a product.

Common exam trap: confusing data mesh with data lake; remember decentralization is key.

Data mesh is best suited for large organizations with multiple business domains that need autonomy.

Central platform team still exists to build and maintain the self-serve platform and global governance.

Easy to Mix Up

These come up on the exam all the time. Here's how to tell them apart.

Data Mesh

Decentralized: each domain owns its data as a product

Data is treated as a product with SLAs and documentation

Governance is federated and automated via platform

Scalable across many domains without central bottleneck

Requires cultural shift and domain expertise

Data Lake

Centralized: single team owns all data

Data is a byproduct of ingestion; often lacks documentation

Governance is centralized and manual

Central team becomes bottleneck as data grows

Easier to implement initially but harder to scale

Watch Out for These

Mistake

Data mesh is a specific technology or product you can buy.

Correct

Data mesh is an architectural paradigm, not a product. It is implemented using a combination of technologies like Azure Data Lake Storage, Databricks, and Purview. No single vendor sells 'data mesh.'

Mistake

In data mesh, there is no central data team.

Correct

There is still a central platform team that builds and maintains the self-serve data platform and enforces global governance. Domain teams own data products, but the platform is centralized.

Mistake

Data mesh eliminates data governance.

Correct

Governance is federated and automated, not eliminated. Global policies (e.g., encryption, retention) are enforced by the platform, while domain teams manage access to their data products.

Mistake

Data mesh is the same as data fabric.

Correct

Data fabric is a broader concept that includes data integration across hybrid and multi-cloud environments. Data mesh is a subset of data fabric principles, focusing on domain ownership and data as a product.

Mistake

Data mesh requires each domain to have its own storage and compute.

Correct

Domains share a common self-serve platform but have isolated storage containers and compute workspaces. They do not provision separate storage accounts or compute clusters from scratch; they use the platform's resources.

Do You Actually Know This?

Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.

Frequently Asked Questions

What is data mesh in simple terms?

Data mesh is an architectural approach where each business domain (e.g., sales, marketing) owns and manages its own data as a product, rather than sending all data to a central team. Domain teams are responsible for data quality, documentation, and access. A shared platform provides common infrastructure (storage, compute, catalog) and automated governance. This avoids the bottleneck of a central data team and scales across many domains.

How does data mesh differ from a data lake?

In a data lake, a central team ingests and manages all data in a single repository. In a data mesh, each domain owns its data in isolated storage containers and exposes it as a product. Data mesh decentralizes ownership and treats data as a product with SLAs, while a data lake centralizes storage and often lacks product-level documentation. Data mesh also includes a self-serve platform and federated governance, which a data lake typically does not.

What Azure services are used to implement a data mesh?

Key Azure services include: Azure Data Lake Storage Gen2 (storage), Azure Databricks (compute), Microsoft Purview (data catalog and governance), Azure Data Factory (data integration), Azure Policy (automated governance), and Azure Active Directory (now Microsoft Entra ID) for access control. These services together provide the self-serve platform and governance needed for a data mesh.

Is data mesh suitable for small organizations?

No, data mesh is typically overkill for small organizations with few domains. It is designed for large enterprises with many independent business units that need autonomy. For small teams, a simple data lake or warehouse is more practical. The overhead of building a self-serve platform and enforcing federated governance may outweigh benefits.

What is a data product in data mesh?

A data product is a curated, documented, and governed dataset that is owned by a domain and treated as a product. It includes not only the data itself but also metadata (schema, description, owner), quality SLAs, access controls, and APIs. Data products are discoverable through a catalog and are designed to be consumed by other domains or applications.

Does data mesh eliminate the need for a central data team?

No, a central platform team is still needed to build and maintain the self-serve data platform, define global governance policies, and provide tools and templates. Domain teams own their data products, but the platform is centralized. The central team's role shifts from data management to platform engineering and governance.

What is federated computational governance?

Federated computational governance means that governance policies (e.g., data encryption, retention, classification) are automated and embedded into the self-serve platform. Policies are defined globally but enforced locally by the platform. For example, a policy might automatically encrypt all data products containing PII. This ensures consistency without requiring a central team to manually review each domain.

Terms Worth Knowing

Ready to put this to the test?

You've just covered Data Mesh and Domain-Oriented Data Ownership — now see how well it sticks with free DP-900 practice questions. Full explanations included, no account needed.

Done with this chapter?