AZ-900Chapter 122 of 127Objective 3.2

Microsoft Purview (Data Governance)

This chapter covers Microsoft Purview, Microsoft's unified data governance service that helps organizations discover, classify, and manage their data across on-premises, multi-cloud, and SaaS environments. For the AZ-900 exam, understanding Purview is part of Objective 3.2: Describe the features of governance and compliance in Azure, which typically accounts for about 5-10% of the total questions. While Purview is not heavily tested in depth, you must know its core purpose, key capabilities (like data catalog, data mapping, and data classification), and how it differs from other governance tools like Azure Policy and Azure Blueprints. We'll break down what Purview does, how it works, and exactly what the exam expects you to remember.

25 min read
Intermediate
Updated May 31, 2026

The Digital Librarian for Your Data

Imagine your company owns a massive library with millions of books, each containing sensitive information like customer addresses, financial records, and employee health data. Without a librarian, books are piled randomly, some are lost, and anyone can read any book — even those they shouldn't. Microsoft Purview is like hiring a team of expert digital librarians who do three things: first, they walk through every aisle, scanning each book's cover and table of contents to create a searchable catalog (this is data discovery and classification). Second, they set rules: 'Only managers can open salary books,' 'Health records must be locked in a special vault for six years then shredded' (this is data governance and retention policies). Third, they monitor every checkout — if someone tries to sneak a book out of the restricted section, the librarian immediately alerts security and logs the attempt (this is auditing and insider risk management). Crucially, the librarians don't move your books — they just tag them and watch the doors. In Azure, Purview sits on top of your data sources (like Azure SQL, Blob Storage, or even on-premises SQL Server) and provides a unified control plane without copying the data. It uses scanners that connect to your data stores, extract metadata, and apply sensitivity labels from Microsoft Information Protection. This is not a simple filing cabinet analogy — Purview actively scans, classifies, and enforces policies across hybrid environments, just as a librarian catalogs, restricts, and audits every book in a vast library.

How It Actually Works

What is Microsoft Purview and What Business Problem Does It Solve?

Organizations today generate massive amounts of data spread across Azure Blob Storage, Azure SQL Database, on-premises SQL Server, Amazon S3, and SaaS apps like Salesforce. The challenge is not just storing data — it's knowing what data you have, where it lives, who can access it, and whether it complies with regulations like GDPR, HIPAA, or CCPA. Without a unified view, companies risk data breaches, non-compliance fines, and inefficiencies where data scientists spend 80% of their time finding and preparing data rather than analyzing it.

Microsoft Purview is a cloud-based data governance service that provides a unified data catalog and data mapping to solve these problems. It was formerly known as Azure Purview but was rebranded in 2021 as part of the Microsoft Purview compliance portfolio. On the exam, you may still see references to "Azure Purview" — both terms refer to the same service.

How Does Purview Work? A Step-by-Step Mechanism

Purview operates through three main layers: scanning, cataloging, and governance.

1. Scanning and Classification Purview uses scanners — lightweight software agents — that connect to your data sources. You can register sources like Azure Blob Storage, Azure Data Lake Storage, Azure SQL Database, Azure Synapse Analytics, on-premises SQL Server, Power BI, and even third-party clouds like Amazon S3 and Google BigQuery. The scanner reads the schema, sample data, and metadata (like column names, data types, and file paths). It then applies built-in classifiers to automatically detect sensitive data types such as credit card numbers, social security numbers, passport numbers, and medical record IDs. You can also create custom classifiers for proprietary data patterns.

2. Data Catalog and Search After scanning, Purview builds a searchable data catalog — a central inventory of all your data assets. Each asset (like a table, file, or report) has metadata including its source, schema, classification labels, and lineage (how it was derived). Business users can search for data using keywords, browse by source, or filter by classification. The catalog supports glossary terms — business-friendly names and descriptions — so a table called "Cust_Contact_2024" can be tagged as "Customer Contact Information" with a definition and owner.

3. Data Mapping and Lineage Purview automatically captures data lineage — the journey of data from source to transformation to consumption. For example, if data flows from an on-premises SQL Server through Azure Data Factory into Azure Synapse Analytics, Purview traces every step. This is critical for auditing and impact analysis: you can see exactly which reports depend on a particular table, or what happens if you change a source column.

4. Governance Policies Purview integrates with Microsoft 365's sensitivity labels (from Microsoft Information Protection). You can apply labels like "Confidential" or "Highly Restricted" to data assets based on classification results. These labels then enforce controls: for example, a label might prevent data from being copied to a personal OneDrive or require encryption. Purview also supports access policies (currently in preview) to restrict who can read or modify data at the source.

5. Monitoring and Insights Purview provides dashboards showing the number of classified assets, sensitive data types found, and label coverage. You can set up alerts for anomalous activities — such as a user querying a large volume of sensitive data — via integration with Microsoft 365 Defender.

Key Components, Tiers, and Pricing

Purview has two main components: - Microsoft Purview Data Map: The underlying engine that captures metadata and builds the catalog. It includes automatic scanning, classification, and lineage. - Microsoft Purview Data Catalog: The user-facing portal for searching, browsing, and governing data assets.

Pricing (as of 2024):

Purview is billed based on the amount of metadata stored and the number of data assets processed. There is a free tier with limited capacity (e.g., 1 GB of metadata storage and 100,000 assets). Beyond that, you pay per unit of metadata storage (e.g., $0.001 per GB per hour) and per asset processed (e.g., $0.01 per 1,000 assets). Scanning on-premises sources may incur additional costs for the self-hosted integration runtime.

How It Compares to On-Premises Data Governance

On-premises, organizations might use third-party tools like Informatica or Collibra, which require significant hardware and manual effort. These tools often lack native integration with cloud sources and may not automatically classify data. Purview is fully managed, cloud-native, and integrates deeply with Azure services and Microsoft 365. It also supports hybrid environments — you can scan on-premises SQL Server using a self-hosted integration runtime, making it a true hybrid governance solution.

Azure Portal and CLI Touchpoints

In the Azure portal, you create a Microsoft Purview account as a resource. From there, you register data sources, configure scans, and manage the catalog. You can also use the Purview Governance Portal (a separate web UI) to browse assets and manage glossary terms. For automation, you can use Azure CLI or PowerShell, but most exam questions focus on portal-based configuration. A typical CLI command to create a Purview account:

az purview account create --name mypurview --resource-group myrg --location eastus

To list registered data sources:

az purview data-source list --account-name mypurview --resource-group myrg

Note: The CLI is rarely tested on AZ-900, but knowing it exists is helpful.

Concrete Business Scenario

A healthcare company uses Purview to scan its Azure SQL Database containing patient records. The scanner automatically detects columns with social security numbers and diagnosis codes, applying a "Highly Confidential" sensitivity label. The compliance team then searches the catalog to confirm all PHI (Protected Health Information) is labeled. They set up a data lineage view to see that patient data flows into Power BI reports for analytics. When a new regulation requires retention of records for 7 years, they use Purview's policy to enforce retention labels that prevent deletion. Without Purview, they would have to manually inventory databases and risk missing sensitive data.

Walk-Through

1

Create a Purview Account

In the Azure portal, search for 'Microsoft Purview' and click 'Create'. Fill in the subscription, resource group, and a unique account name. Choose a region (e.g., East US). You can also enable 'Event Hubs integration' for real-time monitoring. Click 'Review + Create'. Behind the scenes, Azure provisions a managed account with storage for metadata and a search index. The account is the top-level container for all governance activities. Default limits: up to 1000 data sources per account.

2

Register a Data Source

In the Purview Governance Portal (accessible from the account overview), go to 'Data Map' > 'Sources'. Click 'Register' and select the source type (e.g., Azure Blob Storage). Provide connection details like storage account name and authentication method (managed identity or account key). Purview does not copy the data; it only reads metadata and sample data. You can register up to 1000 sources per account. For on-premises sources, you need a self-hosted integration runtime installed on a local machine.

3

Configure and Run a Scan

After registering a source, create a 'Scan' rule set. You can choose to scan all data or a subset (e.g., specific containers or folders). Select built-in classifiers or create custom ones. Set the scan frequency: once, weekly, or monthly. When you run the scan, Purview connects to the source, reads schema, inspects sample data, and applies classifiers. The scan time depends on data volume — a few GB might take minutes. The results populate the data catalog with assets and classifications.

4

Browse and Curate the Data Catalog

Once scanned, go to 'Data Catalog' in the Purview portal. You can search for assets by name, classification, or glossary term. For each asset, you can view schema, lineage, and classifications. Curators can add business metadata: descriptions, glossary terms (e.g., 'Customer Data'), and contacts (e.g., data owner). This step is manual but critical for making data discoverable to business users. You can also export the catalog to Excel for offline review.

5

Apply Sensitivity Labels and Policies

Integrate with Microsoft 365 to synchronize sensitivity labels. In Purview, you can automatically apply labels based on classification (e.g., if 'Credit Card Number' found, apply 'Confidential'). This is done via 'Labeling policies' in the Data Map. Additionally, you can create 'Access policies' (preview) to restrict access at the source. For example, a policy can block non-owners from reading columns with PII. These policies are enforced when users query the data through services like Azure Synapse.

What This Looks Like on the Job

Scenario 1: Financial Services Compliance

A global bank must comply with SOX and GDPR. They have customer data spread across Azure SQL Database, on-premises Oracle databases, and Salesforce. The compliance team uses Purview to register all sources and run scans. Purview automatically classifies account numbers, SWIFT codes, and personal identifiers. They create a custom classifier for internal customer IDs. Using the data catalog, auditors can search for all assets containing 'PII' and verify that retention labels (e.g., '7-year retention') are applied. The bank also uses lineage to trace how customer data flows into risk reports. Without Purview, they would need manual inventories and risk missing data in shadow IT systems. Cost: they pay for metadata storage (~$50/month for 10 TB of metadata) and scanning (~$0.01 per 1000 assets). Common mistake: forgetting to register all sources, leading to incomplete compliance coverage.

Scenario 2: Healthcare Data Discovery

A hospital network merges with another hospital and needs to inventory all patient data. They have Azure Data Lake Storage and on-premises file servers. They deploy Purview and scan the Data Lake. The built-in healthcare classifiers (e.g., 'Disease Name', 'Medical Record ID') automatically tag sensitive columns. The data governance team creates glossary terms like 'Patient Demographics' and assigns data stewards. They use Purview's insight reports to see that 30% of assets are unclassified, prompting additional scans. One issue: the scan of on-premises file servers fails because the self-hosted integration runtime is not configured with proper network access. After fixing it, they achieve full visibility. This scenario highlights the need for proper infrastructure setup for hybrid scanning.

Scenario 3: Data Democratization in Retail

A retail company wants to empower data analysts to find sales data without IT bottlenecks. They use Purview to catalog data from Azure Synapse Analytics and Power BI. Analysts search the catalog for 'Sales by Region' and find the correct table with lineage showing it comes from a cleaned dataset. They can see the owner and description. This reduces data discovery time from days to minutes. However, if the catalog is not kept up-to-date (e.g., scans are not scheduled), analysts may find stale assets. Purview supports incremental scans to keep metadata fresh. The company also uses sensitivity labels to mark 'Internal Only' on sales forecasts to prevent external sharing.

How AZ-900 Actually Tests This

Exam Objective 3.2: Describe the features of governance and compliance in Azure

On the AZ-900 exam, you will see 2-3 questions related to Microsoft Purview, typically asking about its purpose and capabilities. The exam does not test deep configuration details like scanning rule sets or custom classifiers. Instead, focus on these key points:

1.

What Purview does: It provides a unified data catalog, data classification, and data lineage across hybrid and multi-cloud environments. It helps with governance, compliance, and data discovery.

2.

Key terms: 'Data catalog', 'data map', 'data classification', 'sensitivity labels', 'lineage'. Know that Purview uses 'scanners' to discover data and 'classifiers' to identify sensitive data types.

3.

Integration: Purview integrates with Microsoft 365 for sensitivity labels and with Azure services like Azure Data Factory for lineage. It can scan on-premises sources via a self-hosted integration runtime.

Common Wrong Answers and Why Candidates Choose Them

Wrong answer: 'Purview is used to enforce Azure Policy.' Why chosen: Candidates confuse governance tools. Azure Policy enforces compliance rules on Azure resources, while Purview governs data itself.

Wrong answer: 'Purview replaces Azure Blueprints.' Why chosen: Both are governance tools, but Blueprints orchestrate resource deployments, not data governance.

Wrong answer: 'Purview is a security tool like Azure Security Center.' Why chosen: Purview has compliance aspects, but it is primarily for data governance, not security monitoring.

Wrong answer: 'Purview only works with Azure data sources.' Why chosen: Candidates forget it supports on-premises and multi-cloud sources.

Specific Terms and Values That Appear Verbatim

'Microsoft Purview' (formerly Azure Purview)

'Data Map' and 'Data Catalog'

'Built-in classifiers' for sensitive data types (e.g., credit card numbers, SSN)

'Lineage' to show data origin and transformations

'Self-hosted integration runtime' for on-premises scanning

Edge Cases and Tricky Distinctions

Purview vs. Azure Information Protection: Purview uses sensitivity labels from Microsoft Information Protection (MIP), but MIP is part of Microsoft 365. Purview applies these labels to data assets in the catalog, not directly to files (that is done by Azure Information Protection client).

Purview vs. Azure Data Catalog: Azure Data Catalog was a predecessor that is now deprecated. Purview is the replacement with enhanced capabilities.

Scanning frequency: Default is once; you can schedule periodic scans. The exam might ask about 'incremental scans' which only detect changes.

Memory Trick: 'Purview = Data GPS'

Think of Purview as a GPS for your data: it maps where data lives (catalog), what it contains (classification), and how it got there (lineage). This helps you navigate and comply with regulations. When you see a question about discovering, classifying, or tracking data across sources, the answer is Purview. If the question is about resource compliance or deployment orchestration, it's Azure Policy or Blueprints.

Key Takeaways

Microsoft Purview is a unified data governance service for discovering, classifying, and managing data across on-premises, multi-cloud, and SaaS environments.

Purview uses scanners to read metadata and sample data; it does not copy or move your data.

Key capabilities: data catalog, data classification (built-in classifiers for sensitive data types), data lineage, and sensitivity labeling.

Purview integrates with Microsoft 365 for sensitivity labels and with Azure Data Factory for lineage.

For on-premises sources, you need a self-hosted integration runtime.

On the AZ-900 exam, know that Purview is for data governance, not resource governance (that's Azure Policy).

Purview was formerly called Azure Purview; both names refer to the same service.

Easy to Mix Up

These come up on the exam all the time. Here's how to tell them apart.

Microsoft Purview

Governs data assets (tables, files, reports)

Provides data catalog, classification, and lineage

Scans hybrid and multi-cloud sources

Integrates with Microsoft 365 sensitivity labels

Helps with data discovery and compliance

Azure Policy

Governs Azure resource configurations (VMs, storage, etc.)

Enforces rules like 'require encryption' or 'restrict locations'

Applies to Azure resources only (not on-premises)

Uses policy definitions and initiatives

Helps with resource compliance and security baselines

Watch Out for These

Mistake

Microsoft Purview is only for Azure data sources.

Correct

Purview can scan on-premises SQL Server, Amazon S3, Google BigQuery, and other non-Azure sources via the self-hosted integration runtime. It is a hybrid and multi-cloud governance tool.

Mistake

Purview automatically encrypts sensitive data.

Correct

Purview classifies and labels data, but encryption is handled by other services like Azure Storage encryption or Azure SQL TDE. Purview does not encrypt data itself.

Mistake

Purview is a security tool like Microsoft Defender for Cloud.

Correct

Purview is a data governance tool focused on cataloging, classification, and lineage. Security tools like Defender for Cloud focus on threat detection and vulnerability management. They complement each other but serve different purposes.

Mistake

You need to move your data into Purview for it to work.

Correct

Purview does not copy or move your data. It only reads metadata and sample data during scanning. Your data remains in its original source.

Mistake

Purview and Azure Policy are the same thing.

Correct

Azure Policy enforces compliance rules on Azure resources (e.g., 'all storage accounts must use HTTPS'). Purview governs data assets themselves (e.g., 'this table contains PII'). They are different tools under the governance umbrella.

Frequently Asked Questions

What is the main purpose of Microsoft Purview?

Microsoft Purview provides a unified data governance solution to help organizations discover, classify, and manage their data across hybrid and multi-cloud environments. It creates a searchable data catalog, automatically classifies sensitive data, and tracks data lineage. It is not a security tool but a governance tool. On the exam, remember that Purview is about understanding and controlling your data assets.

Does Purview support on-premises data sources?

Yes, Purview can scan on-premises data sources like SQL Server using a self-hosted integration runtime (SHIR). The SHIR is a software agent installed on a local machine that connects to Purview. This allows hybrid governance. The exam may test that Purview is not limited to Azure.

How does Purview classify sensitive data?

Purview uses built-in classifiers that detect patterns like credit card numbers, social security numbers, and medical record IDs. You can also create custom classifiers. During a scan, Purview samples data and applies these classifiers to label assets. The results are visible in the data catalog. The exam expects you to know that classification is automatic.

What is the difference between Purview Data Map and Data Catalog?

The Data Map is the backend engine that captures metadata, lineage, and classifications. The Data Catalog is the user-facing portal for searching, browsing, and governing assets. Think of the Data Map as the database and the Data Catalog as the search interface. The exam may use these terms interchangeably, but knowing the distinction helps.

Can Purview enforce access control on data?

Purview can apply sensitivity labels that integrate with Microsoft 365 compliance policies, which can enforce actions like encryption or access restrictions. Additionally, Purview has a preview feature for access policies that can restrict read/write at the source. However, for AZ-900, know that Purview's primary role is discovery and classification, not enforcement.

Is Purview free?

Purview has a free tier with limited capacity (e.g., 1 GB of metadata storage and 100,000 assets). Beyond that, you pay for metadata storage and asset processing. The exam does not require you to memorize pricing, but you should know it is a paid service with a free tier.

How does Purview differ from Azure Information Protection?

Azure Information Protection (AIP) is part of Microsoft 365 and focuses on classifying and protecting files and emails (e.g., applying labels to documents). Purview uses the same sensitivity labels but applies them to data assets in the catalog (e.g., tables and columns). AIP protects the data itself, while Purview helps you discover and manage it. On the exam, remember that Purview is for data governance at scale.

Terms Worth Knowing

Ready to put this to the test?

You've just covered Microsoft Purview (Data Governance) — now see how well it sticks with free AZ-900 practice questions. Full explanations included, no account needed.

Done with this chapter?