This chapter covers Microsoft Fabric, Microsoft's unified SaaS platform for data and analytics. You will learn its core components, architecture, and how it integrates with existing Azure services. For the DP-900 exam, approximately 10-15% of questions touch on modern analytics platforms like Microsoft Fabric, focusing on its capabilities and how it simplifies the data analytics lifecycle. Understanding Fabric's unified lakehouse architecture and its key workloads is essential for answering scenario-based questions.
Jump to a section
Imagine a large manufacturing company that previously had separate departments for sourcing raw materials, assembly, quality control, packaging, and shipping. Each department had its own building, its own management, its own inventory systems, and its own data silos. To produce a finished product, materials had to be physically moved from one building to another, with delays, lost items, and duplicated effort. Now, the company builds a single, massive factory with multiple specialized zones under one roof. The raw materials (data) come in through a receiving dock (OneLake). The assembly line (Data Factory) transforms them. A quality assurance lab (Data Engineering/Science) tests and refines. A packaging area (Power BI) creates the final reports. And the shipping dock (Real-Time Analytics) sends finished products to customers. All zones share a common conveyor belt system and a unified inventory catalog (OneLake). Workers can move seamlessly between zones because they use the same tools and processes. The factory manager (Fabric admin) can see the entire production flow in real time, identify bottlenecks, and reallocate resources instantly. This is what Microsoft Fabric does for data: it unifies all data and analytics services into a single SaaS platform, eliminating silos and providing a shared lake (OneLake), common governance, and integrated experiences.
What is Microsoft Fabric?
Microsoft Fabric is a unified, end-to-end analytics platform introduced in May 2023 that integrates multiple data and analytics services into a single software-as-a-service (SaaS) experience. It is built on a common foundation called OneLake, a single, multi-cloud data lake that serves as the single source of truth for all data within the organization. Fabric eliminates the need to stitch together separate services like Azure Data Lake Storage, Azure Synapse Analytics, Azure Data Factory, and Power BI, which previously required complex integration and separate management.
Why Microsoft Fabric Exists
Before Fabric, organizations typically built analytics solutions by combining multiple Azure services: Azure Data Lake Storage for data storage, Azure Data Factory for data ingestion, Azure Synapse Analytics for data warehousing, Azure Databricks for data engineering and data science, and Power BI for visualization. This approach led to several challenges: - Data silos: Each service had its own storage, making it difficult to share data across teams. - Complexity: Managing multiple services with different authentication, governance, and monitoring. - Cost: Duplicate storage and compute resources. - Latency: Moving data between services introduced delays.
Fabric addresses these by providing a unified platform where all workloads share the same underlying storage (OneLake), a common security and governance model, and a consistent user experience across data engineering, data science, data warehousing, real-time analytics, and business intelligence.
How Microsoft Fabric Works Internally
Fabric is built on a foundation of OneLake, which is a single, logically centralized data lake that automatically partitions and replicates data across regions. OneLake is built on top of Azure Data Lake Storage (ADLS) Gen2 but abstracts the underlying storage details. Every Fabric tenant gets a single OneLake that is automatically provisioned. Within OneLake, data is organized into workspaces (similar to containers) and items (the different artifacts like lakehouses, warehouses, reports).
Key architectural components: - OneLake: The data lake that stores all data in open formats (Delta Parquet, CSV, etc.). It supports shortcuts, which are symbolic links to external data sources (e.g., Azure Data Lake Storage, Amazon S3) without moving data. - Compute engines: Fabric provides multiple compute engines that can read from OneLake directly: - Spark (for data engineering and data science) - SQL (for data warehousing) - Power BI (for analytics and reporting) - Real-Time Analytics (Kusto Query Language) - Data Factory: A cloud-based ETL/ELT service that orchestrates data movement and transformation. - Semantic model: A business-level abstraction layer that defines metrics, relationships, and calculations for Power BI.
Key Components and Defaults
Fabric includes the following workloads (also called experiences):
Lakehouse: A data lake that provides a relational database-like experience. It supports tables (managed and unmanaged) and files. The default storage format is Delta Lake (Parquet with transaction log).
Data Warehouse: A fully managed SQL data warehouse that uses the same OneLake storage. It supports T-SQL, and data is stored in Delta format.
Data Engineering: Provides Apache Spark notebooks and jobs for data transformation.
Data Science: Includes machine learning model training, scoring, and experiment tracking.
Real-Time Analytics: Uses Kusto Query Language (KQL) for high-speed analytics on streaming data.
Data Factory: Provides pipelines and dataflows for data integration.
Power BI: For creating reports and dashboards.
Default values and timers:
OneLake is automatically provisioned per tenant; no manual setup required.
Data in OneLake is stored in the region closest to the tenant's home region.
Shortcuts can point to external storage; they do not copy data by default.
Data retention for workspace deletion is 7 days (soft delete).
Spark pools have default configurations (e.g., 5 executors, 4 cores each) but can be customized.
Configuration and Verification
To create a Fabric workspace: 1. Navigate to the Fabric portal (app.powerbi.com or fabric.microsoft.com). 2. Click on "Workspaces" and select "New workspace." 3. Provide a name, description, and set access permissions. 4. Once created, you can add items like Lakehouse, Data Warehouse, etc.
To verify OneLake integration:
Use the OneLake file explorer (preview) to browse files.
Use az storage blob commands if OneLake is exposed via ADLS Gen2 API.
In a Lakehouse, use SHOW TABLES in SQL or %sql in a notebook.
How Fabric Interacts with Related Technologies
Fabric integrates with: - Azure Active Directory: For authentication and authorization. - Azure Purview (Microsoft Purview): For data cataloging and governance. - Azure DevOps: For CI/CD pipelines. - Git: For version control of notebooks and reports. - Microsoft 365: For embedding reports in Teams and SharePoint. - Dynamics 365: For accessing business data.
Fabric also supports open standards like Delta Lake, Parquet, and T-SQL, making it compatible with third-party tools that support these formats.
Exam Relevance
For DP-900, you need to know:
Microsoft Fabric is a unified SaaS platform that integrates data storage, processing, and analytics.
OneLake is the single, shared data lake that eliminates data silos.
Fabric includes multiple workloads: Lakehouse, Data Warehouse, Data Engineering, Data Science, Real-Time Analytics, Data Factory, and Power BI.
Fabric supports open formats (Delta Lake, Parquet) and shortcuts to external data.
Fabric is designed for end-to-end analytics, from ingestion to visualization.
Provision a Fabric Tenant
To start using Microsoft Fabric, an organization must have a Fabric tenant. This is typically provisioned automatically when a user signs up for the Fabric free trial or when an administrator enables Fabric for the organization in the Microsoft 365 admin center. The tenant is associated with an Azure Active Directory (Azure AD) tenant. During provisioning, OneLake is created in the tenant's home region (e.g., East US). This step happens once and does not require manual configuration of storage accounts. The tenant is the top-level container for all workspaces and items.
Create a Workspace
A workspace is a logical container for related items (lakehouses, warehouses, reports, etc.). To create one, navigate to the Fabric portal and click 'Workspaces' then 'New workspace'. Provide a name and optionally a description. You can also assign roles (Admin, Member, Contributor, Viewer) to users or groups. Workspaces are the primary way to organize and secure data and analytics assets. Each workspace has its own OneLake folder (under the tenant's OneLake) where all data is stored.
Create a Lakehouse
Inside a workspace, you can create a Lakehouse item. A Lakehouse is a data lake with a relational layer. When you create a Lakehouse, Fabric automatically provisions a Spark compute endpoint and a SQL analytics endpoint. The Lakehouse stores data in Delta Lake format. You can create tables using Spark or SQL, or load data via Data Factory. The Lakehouse exposes a SQL endpoint for querying with T-SQL. This step establishes the foundational storage and compute for data engineering and warehousing.
Ingest Data Using Data Factory
Data Factory in Fabric allows you to create pipelines and dataflows to ingest data from various sources (e.g., Azure Blob Storage, SQL Server, Salesforce). You can use copy activities to move data into the Lakehouse. For example, you can copy a CSV file from an external blob store into a Lakehouse table. Data Factory supports both scheduled and event-triggered pipelines. The data is stored in OneLake in Delta Parquet format. You can also use shortcuts to reference external data without copying.
Transform Data with Spark
Once data is in the Lakehouse, you can use Spark notebooks (PySpark, Scala, SQL) to transform it. You can create a notebook in the workspace, attach it to the Lakehouse's Spark compute, and write transformation logic. Common transformations include filtering, aggregating, joining, and writing back to the Lakehouse as new tables. Spark runs on managed clusters that auto-scale. The transformed data remains in OneLake, ready for consumption by other workloads.
Create a Power BI Report
After data is prepared, you can create Power BI reports directly in Fabric. In the workspace, select 'New Power BI report' and choose the Lakehouse or Warehouse as the data source. The semantic model automatically detects relationships and measures. You can build visuals and dashboards. The report is stored as an item in the workspace. Users can view reports in the Fabric portal or embed them in other applications. This completes the analytics lifecycle from data ingestion to visualization.
Enterprise Scenario 1: Retail Analytics
A large retail chain with 500 stores uses Microsoft Fabric to unify sales data from point-of-sale systems, inventory data from warehouses, and customer data from CRM. Previously, they used separate Azure Data Lake Storage for raw data, Azure Databricks for processing, and Power BI for reporting. Data movement between services caused delays and inconsistencies. With Fabric, they create a single Lakehouse that stores all data. Data Factory pipelines ingest nightly sales data from each store's SQL Server database into the Lakehouse. Spark notebooks clean and aggregate the data. The Data Warehouse workload provides fast SQL queries for inventory analysts. Power BI reports are built directly on the Lakehouse, showing real-time inventory levels and sales trends. OneLake shortcuts allow referencing supplier data stored in an external S3 bucket without copying. The company reduced data latency from 24 hours to under 1 hour and cut storage costs by 40% by eliminating duplicate copies.
Scenario 2: Healthcare Analytics
A healthcare provider aggregates patient data from multiple electronic health record (EHR) systems, lab results, and wearable devices. They need to comply with HIPAA and ensure data governance. Fabric provides a unified platform where all data resides in OneLake with common security policies. Real-Time Analytics workload ingests streaming heart rate data from wearables using Kusto Query Language. Data Science workload builds predictive models for patient readmission risks. Data Warehouse stores structured patient records. All workloads share the same data, so a model trained in Data Science can be deployed to score new data in Real-Time Analytics. The organization uses Microsoft Purview for data cataloging and lineage. Common misconfiguration: forgetting to set appropriate row-level security on the Lakehouse, leading to unauthorized data access. Best practice: use workspace roles and item permissions to enforce least privilege.
Scenario 3: Financial Services
A global bank uses Fabric for risk analysis and regulatory reporting. They have petabytes of transaction data in on-premises Hadoop clusters. With Fabric, they create shortcuts to the on-premises data using OneLake's ability to connect to ADLS Gen2 and then to on-premises via Azure Stack HCI. Data Factory pipelines copy critical data into the Lakehouse. Data Engineering workload runs daily risk calculations using Spark. The Data Warehouse workload provides low-latency queries for compliance officers. Power BI reports are embedded in a custom web application for real-time risk dashboards. Performance consideration: Spark pool configuration must be tuned for the large data volumes; default settings may cause out-of-memory errors. Misconfiguration: not partitioning large tables, leading to slow queries. Fabric's auto-partitioning helps but explicit partitioning by date is recommended.
DP-900 Objective 3.1: Describe modern data analytics platforms
This objective covers understanding of platforms like Microsoft Fabric, Azure Synapse Analytics, and Azure Databricks. The exam focuses on the key characteristics and benefits of Fabric as a unified SaaS platform.
Common Wrong Answers and Traps
"Microsoft Fabric is a replacement for Azure Synapse Analytics" — This is partially true but misleading. Fabric is a broader platform that includes Synapse capabilities (data warehousing, Spark) but also adds OneLake, Data Factory, Power BI, and Real-Time Analytics. The exam may present Fabric and Synapse as separate options; Fabric is the newer, unified platform.
"OneLake is just another name for Azure Data Lake Storage" — OneLake is built on ADLS Gen2 but is a logical data lake that spans multiple regions and workspaces. It is not a separate storage account you manage. Candidates often confuse OneLake with a specific storage account.
"Fabric requires you to provision storage accounts manually" — Fabric automatically provisions OneLake; you do not create storage accounts. This is a common trap because other Azure services require manual storage setup.
"Fabric only works with structured data" — Fabric supports unstructured data (files) in Lakehouse and shortcuts. It is not limited to structured data.
Specific Numbers and Terms on the Exam
OneLake: The single, unified data lake.
Shortcuts: Symbolic links to external data without copying.
Delta Lake: The default storage format for tables.
Workspace: Logical container for items.
Lakehouse: Combines data lake and data warehouse.
Real-Time Analytics: Uses Kusto Query Language.
Data Factory: For ETL/ELT pipelines.
Power BI: For visualization.
Edge Cases and Exceptions
Fabric is not available in all Azure regions; check availability.
Free trial includes limited capacity; production requires paid capacity.
OneLake shortcuts do not support all external sources; only ADLS Gen2, Amazon S3, and Dataverse are supported initially.
Fabric does not replace Azure Databricks for advanced machine learning workflows; Databricks can still be used alongside Fabric.
How to Eliminate Wrong Answers
If a question asks about a unified data lake that eliminates silos, the answer is OneLake.
If a question mentions a SaaS platform that integrates all analytics workloads, the answer is Microsoft Fabric.
If a question describes a symbolic link to external data, the answer is shortcut.
If a question mentions a workload for streaming analytics with KQL, the answer is Real-Time Analytics.
Microsoft Fabric is a unified SaaS analytics platform that integrates data storage, processing, and visualization.
OneLake is the single, multi-cloud data lake that eliminates data silos and provides a single source of truth.
Fabric includes workloads: Lakehouse, Data Warehouse, Data Engineering, Data Science, Real-Time Analytics, Data Factory, and Power BI.
Data in OneLake is stored in open formats (Delta Lake, Parquet) and supports shortcuts to external data without copying.
Fabric automatically provisions OneLake per tenant; no manual storage account creation is needed.
Real-Time Analytics uses Kusto Query Language (KQL) for high-speed streaming analytics.
Shortcuts provide read-only access to external data sources like ADLS Gen2, Amazon S3, and Dataverse.
Workspaces are logical containers for organizing items (lakehouses, reports, etc.) and controlling access.
These come up on the exam all the time. Here's how to tell them apart.
Microsoft Fabric
Unified SaaS platform with OneLake as single data lake.
Includes Power BI, Data Factory, Real-Time Analytics, Data Science, and Data Engineering.
Automatically provisions storage; no manual setup.
Supports shortcuts to external data without copying.
Ideal for end-to-end analytics with a consistent user experience.
Azure Synapse Analytics
PaaS offering with separate storage (ADLS Gen2) and compute (SQL pools, Spark pools).
Focuses on data warehousing, big data analytics, and data integration.
Requires manual provisioning of storage accounts and compute resources.
Does not include built-in Power BI or Real-Time Analytics (KQL).
Best for large-scale data warehousing and complex ETL pipelines.
Mistake
Microsoft Fabric is just a rebranded version of Azure Synapse Analytics.
Correct
Fabric is a completely new platform built from the ground up as a SaaS offering. While it includes data warehousing capabilities similar to Synapse, it also integrates Power BI, Data Factory, Real-Time Analytics, and Data Science under a single unified experience with OneLake. Synapse remains available but is not part of Fabric.
Mistake
OneLake is a separate storage account that you must create and manage.
Correct
OneLake is automatically provisioned per tenant and is not a separate Azure resource you create. It is a logical data lake that abstracts underlying ADLS Gen2 storage. You do not need to manage storage accounts; Fabric handles it automatically.
Mistake
Fabric only supports structured data in tables.
Correct
Fabric supports both structured (tables) and unstructured (files) data. Lakehouse can store any file format (CSV, JSON, Parquet, images, etc.). Tables are stored in Delta Lake format, but raw files are also accessible.
Mistake
Shortcuts copy data into OneLake.
Correct
Shortcuts are symbolic links that point to external data sources. They do not copy data; they provide a read-only view of the external data. This saves storage costs and avoids duplication.
Mistake
Fabric can only be used with Azure data sources.
Correct
Fabric supports shortcuts to Amazon S3 and Dataverse, and Data Factory can connect to hundreds of on-premises and cloud sources (e.g., SQL Server, Salesforce, Google BigQuery). It is not limited to Azure.
Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.
Microsoft Fabric is a unified SaaS analytics platform that combines data lake, data warehouse, data integration, real-time analytics, and business intelligence into a single experience. Unlike Azure Synapse Analytics, which is a PaaS service requiring separate provisioning of storage and compute, Fabric provides a fully managed, unified environment with OneLake as the common data lake. Fabric also includes built-in Power BI and Real-Time Analytics (KQL), which Synapse does not.
No. Fabric automatically provisions OneLake, a logical data lake, for your tenant. You do not need to create or manage any storage accounts. OneLake is built on Azure Data Lake Storage Gen2 but is abstracted away.
A shortcut is a symbolic link within OneLake that points to an external data source (e.g., Azure Data Lake Storage, Amazon S3, Dataverse). It does not copy data; it provides a read-only view. This allows you to access external data without moving it, saving storage costs and avoiding duplication.
Fabric includes Lakehouse, Data Warehouse, Data Engineering (Spark), Data Science (ML), Real-Time Analytics (KQL), Data Factory (pipelines/dataflows), and Power BI. These workloads share the same OneLake storage and security model.
Yes. Fabric supports shortcuts to Amazon S3 and Dataverse. Data Factory can connect to hundreds of on-premises and cloud sources, including SQL Server, Oracle, Salesforce, and Google BigQuery.
The default storage format is Delta Lake (Parquet with a transaction log). This provides ACID transactions, schema enforcement, and time travel capabilities.
Fabric integrates with Azure Active Directory for authentication and authorization. It also supports Microsoft Purview for data cataloging, lineage, and sensitivity labeling. Workspace roles and item-level permissions control access.
You've just covered Microsoft Fabric Overview — now see how well it sticks with free DP-900 practice questions. Full explanations included, no account needed.
Done with this chapter?