Microsoft Fabric is a unified SaaS platform that integrates data engineering, data integration, data warehousing, data science, real-time analytics, and business analytics. At its core lies OneLake, a single, logical, multi-cloud data lake that provides a common storage layer for all Fabric experiences. This chapter covers OneLake and Workspaces in depth, explaining their architecture, key components, and how they interact. For the DP-900 exam, approximately 15-20% of questions relate to analytics workloads, with OneLake and Workspaces being central to understanding Fabric's data management. You will be tested on concepts like shortcuts, granular permissions, and the relationship between workspaces and OneLake items.
Jump to a section
Imagine a city (Microsoft Fabric) where each department (Data Engineering, Data Science, Data Warehouse, etc.) needs to store and access water (data). Traditionally, each department would dig its own well (separate data store), leading to duplication and inconsistency. In Fabric, the city builds OneLake—a single, massive, centrally managed lake. Each department gets a designated area on the lake's shore (a workspace) with its own dock (shortcut) to access the water. The lake itself stores the water once in a common format (Delta-Parquet). When a department creates a table, it's like building a fenced-off section of the lake (a managed folder) that only they can edit. But other departments can create shortcuts—imagine a pipe that lets them view and query the water without moving it. The city also has a central zoning office (the OneLake portal) that tracks all sections and shortcuts. If a department wants to share its water with another, it simply grants permission to the shortcut. The key is that all water is stored in the lake, not in individual wells, so everyone sees the same version of truth. This eliminates data silos and ensures consistency, just as OneLake eliminates the need for separate data lakes.
What is Microsoft Fabric?
Microsoft Fabric is a Software-as-a-Service (SaaS) analytics platform that brings together multiple data and analytics services into a single, integrated environment. It was announced in May 2023 and became generally available in November 2023. Fabric replaces the need to manage separate Azure services like Azure Synapse Analytics, Azure Data Factory, Azure Data Lake Storage Gen2, and Power BI by providing a unified experience. The platform is built on a foundation of OneLake, a multi-cloud data lake that acts as the single source of truth for all data ingested, processed, and analyzed within Fabric.
OneLake: The Single, Logical Data Lake
OneLake is the central storage layer of Microsoft Fabric. It is a single, logical, multi-cloud data lake that is automatically provisioned for every Fabric tenant. Each Fabric tenant gets exactly one OneLake instance. OneLake is built on top of Azure Data Lake Storage (ADLS) Gen2 and uses the Delta-Parquet format as its native storage format. Delta-Parquet combines the columnar storage efficiency of Parquet with the ACID transaction capabilities of Delta Lake, enabling reliable, performant data operations.
Key characteristics of OneLake: - Single logical lake: Despite being physically distributed across regions and clouds, OneLake presents a unified namespace. Users and services interact with a single logical endpoint. - Multi-cloud: OneLake can span multiple Azure regions and even include data stored in other clouds (e.g., AWS S3) via shortcuts, though the primary storage is in Azure. - Automatic provisioning: When a Fabric tenant is created, OneLake is automatically provisioned. No manual storage account creation is needed. - Native Delta-Parquet: All data in OneLake is stored as Delta-Parquet tables by default. This format is open-source and supported by many analytics engines. - Shortcuts: OneLake supports shortcuts, which are symbolic links to data stored either within the same OneLake, in other OneLake instances, in ADLS Gen2, or in Amazon S3. Shortcuts avoid data movement and duplication. - Granular permissions: Access to data in OneLake is managed through workspace roles and item-level permissions, not through storage account keys or SAS tokens.
Workspaces: Containers for Collaboration
In Microsoft Fabric, a workspace is a logical container that holds all the items (datasets, notebooks, pipelines, reports, etc.) related to a specific project or team. Workspaces are similar to Power BI workspaces but extended to support all Fabric experiences. Each workspace has its own folder structure within OneLake, allowing data to be organized and secured independently.
Key points about workspaces:
- Workspace identity: Each workspace is assigned a unique GUID. The OneLake folder path for a workspace is: https://onelake.dfs.fabric.microsoft.com/<workspace_guid>/.
- Items in a workspace: Workspaces contain items such as Lakehouses, Data Warehouses, Notebooks, Data Pipelines, Semantic Models, Reports, etc. Each item has its own subfolder under the workspace.
- Roles: Workspace access is controlled via roles: Admin, Member, Contributor, and Viewer. These roles determine what actions users can perform on items within the workspace.
- Capacity: Workspaces are associated with a Fabric capacity (SKU), which determines the compute resources available. Capacities can be shared across workspaces or dedicated to a single workspace.
How OneLake and Workspaces Interact
When you create a Lakehouse or Data Warehouse in a workspace, Fabric automatically creates a corresponding folder in OneLake. For example, a Lakehouse named "SalesLake" in workspace "SalesTeam" will have a folder at:
https://onelake.dfs.fabric.microsoft.com/<workspace_guid>/SalesLake.Lakehouse/All tables created in that Lakehouse are stored as Delta-Parquet files within that folder. The folder structure is:
- Tables/ - contains managed tables (Fabric manages the lifecycle)
- Files/ - contains unmanaged files (user-managed, e.g., raw data)
Shortcuts can be created within a Lakehouse or Data Warehouse to reference data stored elsewhere. For example, a shortcut can point to an ADLS Gen2 container, an Amazon S3 bucket, or another Lakehouse's table. The shortcut appears as a virtual table or folder, but the data remains in its original location.
Shortcuts: Virtual References to External Data
Shortcuts are a critical feature of OneLake. They allow you to access data stored outside of Fabric without moving or copying it. There are three main types of shortcuts: 1. Internal shortcuts: Point to data within the same OneLake (e.g., another workspace's Lakehouse). 2. ADLS Gen2 shortcuts: Point to data in Azure Data Lake Storage Gen2. 3. Amazon S3 shortcuts: Point to data in Amazon S3.
How shortcuts work:
When you create a shortcut, Fabric stores metadata about the source location (URL, credentials if needed) in OneLake.
When a query accesses the shortcut, Fabric reads the data directly from the source, applying the necessary authentication and authorization.
Shortcuts are read-only by default; you cannot write data through a shortcut to the source (except for certain internal shortcuts where write is allowed).
Shortcuts support Delta-Parquet tables and folders. For non-Delta formats, you can still access files but may need to use Spark or other engines to read them.
Permissions and Security
Permissions in Fabric are managed at multiple levels: - Tenant level: Admin settings control cross-tenant sharing, external data sharing, etc. - Capacity level: Capacity admins manage which workspaces use a capacity. - Workspace level: Roles (Admin, Member, Contributor, Viewer) control what users can do within the workspace. - Item level: For specific items like Lakehouses, you can grant permissions (Read, ReadAll, Write, Execute) to users or groups. - OneLake data access roles: In Lakehouses, you can define custom roles that restrict access to specific tables or rows (row-level security).
Important: OneLake does not use storage account keys or SAS tokens. All access is through Azure AD (Entra ID) authentication and Fabric permissions. This eliminates the risk of key exposure.
Data Formats and Storage
All data in OneLake is stored in the Delta-Parquet format by default. Delta-Parquet provides: - ACID transactions: Atomic, Consistent, Isolated, Durable operations. - Schema enforcement: Ensures data conforms to defined schema. - Time travel: Ability to query previous versions of data. - Efficient compression: Parquet columnar storage reduces storage costs and improves query performance.
Fabric also supports other formats like CSV, JSON, and Avro for raw files in the Files/ section, but tables are always Delta-Parquet.
Integration with Fabric Experiences
OneLake and workspaces are the foundation for all Fabric workloads: - Data Factory: Pipelines can ingest data into OneLake via shortcuts or direct copy. - Synapse Data Engineering: Notebooks and Spark jobs read/write from OneLake. - Synapse Data Warehouse: Warehouse tables are stored in OneLake as Delta-Parquet. - Synapse Data Science: Models can access training data from OneLake. - Synapse Real-Time Analytics: Streaming data lands in OneLake. - Power BI: Semantic models can directly query OneLake data via Direct Lake mode, bypassing import or DirectQuery for high performance.
Configuration and Management
OneLake is automatically provisioned; there is no manual configuration required. However, you can manage: - Shortcuts: Create, update, delete shortcuts via the Fabric portal or APIs. - Permissions: Assign workspace roles and item permissions. - Capacity: Assign or change the capacity associated with a workspace. - Data retention: Fabric does not automatically delete data; you must manage lifecycle manually.
To verify OneLake connectivity, you can use tools like Azure Storage Explorer (with Fabric authentication) or the OneLake file explorer (preview). For example, to list files in a Lakehouse:
https://onelake.dfs.fabric.microsoft.com/<workspace_guid>/<lakehouse_name>.Lakehouse/Tables/Limits and Quotas
Maximum number of workspaces per tenant: 1,000 (default, can be increased by support).
Maximum number of items per workspace: 10,000.
Maximum size of a single file: 4.75 TB (same as ADLS Gen2).
Shortcut source: Must be accessible via public endpoint or private endpoint with proper networking.
Delta-Parquet table maximum partition columns: 256.
Summary
OneLake and Workspaces form the backbone of Microsoft Fabric. OneLake provides a single, logical data lake with Delta-Parquet as the native format, while workspaces organize projects and teams. Shortcuts enable seamless access to external data without duplication. Understanding these concepts is essential for the DP-900 exam, especially questions about data storage, sharing, and security in Fabric.
Create Fabric Capacity
Before using Fabric, you must provision a Fabric capacity (SKU) in the Azure portal or through the Fabric admin portal. The capacity determines the compute resources available for all workspaces within that capacity. Capacities are available in tiers: F2, F4, F8, F16, F32, F64, F128, F256, F512, F1024, and F2048 (for Fabric) or equivalent Power BI Premium SKUs (P1-P4). Each SKU has a specific number of capacity units (CUs). For example, F64 provides 64 CUs. The capacity is billed per second based on consumption. Without a capacity, Fabric workspaces cannot run any compute operations; they can only store data.
Create a Workspace
In the Fabric portal, you create a workspace by clicking 'Workspaces' > 'New workspace'. You provide a name, description, and optionally assign a capacity. The workspace is automatically assigned a unique GUID. Behind the scenes, Fabric creates a dedicated folder in OneLake for the workspace at `https://onelake.dfs.fabric.microsoft.com/<workspace_guid>/`. This folder will contain all items created within the workspace. You can also set the workspace's license mode (Fabric capacity, Power BI Premium, or shared capacity). For production, always assign a dedicated capacity to avoid performance contention.
Create a Lakehouse in the Workspace
Inside the workspace, you create a Lakehouse item. Click 'New' > 'Lakehouse', give it a name, and Fabric provisions a folder in OneLake named `<lakehouse_name>.Lakehouse`. This folder contains subfolders `Tables/` and `Files/`. The Lakehouse is initially empty. You can then ingest data using pipelines, notebooks, or direct upload. The Lakehouse supports Spark compute for data transformation. The Lakehouse's metadata (schema, shortcuts) is stored in the Fabric metastore (Hive Metastore compatible).
Ingest Data into the Lakehouse
Data can be ingested via Data Factory pipelines (copy activity), Notebooks (Spark), or direct upload. For example, using a pipeline, you can copy data from Azure SQL Database to the Lakehouse's `Files/` folder as Parquet files. Then, using a notebook, you can load the Parquet files into a Delta table in `Tables/`. The Delta table is stored as a directory of Parquet files plus a `_delta_log` folder that tracks transactions. The table is immediately available for querying via SQL Analytics endpoint or Spark.
Create a Shortcut to External Data
To access data without moving it, create a shortcut. In the Lakehouse, right-click on `Tables/` or `Files/` and select 'New shortcut'. Choose the source type (Azure Data Lake Storage Gen2, Amazon S3, or internal). Provide the URL and authentication (e.g., account key for ADLS Gen2, access key for S3). Fabric validates the connection and creates a virtual folder or table. The shortcut appears in the Lakehouse as if it were local data. Queries read directly from the source. No data is copied to OneLake. Shortcuts can be refreshed or deleted independently.
Grant Permissions and Share Data
To allow other users to access the Lakehouse, assign workspace roles or item-level permissions. For example, a user with Contributor role can create new items and modify existing ones. A user with Viewer role can only read data. For fine-grained access, use OneLake data access roles to restrict access to specific tables or rows. Data can be shared across workspaces by creating shortcuts to other workspaces' Lakehouses. To share with external tenants, enable cross-tenant sharing in the admin portal and provide the shortcut URL.
Scenario 1: Enterprise Data Lake Consolidation
A large retail company had multiple data lakes across different departments: marketing used Azure Data Lake Storage Gen2, finance used a separate ADLS account, and the analytics team used a third. This led to data silos and duplication. They adopted Microsoft Fabric with OneLake as the single logical lake. They created a workspace for each department but used shortcuts to point to the existing ADLS Gen2 containers. This allowed each department to keep their existing data pipelines while enabling cross-department queries without moving data. The data engineering team created a central Lakehouse with shortcuts to all departmental data, enabling company-wide reporting. Performance was good because queries read directly from the original sources. The main challenge was managing permissions: they had to ensure that shortcuts did not inadvertently expose sensitive financial data. They used OneLake data access roles to restrict access to specific tables within the finance shortcut.
Scenario 2: Real-Time Analytics with Streaming Data
A logistics company needed to analyze IoT sensor data from delivery trucks in real time. They used Fabric's Real-Time Analytics experience to ingest streaming data into a KQL database, which automatically landed data in OneLake as Delta-Parquet tables. The data was then available for historical analysis via a Lakehouse shortcut. They created a workspace named 'IoT' with a KQL database and a Lakehouse. The Lakehouse had a shortcut to the KQL database's OneLake folder. Power BI reports connected to the Lakehouse using Direct Lake mode, achieving sub-second query performance on millions of records. The key operational consideration was capacity sizing: the streaming ingestion consumed significant compute, so they used an F64 capacity and monitored CU usage. A misconfiguration occurred when they initially set the shortcut to point to the KQL database's raw folder instead of the curated tables, causing schema mismatch errors. They corrected it by pointing to the curated folder.
Scenario 3: Multi-Cloud Data Integration
A media company used both Azure and AWS. Their video transcoding logs were stored in Amazon S3, while user profiles were in Azure SQL Database. They used Fabric to unify analytics. They created a workspace 'MediaAnalytics' with a Lakehouse. They created an S3 shortcut to the S3 bucket containing the logs, using AWS access keys stored securely in Azure Key Vault (via Fabric's linked service). For the user profiles, they used a Data Factory pipeline to copy data from Azure SQL to the Lakehouse's Tables/ folder. The challenge was that the S3 shortcut was read-only, so they could not write transformed data back to S3. They solved this by processing the logs in a Spark notebook and writing results to the Lakehouse's managed tables. The cross-cloud setup introduced latency for S3 reads, but it was acceptable for daily batch reports. The main pitfall was that S3 shortcuts do not support Delta format natively; they had to store data as Parquet files and create external tables. This required additional schema management.
DP-900 Exam Focus: OneLake and Workspaces
Objective 3.1: Describe analytics workloads in Microsoft Fabric
This objective covers the core concepts of Fabric, including OneLake, workspaces, shortcuts, and data formats. Expect 4-6 questions related to this topic on the exam.
Common Wrong Answers and Why They Are Wrong
1. 'OneLake is a separate Azure storage account that you must provision.' - Wrong because OneLake is automatically provisioned when you create a Fabric tenant. You do not create or manage a storage account. This is a key distinction from Azure Data Lake Storage.
2. 'Shortcuts copy data into OneLake for better performance.' - Wrong because shortcuts are virtual references; they do not move or copy data. Data remains in the source location. The exam tests this distinction heavily.
3. 'Workspaces are equivalent to Azure resource groups.' - Wrong because workspaces are logical containers for Fabric items, not Azure resources. They have specific roles and permissions, and they are tied to Fabric capacities, not subscriptions.
4. 'OneLake only supports Delta-Parquet format.'
- Partially wrong: Tables are Delta-Parquet, but OneLake also stores raw files (CSV, JSON, etc.) in the Files/ folder. The exam may ask about supported formats.
Specific Numbers and Terms That Appear on the Exam
Delta-Parquet: The native format for OneLake tables.
Shortcut types: Internal, ADLS Gen2, Amazon S3.
Workspace roles: Admin, Member, Contributor, Viewer.
OneLake endpoint format: https://onelake.dfs.fabric.microsoft.com/<workspace_guid>/<item_name>.Lakehouse/
Capacity SKUs: F2, F4, F8, F16, F32, F64, etc.
Maximum workspaces per tenant: 1,000 (default).
Direct Lake mode: Power BI mode that queries OneLake directly without import or DirectQuery.
Edge Cases and Exceptions
Shortcuts to S3 are read-only; you cannot write data to S3 through a shortcut.
Internal shortcuts can be writable if the source is a Lakehouse in the same tenant.
OneLake does not support versioning like ADLS Gen2; however, Delta Lake time travel is supported.
Cross-tenant shortcuts require admin approval and proper networking.
How to Eliminate Wrong Answers
If an answer mentions 'copying data' or 'moving data' in the context of shortcuts, it is wrong.
If an answer says you need to create a storage account, it is wrong.
If an answer says workspaces are the same as Power BI workspaces (without mentioning Fabric items), it is incomplete.
If an answer says OneLake only stores Delta-Parquet, it is wrong because Files/ can store other formats.
OneLake is the single, logical, multi-cloud data lake automatically provisioned for every Fabric tenant.
All tables in OneLake are stored in Delta-Parquet format, providing ACID transactions and time travel.
Workspaces are logical containers for Fabric items; each workspace has a unique GUID and folder in OneLake.
Shortcuts are virtual references to data in ADLS Gen2, Amazon S3, or other OneLake locations; they do not copy data.
Fabric capacity (SKU) determines compute resources available to workspaces; capacities range from F2 to F2048.
Permissions are managed via workspace roles (Admin, Member, Contributor, Viewer) and item-level permissions.
Direct Lake mode in Power BI enables high-performance querying of OneLake data without import or DirectQuery.
OneLake supports external access via Azure Storage Explorer and REST APIs using Azure AD authentication.
Maximum of 1,000 workspaces per tenant by default; maximum of 10,000 items per workspace.
Shortcuts to Amazon S3 are read-only; internal shortcuts may allow writes if source is a Lakehouse.
These come up on the exam all the time. Here's how to tell them apart.
OneLake
Single logical data lake per Fabric tenant
Automatically provisioned, no manual setup
Native Delta-Parquet format for tables
Supports shortcuts to external data sources
Access controlled via Fabric permissions and Azure AD
Azure Data Lake Storage Gen2
Multiple storage accounts can be created
Requires manual provisioning and configuration
Supports any file format, no native table format
No built-in shortcut capability; uses symlinks or external tools
Access controlled via storage account keys, SAS tokens, or RBAC
Mistake
OneLake requires manual creation of an Azure Data Lake Storage account.
Correct
OneLake is automatically provisioned for every Fabric tenant. You do not need to create or manage any storage account. It is built on ADLS Gen2 but abstracted away.
Mistake
Shortcuts physically copy data into OneLake for faster access.
Correct
Shortcuts are virtual references. They do not move or copy data. Queries read directly from the source location. This avoids duplication and storage costs.
Mistake
Workspaces in Fabric are identical to Power BI workspaces.
Correct
While similar, Fabric workspaces can contain many more item types (Lakehouses, Notebooks, Pipelines, etc.) and are tied to Fabric capacities. They also have different role definitions.
Mistake
All data in OneLake is stored in Delta-Parquet format.
Correct
Tables are stored as Delta-Parquet, but the `Files/` folder in a Lakehouse can contain any file format (CSV, JSON, Parquet, etc.). Only tables are Delta-Parquet by default.
Mistake
OneLake can only be accessed from within Microsoft Fabric.
Correct
OneLake can be accessed externally via Azure Storage Explorer, REST APIs, and tools that support the ADLS Gen2 DFS endpoint. However, authentication requires Azure AD (Entra ID) tokens.
Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.
OneLake is a logical data lake built on top of ADLS Gen2. It is automatically provisioned per Fabric tenant and provides a unified namespace across regions and clouds. Unlike ADLS Gen2, you do not manage storage accounts, access keys, or SAS tokens. OneLake also natively supports Delta-Parquet tables and shortcuts to external data sources. ADLS Gen2 is a general-purpose storage service that requires manual configuration and offers more granular control over storage settings.
Shortcuts are virtual references that point to data stored in ADLS Gen2, Amazon S3, or other OneLake locations. When you create a shortcut, Fabric stores metadata about the source location and credentials. Queries read data directly from the source; no data is copied. Shortcuts appear as tables or folders in a Lakehouse. They are read-only for external sources (S3, ADLS) but can be writable for internal shortcuts within the same tenant.
Yes. OneLake supports the ADLS Gen2 DFS endpoint. You can use tools like Azure Storage Explorer, Azure CLI, or REST APIs to access data, provided you authenticate with Azure AD (Entra ID). The URL format is `https://onelake.dfs.fabric.microsoft.com/<workspace_guid>/<item_path>`. However, you cannot use storage account keys or SAS tokens; only Azure AD authentication is supported.
Direct Lake is a Power BI connectivity mode that allows semantic models to query data directly from OneLake without importing data or using DirectQuery. It provides the performance of import mode (sub-second queries) while keeping data in OneLake. It is ideal for large datasets that need frequent refreshes. Direct Lake requires a Fabric capacity and works best with Delta-Parquet tables.
Security in OneLake is managed through workspace roles (Admin, Member, Contributor, Viewer) and item-level permissions. For fine-grained access, you can use OneLake data access roles to restrict access to specific tables or rows (row-level security). All access is authenticated via Azure AD. There are no storage account keys or SAS tokens involved.
The default maximum is 1,000 workspaces per tenant. If you need more, you can contact Microsoft support to request an increase. Exceeding the limit will prevent creation of new workspaces until some are deleted or the limit is raised.
Yes. Power BI Premium capacities (P1-P4) can be used as Fabric capacities. When you assign a Power BI Premium capacity to a workspace, it enables all Fabric experiences, including OneLake. However, some advanced Fabric features may require higher SKUs.
You've just covered Microsoft Fabric: OneLake and Workspaces — now see how well it sticks with free DP-900 practice questions. Full explanations included, no account needed.
Done with this chapter?