This chapter covers Azure Data Lake Storage Gen2 (ADLS Gen2), a cloud storage service that combines the scalability and cost-efficiency of Azure Blob Storage with a hierarchical namespace and POSIX-like file system semantics. For the AZ-104 exam, understanding ADLS Gen2 is critical because it appears in questions about storage accounts, data analytics, and security—roughly 10-15% of storage-related questions touch on ADLS Gen2. You must know its architecture, how it differs from standard Blob Storage, and how to configure its security and performance features.
Jump to a section
Imagine a massive warehouse where a company stores all its shipping containers. In a traditional flat warehouse (Blob Storage), containers are scattered randomly on the floor, each with a unique ID. To find a container from 'Customer A, Region East, Month January', you must search a global registry and then navigate directly to that container. There are no aisles or shelves—every container is equally accessible, but grouping or navigating by customer or region requires external indexing. Now, consider a hierarchical warehouse (Azure Data Lake Storage Gen2). The warehouse is organized with aisles, shelves, and bins. You can store containers in a path like /Customers/A/East/January/. To find all containers for Customer A, you simply walk down the 'Customers' aisle to the 'A' section. The warehouse manager (the storage service) maintains a directory tree, so listing contents of /Customers/A/ is fast and efficient—no need to scan every container. Permissions can be applied at the aisle, shelf, or bin level, just like POSIX ACLs. The key innovation is that the physical layout of the warehouse (the underlying blob storage) remains the same, but the warehouse adds a logical directory structure on top. This enables analytics tools like Spark and Hive to navigate the warehouse naturally, as if it were a file system, without expensive full scans. In short, ADLS Gen2 gives a flat blob store the organizational power of a file system, making it ideal for big data analytics.
What is Azure Data Lake Storage Gen2?
Azure Data Lake Storage Gen2 is a set of capabilities dedicated to big data analytics, built on top of Azure Blob Storage. It is not a separate storage account type; rather, it is Blob Storage with the hierarchical namespace feature enabled. This feature introduces a file system abstraction that allows you to organize data into directories and subdirectories, just like a traditional file system. Under the hood, the data is still stored as blobs in a flat namespace, but the service maintains a directory tree that maps paths to blobs. This enables operations like renaming directories, listing contents of a directory, and setting permissions at the directory level, all without scanning all blobs.
Why does it exist?
Traditional Blob Storage is optimized for simple object storage where each blob has a unique name. It works well for content delivery, backups, and general-purpose storage. However, for big data analytics workloads (like those using Apache Spark, Hive, or Azure Databricks), applications expect a file system interface. They need to navigate directories, list files in a folder, and set permissions on directories. Without a hierarchical namespace, these operations become slow and expensive because the application must scan all blobs or maintain external metadata. ADLS Gen2 solves this by providing a native file system interface on top of blob storage, offering the best of both worlds: the scalability and cost of blob storage with the usability of a file system.
How does it work internally?
Internally, ADLS Gen2 uses a two-layer architecture:
- Storage Layer: The actual data is stored as block blobs in a flat namespace. Each blob corresponds to a file, and its name is a unique identifier that includes the full path (e.g., /mydata/2024/01/log.txt). However, unlike standard Blob Storage, the blob name is not just a flat string; it is parsed by the service to simulate a directory structure.
- Hierarchical Namespace Layer: This layer maintains a directory tree in a separate metadata store. When you create a directory, the service creates a zero-length blob that represents the directory. When you list files in a directory, the service queries the metadata store to find all blobs whose path starts with that directory prefix. This is much faster than scanning all blobs because the metadata store is indexed.
Key operations:
- Create Directory: Creates a zero-length blob with a special marker (e.g., $directory$). The blob has no content but acts as a placeholder.
- Rename Directory: Instead of renaming each blob individually, the service updates the metadata to point the old path to the new path. This is an atomic operation at the metadata level, making it very fast even for directories containing millions of files.
- List Directory: The service queries the metadata store for all blobs whose path starts with the directory prefix and returns the list. This is O(n) where n is the number of blobs in that directory, not the total number of blobs.
- Set Permissions: ACLs are stored as metadata on the directory or file blob. When a request is made, the service checks the ACLs along the path (from root to the target) to determine access.
Key Components, Values, Defaults, and Timers
Hierarchical Namespace: Must be enabled at storage account creation time. It cannot be enabled on an existing storage account that has data. If you need to convert a standard Blob Storage account, you must migrate data to a new account.
Storage Account Types: ADLS Gen2 is available only in StorageV2 (general purpose v2) accounts. It is not available in BlobStorage or general purpose v1 accounts.
Performance Tiers: Standard (HDD-based) and Premium (SSD-based). Premium tier offers lower latency and higher throughput, suitable for high-performance analytics.
Redundancy Options: LRS, GRS, RA-GRS, ZRS, GZRS, RA-GZRS. For analytics, LRS or ZRS is common; RA-GRS is used for read-heavy workloads with geo-redundancy.
Access Control: Supports both Azure RBAC and POSIX-like ACLs. RBAC controls management plane access (e.g., creating storage account), while ACLs control data plane access (e.g., reading files). ACLs are evaluated first; if an ACL denies access, RBAC is not checked.
Default ACLs: When a new file or directory is created, it inherits the default ACLs from its parent directory. This allows setting permissions for future objects.
Umask: Not directly supported. Instead, you can set default ACLs to achieve similar results.
Soft Delete: Not available for ADLS Gen2 directories. However, blob soft delete can be enabled for the underlying blobs, but it is not integrated with the hierarchical namespace.
Encryption: Data is encrypted at rest by default using Microsoft-managed keys. Customer-managed keys are supported via Azure Key Vault.
Immutable Storage: Supported via time-based retention policies and legal holds.
Hierarchical Namespace Limitations: Once enabled, certain Blob Storage features are not available, including: Blob snapshots, Blob soft delete, Blob inventory, Blob change feed, and point-in-time restore. However, ADLS Gen2 supports its own versioning via Azure Data Lake Storage lifecycle management.
Configuration and Verification Commands
To create a storage account with hierarchical namespace enabled using Azure CLI:
az storage account create \
--name mystorageaccount \
--resource-group myResourceGroup \
--location eastus \
--sku Standard_LRS \
--kind StorageV2 \
--hierarchical-namespace trueTo verify hierarchical namespace status:
az storage account show --name mystorageaccount --query isHnsEnabledTo create a container (filesystem) and a directory:
az storage fs create --name mycontainer --account-name mystorageaccount
az storage fs directory create --name mydir --file-system mycontainer --account-name mystorageaccountTo set ACLs on a directory:
az storage fs access set --path mydir --file-system mycontainer --permissions rwxr-xr-x --account-name mystorageaccountTo list ACLs:
az storage fs access show --path mydir --file-system mycontainer --account-name mystorageaccountTo upload a file:
az storage fs file upload --source localfile.txt --path mydir/file.txt --file-system mycontainer --account-name mystorageaccountHow it interacts with related technologies
Azure Blob Storage: ADLS Gen2 is built on Blob Storage, so the underlying blob APIs are still available. However, Microsoft recommends using the ADLS Gen2 REST API (which uses the dfs.core.windows.net endpoint) for hierarchical operations. The blob endpoint (blob.core.windows.net) still works but does not support directory operations.
Azure Data Lake Analytics: Deprecated. ADLS Gen2 is the recommended storage for Azure Data Lake Analytics (now Azure Synapse Analytics).
Azure Databricks: Native support via the DBFS (Databricks File System) when mounted to an ADLS Gen2 account. Databricks can read and write using the abfss:// driver.
Azure Synapse Analytics: Can query data directly from ADLS Gen2 using serverless SQL pool or dedicated SQL pool.
HDInsight: Can use ADLS Gen2 as the primary storage for Hadoop clusters.
Azure Data Factory: Supports ADLS Gen2 as both source and sink.
Power BI: Can connect to ADLS Gen2 using Power Query.
Azure Storage Firewall and Virtual Networks: Work with ADLS Gen2. You can restrict access to specific VNets or IP addresses.
Private Endpoints: Supported, allowing secure access from your VNet without traversing the internet.
Azure Policy: Can enforce hierarchical namespace at the subscription level.
Performance Considerations
Throughput: ADLS Gen2 can achieve higher throughput than standard Blob Storage for analytics workloads because it can parallelize reads and writes across multiple containers and directories. The hierarchical namespace reduces metadata overhead.
Latency: Premium tier offers single-digit millisecond latency for small I/O operations.
Scalability: No limit on the number of files or directories. However, the hierarchical namespace metadata store has its own limits: up to 10 million directories per storage account (soft limit, can be increased by support).
Cost: There is no additional cost for enabling hierarchical namespace. You pay standard Blob Storage rates for storage and transactions. However, operations like listing directories may incur more transactions than flat namespace because the service must query the metadata store.
Security Details
- Authentication: Supports Azure AD (OAuth2) and shared key (account key). For production, Azure AD is recommended. - Authorization: RBAC for control plane (storage account management) and ACLs for data plane. The built-in roles for data plane are: - Storage Blob Data Owner: Full access including ACL management. - Storage Blob Data Contributor: Read/write/delete access. - Storage Blob Data Reader: Read-only access. - ACL Entries: Each ACL entry consists of a security principal (user, group, or service principal), a permission type (read, write, execute), and a scope (access ACL or default ACL). Maximum number of ACL entries per file or directory is 32 (including the four built-in entries: owner, owning group, mask, and other). - Superuser: The storage account owner (the Azure AD user who created the account) has superuser privileges and can bypass ACLs.
Lifecycle Management
ADLS Gen2 supports lifecycle management policies to tier data to cool or archive tiers based on last modification time. You can define rules at the container or directory level using Azure CLI or portal. Note that once a blob is moved to archive tier, it must be rehydrated before reading, which can take hours.
Monitoring
Azure Monitor: Provides metrics like egress, ingress, and latency. You can set alerts.
Storage Analytics Logging: Logs read, write, and delete operations. Can be enabled for the blob service (not the DFS endpoint).
Diagnostic Settings: Can be configured to send resource logs to Log Analytics, Event Hubs, or storage.
Common Exam Scenarios
Enabling hierarchical namespace: Must be done at creation. If a candidate thinks it can be enabled later, that is a trap.
ACL vs RBAC: The exam tests that ACLs are evaluated first and can override RBAC. Also, that ACLs are only for data plane, RBAC for control plane.
Limitations: Snapshot and soft delete not supported with hierarchical namespace. Candidates often forget this.
Performance: ADLS Gen2 is better than Blob Storage for analytics because of directory operations.
Security: Using Azure AD is preferred for production. Shared key should be avoided.
Conclusion
Azure Data Lake Storage Gen2 is a powerful enhancement to Blob Storage that brings file system semantics to the cloud. For the AZ-104 exam, focus on its architecture, configuration, security model, and limitations. Understanding these will help you answer questions about storage for big data analytics, data lake architectures, and hybrid scenarios.
Create Storage Account
Navigate to the Azure portal, select 'Create a resource', then 'Storage account'. Fill in the required fields: subscription, resource group, storage account name (must be globally unique, 3-24 characters, lowercase alphanumerics), region, performance tier (Standard or Premium), redundancy (e.g., LRS, GRS). Under 'Advanced', toggle 'Hierarchical namespace' to Enabled. This is the critical step—once enabled, it cannot be disabled, and it cannot be enabled later. Click 'Review + create' then 'Create'. The deployment takes a few minutes. After creation, verify that hierarchical namespace is enabled by checking the storage account properties or using Azure CLI: `az storage account show --name <account> --query isHnsEnabled`.
Create a Filesystem (Container)
In ADLS Gen2, a container is called a 'filesystem'. It is the top-level logical unit where data is stored. In the Azure portal, go to your storage account, under 'Data Lake Storage', select 'Containers'. Click '+ Container' and provide a name (e.g., 'myfilesystem'). The name must be lowercase, 3-63 characters, and can contain hyphens. You can also create it via CLI: `az storage fs create --name myfilesystem --account-name <account>`. The filesystem is equivalent to a Blob Storage container but optimized for hierarchical namespace operations.
Create Directories and Upload Files
Within the filesystem, you can create directories to organize data. In the portal, navigate to the filesystem, click 'Add directory', and provide a path (e.g., 'sales/2024/January'). You can create nested directories in one go. Upload files by clicking 'Upload' and selecting a file, or use CLI: `az storage fs directory create --path sales/2024/January --file-system myfilesystem --account-name <account>` and `az storage fs file upload --source data.csv --path sales/2024/January/data.csv --file-system myfilesystem --account-name <account>`. The service stores the file as a blob with the full path as its name.
Set Access Control Lists (ACLs)
To secure data, you can set POSIX-like ACLs on directories and files. In the portal, go to the directory or file, select 'Access Control (IAM)', then 'Add role assignment' for RBAC, or use 'Manage ACL' under 'Data Lake Storage' for ACLs. For ACLs, you can add entries for users or groups with read (r), write (w), and execute (x) permissions. Default ACLs can be set on directories to apply to future children. Use CLI: `az storage fs access set --path sales --file-system myfilesystem --permissions rwxr-xr-x --account-name <account>`. ACLs are evaluated before RBAC; if an ACL denies access, RBAC cannot override.
Access Data from Analytics Services
Once data is stored, you can access it from services like Azure Databricks, HDInsight, or Azure Synapse. For example, in Databricks, you mount the ADLS Gen2 account using the service principal or access key: `dbutils.fs.mount("abfss://myfilesystem@myaccount.dfs.core.windows.net", "/mnt/mydata")`. The driver uses the `abfss://` protocol to communicate with the DFS endpoint. For Synapse, you can create a serverless SQL database that queries external data in ADLS Gen2 using OPENROWSET. Performance is optimized because the hierarchical namespace allows directory pruning and parallel reads.
Enterprise Scenario 1: Big Data Analytics Pipeline
A large e-commerce company uses ADLS Gen2 as the central data lake for all customer transactions, clickstream logs, and inventory data. Data arrives in real-time from Azure Event Hubs and is stored in ADLS Gen2 in a directory structure like /raw/events/{date}/{hour}/. The company uses Azure Databricks to process this data, transforming it into curated datasets stored in /curated/customers/ and /curated/sales/. The hierarchical namespace allows Databricks to efficiently list only the new files in a directory, rather than scanning all blobs. This reduces job startup time from minutes to seconds. They also set ACLs so that only the data engineering team has write access to /raw/, while analysts have read access to /curated/. Performance scales to hundreds of terabytes, and they use lifecycle management to move data older than 30 days to cool tier, reducing costs. Misconfiguration example: If the hierarchical namespace was not enabled at creation, they would have to migrate data to a new account, causing downtime.
Enterprise Scenario 2: Multi-tenant Data Isolation
A SaaS provider hosts multiple customers in a single ADLS Gen2 account. Each customer's data is stored under a separate directory: /customer1/, /customer2/, etc. They use ACLs to isolate access: each customer's service principal has read/write access only to its own directory. RBAC is used for administrative tasks. The hierarchical namespace allows easy listing of all files for a customer without leaking data. They also use Azure Storage Firewall and Private Endpoints to ensure that only their VNet can access the storage account. Common pitfall: forgetting that ACLs are evaluated before RBAC; if a deny ACL is set inadvertently, even an administrator with Storage Blob Data Owner role cannot access the data. They mitigate this by using audit logs to detect unauthorized access attempts.
Enterprise Scenario 3: Hybrid Cloud with Azure Stack
A financial services firm uses ADLS Gen2 for on-premises analytics with Azure Stack Hub. They replicate data from on-premises Hadoop clusters to ADLS Gen2 in Azure for long-term retention and disaster recovery. The hierarchical namespace ensures compatibility with on-premises Hadoop file systems (HDFS), allowing seamless migration. They use Azure Data Factory to orchestrate incremental data loads. Performance consideration: network latency between on-premises and Azure can impact throughput, so they use Azure ExpressRoute for dedicated connectivity. Misconfiguration: If they use standard Blob Storage instead of ADLS Gen2, their Hadoop jobs would fail because they expect HDFS-like directory operations. They learned this the hard way after a migration project delay.
The AZ-104 exam tests Azure Data Lake Storage Gen2 primarily under Objective 2.2 'Configure Azure Storage security' and implicitly under 'Create and configure storage accounts' (Objective 2.1). Expect 2-3 questions that directly reference ADLS Gen2, often in the context of security, performance, or limitations.
Common Wrong Answers and Why Candidates Choose Them
'You can enable hierarchical namespace on an existing storage account' – This is false. Candidates often confuse this with other features that can be enabled later, like soft delete or versioning. The exam explicitly tests that hierarchical namespace must be enabled during account creation.
'ADLS Gen2 supports blob snapshots' – False. Snapshots are not supported with hierarchical namespace enabled. Candidates may assume all blob features are available, but the exam highlights this exception.
'ACLs override RBAC' – Partially true but misleading. ACLs are evaluated first, but RBAC can grant additional access if ACLs don't deny. The correct statement is 'ACLs are evaluated before RBAC; if an ACL denies access, RBAC cannot override.' The exam loves to test this nuance.
'You can use the blob endpoint for directory operations' – False. The blob endpoint does not support directory rename or list operations. You must use the DFS endpoint (dfs.core.windows.net). Candidates often think both endpoints are interchangeable.
Specific Numbers and Terms
Storage account type: Must be StorageV2 (general purpose v2).
Hierarchical namespace: Must be enabled at creation.
ACL limit: 32 ACL entries per file/directory (including 4 built-in).
Default ACL: Applied to new children.
Performance tier: Premium for low latency; Standard for cost.
Redundancy: LRS, ZRS, GRS, RA-GRS, GZRS, RA-GZRS.
Unsupported features: Snapshots, soft delete, blob inventory, change feed, point-in-time restore.
Authentication: Azure AD recommended; shared key supported.
Edge Cases and Exceptions
Immutable storage: Supported via time-based retention policies and legal holds. However, if you enable immutable storage, you cannot delete or modify blobs during the retention period. This can conflict with lifecycle management.
Soft delete for blobs: Even if enabled, it does not work for ADLS Gen2 directories. Only the underlying blobs (files) are protected, but directory metadata is not restored.
Customer-managed keys: Supported, but if you revoke the key, access to data is blocked. This can cause cascading failures in analytics pipelines.
Zone-redundant storage (ZRS): Not available in all regions. Check regional availability.
How to Eliminate Wrong Answers
If a question mentions 'enabling hierarchical namespace on an existing account', eliminate that answer unless it says 'migrate to a new account'.
If a question lists features like snapshots or soft delete, and asks which are supported with ADLS Gen2, the correct answer is usually 'none of the above' or a subset that excludes those.
For security questions, remember that ACLs are for data plane, RBAC for control plane. If a scenario involves granting a user read access to a specific directory, the answer is ACL, not RBAC.
For performance, ADLS Gen2 is better for analytics because of directory operations. If a comparison question asks which storage is faster for listing files in a folder, ADLS Gen2 is correct.
Exam Tips
Memorize the list of unsupported features: 'No snapshots, no soft delete, no change feed, no point-in-time restore.'
Know that ACLs have a limit of 32 entries.
Understand that the abfss:// driver is used by Spark/Hadoop.
Practice creating a storage account with CLI and verifying HNS.
ADLS Gen2 is Blob Storage with hierarchical namespace enabled; must be set at account creation.
Only available on StorageV2 (general purpose v2) accounts.
Unsupported features: snapshots, soft delete, change feed, inventory, point-in-time restore.
ACLs are evaluated before RBAC; deny ACLs cannot be overridden by RBAC.
Use DFS endpoint (dfs.core.windows.net) for directory operations; blob endpoint works but limited.
Maximum 32 ACL entries per file/directory (including 4 built-in).
Default ACLs on directories apply to new child objects.
Azure AD authentication is recommended over shared key for production.
Lifecycle management policies can tier data to cool/archive tiers.
Performance premium tier offers lower latency for analytics workloads.
These come up on the exam all the time. Here's how to tell them apart.
Azure Blob Storage (Flat Namespace)
No hierarchical directory structure; all blobs are at the same level.
Listing files in a 'folder' requires scanning all blobs or using prefixes, which is slower.
Supports blob snapshots, soft delete, change feed, and point-in-time restore.
Ideal for general-purpose storage, backups, and content delivery.
REST API uses blob.core.windows.net endpoint.
Azure Data Lake Storage Gen2
Hierarchical namespace with directories and subdirectories.
Directory operations (list, rename) are fast and efficient, even with millions of files.
Does not support snapshots, soft delete, change feed, or point-in-time restore.
Optimized for big data analytics workloads (Spark, Hive, Databricks).
REST API uses dfs.core.windows.net endpoint for file system operations.
Mistake
ADLS Gen2 is a separate type of storage account, like BlobStorage or FileStorage.
Correct
ADLS Gen2 is not a separate account type; it is a feature (hierarchical namespace) that can be enabled on a general-purpose v2 storage account. The account kind remains StorageV2.
Mistake
You can enable hierarchical namespace on an existing storage account that already contains data.
Correct
Hierarchical namespace must be enabled at the time of storage account creation. It cannot be enabled later. To convert an existing account, you must migrate data to a new account with HNS enabled.
Mistake
ADLS Gen2 supports all Blob Storage features, including snapshots and soft delete.
Correct
ADLS Gen2 does not support blob snapshots, blob soft delete, blob inventory, blob change feed, or point-in-time restore. These features are incompatible with the hierarchical namespace.
Mistake
The blob endpoint (blob.core.windows.net) can be used for all ADLS Gen2 operations, including directory rename.
Correct
The blob endpoint does not support hierarchical namespace operations like directory rename, list, or ACL management. You must use the DFS endpoint (dfs.core.windows.net) for those operations.
Mistake
ACLs and RBAC are evaluated together, with the most permissive one winning.
Correct
ACLs are evaluated first. If an ACL denies access, the request is denied regardless of RBAC permissions. RBAC is only evaluated if ACLs do not explicitly deny access.
Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.
No, hierarchical namespace must be enabled during storage account creation. It cannot be enabled later. If you need ADLS Gen2 on an existing account, you must create a new storage account with HNS enabled and migrate your data.
Only general-purpose v2 (StorageV2) accounts support ADLS Gen2. BlobStorage and general-purpose v1 accounts do not support the hierarchical namespace feature.
No, blob snapshots are not supported when hierarchical namespace is enabled. This is a key limitation tested on the AZ-104 exam.
You can set POSIX-like ACLs using the Azure portal, CLI, or REST API. For example, using CLI: `az storage fs access set --path mydir --file-system mycontainer --permissions rwxr-xr-x --account-name myaccount`. You can also use Azure RBAC for control plane access.
The blob endpoint (blob.core.windows.net) is for standard Blob Storage operations. The DFS endpoint (dfs.core.windows.net) is for ADLS Gen2 file system operations like directory create, rename, and ACL management. For full ADLS Gen2 functionality, use the DFS endpoint.
Yes, Azure Databricks has native support for ADLS Gen2 via the `abfss://` driver. You can mount an ADLS Gen2 filesystem or access it directly using service principal authentication.
ADLS Gen2 supports LRS, GRS, RA-GRS, ZRS, GZRS, and RA-GZRS. The choice depends on your durability and availability requirements.
You've just covered Azure Data Lake Storage Gen2 — now see how well it sticks with free AZ-104 practice questions. Full explanations included, no account needed.
Done with this chapter?