DP-900Chapter 77 of 101Objective 3.1

Azure Data Lake Storage Gen2 Hierarchical Namespace

This chapter covers Azure Data Lake Storage Gen2 (ADLS Gen2) hierarchical namespace, a key feature that transforms Azure Blob Storage into a file-system-like storage for big data analytics. For the DP-900 exam, this topic appears in Domain 3 (Analytics), Objective 3.1: Describe the core data workloads. Approximately 10-15% of exam questions touch on ADLS Gen2, focusing on its benefits over flat blob storage, how it enables POSIX-like access, and its integration with analytics services. You will need to understand the hierarchical namespace mechanism, its advantages for cost and performance, and common use cases like data lakes.

25 min read
Intermediate
Updated May 31, 2026

Library Filing System with Aisle Signs

Imagine a library where books are stored in a single giant pile in a warehouse. To find a specific book, you must consult a card catalog that lists every book's exact coordinates in the pile (e.g., 'shelf 47, stack 3, position 12'). This is like blob storage with a flat namespace—you can find items quickly if you know their full path, but you cannot browse by subject or author. Now consider a library organized by aisles labeled 'Science,' 'History,' 'Fiction,' and within each aisle, shelves labeled by subcategory (e.g., 'Physics,' 'Chemistry'). You can walk to the 'Science' aisle, then to the 'Physics' shelf, and find all books there. This is the hierarchical namespace. The aisle signs and shelf labels are like directories. You can navigate the structure, set permissions at any level (e.g., allow only certain patrons to access 'History'), and move entire shelves to a different aisle without re-indexing every book. The hierarchical namespace enables this by storing directory entries as first-class objects, not just prefixes. When you rename a folder, the system updates a single metadata pointer, not millions of blob names. This makes operations like 'move' or 'delete a folder' instantaneous regardless of the number of files inside.

How It Actually Works

What is Azure Data Lake Storage Gen2?

Azure Data Lake Storage Gen2 (ADLS Gen2) is a set of capabilities dedicated to big data analytics, built on top of Azure Blob Storage. It combines the scalability and cost-effectiveness of blob storage with a hierarchical namespace that organizes objects into a directory hierarchy similar to a file system. This enables data to be accessed using both object storage APIs (e.g., REST, SDKs) and file system APIs (e.g., NFS 3.0, Hadoop Distributed File System (HDFS) via Azure Blob File System (ABFS) driver).

Why Hierarchical Namespace Exists

Traditional Azure Blob Storage uses a flat namespace: every blob is stored at the root level, and paths like container/folder1/file.txt are just a naming convention—the 'folder1' is not a real directory object. This flat structure causes several issues for analytics workloads: - Rename operations are O(n): Renaming a 'folder' containing 1 million files requires updating the name of each blob individually. This can take minutes or hours. - No atomic directory operations: Deleting a 'folder' requires listing and deleting each blob one by one. - No POSIX permissions: Access control is limited to container-level or blob-level, not directory-level.

ADLS Gen2 solves these by introducing a hierarchical namespace where directories are first-class objects. Each directory has its own metadata and can be manipulated atomically.

How It Works Internally

At the storage layer, ADLS Gen2 stores data in the same distributed system as blob storage, but the namespace is managed differently. The hierarchical namespace is implemented as a separate metadata layer that keeps track of the directory tree. When you create a directory, the system creates a directory entry object. When you rename a directory, only the directory entry's name is updated—the blobs within are not touched. This makes rename and delete operations O(1) (constant time) regardless of the number of objects inside.

Key components: - Storage Account: ADLS Gen2 is enabled on a StorageV2 (general-purpose v2) or BlockBlobStorage account by setting the 'Hierarchical namespace' property to 'Enabled' at account creation. This cannot be changed later. - Container: The top-level namespace in which data is organized. In ADLS Gen2, a container is equivalent to a file system root. - Directory: A logical grouping of files and subdirectories. Directories are stored as metadata objects. - File: A blob stored in the hierarchy. Files have both blob properties and file system attributes (e.g., POSIX permissions).

POSIX Access Control Lists (ACLs)

ADLS Gen2 supports both POSIX-style ACLs and Azure role-based access control (RBAC). POSIX ACLs can be set at the directory or file level and control access for the owning user, owning group, named users, named groups, and others. There are two types: - Access ACLs: Control access to an object. - Default ACLs: Apply to new children created under a directory.

Each ACL entry has a permission mask (read, write, execute) represented in octal (e.g., 755). The effective permissions are computed based on the most specific match.

ABFS Driver and HDFS Compatibility

ADLS Gen2 exposes an HDFS-compatible endpoint via the Azure Blob File System (ABFS) driver. This driver translates HDFS operations to REST calls to the ADLS Gen2 endpoint. The URI scheme is abfs://<container>@<storageaccount>.dfs.core.windows.net/<path>. This allows tools like Apache Hadoop, Spark, and Hive to use ADLS Gen2 as if it were HDFS, without any code changes.

Performance and Cost Benefits

Atomic rename and delete: Operations on directories are instant, reducing job latency when staging data.

Parallel access: The hierarchical namespace enables efficient parallel processing by allowing jobs to split work by directory.

Cost: Storage costs are the same as blob storage (hot, cool, archive tiers). However, the hierarchical namespace can reduce transaction costs by eliminating the need to list and delete millions of blobs.

Interaction with Other Azure Services

Azure Data Factory: Can ingest data into ADLS Gen2 using copy activities.

Azure Databricks: Can read/write data using the ABFS driver.

Azure Synapse Analytics: Can query data in ADLS Gen2 using serverless SQL or dedicated SQL pools.

Azure HDInsight: Supports ADLS Gen2 as a primary storage.

Azure Machine Learning: Can use ADLS Gen2 as a datastore.

Configuration and Verification

To create an ADLS Gen2 storage account with hierarchical namespace enabled using Azure CLI:

az storage account create \
    --name mystorageaccount \
    --resource-group myResourceGroup \
    --location eastus \
    --sku Standard_GRS \
    --kind StorageV2 \
    --hierarchical-namespace true

To verify if hierarchical namespace is enabled on an existing account:

az storage account show \
    --name mystorageaccount \
    --resource-group myResourceGroup \
    --query isHnsEnabled

This returns true or false.

Default Values and Limits

Maximum storage account capacity: 5 PiB (default), can be increased to 500 TiB per account with large file shares enabled.

Maximum file size: 4.75 TiB (block blob default), up to 190.7 TiB with large block blob support.

Maximum directory depth: No documented limit, but practical limits due to path length (max 1024 characters).

Concurrent connections: No hard limit, but performance degrades beyond 10,000 concurrent requests per partition.

Common Commands

To create a directory:

az storage fs directory create \
    --name mydirectory \
    --file-system mycontainer \
    --account-name mystorageaccount

To list files and directories:

az storage fs file list \
    --file-system mycontainer \
    --path mydirectory \
    --account-name mystorageaccount

To set POSIX ACLs:

az storage fs access set \
    --acl "user::rwx,group::r-x,other::r--" \
    --path mydirectory \
    --file-system mycontainer \
    --account-name mystorageaccount

Exam-Relevant Details

Hierarchical namespace must be enabled at account creation; it cannot be enabled later.

ADLS Gen2 is built on Blob Storage, so it inherits blob features like lifecycle management, versioning, and soft delete.

The ABFS driver is the recommended way to access ADLS Gen2 from Hadoop/Spark; WASB (Windows Azure Storage Blob) driver is legacy and does not support hierarchical namespace.

ADLS Gen2 supports both POSIX ACLs and RBAC; RBAC is evaluated first, then ACLs if RBAC doesn't explicitly deny access.

The hierarchical namespace does not support NFS v3 by default; you must enable the NFS 3.0 protocol feature on the account.

Walk-Through

1

Enable HNS at Account Creation

When creating a new Azure Storage account of kind StorageV2 (general-purpose v2) or BlockBlobStorage, you can set the 'Hierarchical namespace' property to 'Enabled' in the Azure portal, or use the `--hierarchical-namespace true` flag in Azure CLI. This is a one-time setting that cannot be changed after creation. If you need to convert an existing blob storage account, you must migrate data to a new account with HNS enabled. The account creation process provisions the metadata layer for the hierarchical namespace alongside the blob storage backend.

2

Create a Container as Root

After the storage account is created, you create a container (equivalent to a file system root). In the Azure portal, you navigate to the 'Containers' blade and add a container. Using CLI: `az storage container create --name mycontainer --account-name mystorageaccount`. The container is the top-level namespace. All directories and files will reside under this container.

3

Create Directories and Upload Files

With the container ready, you create directories using the `az storage fs directory create` command or via Azure portal. Directories are metadata objects that organize files. When you upload a file, you can specify a directory path. The ABFS driver or REST API will create any intermediate directories automatically if they don't exist. Files are stored as block blobs but with additional file system attributes (POSIX permissions, modification time).

4

Set POSIX ACLs on Directory

To control access, you set POSIX ACLs on a directory using `az storage fs access set` or via Azure portal. For example, to give read/write/execute to owner, read/execute to group, and read-only to others: `--acl "user::rwx,group::r-x,other::r--"`. Default ACLs can be set with the `--default` flag. These ACLs are stored as metadata on the directory object. When a file is accessed, the system computes effective permissions by combining RBAC (first) and ACLs (if RBAC allows).

5

Access Data via ABFS Driver

Analytics services like Azure Databricks or HDInsight connect to ADLS Gen2 using the ABFS driver. The connection string uses the URI scheme `abfs://<container>@<storageaccount>.dfs.core.windows.net/<path>`. The driver translates HDFS calls (e.g., `mkdir`, `rename`, `getFileStatus`) to REST API calls against the DFS endpoint (`*.dfs.core.windows.net`). This enables existing Hadoop/Spark applications to work without modification.

6

Perform Atomic Operations

With hierarchical namespace, operations like renaming a directory are atomic and O(1). For example, using `az storage fs directory move` renames the directory metadata object; no files are moved. Similarly, deleting a directory with `az storage fs directory delete` removes the directory and its contents atomically. This is a key exam point: compare to flat blob storage where rename requires renaming each blob.

What This Looks Like on the Job

Enterprise Scenario 1: Data Lake for a Retail Company

A large retailer uses ADLS Gen2 as its central data lake to store clickstream logs, sales transactions, and inventory data. The hierarchical namespace is essential because data is organized by year/month/day (e.g., /logs/2025/03/01/). Each day's data arrives in a staging directory, is validated, then moved to the final directory. Without HNS, moving a day's folder (containing thousands of files) would require renaming each file individually, taking minutes. With HNS, the move is instantaneous. The company also sets POSIX ACLs to allow the marketing team read access to sales data but not inventory data. They use Azure Data Factory to orchestrate ingestion and Azure Databricks for processing. The storage account is configured with geo-redundant storage (GRS) for disaster recovery. A common misconfiguration is not enabling HNS at creation, forcing a data migration later.

Enterprise Scenario 2: Genomic Research with Fine-Grained Access

A research institute stores genomic sequencing data in ADLS Gen2. Each research project gets a dedicated container, and within it, directories for raw data, processed results, and publications. The hierarchical namespace allows setting default ACLs on the project root so that new files automatically inherit permissions: only the principal investigator has write access, while team members have read/execute. This is critical for compliance with data privacy regulations. The institute uses NFS 3.0 to mount the storage on Linux compute nodes for legacy tools. They enabled the NFS 3.0 protocol feature on the storage account. Performance considerations include ensuring sufficient throughput by using premium block blob storage for hot data. A common pitfall is forgetting that NFS 3.0 requires the hierarchical namespace to be enabled, and that the storage account must have public network access from the virtual network.

Enterprise Scenario 3: IoT Data Ingestion Pipeline

A smart city project collects sensor data from thousands of devices. Data arrives in real-time via Azure Event Hubs and is stored in ADLS Gen2 partitioned by device ID and timestamp (e.g., /devices/sensor001/2025/03/01/hourly/). The hierarchical namespace enables efficient queries using Azure Synapse serverless SQL—users can query a specific device's data by simply pointing to its directory. Lifecycle management policies are set to move data older than 30 days to cool tier and delete after 1 year. The HNS allows the lifecycle management to operate on entire directories. Without HNS, lifecycle policies would need to be applied to each blob individually. A misconfiguration that can occur is setting the tier policy on the container level without realizing that directories are not objects in lifecycle management—they are metadata, so policies apply to blobs matching a prefix.

How DP-900 Actually Tests This

What DP-900 Tests on This Topic

DP-900 objective 3.1: 'Describe the core data workloads' includes understanding when to use Azure Data Lake Storage Gen2 vs. Blob Storage. The exam focuses on:

The key differentiator: hierarchical namespace vs. flat namespace.

Benefits: atomic directory operations, POSIX permissions, HDFS compatibility via ABFS driver.

Use cases: big data analytics, data lakes, any workload needing a file system structure.

Limitations: cannot be enabled on existing accounts, not suitable for simple object storage (use blob).

Common Wrong Answers and Why Candidates Choose Them

1.

'ADLS Gen2 is a completely new storage type separate from Blob Storage.' This is false. ADLS Gen2 is built on Blob Storage—it's an evolution, not a replacement. Candidates assume it's a different service because of the name.

2.

'Hierarchical namespace enables faster read/write of individual files.' False. The performance benefit is for directory operations (rename, delete), not for individual file I/O. Candidates confuse the metadata layer with data throughput.

3.

'POSIX ACLs replace RBAC entirely.' False. Both coexist; RBAC is evaluated first, then ACLs. Candidates think ACLs are the only access control.

4.

'ADLS Gen2 supports NFS v3 by default.' False. NFS v3 is an optional feature that must be enabled separately. Candidates assume all file system features are automatic.

Specific Numbers and Terms on the Exam

ABFS driver: The required driver for Hadoop/Spark access (not WASB).

DFS endpoint: *.dfs.core.windows.net (not *.blob.core.windows.net).

URI format: abfs://<container>@<account>.dfs.core.windows.net/<path>.

Hierarchical namespace: Must be enabled at account creation.

Storage account types: StorageV2 or BlockBlobStorage only.

Edge Cases and Exceptions

You can enable hierarchical namespace on an existing empty storage account? No, it must be done at creation.

Can you use ADLS Gen2 with Azure NetApp Files? No, those are separate.

Does ADLS Gen2 support Blob Storage features like lifecycle management? Yes, because it is built on Blob Storage.

Can you enable hierarchical namespace on a premium block blob storage account? Yes, but premium accounts have different performance tiers.

How to Eliminate Wrong Answers

If a question asks about 'fast rename operations,' look for the answer that mentions 'hierarchical namespace' or 'atomic directory operations.' If the question is about 'POSIX permissions,' eliminate answers that mention 'only RBAC.' For 'HDFS compatibility,' the answer must include 'ABFS driver.' Always check if the answer mentions 'flat namespace' as a disadvantage—that's a key exam point.

Key Takeaways

ADLS Gen2 is built on Azure Blob Storage with a hierarchical namespace enabled at account creation.

Hierarchical namespace enables O(1) atomic directory operations (rename, delete) regardless of number of files.

ADLS Gen2 supports POSIX ACLs for fine-grained access control in addition to RBAC.

The ABFS driver (abfs://) is used for HDFS compatibility; WASB driver does not support hierarchical namespace.

NFS 3.0 is an optional feature that must be enabled separately on the storage account.

Common use cases: data lakes, big data analytics, and any workload requiring a file system structure.

Cannot enable hierarchical namespace on existing accounts; must be done at creation.

Easy to Mix Up

These come up on the exam all the time. Here's how to tell them apart.

Azure Blob Storage (Flat Namespace)

Flat namespace: no real directories, only prefix naming.

Rename operation is O(n): must rename each blob individually.

Delete operation is O(n): must list and delete each blob.

Access control via RBAC only at container or blob level.

Best for object storage, backups, and simple key-value data.

ADLS Gen2 (Hierarchical Namespace)

Hierarchical namespace: directories are first-class objects.

Rename operation is O(1): only directory metadata is updated.

Delete operation is atomic: directory and contents removed instantly.

Supports POSIX ACLs at directory and file level for fine-grained control.

Best for big data analytics, data lakes, and HDFS workloads.

Watch Out for These

Mistake

ADLS Gen2 is a completely new storage service separate from Azure Blob Storage.

Correct

ADLS Gen2 is built on top of Azure Blob Storage. It adds a hierarchical namespace and POSIX ACLs, but the underlying storage is still blob storage. You can use both blob APIs and DFS APIs on the same data.

Mistake

Hierarchical namespace improves read/write performance of individual files.

Correct

The hierarchical namespace does not affect the throughput or latency of reading or writing a single file. It improves performance for directory-level operations (rename, delete, list) by making them atomic and O(1).

Mistake

You can enable hierarchical namespace on any existing storage account.

Correct

Hierarchical namespace must be enabled at the time of storage account creation. It cannot be enabled later. To convert an existing blob storage account, you must migrate data to a new account with HNS enabled.

Mistake

ADLS Gen2 automatically supports NFS 3.0 protocol.

Correct

NFS 3.0 is an optional feature that must be enabled separately on the storage account. It is not automatically available just because hierarchical namespace is enabled.

Mistake

POSIX ACLs replace Azure RBAC for access control.

Correct

Both RBAC and POSIX ACLs are evaluated. RBAC is evaluated first. If RBAC explicitly denies access, the request is denied regardless of ACLs. If RBAC allows, then ACLs are evaluated for finer-grained control.

Do You Actually Know This?

Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.

Frequently Asked Questions

What is the difference between Azure Blob Storage and Azure Data Lake Storage Gen2?

Azure Blob Storage is a scalable object store with a flat namespace, while ADLS Gen2 adds a hierarchical namespace and POSIX ACLs on top of blob storage. The hierarchical namespace allows atomic directory operations and HDFS compatibility via the ABFS driver. For DP-900, remember that ADLS Gen2 is for analytics workloads requiring a file system structure.

Can I enable hierarchical namespace on an existing storage account?

No. Hierarchical namespace must be enabled at the time of storage account creation. If you need it on an existing account, you must create a new account with HNS enabled and migrate your data.

Does ADLS Gen2 support NFS?

Yes, but only if the NFS 3.0 protocol is explicitly enabled on the storage account. It is not enabled by default. The storage account must also have hierarchical namespace enabled.

What is the ABFS driver and why is it important?

The Azure Blob File System (ABFS) driver is the Hadoop-compatible driver for ADLS Gen2. It translates HDFS operations to REST calls against the DFS endpoint. It is important because it allows Apache Spark, Hive, and other Hadoop ecosystem tools to use ADLS Gen2 as their underlying storage without code changes.

How do POSIX ACLs and RBAC work together in ADLS Gen2?

RBAC is evaluated first. If RBAC explicitly denies access, the request is denied. If RBAC allows (or does not deny), then POSIX ACLs are evaluated to determine the final access. ACLs provide finer-grained control at the directory or file level.

What are the storage account types that support hierarchical namespace?

Only StorageV2 (general-purpose v2) and BlockBlobStorage account types support enabling hierarchical namespace. Other types like BlobStorage or FileStorage do not.

What is the URI format for accessing ADLS Gen2 from Hadoop?

The URI format is `abfs://<container>@<storageaccount>.dfs.core.windows.net/<path>`. For example: `abfs://mycontainer@mystorageaccount.dfs.core.windows.net/data/file.csv`.

Terms Worth Knowing

Ready to put this to the test?

You've just covered Azure Data Lake Storage Gen2 Hierarchical Namespace — now see how well it sticks with free DP-900 practice questions. Full explanations included, no account needed.

Done with this chapter?