CCNA Design and implement data storage Questions — Page 3 of 3

151

MCQmedium

You are designing a data lake architecture using Azure Data Lake Storage Gen2. The data will be ingested from multiple sources with varying schemas. You need to organize the data in a way that supports both batch and streaming analytics while maintaining data lineage. Which folder structure convention should you use?

A.Organize by ingestion date only, with subfolders for each source.

B.Organize by source system, then by date.

C.Use a medallion architecture with three layers: bronze (raw), silver (cleaned), gold (aggregated).

D.Organize by file format (CSV, Parquet, JSON) and date.

AnswerC

Medallion architecture provides clear separation and lineage.

Why this answer

Option C is correct because the medallion architecture (bronze, silver, gold) is the recommended pattern for Azure Data Lake Storage Gen2 when handling multiple sources with varying schemas. It supports both batch and streaming by storing raw data in bronze, applying incremental transformations in silver, and serving aggregated views in gold, while maintaining data lineage through clear layer boundaries and audit columns.

Exam trap

The trap here is that candidates often choose Option B (source then date) because it seems logical for organization, but they overlook the requirement to support both batch and streaming analytics while maintaining data lineage, which the medallion architecture explicitly addresses through layered transformations.

How to eliminate wrong answers

Option A is wrong because organizing by ingestion date only, with subfolders for each source, lacks schema evolution support and makes it difficult to trace data lineage across transformations. Option B is wrong because organizing by source system then by date does not provide a standardized processing pipeline for both batch and streaming, and it fails to separate raw, cleaned, and aggregated states. Option D is wrong because organizing by file format and date ignores the need for schema management and lineage tracking, and it does not facilitate incremental processing or data quality checks across layers.

Practice this question →

152

MCQhard

A healthcare company stores sensitive patient data in Azure Data Lake Storage Gen2. They need to ensure that only authorized users can access data and that all access is audited. They also need to prevent data from being accessed by unauthorized Azure services. Which combination of security features should be used?

A.Use a private endpoint and Azure AD authentication, disable public access.

B.Use Azure RBAC and ACLs for authorization, enable firewall and virtual network service endpoints, and enable diagnostic settings for auditing.

C.Use managed identity for service access and disable public access.

D.Use Azure AD authentication and SAS tokens for access, enabling diagnostic logs for auditing.

AnswerB

Combination provides layered security and full audit.

Why this answer

Option B is correct because it combines Azure RBAC and ACLs for fine-grained authorization, a firewall with virtual network service endpoints to restrict access to authorized networks, and diagnostic settings to capture audit logs. This layered approach ensures that only authorized users can access the data, all access is audited, and unauthorized Azure services are blocked by the firewall and service endpoints.

Exam trap

Microsoft often tests the misconception that disabling public access alone is sufficient to block unauthorized Azure services, when in fact service endpoints or private endpoints are required to prevent access from other Azure services within the same region.

How to eliminate wrong answers

Option A is wrong because while a private endpoint and Azure AD authentication secure access and disable public access, they do not provide the ability to block unauthorized Azure services (e.g., other Azure services outside the virtual network) unless combined with network rules like service endpoints or firewall rules. Option C is wrong because managed identity alone does not prevent unauthorized Azure services from accessing the data; it only authenticates the service, and disabling public access without network-level controls still allows other Azure services within the same region to access the storage account via the Azure backbone. Option D is wrong because SAS tokens can be leaked or misused, and they do not provide the same level of fine-grained authorization as RBAC and ACLs; additionally, enabling diagnostic logs alone does not prevent unauthorized Azure services from accessing the data.

Practice this question →

153

Matchingmedium

Match each Azure security feature to its description.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts

Matches

Role-based access control for Azure resources

Cloud-based identity and access management service

Manage cryptographic keys and secrets

Private connectivity to Azure services over VNet

Why these pairings

These security features are essential for data protection.

Practice this question →

154

Multi-Selectmedium

A company is designing a data storage solution for a global application that requires low-latency reads and writes for user session data. The solution must support automatic failover across multiple Azure regions. Which TWO Azure services meet these requirements?

Select 2 answers

A.Azure Table Storage

B.Azure Cache for Redis

C.Azure Blob Storage

D.Azure Cosmos DB

E.Azure SQL Database

AnswersB, D

Supports geo-replication and automatic failover.

Why this answer

Azure Cache for Redis is correct because it provides an in-memory data store with sub-millisecond latency for both reads and writes, making it ideal for user session data. It supports automatic failover across Azure regions through geo-replication, where data from a primary cache is asynchronously replicated to a secondary cache in a paired region, ensuring high availability and disaster recovery.

Exam trap

The trap here is that candidates often confuse Azure Cosmos DB's multi-region writes with the specific requirement for low-latency session data, but Cosmos DB, while supporting automatic failover, has higher latency than an in-memory cache like Redis for frequent, small reads and writes typical of session state.

Practice this question →

155

Multi-Selecteasy

Which TWO options are valid methods to load data from on-premises SQL Server into Azure Synapse Analytics?

Select 2 answers

A.SQL Server Integration Services (SSIS) package

B.Azure Data Factory with incremental copy

C.PolyBase from external table

AnswersB, C

Supports incremental loading.

Why this answer

Azure Data Factory with incremental copy is a valid method because it can connect to on-premises SQL Server via self-hosted integration runtime, and then use change tracking or watermark columns to copy only new or updated rows into Azure Synapse Analytics. This approach minimizes data transfer and supports scheduled, reliable loading.

Exam trap

The trap here is that candidates may think SSIS is the only traditional ETL tool, but Azure Data Factory's incremental copy and PolyBase are both first-class, scalable methods for loading into Synapse, and SSIS is also valid but not listed as correct in this specific question's answer set.

Practice this question →

156

MCQhard

Your company uses Azure Synapse Analytics dedicated SQL pool for a data warehouse. You notice that queries on a large fact table are slow. The table is hash-distributed on CustomerID and has 60 distributions. After reviewing the query plan, you see that many queries filter on OrderDate. You want to improve query performance without redesigning the table. What should you do?

A.Change the distribution key to OrderDate

B.Recreate the table with round-robin distribution

C.Update statistics only on the OrderDate column

D.Create a clustered columnstore index on the table and order by OrderDate

AnswerD

Clustered columnstore index with ordering by OrderDate accelerates range queries.

Why this answer

Option D is correct because creating a clustered columnstore index with ORDER BY OrderDate on the fact table improves query performance by physically ordering the data by OrderDate within each columnstore segment. This allows segment elimination during queries that filter on OrderDate, reducing the amount of data scanned. Since the table is already hash-distributed on CustomerID, this approach does not require redesigning the table and directly addresses the slow queries without changing the distribution strategy.

Exam trap

The trap here is that candidates often assume changing the distribution key (Option A) is the only way to optimize filter-heavy queries, but they overlook that a clustered columnstore index with ORDER BY can provide similar I/O reduction without the risks of data skew or redesign.

How to eliminate wrong answers

Option A is wrong because changing the distribution key to OrderDate would cause data skew if OrderDate has many duplicate values (e.g., all rows for a single date), leading to uneven distribution and poor parallel query performance. Option B is wrong because round-robin distribution distributes rows evenly but eliminates data locality for joins and aggregations, which would degrade performance for queries that join on CustomerID or other keys. Option C is wrong because updating statistics on OrderDate alone does not reduce I/O; statistics help the optimizer choose better plans but do not physically reorganize data to minimize data scanned for filter predicates.

Practice this question →

157

MCQmedium

Your team uses Azure Databricks to process data stored in Azure Data Lake Storage Gen2. You need to ensure that only authorized users can access the data and that access is audited. What should you implement?

A.Use Azure RBAC with Microsoft Entra ID and enable diagnostic settings for Azure Storage

B.Use Azure managed identities to access the storage account

C.Use shared access signatures (SAS) with stored access policies

D.Configure the storage account firewall to allow only Azure Databricks IP addresses

AnswerA

D is correct because RBAC provides fine-grained access control and diagnostic logs enable auditing.

Why this answer

Azure RBAC with Microsoft Entra ID (formerly Azure AD) provides fine-grained access control to Azure Data Lake Storage Gen2, ensuring only authorized users can access data. Enabling diagnostic settings for Azure Storage captures read, write, and delete operations, which are sent to a Log Analytics workspace, storage account, or event hub for auditing. This combination meets both the authorization and auditing requirements specified in the question.

Exam trap

The trap here is that candidates often confuse authentication (managed identities) with authorization and auditing, or they assume network-level controls (firewall) are sufficient for access control and auditing, when in fact Azure RBAC with diagnostic settings is required for user-level authorization and audit trails.

How to eliminate wrong answers

Option B is wrong because Azure managed identities authenticate Azure services (like Databricks) to Azure resources without storing credentials, but they do not provide granular user-level authorization or auditing of individual user actions. Option C is wrong because shared access signatures (SAS) with stored access policies grant time-limited, delegated access to storage resources but do not provide per-user auditing; SAS tokens are typically used for service-to-service access, not for auditing individual user activity. Option D is wrong because configuring the storage account firewall to allow only Azure Databricks IP addresses restricts network access but does not authorize specific users or provide audit logs of data access events.

Practice this question →

158

Multi-Selecthard

A company ingests streaming data from multiple sources into Azure Event Hubs. The data must be stored in Azure Data Lake Storage Gen2 in Parquet format, partitioned by date and hour. The solution must minimize cost and processing latency. Which THREE actions should be taken?

Select 2 answers

A.Enable Event Hubs Capture to automatically write data to Data Lake Storage in Avro format.

B.Use Azure Data Factory to copy data from Event Hubs to Data Lake Storage every 5 minutes.

C.Use Azure Stream Analytics to read from Event Hubs and write to Data Lake Storage in Parquet format.

D.Configure Stream Analytics output to partition by date and hour.

E.Use Azure Databricks to process the stream and write to Data Lake Storage.

AnswersC, D

Stream Analytics can transform and output Parquet to Data Lake.

Why this answer

Option C is correct because Azure Stream Analytics can read directly from Event Hubs and write to Azure Data Lake Storage Gen2 in Parquet format, which is a columnar format optimized for analytics and compression. This approach minimizes cost and processing latency by enabling real-time, continuous processing without the need for intermediate storage or batch orchestration, meeting the requirements for low-latency ingestion.

Exam trap

The trap here is that candidates often confuse Event Hubs Capture (which writes Avro) with the ability to write Parquet directly, or they overcomplicate the solution by choosing batch tools like Data Factory or heavy compute like Databricks when a simpler, cost-effective streaming service like Stream Analytics meets all requirements.

Practice this question →

159

MCQhard

A company uses Azure Synapse Analytics serverless SQL pool to query data in ADLS Gen2. Users report that queries against Parquet files are slow. What should you recommend to improve query performance?

A.Create external tables with statistics on relevant columns.

B.Create clustered columnstore indexes on the external tables.

C.Convert the Parquet files to CSV format for faster reads.

D.Partition the data into many small files.

AnswerA

Statistics help the optimizer prune data and improve query performance.

Why this answer

In Azure Synapse serverless SQL pool, external tables do not automatically have statistics. Without statistics, the query optimizer cannot generate efficient execution plans, leading to poor performance on Parquet files. Creating statistics on relevant columns enables the optimizer to estimate cardinality and choose better join and filter strategies, significantly improving query speed.

Exam trap

The trap here is that candidates confuse external table capabilities with dedicated SQL pool features, assuming that indexes like columnstore can be applied to external tables, or that file format changes (CSV) or file count adjustments are the primary performance levers, when in fact statistics are the critical missing piece for serverless SQL pool optimization.

How to eliminate wrong answers

Option B is wrong because clustered columnstore indexes are not supported on external tables in serverless SQL pool; they are only applicable to tables in dedicated SQL pools. Option C is wrong because CSV format is slower than Parquet for analytical queries due to lack of compression, columnar storage, and predicate pushdown; converting to CSV would degrade performance. Option D is wrong because partitioning data into many small files increases metadata overhead and file open operations, which slows down queries in serverless SQL pool; optimal performance is achieved with a moderate number of reasonably sized files.

Practice this question →

160

Multi-Selectmedium

Which TWO Azure Blob Storage access tiers are suitable for data that must be available within milliseconds but is accessed infrequently (e.g., once per month)?

Select 2 answers

A.Cool

B.Hot

C.Premium

D.Cold

E.Archive

AnswersA, B

Low latency, lower storage cost, designed for infrequent access.

Why this answer

The Cool tier is designed for data that is accessed infrequently (e.g., once per month) but still requires low latency (milliseconds) on access. It offers a lower storage cost than Hot but with a higher access cost, making it ideal for infrequent reads where immediate availability is needed.

Exam trap

The trap here is that candidates often confuse 'infrequent access' with 'cold storage' and select Cold or Archive, forgetting that Cool is the correct tier for infrequent access with millisecond latency requirements.

Practice this question →

161

MCQmedium

You are designing a solution to store large amounts of log data that is written once and accessed rarely. The data must be retained for 7 years for compliance. After 30 days, the data should be moved to a lower-cost storage tier. After 1 year, the data should be archived. Which Azure Storage lifecycle management policy should you implement for an Azure Data Lake Storage Gen2 account?

A.Transition to cool tier after 30 days; delete after 7 years.

B.Transition to cool tier after 30 days; transition to archive tier after 365 days; delete after 2555 days (7 years).

C.Transition to archive tier after 30 days; delete after 7 years.

D.Transition to cool tier after 30 days; transition to cool tier again after 365 days.

AnswerB

Matches the requirements exactly.

Why this answer

Option B is correct because it aligns with the specified lifecycle requirements: transition to cool tier after 30 days for cost savings, transition to archive tier after 365 days for long-term retention, and delete after 2555 days (7 years) for compliance. Azure Data Lake Storage Gen2 supports lifecycle management policies that automate tier transitions and deletion based on age, ensuring data is moved to lower-cost storage as access patterns change.

Exam trap

The trap here is that candidates may confuse the required tiering order (cool then archive) with direct archiving after 30 days (Option C) or fail to include a deletion rule (Option D), missing the 7-year compliance requirement.

How to eliminate wrong answers

Option A is wrong because it deletes the data after 7 years but does not include a transition to the archive tier after 1 year, which is required by the compliance policy to archive data after 365 days. Option C is wrong because it transitions to archive tier after only 30 days, which violates the requirement to keep data in a lower-cost tier (cool) for the first year before archiving. Option D is wrong because it transitions to cool tier again after 365 days, which does not archive the data as required, and it lacks a deletion rule for the 7-year retention period.

Practice this question →

162

Multi-Selecthard

You are designing a data lake architecture using Azure Data Lake Storage Gen2. You need to optimize query performance for Azure Synapse Analytics serverless SQL. Which three design considerations should you follow? (Choose three.)

Select 3 answers

A.Store data in Parquet format

B.Partition files by date to enable partition elimination

C.Compress files using snappy or gzip

D.Use many small files (under 64 MB) to increase parallelism

E.Store data in nested folder structures for better organization

AnswersA, B, C

Why this answer

Parquet is a columnar storage format that reduces I/O by reading only the columns needed for a query, which significantly improves performance in Azure Synapse serverless SQL. It also supports efficient compression and encoding schemes, making it ideal for analytical workloads on Azure Data Lake Storage Gen2.

Exam trap

The trap here is that candidates often confuse file size optimization with parallelism, assuming smaller files increase parallelism, but in serverless SQL, too many small files cause excessive metadata requests and reduce throughput, while larger files enable better batch processing.

Why the other options are wrong

Small files cause overhead; larger files (128 MB+) are recommended.

Deeply nested folders increase file listing time, impacting performance.

Practice this question →

163

MCQeasy

A data engineer needs to store semi-structured JSON logs from multiple sources in Azure. The logs must be queryable using T-SQL and support schema-on-read. Which Azure service should be used?

A.Azure Synapse serverless SQL pool with JSON files in ADLS Gen2.

B.Azure Data Factory mapping data flows.

C.Azure Cosmos DB Core (SQL) API.

D.Azure SQL Database with JSON columns.

AnswerA

Serverless SQL pool can query JSON files with schema-on-read using T-SQL.

Why this answer

Azure Synapse serverless SQL pool can query JSON files stored in ADLS Gen2 using T-SQL, supporting schema-on-read by inferring the schema from the file content at query time. This makes it ideal for semi-structured logs that need to be queried without predefined schema.

Exam trap

The trap here is that candidates often confuse schema-on-read with schema-on-write, picking Azure SQL Database or Cosmos DB because they support JSON, but those require predefined schemas or containers, failing the schema-on-read requirement.

How to eliminate wrong answers

Option B is wrong because Azure Data Factory mapping data flows are designed for ETL/ELT transformations, not for direct T-SQL querying of data at rest. Option C is wrong because Azure Cosmos DB Core (SQL) API stores data as JSON but does not support schema-on-read; it requires a defined container schema and uses its own SQL dialect, not standard T-SQL. Option D is wrong because Azure SQL Database with JSON columns requires a predefined table schema and does not support schema-on-read for external files; it stores JSON in relational columns, not as files.

Practice this question →

164

MCQeasy

A company uses Azure Synapse Analytics dedicated SQL pool. They need to load data from Azure Data Lake Storage Gen2 (ADLS Gen2) incrementally. Which PolyBase external table configuration supports incremental loading without reprocessing historical data?

A.Create an external table with a clustered index on the date column.

B.Create an external table partitioned by date column with a partition location pattern.

C.Create an external table on ORC files with a clustered columnstore index.

D.Create an external table with round-robin distribution on the date column.

AnswerB

Partitioned external tables allow PolyBase to eliminate partitions, loading only new data.

Why this answer

Option B is correct because partitioning an external table by a date column with a partition location pattern allows PolyBase to perform partition elimination during queries. This means only the partitions containing new or modified data are read from ADLS Gen2, enabling incremental loading without scanning historical data.

Exam trap

The trap here is that candidates confuse indexing or distribution strategies (which apply to internal tables) with external table capabilities, overlooking that only partition elimination on external tables enables incremental file-level filtering in PolyBase.

How to eliminate wrong answers

Option A is wrong because a clustered index on an external table is not supported; external tables in Azure Synapse are read-only and cannot have indexes. Option C is wrong because a clustered columnstore index applies to internal tables in the dedicated SQL pool, not to external tables, and does not control which files are read from ADLS Gen2. Option D is wrong because round-robin distribution distributes data evenly across distributions but does not enable partition elimination or incremental loading; it is a distribution method for internal tables, not a mechanism for selective file reading.

Practice this question →

165

MCQeasy

A healthcare organization needs to store electronic health records (EHR) in a format that supports schema flexibility and complex nested data. The solution must allow fast queries by patient ID and enable analytics with Azure Synapse. Which data store should you choose?

A.Azure Table Storage

B.Azure Data Lake Storage Gen2 with files in JSON format

C.Azure Cosmos DB with analytical store enabled

D.Azure SQL Database with JSON columns

AnswerC

Cosmos DB provides schema flexibility, nested data support, and Synapse integration for analytics.

Why this answer

Azure Cosmos DB with analytical store enabled is the correct choice because it provides schema flexibility for complex nested EHR data, supports fast point reads by patient ID via its indexed partition key, and the analytical store enables efficient analytics with Azure Synapse through the Synapse Link feature, which automatically synchronizes operational data into a columnar format optimized for large-scale queries.

Exam trap

The trap here is that candidates often choose Azure SQL Database with JSON columns (Option D) because they assume relational databases can handle JSON, but they overlook the requirement for schema flexibility and native analytical store integration, which Cosmos DB with analytical store uniquely provides for hybrid transactional/analytical processing (HTAP) workloads.

How to eliminate wrong answers

Option A is wrong because Azure Table Storage is a key-value store that does not support complex nested data structures or schema flexibility for hierarchical EHR records, and it lacks native integration with Azure Synapse for analytics. Option B is wrong because while Azure Data Lake Storage Gen2 with JSON files can store nested data, it does not provide fast point queries by patient ID without additional indexing or processing, and it requires separate ETL for analytics rather than real-time analytical store access. Option D is wrong because Azure SQL Database with JSON columns imposes a fixed relational schema and does not offer the same level of schema flexibility as a NoSQL document store; JSON columns also complicate indexing and nested query performance, and it lacks a built-in analytical store for seamless Synapse integration.

Practice this question →

166

Multi-Selecthard

Your company stores JSON documents in Azure Cosmos DB Core (SQL) API. You need to improve query performance for a common filter on the 'status' field and a sort on 'timestamp'. Which three actions should you take?

Select 3 answers

A.Create a composite index on (status, timestamp)

B.Increase the provisioned RU/s

C.Choose a partition key that ensures even distribution

D.Denormalize the status field into a separate container

E.Disable indexing on the timestamp field

AnswersA, B, C

Why this answer

A composite index on (status, timestamp) is correct because Azure Cosmos DB Core (SQL) API uses composite indexes to efficiently support queries with multiple filter and sort conditions. This index allows the query engine to first filter on the 'status' field and then sort by 'timestamp' in a single index seek, avoiding a full scan or in-memory sort. Without this composite index, the query would require a cross-partition scan or an expensive sort operation, degrading performance.

Exam trap

The trap here is that candidates may think increasing RU/s (Option B) or disabling indexing (Option E) are sufficient fixes, but the core issue is the lack of an appropriate composite index to support the combined filter and sort, which is a common DP-203 exam pattern for Cosmos DB query optimization.

Why the other options are wrong

Unnecessary complexity; composite index is sufficient.

Sorting requires indexing on that field.

Practice this question →

167

Multi-Selectmedium

Which THREE factors should be considered when choosing between Azure Synapse Dedicated SQL Pool and Azure SQL Database for a data warehouse workload?

Select 3 answers

A.Need for geo-replication

B.Concurrency and workload management

C.Integration with Azure Data Lake Storage

D.Volume of data to be stored

E.Complexity of analytical queries

AnswersB, D, E

Dedicated SQL Pool has workload management.

Why this answer

Option B is correct because Azure Synapse Dedicated SQL Pool provides built-in workload management features like workload classification, importance, and concurrency slots, which are essential for managing mixed analytical workloads with varying resource demands. Azure SQL Database, while supporting resource governance, lacks the granular workload management capabilities needed for complex data warehouse scenarios with concurrent queries.

Exam trap

The trap here is that candidates often assume geo-replication (Option A) is a critical differentiator, but both services support it, and the real decision hinges on workload management, data volume scalability, and analytical query complexity.

Practice this question →

168

MCQhard

You have an Azure Synapse Analytics dedicated SQL pool with a large fact table partitioned by month. You notice that queries filtering on a specific month still scan all partitions. The table has a clustered columnstore index. What is the most likely cause?

A.The table is hash-distributed on a column other than date, preventing partition elimination.

B.The statistics on the date column are outdated or missing.

C.The columnstore index has a large number of open rowgroups that are not compressed.

D.The table is replicated, so partition elimination does not apply.

AnswerC

Open rowgroups cause the entire columnstore to be scanned, bypassing partition elimination.

Why this answer

Option C is correct because when a clustered columnstore index has a large number of open rowgroups (i.e., rowgroups that have not yet been compressed into columnstore segments), queries may fall back to the delta store, which stores data in a B-tree-like structure. This bypasses partition elimination because the delta store does not maintain the same metadata for partition pruning, causing all partitions to be scanned. Compressing open rowgroups into closed, compressed rowgroups restores proper partition elimination behavior.

Exam trap

The trap here is that candidates often assume partition elimination is always effective with columnstore indexes, overlooking that unclosed delta store rowgroups can disable this optimization, and they may incorrectly attribute the issue to distribution or statistics.

How to eliminate wrong answers

Option A is wrong because hash distribution on a non-date column does not prevent partition elimination; partition elimination is a storage-level optimization based on partition boundaries, independent of distribution method. Option B is wrong because statistics on the date column are used for cardinality estimation, not for partition elimination; partition elimination is determined by metadata about partition ranges, not statistics. Option D is wrong because replicated tables are fully copied to each distribution, but partition elimination still applies when querying a replicated table if it is partitioned; the issue here is specific to the columnstore index's rowgroup state, not replication.

Practice this question →

169

MCQhard

A company is migrating its on-premises SQL Server data warehouse to Azure Synapse Analytics. They have a fact table with 2 billion rows and 30 columns. The table is frequently joined on CustomerID and filtered on OrderDate. What is the recommended table design?

A.Hash-distribute on CustomerID and partition on OrderDate

B.Replicate the table to all nodes

C.Round-robin distribution with partitions on OrderDate

D.Hash-distribute on OrderDate and partition on CustomerID

AnswerA

Optimizes joins and filtering.

Why this answer

Hash-distributing the fact table on CustomerID ensures that rows with the same CustomerID are co-located on the same distribution node, which makes joins on CustomerID efficient by avoiding data movement. Partitioning on OrderDate enables partition elimination when filtering by date, reducing the amount of data scanned. This combination optimizes both the join and filter operations for a large fact table in Azure Synapse Analytics.

Exam trap

The trap here is that candidates often confuse the roles of distribution and partitioning, thinking that partitioning on the join column or distributing on the filter column will improve performance, when in fact distribution should align with join keys and partitioning with filter keys.

How to eliminate wrong answers

Option B is wrong because replicating a 2-billion-row fact table to all nodes would consume excessive storage and cause significant overhead during data loading and maintenance, and it is intended for small dimension tables, not large fact tables. Option C is wrong because round-robin distribution distributes rows evenly but without any logical grouping, so joins on CustomerID would require shuffling all data across nodes, leading to poor performance. Option D is wrong because hash-distributing on OrderDate would scatter rows with the same CustomerID across nodes, making joins on CustomerID highly inefficient, and partitioning on CustomerID is not supported (partition columns must be date/time types in Synapse) and would not help with date-based filtering.

Practice this question →

170

Multi-Selectmedium

Which TWO of the following are supported storage options for use as a source in Azure Synapse Pipeline Copy Activity?

Select 2 answers

A.Azure Data Lake Storage Gen2

B.Azure Analysis Services

C.Azure Cognitive Search

D.Azure Purview

E.Azure Blob Storage

AnswersA, E

ADLS Gen2 is a supported source.

Why this answer

Azure Data Lake Storage Gen2 is a supported source for Azure Synapse Pipeline Copy Activity because it combines a hierarchical file system with Azure Blob Storage APIs, enabling efficient data ingestion. The Copy Activity can read data from ADLS Gen2 using the AzureBlobFS linked service, which supports both file and folder-level reads for structured and unstructured data.

Exam trap

The trap here is that candidates confuse Azure services that manage or process data (like Analysis Services, Cognitive Search, or Purview) with actual storage services that can serve as a source for the Copy Activity, leading them to select non-storage options.

Practice this question →

171

MCQmedium

You are designing a change data capture (CDC) solution to incrementally load data from an on-premises SQL Server database to Azure Synapse Analytics. The source tables have no timestamp columns and you cannot modify the schema. Which Azure service should you use?

A.Azure Synapse Pipelines with mapping data flows

B.Azure Data Factory with change tracking

C.Azure Databricks with Auto Loader

D.Azure Stream Analytics

AnswerB

ADF can enable change tracking on SQL Server or use custom watermark logic.

Why this answer

Azure Data Factory's change tracking capability is the correct choice because it can capture row-level inserts, updates, and deletes from SQL Server without requiring timestamp columns or schema modifications. It uses SQL Server's built-in change tracking feature, which tracks changes at the table level and provides a reliable incremental load mechanism to Azure Synapse Analytics.

Exam trap

The trap here is that candidates often assume a timestamp column is mandatory for incremental loads, but Azure Data Factory's change tracking connector bypasses this requirement by using SQL Server's built-in change tracking mechanism, which does not require any schema modifications.

How to eliminate wrong answers

Option A is wrong because Azure Synapse Pipelines with mapping data flows require a timestamp column or a watermark to identify changed rows, and they cannot leverage SQL Server's change tracking without schema modifications. Option C is wrong because Azure Databricks with Auto Loader is designed for incremental ingestion of file-based data (e.g., from cloud storage), not for capturing changes from a relational database like SQL Server. Option D is wrong because Azure Stream Analytics is a real-time event processing service for streaming data (e.g., from IoT hubs or event hubs), not a batch or incremental load solution for database tables.

Practice this question →

172

Multi-Selecteasy

You need to design a storage solution for a data lake that will be used by multiple teams for analytics. The solution must support fine-grained access control, versioning of files, and integration with Azure Purview for data cataloging. Which THREE features should you enable in Azure Data Lake Storage Gen2?

Select 3 answers

A.Integration with Microsoft Purview

B.Managed identity authentication

C.Blob versioning

D.Access control lists (ACLs)

E.Soft delete

AnswersA, C, D

Purview integrates with Data Lake Storage for data cataloging.

Why this answer

Option A is correct because Microsoft Purview integration is required for data cataloging, enabling automated scanning, classification, and lineage tracking of data stored in Azure Data Lake Storage Gen2. This directly supports the requirement for data cataloging in the scenario.

Exam trap

The trap here is that candidates confuse soft delete with versioning, but soft delete only recovers deleted blobs for a retention period, whereas versioning maintains all overwrites and deletes as separate versions for granular recovery and audit.

Practice this question →

173

MCQeasy

A data engineer needs to store semi-structured JSON logs from IoT devices. The data will be queried using SQL and must support high-throughput writes. Which Azure data store is most appropriate?

A.Azure Blob Storage with JSON blobs

B.Azure Cosmos DB with Core (SQL) API

C.Azure Data Lake Storage Gen2 with JSON files and PolyBase

D.Azure SQL Database with JSON columns

AnswerB

Optimized for semi-structured JSON, native SQL API, and high-throughput writes.

Why this answer

Azure Cosmos DB with Core (SQL) API is the most appropriate choice because it natively stores semi-structured JSON documents, supports high-throughput writes with single-digit millisecond latency, and allows querying the JSON data directly using SQL syntax. Its schema-agnostic design and automatic indexing make it ideal for IoT workloads where device telemetry arrives at high velocity and must be immediately queryable.

Exam trap

The trap here is that candidates often choose Azure Blob Storage or Data Lake Storage because they associate JSON files with cheap storage, but they overlook the requirement for high-throughput writes and native SQL querying, which Cosmos DB uniquely satisfies among the options.

How to eliminate wrong answers

Option A is wrong because Azure Blob Storage with JSON blobs does not provide native SQL querying capabilities; querying would require additional services like Azure Synapse or external tools, and it is optimized for large, infrequent access rather than high-throughput writes. Option C is wrong because Azure Data Lake Storage Gen2 with JSON files and PolyBase is designed for analytical batch processing and large-scale data lakes, not for real-time high-throughput writes; PolyBase is used for querying external data in Synapse, not for direct ingestion at IoT scale. Option D is wrong because Azure SQL Database with JSON columns imposes a fixed relational schema and transactional overhead that cannot match the write throughput and schema flexibility of Cosmos DB; it is optimized for ACID transactions and structured data, not for high-velocity semi-structured ingestion.

Practice this question →

174

MCQeasy

You are designing a data storage solution for IoT sensor data. The data is written thousands of times per second and requires low-latency reads for real-time dashboards. Which Azure storage solution should you use?

A.Azure Blob Storage

B.Azure Cosmos DB

C.Azure SQL Database

D.Azure Data Lake Storage Gen2

AnswerB

Why this answer

Azure Cosmos DB is the correct choice because it provides single-digit millisecond read and write latency at any scale, with automatic indexing and multi-region distribution. Its support for multiple APIs (SQL, MongoDB, Cassandra, etc.) and configurable consistency levels makes it ideal for IoT sensor data requiring high-throughput writes and low-latency reads for real-time dashboards.

Exam trap

The trap here is that candidates often choose Azure Blob Storage or Data Lake Storage Gen2 because they associate IoT data with 'storage' rather than 'real-time querying,' overlooking the critical requirement for low-latency reads and high-frequency writes that only a NoSQL database like Cosmos DB can satisfy.

Why the other options are wrong

Blob Storage is optimized for large, unstructured data with higher latency, not real-time ingestion.

SQL Database can handle writes but may struggle with the scale and low-latency requirements of IoT sensor data.

Designed for big data analytics, not real-time ingestion and query.

Practice this question →

175

MCQeasy

A data engineer is setting up Azure Data Lake Storage Gen2 for a new project. The security requirement is to prevent direct access to the storage account from the internet while allowing access from a specific virtual network. Which network security feature should be enabled?

A.Azure Private Endpoint

B.Shared access signature (SAS)

C.Azure Defender for Storage

D.Firewall and virtual network service endpoints

AnswerD

Allow access from specific VNet.

Why this answer

Option D is correct because Firewall and virtual network service endpoints allow you to restrict access to Azure Data Lake Storage Gen2 to only traffic originating from a specific virtual network, effectively blocking all internet-based access. This is achieved by configuring a service endpoint on the subnet and a firewall rule on the storage account that denies all traffic except that from the designated virtual network.

Exam trap

The trap here is that candidates often confuse Azure Private Endpoint with a complete internet-blocking solution, but Private Endpoint alone does not disable the public endpoint; you must also configure the firewall to deny all public traffic.

How to eliminate wrong answers

Option A is wrong because Azure Private Endpoint uses a private IP address from your virtual network to connect to the storage account, but it does not inherently block all internet access; it provides a private connection but still requires additional firewall rules to fully prevent internet access. Option B is wrong because Shared access signature (SAS) provides delegated access to storage resources via tokens that can be used over the internet, and it does not restrict network-level access from the internet. Option C is wrong because Azure Defender for Storage is a security monitoring and threat detection service, not a network access control mechanism; it does not block or restrict network traffic.

Practice this question →

176

MCQmedium

You are migrating an on-premises SQL Server database to Azure Synapse Analytics dedicated SQL pool. The database includes a table with 500 million rows that is frequently queried by date range. Which distribution strategy should you use for this table?

A.Hash distribution on the date column.

B.Hash distribution on an identity column.

C.Replicated distribution.

D.Round-robin distribution.

AnswerA

Hash distribution on the date column allows partition elimination and efficient querying by date range.

Why this answer

Hash distribution on the date column is correct because it distributes rows evenly across distributions based on a hash of the date value, enabling partition elimination and collocated joins when queries filter by date range. This strategy optimizes performance for the frequent date-range queries by ensuring that data for a specific date range is likely stored on the same distribution, minimizing data movement.

Exam trap

The trap here is that candidates often choose round-robin distribution for large tables thinking it is simple and balanced, but they overlook that frequent date-range queries require data collocation and partition elimination, which only hash distribution on the query filter column can provide.

How to eliminate wrong answers

Option B is wrong because hash distribution on an identity column would distribute rows randomly with respect to date, causing high data movement for date-range queries and poor query performance. Option C is wrong because replicated distribution is suitable for small, slowly changing dimension tables (typically < 2 GB), not for a 500-million-row fact table, as it would replicate massive data across all nodes, causing storage and maintenance overhead. Option D is wrong because round-robin distribution distributes rows evenly without any key, leading to full table scans and high data shuffling for date-range queries, which is inefficient for frequently queried tables.

Practice this question →

177

MCQeasy

You are designing a data lake for a manufacturing company that will store sensor readings in Parquet format. The data will be used by data scientists for batch training and by analysts for ad-hoc queries. Which Azure service should you use as the primary storage layer?

A.Azure Data Lake Storage Gen2

B.Azure Synapse SQL Pool

C.Azure Blob Storage

D.Azure SQL Database

AnswerA

ADLS Gen2 provides hierarchical namespace and is optimized for analytics workloads.

Why this answer

Azure Data Lake Storage Gen2 (ADLS Gen2) is the correct primary storage layer because it combines a hierarchical namespace with Azure Blob Storage's scalable object storage, providing POSIX-compliant access control and directory-level operations. This makes it ideal for storing Parquet files used by data scientists for batch training (via Spark or PolyBase) and by analysts for ad-hoc queries (via Synapse SQL or Azure Databricks), as it supports both high-throughput analytics and fine-grained security.

Exam trap

The trap here is confusing a compute service (Synapse SQL Pool) with a storage service, or assuming that Blob Storage alone is sufficient without considering the hierarchical namespace required for data lake operations.

How to eliminate wrong answers

Option B is wrong because Azure Synapse SQL Pool is a distributed query engine and data warehouse, not a primary storage layer; it queries data from external storage like ADLS Gen2 but does not natively store Parquet files as its primary format. Option C is wrong because Azure Blob Storage lacks a hierarchical namespace, making it unsuitable for directory-level operations and POSIX ACLs required for efficient data lake management, though it can store Parquet files, it is less optimized for analytics workloads. Option D is wrong because Azure SQL Database is a relational database for transactional workloads, not designed for storing large volumes of Parquet files or supporting the schema-on-read patterns needed for data science and ad-hoc analytics.

Practice this question →

178

Multi-Selecthard

A multinational corporation is designing a data lake on Azure Data Lake Storage Gen2. The data must be accessible from multiple regions with low latency, but only one region needs writable access. The solution must also comply with data residency requirements. Which two features or configurations should be implemented? (Choose two.)

Select 2 answers

A.Enable hierarchical namespace on the storage account

B.Use read-access geo-redundant storage (RA-GRS)

C.Deploy Azure Front Door to route read requests to the nearest region

D.Configure geo-replication with manual failover

E.Use Azure Data Lake Storage Gen2 with locally redundant storage (LRS)

AnswersB, C

Why this answer

B is correct because read-access geo-redundant storage (RA-GRS) provides a secondary read-only endpoint in a paired region, enabling low-latency reads from multiple regions while maintaining a single writable primary region. This meets the data residency requirement because the secondary region is within the same geography, and RA-GRS ensures data is replicated asynchronously without allowing writes to the secondary.

Exam trap

The trap here is that candidates often confuse RA-GRS with GRS or fail to recognize that RA-GRS provides a read-only secondary endpoint, which is essential for low-latency reads without compromising the single-writable-region requirement.

Why the other options are wrong

Hierarchical namespace is a feature of ADLS Gen2 but does not provide multi-region low-latency reads or data residency.

Manual failover does not provide automatic read access from secondary regions; it requires a manual process to promote the secondary.

LRS does not replicate data to another region, so it cannot provide multi-region low-latency reads.

Practice this question →

179

MCQmedium

A financial services company is migrating its data warehouse to Azure Synapse Analytics. They have a star schema with a 10-billion-row fact table and 50 dimension tables. Query performance is critical, and they need to minimize data movement during joins. Which distribution strategy should they use for the fact table?

A.Replicated distribution

B.Partitioned distribution

C.Hash distribution on the most frequently joined dimension key

D.Round-robin distribution

AnswerC

Collocates data for joins.

Why this answer

Hash distribution on the most frequently joined dimension key is correct because it co-locates matching rows from the fact and dimension tables on the same compute node, minimizing data movement during joins. For a 10-billion-row fact table, this distribution ensures that the most common join operation is performed locally without shuffling data across nodes, which is critical for query performance in Azure Synapse Analytics.

Exam trap

The trap here is that candidates confuse table partitioning with distribution strategy, assuming 'partitioned distribution' is a valid option, but Azure Synapse Analytics uses hash, round-robin, and replicated distributions only, while partitioning is a separate concept for data organization within a distribution.

How to eliminate wrong answers

Option A is wrong because replicated distribution copies the entire table to each node, which is impractical for a 10-billion-row fact table due to massive storage overhead and write performance degradation. Option B is wrong because partitioned distribution is not a valid distribution strategy in Azure Synapse Analytics; partitioning is a table organization feature for data management, not a distribution method for minimizing data movement. Option D is wrong because round-robin distribution spreads data evenly but randomly across nodes, causing all joins to require data movement (shuffling) since no rows are co-located, leading to poor query performance.

Practice this question →

180

MCQhard

Your company uses Azure Synapse Analytics for its enterprise data warehouse. The main fact table, OrdersFact, is distributed using hash on OrderID. It has 10 billion rows. The table is partitioned by month. Recently, the data engineering team added a new column 'OrderStatus' that is used in many queries with filters like 'WHERE OrderStatus = 'Shipped''. These queries are scanning the entire table because the partition pruning is not effective. You need to improve query performance for these status-based queries without redesigning the entire table. What should you do?

A.Repartition the table by OrderStatus

B.Create a non-clustered columnstore index on OrderStatus

C.Change the distribution key to include OrderStatus

D.Create a materialized view that aggregates by OrderStatus and month

AnswerB

A non-clustered columnstore index on OrderStatus allows efficient filtering.

Why this answer

Creating a non-clustered columnstore index on OrderStatus can significantly improve query performance by allowing index seek on status values. Option A is wrong because adding OrderID to the distribution key does not help status queries. Option B is wrong because repartitioning by status is not recommended for a table already partitioned by month and may cause data skew.

Option D is wrong because materialized views can help but are not necessary if indexing solves the problem more directly.

Practice this question →

181

MCQhard

You are using Azure Synapse SQL Pool to store a large fact table partitioned by date. Queries frequently filter on a specific date range and aggregate by a column called 'product_id'. Which table distribution and indexing strategy will minimize query execution time?

A.Hash distribution on product_id with clustered rowstore index

B.Replicated table with clustered columnstore index

C.Round-robin distribution with clustered columnstore index

D.Hash distribution on product_id with clustered columnstore index

AnswerD

Hash distribution enables co-location for aggregation, and columnstore is efficient for large scans.

Why this answer

Option D is correct because hash distribution on product_id ensures that rows with the same product_id are co-located on the same distribution, enabling efficient local aggregation without data movement. The clustered columnstore index provides high compression and fast scan performance for large fact tables, especially when queries filter on a date range and aggregate by product_id.

Exam trap

The trap here is that candidates often choose round-robin distribution (Option C) thinking it is best for large tables, but they overlook that hash distribution on the aggregation column eliminates expensive data shuffling, which is critical for minimizing query execution time.

How to eliminate wrong answers

Option A is wrong because a clustered rowstore index is not optimal for large fact tables with aggregation queries; it lacks the column-level compression and batch processing benefits of columnstore, leading to higher I/O and slower scans. Option B is wrong because replicated tables are designed for small, slowly changing dimension tables, not large fact tables; replicating a large fact table would consume excessive storage and cause high replication overhead. Option C is wrong because round-robin distribution distributes data evenly but does not co-locate rows by product_id, forcing data shuffling across distributions during aggregation, which increases query execution time.

Practice this question →

182

MCQmedium

A company is designing a data lake in Azure Data Lake Storage Gen2 (ADLS Gen2) to store IoT sensor data from millions of devices. The data is ingested in Parquet format, partitioned by date and device ID. The analytics team frequently queries the last 30 days of data for specific device types. Which partition strategy minimizes query cost and optimizes performance?

A.Partition by device ID first, then by date.

B.Partition by device ID only, with a separate directory for each device.

C.Partition by date (yyyy/MM/dd) first, then by device type (e.g., sensor_type=temp).

D.Partition by device type only, with a directory for each type.

AnswerC

This allows date pruning first, then efficient filtering by device type within each day.

Why this answer

Option C is correct because partitioning by date first enables efficient partition pruning for the common query pattern (last 30 days), and then by device type further filters the data within those date partitions. In ADLS Gen2, queries using partition elimination skip entire directories, reducing the amount of data scanned and minimizing query cost. This strategy aligns with the typical query workload, where date-range filtering is the most selective predicate.

Exam trap

The trap here is that candidates often assume partitioning by the most granular attribute (device ID) first will provide the best performance, but they overlook that query patterns typically filter by time range, making date the most effective first-level partition for cost and performance optimization.

How to eliminate wrong answers

Option A is wrong because partitioning by device ID first, then by date, would require scanning all device ID partitions even when querying only recent data, leading to high I/O and cost. Option B is wrong because partitioning only by device ID with a separate directory per device does not support efficient date-range pruning; queries for the last 30 days would need to scan every device directory, which is prohibitively expensive for millions of devices. Option D is wrong because partitioning only by device type would force scanning all date directories for every query, even when the query is limited to a specific time range, resulting in unnecessary data reads and higher costs.

Practice this question →

183

MCQhard

Refer to the exhibit. You are reviewing an Azure Cosmos DB for NoSQL container configuration. The container stores customer orders. The application frequently queries orders by orderId. However, these queries are consuming high RUs and are slow. What is the most likely cause?

A.The unique key on orderId does not index the field; queries by orderId are cross-partition.

B.The _etag field is excluded from indexing, causing high RU for queries that include it.

C.The indexing mode is set to consistent, which causes high RU usage.

D.The container is provisioned with only 400 RU/s, which is too low for the workload.

AnswerA

Unique keys do not serve as indexes; the query must specify customerId to be efficient.

Why this answer

Option A is correct because the unique key constraint on `orderId` does not automatically create an index for that field; it only enforces uniqueness. Without an explicit indexing policy that includes `orderId`, queries filtering by `orderId` must perform a cross-partition scan, which consumes high RUs and is slow. In Azure Cosmos DB for NoSQL, queries that cannot be served from a single physical partition due to missing index or partition key mismatch result in fan-out across all partitions, dramatically increasing RU consumption.

Exam trap

Microsoft often tests the misconception that a unique key constraint automatically creates an index for querying, when in fact unique keys only enforce uniqueness and do not affect query performance; the trap is that candidates confuse unique keys with indexing policies.

How to eliminate wrong answers

Option B is wrong because `_etag` is a system property used for optimistic concurrency control, and excluding it from indexing does not cause high RU for queries that include it; queries that reference `_etag` would simply not benefit from an index, but the primary issue is the missing index on `orderId`. Option C is wrong because the indexing mode 'consistent' is the default and recommended mode for balanced write/read performance; it does not inherently cause high RU usage. Option D is wrong because while 400 RU/s is low, the question states that queries by `orderId` are consuming high RUs and are slow, which points to a query design issue (cross-partition) rather than insufficient throughput; increasing RU/s without fixing the indexing would only mask the problem.

Practice this question →

184

Multi-Selectmedium

Which TWO factors should you consider when choosing between Azure SQL Database and Azure Cosmos DB for a transactional workload that requires low-latency reads and writes globally?

Select 2 answers

A.ACID transaction support across multiple documents.

B.Support for secondary indexes.

C.Consistency models (strong, bounded staleness, session, eventual, consistent prefix).

D.Ability to run stored procedures.

E.Global distribution capabilities with multi-region writes.

AnswersC, E

Cosmos DB offers five consistency models; SQL Database offers snapshot isolation.

Why this answer

Option C is correct because Azure Cosmos DB offers multiple well-defined consistency models (strong, bounded staleness, session, consistent prefix, and eventual) that allow you to tune the trade-off between consistency and latency for globally distributed workloads. For a transactional workload requiring low-latency reads and writes globally, choosing the appropriate consistency model (e.g., session or eventual) can significantly reduce write latency by avoiding the overhead of synchronous replication required for strong consistency. This flexibility is a key factor when designing for global distribution with multi-region writes.

Exam trap

The trap here is that candidates often assume ACID transactions (Option A) or stored procedures (Option D) are unique to one service, when in fact both Azure SQL Database and Azure Cosmos DB support these features, making them irrelevant as distinguishing factors for global low-latency workloads.

Practice this question →

185

MCQeasy

You are designing a data storage solution for real-time streaming data from IoT devices. The data must be stored in its original format for immediate processing and later transformed for analytics. Which Azure service should you use for raw data ingestion?

A.Azure Data Lake Storage Gen2

B.Azure Event Hubs

C.Azure Stream Analytics

D.Azure Data Factory

AnswerB

C is correct because Event Hubs is designed for real-time data ingestion.

Why this answer

Azure Event Hubs is a fully managed, real-time data ingestion service optimized for high-throughput streaming data from IoT devices. It can receive millions of events per second, store them in a partitioned, ordered log for immediate processing, and retain them for up to 7 days (or longer with Event Hubs Capture) for later transformation and analytics. This makes it the correct choice for raw data ingestion before any transformation occurs.

Exam trap

The trap here is that candidates confuse data ingestion (Event Hubs) with data storage (Data Lake Storage) or data processing (Stream Analytics), assuming a single service must handle both raw capture and transformation, when in fact the question explicitly asks for raw data ingestion only.

How to eliminate wrong answers

Option A is wrong because Azure Data Lake Storage Gen2 is a hierarchical file store for analytics workloads, not a real-time ingestion endpoint; it cannot natively accept streaming events at high velocity without an intermediary like Event Hubs or IoT Hub. Option C is wrong because Azure Stream Analytics is a stream processing engine that consumes data from sources like Event Hubs and performs real-time transformations, but it does not store raw data itself—it outputs results to sinks. Option D is wrong because Azure Data Factory is a cloud-based ETL and orchestration service for batch and scheduled data movement, not designed for real-time, high-throughput streaming ingestion from IoT devices.

Practice this question →

186

MCQmedium

A company is designing a data lake on Azure Data Lake Storage Gen2. They need to enforce row-level security on the data for different departments. Which approach should they use?

A.Create views in Azure Synapse Serverless SQL with security predicates

B.Assign Azure RBAC roles for each department on the storage account

C.Implement Azure Purview data policies for row-level security

D.Use Azure Data Lake Storage Gen2 access control lists (ACLs) at the file level

AnswerA

Serverless SQL can query ADLS and apply RLS via views.

Why this answer

Azure Synapse Serverless SQL supports row-level security (RLS) through the creation of views that use the `SECURITY_POLICY` and `FILTER_PREDICATE` functions. This allows you to filter rows based on the user's identity (e.g., department membership) without duplicating data or managing separate files. It is the only native Azure service that provides declarative row-level filtering directly on data stored in Azure Data Lake Storage Gen2.

Exam trap

The trap here is that candidates confuse Azure RBAC or ACLs (which control access to storage objects) with row-level security (which controls access to rows within a data set), leading them to choose a storage-level permission model instead of a query-level filtering mechanism.

How to eliminate wrong answers

Option B is wrong because Azure RBAC roles operate at the storage account, container, or blob level, not at the row level; they cannot filter individual rows within a file. Option C is wrong because Azure Purview data policies currently support column-level sensitivity classification and access control, but do not enforce row-level security predicates on data in ADLS Gen2. Option D is wrong because ADLS Gen2 ACLs control access at the file or directory level (POSIX-style permissions), not at the row level within a file.

Practice this question →

187

MCQmedium

Your company stores sensitive customer data in Azure SQL Database. You need to encrypt the data at rest and ensure that only your application can decrypt it, even from database administrators. What should you implement?

A.Transparent Data Encryption (TDE)

B.Always Encrypted

C.Dynamic Data Masking

D.Azure Storage Service Encryption

AnswerB

Why this answer

Always Encrypted is correct because it ensures that sensitive data is encrypted at rest and in use, and the encryption keys are stored client-side, so only the application can decrypt the data. Database administrators (DBAs) cannot access the plaintext data because they lack the column encryption keys, even though they have full administrative access to the database.

Exam trap

The trap here is that candidates often confuse Transparent Data Encryption (TDE) with client-side encryption, assuming TDE protects against DBA access, but TDE only protects data at rest from storage theft, not from authorized database users.

Why the other options are wrong

TDE encrypts data at rest but the database engine holds the keys, allowing DBAs to decrypt.

Only masks data from unauthorized users; data is still stored in plaintext.

This applies to Azure Storage, not SQL Database.

Practice this question →

188

MCQhard

Refer to the exhibit. A Bicep file is used to deploy an Azure Synapse Analytics workspace. What is the purpose of the 'purviewConfiguration' property?

A.It links the workspace to a Microsoft Purview account for data lineage and cataloging

B.It configures automated backups of the Synapse workspace

C.It enables monitoring of data movement by Azure Monitor

D.It connects the workspace to a data catalog for pipeline sources

AnswerA

Purview provides data lineage and cataloging.

Why this answer

The 'purviewConfiguration' property in a Bicep file for Azure Synapse Analytics links the workspace to a Microsoft Purview account. This integration enables automated data lineage tracking, cataloging, and discovery across the Synapse environment, allowing users to search for and govern data assets directly from Purview. Without this property, the Synapse workspace operates independently of Purview's unified data governance capabilities.

Exam trap

The trap here is that candidates confuse the Purview integration with general cataloging or monitoring features, assuming it only applies to pipeline sources rather than understanding it provides full data lineage and cataloging across the entire Synapse workspace.

How to eliminate wrong answers

Option B is wrong because automated backups of a Synapse workspace are configured via the 'sqlPoolBackup' or workspace-level backup policies, not through the 'purviewConfiguration' property, which is solely for Purview integration. Option C is wrong because enabling monitoring of data movement by Azure Monitor is done through diagnostic settings and workspace-level monitoring configurations, not by linking to Purview. Option D is wrong because connecting the workspace to a data catalog for pipeline sources is a general description of Purview's role, but the specific purpose of 'purviewConfiguration' is to link to a Microsoft Purview account for full data lineage and cataloging, not just for pipeline sources.

Practice this question →

189

Multi-Selecthard

Which THREE statements are true about partitioning in Azure Synapse Analytics dedicated SQL pool?

Select 3 answers

A.Partition switching can be used to quickly load data into a table.

B.Partitions are automatically aligned with distributions.

C.Each partition is stored as a separate set of rowgroups in a columnstore index.

D.Partitioning is only supported on tables with clustered rowstore indexes.

E.Excessive partitioning can lead to fragmentation and poor query performance.

AnswersA, C, E

Switching partitions is a metadata-only operation.

Why this answer

Option A is correct because partition switching in Azure Synapse Analytics dedicated SQL pool allows you to quickly load data into a table by switching a partition from a staging table into the target table. This operation is metadata-only and does not require data movement, making it highly efficient for incremental data loads.

Exam trap

The trap here is that candidates often confuse partitions with distributions, thinking they are automatically aligned, or assume partitioning is only for rowstore indexes, when in fact columnstore indexes are the recommended and most common storage type for partitioning in dedicated SQL pool.

Practice this question →

190

MCQmedium

Your organization is implementing a data lake using Azure Data Lake Storage Gen2. You have a folder structure like '/data/landing/' for raw data and '/data/curated/' for cleaned data. The data is ingested daily from various sources. You need to ensure that data in the curated zone is immutable and cannot be modified or deleted by anyone, including administrators, for compliance reasons. However, data in the landing zone should be modifiable. What should you do?

A.Enable immutable storage with a time-based retention policy on the curated zone container

B.Remove the 'Delete' permission from the storage account key

C.Set ACLs on the curated zone folder to deny write and delete for all users

D.Use Azure RBAC to deny delete and write permissions for all users on the curated zone folder

AnswerA

Immutable storage prevents any modification or deletion until the retention period expires.

Why this answer

Option A is correct because Azure Data Lake Storage Gen2 supports immutable storage at the container level, which enforces a time-based retention policy that prevents any data from being modified or deleted—even by administrators—until the retention period expires. This directly meets the compliance requirement for the curated zone, while leaving the landing zone container unaffected and modifiable.

Exam trap

The trap here is that candidates often assume ACLs or RBAC alone can enforce immutability, but they fail to recognize that only container-level immutable storage provides the WORM guarantee that cannot be overridden by administrators or privileged accounts.

How to eliminate wrong answers

Option B is wrong because removing the 'Delete' permission from the storage account key does not prevent modifications (overwrites) and does not block privileged users like administrators who have other access methods (e.g., Azure RBAC, managed identities). Option C is wrong because ACLs on a folder can be overridden by users with higher-level permissions (e.g., storage account key, RBAC roles) and do not provide the legal hold or compliance-grade immutability required. Option D is wrong because Azure RBAC deny assignments can be bypassed by users with Owner or Contributor roles at a higher scope, and they do not enforce the same write-once-read-many (WORM) behavior as immutable storage; RBAC alone cannot prevent deletion by storage account key holders or service administrators.

Practice this question →

191

MCQeasy

A company is designing a data storage solution for IoT device telemetry. Each device sends a JSON payload every second. The data must be stored in a way that supports real-time dashboards and long-term analytics with low latency. Which Azure data store should be used for the ingestion layer?

A.Azure SQL Database

B.Azure Blob Storage

C.Azure Event Hubs

D.Azure Data Lake Storage

AnswerC

Event Hubs is designed for high-throughput data ingestion from IoT devices.

Why this answer

Azure Event Hubs is the correct choice for the ingestion layer because it is a fully managed, real-time data streaming platform designed to ingest millions of events per second with low latency. It supports the capture of JSON telemetry from IoT devices and integrates directly with downstream analytics services like Azure Stream Analytics for real-time dashboards and long-term storage in Azure Data Lake or Blob Storage. Its partitioned throughput model ensures scalable, durable ingestion without blocking producers.

Exam trap

The trap here is that candidates confuse the ingestion layer with the storage layer, choosing Azure Blob Storage or Data Lake Storage because they think 'store data' means persistent storage, but the question specifically asks for the ingestion layer where real-time, low-latency streaming is required, which Event Hubs uniquely provides.

How to eliminate wrong answers

Option A is wrong because Azure SQL Database is a relational OLTP store optimized for structured queries and ACID transactions, not for high-velocity, schema-less JSON ingestion at millions of events per second, and it would introduce latency and cost bottlenecks. Option B is wrong because Azure Blob Storage is an object store designed for batch and large-file storage, not for real-time, per-second event ingestion; it lacks native streaming ingestion, pub-sub semantics, and sub-second latency for dashboards. Option D is wrong because Azure Data Lake Storage is a hierarchical file system optimized for analytics on large datasets, not for real-time event ingestion; it is typically used as a destination for data after it has been processed or captured from a streaming source like Event Hubs.

Practice this question →