Microsoft Azure Data Fundamentals DP-900 DP-900 Questions 601–675 | Page 9/14

601

MCQeasy

Refer to the exhibit. You have a Power BI measure defined as shown. What does this measure return?

A.The count of online sales transactions.

B.The sum of Amount for each product sold online.

C.The total sales amount for all channels.

D.The total sales amount for online channel only.

AnswerD

CALCULATE with filter on Channel='Online'.

Why this answer

Option A is correct. The CALCULATE function evaluates the SUM of Amount for rows where Channel is "Online". Option B is wrong because it sums all sales.

Option C is wrong because it sums Amount per product, not total. Option D is wrong because it counts rows.

Full explanation →

602

MCQhard

A company stores terabytes of historical sales data as Parquet files in Azure Data Lake Storage Gen2. Business analysts need to run ad-hoc SQL queries that involve complex joins and aggregations over this data. They want to avoid provisioning a dedicated cluster or moving data into a separate database. The queries must be executed using standard T-SQL syntax. Which Azure service should they use?

A.Azure Synapse Analytics dedicated SQL pool

B.Azure Synapse Analytics serverless SQL pool

C.Azure Databricks

D.Azure HDInsight

AnswerB

The serverless SQL pool can query data in the data lake directly using T-SQL, scales on demand, and charges per query, fitting the requirement of no provisioning and no data movement.

Why this answer

Azure Synapse Analytics serverless SQL pool (B) is the correct choice because it allows you to query Parquet files directly in Azure Data Lake Storage Gen2 using standard T-SQL syntax without provisioning any dedicated cluster or moving data. It automatically scales compute resources to handle complex joins and aggregations on terabytes of data, charging only for the data processed. This matches the requirement for ad-hoc, serverless querying with familiar T-SQL.

Exam trap

The trap here is that candidates often confuse 'serverless SQL pool' with 'dedicated SQL pool' (Option A), assuming both require provisioning, or they mistakenly think Databricks (Option C) supports standard T-SQL, when it actually uses Spark SQL or Python.

How to eliminate wrong answers

Option A is wrong because Azure Synapse Analytics dedicated SQL pool requires provisioning a dedicated cluster with fixed compute resources, incurring ongoing costs even when idle, and it typically involves moving data into the pool's managed tables, which violates the requirement to avoid provisioning a dedicated cluster or moving data. Option C is wrong because Azure Databricks uses Spark SQL or Python notebooks, not standard T-SQL syntax, and requires a running cluster (provisioned compute) to execute queries, which contradicts the serverless and T-SQL requirements. Option D is wrong because Azure HDInsight is a managed Hadoop/Spark cluster that requires provisioning and managing a cluster, uses HiveQL or Spark SQL (not standard T-SQL), and is designed for batch processing rather than ad-hoc interactive SQL queries.

Full explanation →

603

MCQeasy

A company wants to provide self-service analytics to business users who need to create reports and dashboards from data in Azure Synapse Analytics. Which tool should you recommend?

A.Power BI

B.Microsoft Excel

C.Azure Synapse Studio

D.Azure Data Studio

AnswerA

Power BI is designed for business users to create interactive reports and dashboards.

Why this answer

Power BI is the correct tool because it is designed specifically for self-service analytics, enabling business users to create interactive reports and dashboards from data stored in Azure Synapse Analytics. Power BI connects directly to Synapse via its built-in connector, allowing users to build visualizations without writing code or relying on IT. This aligns with the requirement for business users to perform ad-hoc analysis and reporting.

Exam trap

The trap here is that candidates may confuse Azure Synapse Studio (a development tool) with a reporting tool, overlooking that Power BI is the designated Microsoft solution for self-service business intelligence and dashboards.

How to eliminate wrong answers

Option B is wrong because Microsoft Excel, while capable of basic data analysis and charting, lacks the native connectivity and interactive dashboard capabilities required for self-service analytics on Azure Synapse Analytics; it is not designed for real-time, large-scale data visualization. Option C is wrong because Azure Synapse Studio is a development and management interface for data engineers and data scientists to build pipelines, write SQL scripts, and manage Spark jobs, not a self-service reporting tool for business users. Option D is wrong because Azure Data Studio is a lightweight database management tool for querying and developing with SQL Server and Azure SQL, focused on developers and DBAs, not on creating business reports and dashboards.

Full explanation →

604

MCQhard

Refer to the exhibit. You are reviewing an ARM template snippet for an Azure storage account. You need to ensure that the storage account supports POSIX-like permissions for data lake workloads. Which property must be enabled?

A."allowBlobPublicAccess": false

B."hierarchicalNamespace": {"enabled": true}

C."kind": "DataLakeStorageGen2"

D."keySource": "Microsoft.Storage"

AnswerB

Enabling hierarchical namespace is required for Data Lake Storage Gen2 and POSIX ACLs.

Why this answer

The exhibit shows that hierarchicalNamespace is enabled. This is required for Data Lake Storage Gen2, which supports POSIX ACLs. The question asks which property must be enabled; the answer is the hierarchical namespace.

Option A is wrong because encryption key source is separate. Option C is wrong because data lake storage gen2 is not a property but the result. Option D is wrong because blob public access is unrelated.

Full explanation →

605

MCQhard

A company uses Azure Data Lake Storage Gen2 to store analytics data. The security team requires that all data access be audited, including read, write, and delete operations. Which feature should you enable?

A.Enable Microsoft Purview.

B.Enable Azure Monitor for the storage account.

C.Enable Storage Analytics logging.

D.Enable Microsoft Defender for Cloud.

AnswerC

Storage Analytics logs track all operations.

Why this answer

Storage Analytics logging captures detailed information about successful and failed requests to a storage account, including read, write, and delete operations. This log data can be used for auditing and analysis, meeting the security team's requirement to audit all data access. The logs are stored in a $logs container within the same storage account.

Exam trap

The trap here is that candidates may confuse Azure Monitor (which provides metrics and alerts) with Storage Analytics logging (which provides detailed request-level logs), or think that Microsoft Purview or Defender for Cloud can fulfill the auditing requirement when they serve different purposes.

How to eliminate wrong answers

Option A is wrong because Microsoft Purview is a data governance and catalog service, not an auditing tool for storage access logs. Option B is wrong because Azure Monitor provides metrics and alerts for storage accounts but does not log individual read, write, and delete operations at the request level. Option D is wrong because Microsoft Defender for Cloud is a security posture management and threat protection service, not a feature for auditing granular data access operations.

Full explanation →

606

MCQmedium

A company uses Azure SQL Database and wants to ensure that a specific query always uses a particular index. What should they do?

A.Update statistics on the table

B.Use a query hint to force the index

C.Rebuild the index

D.Enable Query Store

AnswerB

Query hints like WITH (INDEX(index_name)) force the optimizer to use a specific index.

Why this answer

Option B is correct because query hints, specifically the INDEX hint, allow you to force the query optimizer to use a particular index for a specific query. In Azure SQL Database, this is done by adding `OPTION (TABLE HINT (table_name, INDEX (index_name)))` to the query, overriding the optimizer's default index selection.

Exam trap

The trap here is that candidates confuse index maintenance (rebuilding, updating stats) with query plan control, thinking that a well-maintained index will automatically be used, when in fact the optimizer may still choose a different index based on cost estimates.

How to eliminate wrong answers

Option A is wrong because updating statistics helps the optimizer make better decisions but does not force a specific index; it only improves the accuracy of cardinality estimates. Option C is wrong because rebuilding an index defragments it and updates statistics, but it does not guarantee the query will use that index; the optimizer may still choose a different index. Option D is wrong because Query Store is a monitoring and troubleshooting feature that tracks query performance and plan changes, but it cannot force a specific index for a query; it can only force a specific query plan, not an index within that plan.

Full explanation →

607

MCQhard

Refer to the exhibit. The JSON shows an Azure Policy definition. Which effect should be used to proactively prevent creation of storage accounts without encryption?

A.AuditIfNotExists

B.Deny

C.Disabled

D.Append

AnswerB

Deny prevents non-compliant resource creation.

Why this answer

The 'Deny' effect is correct because it proactively blocks the creation or update of a storage account that does not meet the encryption requirement, preventing non-compliant resources from being provisioned. This aligns with Azure Policy's ability to enforce compliance at resource creation time, rather than auditing or remediating after the fact.

Exam trap

The trap here is that candidates often confuse 'AuditIfNotExists' with a proactive block, not realizing it only logs non-compliance after the resource is created, whereas 'Deny' is the only effect that prevents creation entirely.

How to eliminate wrong answers

Option A (AuditIfNotExists) is wrong because it only logs a compliance warning when a storage account lacks encryption, but does not prevent its creation; it is a reactive audit effect. Option C (Disabled) is wrong because it turns off the policy entirely, allowing any storage account to be created without encryption. Option D (Append) is wrong because it adds additional fields to a resource during creation or update, but it cannot block a request; it is used to add tags or settings, not to deny non-compliant resources.

Full explanation →

608

MCQmedium

A financial application requires strict consistency and transaction support (ACID). Which Azure data service is most appropriate for storing its core transactional data?

A.Azure Cosmos DB

B.Azure SQL Database

C.Azure Table Storage

D.Azure Data Lake Storage

AnswerB

Azure SQL Database is a relational database engine that fully supports ACID properties for transactional workloads.

Why this answer

Azure SQL Database is a fully managed relational database service that provides full ACID (Atomicity, Consistency, Isolation, Durability) transaction support through its SQL Server engine. It is the correct choice for a financial application requiring strict consistency and transactional integrity, as it guarantees that all transactions are processed reliably and adhere to the ACID properties.

Exam trap

The trap here is that candidates often confuse Azure Cosmos DB's 'consistency levels' with full ACID transaction support, not realizing that Cosmos DB sacrifices strict transactional guarantees for global scalability and low latency.

How to eliminate wrong answers

Option A is wrong because Azure Cosmos DB is a NoSQL database that offers multiple consistency models (e.g., eventual, session, bounded staleness) but does not provide full ACID transaction support across multiple documents or partitions; it is optimized for global distribution and low latency, not strict transactional consistency. Option C is wrong because Azure Table Storage is a NoSQL key-value store that does not support ACID transactions; it offers only eventual consistency and lacks the relational integrity and transaction management required for core financial data. Option D is wrong because Azure Data Lake Storage is a massively scalable data lake for big data analytics, not a transactional database; it does not support ACID transactions or provide the relational query capabilities needed for core transactional data.

Full explanation →

609

MCQmedium

You are designing a data solution for a retail company that needs to store transactional data (orders, payments) with strong consistency and support for complex joins. The data volume is moderate but expected to grow. Which Azure service should you choose?

A.Azure Cosmos DB

B.Azure Table Storage

C.Azure SQL Database

D.Azure Synapse Analytics

AnswerC

Azure SQL Database provides full relational capabilities with strong consistency.

Why this answer

Azure SQL Database is a fully managed relational database that provides ACID transactions with strong consistency and supports complex joins via T-SQL. It is ideal for transactional workloads like orders and payments where data integrity and relational queries are critical, and it scales elastically to accommodate growing data volumes.

Exam trap

The trap here is that candidates confuse 'scalability' with 'suitability for transactional workloads' and choose Cosmos DB for its global distribution, overlooking that strong consistency and complex joins are not its core strengths.

How to eliminate wrong answers

Option A is wrong because Azure Cosmos DB is a NoSQL database that prioritizes horizontal scaling and low latency over strong consistency (defaulting to eventual consistency unless configured for higher cost) and does not natively support complex joins across multiple entities. Option B is wrong because Azure Table Storage is a key-value NoSQL store with no support for joins, foreign keys, or ACID transactions, making it unsuitable for relational transactional data. Option D is wrong because Azure Synapse Analytics is a big data analytics service designed for large-scale data warehousing and complex analytical queries, not for OLTP workloads requiring real-time transactional consistency and frequent small writes.

Full explanation →

610

MCQmedium

A retail company needs to run complex SQL queries on petabytes of historical sales data stored in Parquet files in Azure Data Lake Storage Gen2. They want a solution that provides fast query performance without managing infrastructure, and they prefer a pay-per-query pricing model. Which Azure service should they use?

A.Azure Synapse Analytics dedicated SQL pool

B.Azure SQL Database

C.Azure Synapse Serverless SQL pool

D.Azure HDInsight with Hive

AnswerC

Serverless SQL pool is ideal for querying data lakes with a pay-per-query model and no infrastructure management.

Why this answer

Azure Synapse Serverless SQL pool is correct because it allows querying petabytes of Parquet files in Azure Data Lake Storage Gen2 using T-SQL without provisioning any infrastructure, and it charges per terabyte of data processed (pay-per-query). This matches the requirements for fast query performance on historical sales data with a serverless, consumption-based pricing model.

Exam trap

The trap here is that candidates often confuse Azure Synapse Analytics dedicated SQL pool (provisioned, always-on) with the serverless SQL pool (pay-per-query), or assume that Azure SQL Database can handle big data analytics on Parquet files, when it is designed for OLTP workloads and lacks native support for querying external data lakes without additional services like PolyBase.

How to eliminate wrong answers

Option A is wrong because Azure Synapse Analytics dedicated SQL pool requires provisioning and managing dedicated compute resources (e.g., DWUs), incurring ongoing costs regardless of query usage, and does not offer a pay-per-query model. Option B is wrong because Azure SQL Database is a relational database service designed for transactional workloads, not for querying petabytes of Parquet files in Data Lake Storage Gen2, and it lacks native support for serverless querying of external data formats. Option D is wrong because Azure HDInsight with Hive requires managing a Hadoop cluster (provisioning VMs, scaling, patching), does not offer a pay-per-query pricing model, and incurs costs for running the cluster even when idle.

Full explanation →

611

Multi-Selecthard

Which THREE of the following are benefits of using a columnar storage format like Parquet for analytical workloads?

Select 3 answers

A.Enforced referential integrity constraints

B.Reduced I/O when querying a subset of columns

C.Optimized for frequent row updates

D.Better compression ratios due to similar data types in columns

E.Support for predicate pushdown to skip irrelevant data

AnswersB, D, E

Reading fewer columns reduces I/O and speeds up queries.

Why this answer

Option B is correct because columnar storage formats like Parquet store data by column rather than by row. When a query only needs a subset of columns (e.g., SELECT SUM(sales) FROM table), the storage engine reads only the relevant column chunks from disk, dramatically reducing I/O compared to reading entire rows. This is a key performance advantage for analytical workloads that aggregate or filter on specific columns.

Exam trap

The trap here is that candidates confuse the benefits of columnar storage (optimized for read-heavy, aggregate queries) with row-oriented storage benefits (optimized for frequent updates and transactional integrity), leading them to incorrectly select Option C.

Full explanation →

612

MCQeasy

A social media application stores user posts as JSON documents in Azure Cosmos DB. Each post includes fields such as postId, userId, content, timestamp, and an array of tags. The development team wants to query posts by userId and timestamp range using a SQL-like syntax. Which Azure Cosmos DB API should they choose?

A.A. Azure Cosmos DB for MongoDB API

B.B. Azure Cosmos DB for NoSQL API (Core SQL API)

C.C. Azure Cosmos DB for Table API

D.D. Azure Cosmos DB for Apache Cassandra API

AnswerB

The NoSQL API natively supports JSON documents and provides a SQL-like query language (SELECT ... WHERE ...). It can efficiently query on userId and timestamp fields, making it the optimal choice.

Why this answer

The Azure Cosmos DB for NoSQL API (Core SQL API) is the correct choice because it natively supports SQL-like querying (SELECT, WHERE, ORDER BY) over JSON documents. The team's requirement to query posts by userId and timestamp range using SQL-like syntax is directly supported by this API, which treats each JSON document as an item and allows filtering on nested fields like userId and timestamp. Other APIs either lack native SQL-like syntax or are optimized for different data models (e.g., MongoDB uses a JSON-like query language, Table API uses OData, Cassandra uses CQL).

Exam trap

The trap here is that candidates confuse 'SQL-like syntax' with any API that supports querying, but only the Core SQL API provides native SQL SELECT statements over JSON documents, while other APIs use different query languages (e.g., MongoDB's query operators, Cassandra's CQL) that are not SQL-like in the standard sense.

How to eliminate wrong answers

Option A is wrong because the Azure Cosmos DB for MongoDB API uses MongoDB's query language (e.g., db.posts.find({userId: ..., timestamp: ...})) rather than SQL-like syntax; it does not support SQL SELECT statements. Option C is wrong because the Azure Cosmos DB for Table API is designed for key-value and wide-column data with OData-based queries, not for querying nested JSON fields like userId and timestamp with SQL syntax. Option D is wrong because the Azure Cosmos DB for Apache Cassandra API uses CQL (Cassandra Query Language), which is similar to SQL but has limitations (e.g., no range queries on non-primary key columns without an index) and is not standard SQL; it is optimized for wide-column stores, not JSON documents.

Full explanation →

613

MCQmedium

A global social media app uses Azure Cosmos DB (NoSQL API) to store user profile data. The app is read-heavy and requires the fastest possible read performance worldwide. The data is updated by users and eventual consistency is acceptable because immediate consistency is not critical for profile views. Which consistency level should they choose to minimize read latency?

A.Strong

B.Bounded staleness

C.Session

D.Eventual

AnswerD

Eventual consistency provides the weakest guarantee but the lowest read latency. Reads are served from any replica without waiting for write propagation, making it ideal for read-heavy workloads where immediate consistency is not required.

Why this answer

Eventual consistency offers the lowest read latency because it allows reads from any replica without waiting for confirmation that the data is the most recent version. Since the app is read-heavy and eventual consistency is acceptable, this consistency level minimizes latency by not requiring any synchronization or staleness bounds.

Exam trap

The trap here is that candidates often assume Strong or Bounded staleness are required for any data that is updated, but the question explicitly states eventual consistency is acceptable, making Eventual the optimal choice for minimizing read latency.

How to eliminate wrong answers

Option A is wrong because Strong consistency requires reads to return the most recent write, which forces synchronization across replicas and increases latency, especially globally. Option B is wrong because Bounded staleness, while more relaxed than Strong, still imposes a maximum staleness bound (time or operations) that requires coordination, adding latency compared to Eventual. Option C is wrong because Session consistency guarantees monotonic reads and writes within a single client session, which introduces overhead to maintain session context and does not provide the lowest possible read latency.

Full explanation →

614

MCQmedium

A global e-commerce company needs to store user session data (key-value pairs) for a web application hosted in multiple Azure regions. The data must support low-latency reads and writes (under 10 ms) and be automatically replicated across regions for high availability. The development team also requires the ability to query sessions by user ID using a simple key lookup and occasionally filter by secondary attributes such as timestamp. Which Azure data store should they choose?

A.Azure Cosmos DB

B.Azure Table Storage

C.Azure SQL Database

D.Azure Cache for Redis

AnswerA

Cosmos DB provides global distribution, low latency, automatic indexing, and multiple consistency models, making it ideal for globally distributed key-value workloads that need secondary query support.

Why this answer

Azure Cosmos DB is correct because it provides globally distributed, multi-region writes with automatic replication, guaranteeing low-latency reads and writes under 10 ms at the 99th percentile. Its key-value API (Table API or SQL API) supports simple key lookups by user ID and secondary indexing on attributes like timestamp, meeting all stated requirements.

Exam trap

The trap here is that candidates often confuse Azure Cache for Redis as a durable data store for session data, overlooking that it is primarily a caching layer and lacks the built-in multi-region replication and durability guarantees required for high availability in a global e-commerce scenario.

How to eliminate wrong answers

Option B (Azure Table Storage) is wrong because it does not support automatic multi-region replication for high availability; it is a single-region service with optional read-access geo-redundant storage (RA-GRS) that only provides read replicas, not active multi-region writes. Option C (Azure SQL Database) is wrong because it is a relational database that introduces overhead for simple key-value lookups and does not natively support multi-region writes with sub-10 ms latency without complex configuration. Option D (Azure Cache for Redis) is wrong because it is an in-memory cache, not a durable data store; data is lost on failure unless persistence is configured, and it lacks native multi-region replication for active-active writes.

Full explanation →

615

MCQeasy

A company runs a real-time dashboard in Power BI that displays sales data from Azure Synapse Analytics. The dashboard must show data with less than 5 seconds of latency. Which Azure service should be used to ingest streaming sales events into Azure Synapse Analytics?

A.Azure SQL Database

B.Azure Stream Analytics

C.Azure Data Factory

D.Azure Databricks

AnswerB

Azure Stream Analytics is a real-time analytics service that can process streaming data with sub-second latency and output directly to Azure Synapse Analytics.

Why this answer

Azure Stream Analytics is the correct choice because it is designed for real-time stream processing, capable of ingesting high-velocity streaming sales events and outputting them to Azure Synapse Analytics with sub-second latency. This meets the requirement of less than 5 seconds of latency for the Power BI dashboard.

Exam trap

The trap here is that candidates often confuse Azure Data Factory (a batch ETL tool) with a real-time streaming service, or they assume Azure SQL Database can handle streaming ingestion, but neither supports the required sub-5-second latency for continuous data flow into Synapse Analytics.

How to eliminate wrong answers

Option A is wrong because Azure SQL Database is a relational database for OLTP workloads, not a stream ingestion service, and it cannot natively process streaming data with low latency into Synapse. Option C is wrong because Azure Data Factory is a cloud-based ETL and data orchestration service designed for batch data movement and transformation, not real-time streaming with sub-5-second latency. Option D is wrong because Azure Databricks is an analytics platform for big data processing and machine learning, but it is not optimized for low-latency stream ingestion into Synapse; it typically requires additional streaming tools like Structured Streaming and adds overhead.

Full explanation →

616

MCQmedium

A company stores backup files in Azure Blob Storage. The backup files are accessed frequently for the first 30 days, then only rarely for the next six months. After one year, the files must be retained for compliance but are never accessed. The company wants to minimize storage costs. Which solution should they use?

A.Manually move files between storage accounts

B.Use Azure Blob Storage lifecycle management policies

C.Use Azure File Sync

D.Use Azure NetApp Files

AnswerB

Lifecycle management policies can automatically move blobs to cooler tiers (Cool, Archive) based on age, optimizing cost.

Why this answer

Azure Blob Storage lifecycle management policies allow you to automatically transition blobs to cooler tiers (e.g., from Hot to Cool after 30 days, then to Archive after one year) and delete blobs after a specified period, all without manual intervention. This directly matches the access pattern: frequent access for 30 days, rare access for six months, and never accessed after one year, minimizing storage costs by using the most cost-effective tier for each phase.

Exam trap

The trap here is that candidates may confuse Azure File Sync or Azure NetApp Files as viable storage options for backups, but these services are designed for active file sharing and high-performance workloads, not for cost-optimized, tiered archival of rarely accessed blob data.

How to eliminate wrong answers

Option A is wrong because manually moving files between storage accounts is labor-intensive, error-prone, and does not leverage Azure's built-in tiering or automation, leading to higher operational costs and potential compliance gaps. Option C is wrong because Azure File Sync is designed for synchronizing on-premises file servers with Azure file shares, not for managing blob lifecycle or tier transitions; it does not support blob storage tiers or automated deletion based on age. Option D is wrong because Azure NetApp Files is a high-performance, enterprise-grade NFS/SMB file share service for demanding workloads, not a cost-optimized solution for infrequently accessed backup blobs; it is significantly more expensive than blob storage tiers and lacks lifecycle management for archival.

Full explanation →

617

MCQeasy

A company plans to implement a near-real-time analytics solution for streaming IoT sensor data. Which Azure service should they use to ingest and process the data streams?

A.Azure Data Factory

B.Azure Synapse Analytics

C.Azure Data Lake Storage Gen2

D.Azure Stream Analytics

AnswerD

Designed for real-time stream processing and analytics.

Why this answer

Azure Stream Analytics is a real-time event processing engine designed to ingest, process, and analyze high-velocity streaming data from sources like IoT sensors. It supports SQL-based queries to transform and route data streams to outputs such as Power BI or Azure Synapse, making it ideal for near-real-time analytics.

Exam trap

The trap here is that candidates often confuse batch-oriented services like Azure Data Factory or storage services like Data Lake Storage Gen2 with real-time stream processing, overlooking that Stream Analytics is the dedicated service for near-real-time data stream ingestion and analysis.

How to eliminate wrong answers

Option A is wrong because Azure Data Factory is a cloud-based ETL and data integration service for batch data movement and orchestration, not designed for real-time stream ingestion. Option B is wrong because Azure Synapse Analytics is a unified analytics platform for large-scale data warehousing and big data analytics, but it relies on separate streaming services like Stream Analytics for real-time ingestion. Option C is wrong because Azure Data Lake Storage Gen2 is a scalable data lake storage solution for storing structured and unstructured data, not a stream processing engine.

Full explanation →

618

MCQhard

A company is designing an enterprise analytics solution. They store raw data in its original format in a scalable repository, apply schema and transformations at read time, and also maintain a curated layer that enforces ACID transactions for data reliability. This architecture combines the flexibility of a data lake with the reliability of a data warehouse. Which term best describes this modern data architecture?

A.Data lakehouse

B.Data mart

C.Operational database

D.Data pipeline

AnswerA

Correct. The data lakehouse architecture combines a data lake's flexibility (schema-on-read) with a data warehouse's ACID transactions and performance, often implemented with Delta Lake.

Why this answer

The data lakehouse architecture combines the flexibility of a data lake (storing raw data in its original format in a scalable repository) with the reliability of a data warehouse (enforcing ACID transactions in a curated layer). This allows schema-on-read transformations while maintaining data integrity, making it the correct term for the described design.

Exam trap

The trap here is that candidates may confuse a data lakehouse with a data lake or data warehouse, missing the key combination of raw storage, schema-on-read, and ACID transactions that defines this modern architecture.

How to eliminate wrong answers

Option B is wrong because a data mart is a subset of a data warehouse focused on a specific business domain, not an architecture that combines a data lake with ACID transactions. Option C is wrong because an operational database is designed for real-time transaction processing (OLTP) and does not typically store raw data in a scalable repository or apply schema-on-read transformations. Option D is wrong because a data pipeline is a process for moving and transforming data between systems, not an architecture that combines storage and ACID reliability.

Full explanation →

619

Multi-Selecthard

Which TWO options are valid ways to secure data at rest in Azure Synapse Analytics?

Select 2 answers

A.Customer-managed keys (CMK) with TDE

B.Transparent Data Encryption (TDE)

C.Always Encrypted

D.Column-level security

E.Dynamic Data Masking

AnswersA, B

Provides additional control over encryption keys.

Why this answer

Customer-managed keys (CMK) with TDE is a valid method to secure data at rest in Azure Synapse Analytics because it allows you to bring your own key (BYOK) to encrypt the database encryption key (DEK), which is stored in Azure Key Vault. This provides an additional layer of control and separation of duties, ensuring that only authorized users can access the encryption keys and thus the underlying data.

Exam trap

The trap here is that candidates often confuse Dynamic Data Masking or Column-level security with data-at-rest encryption, but these are access control or obfuscation features, not encryption mechanisms that protect data stored on disk.

Full explanation →

620

MCQmedium

A company uses Azure SQL Database for an e-commerce application. The Orders table has millions of rows. Queries frequently filter on OrderDate and OrderStatus, and sort by OrderDate descending. Which indexing strategy will most improve query performance?

A.Create a clustered index on OrderDate and a non-clustered index on OrderStatus

B.Create a non-clustered index on (OrderDate, OrderStatus) and keep the existing clustered index on OrderID

C.Create a clustered index on OrderID and a non-clustered index on (OrderStatus, OrderDate)

D.Create a non-clustered index on (OrderDate DESC, OrderStatus) and keep the existing clustered index on OrderID

AnswerD

This index is a covering index that supports the filter on both columns and the sort order, allowing the query to retrieve data without additional sorting.

Why this answer

Option D creates a covering index for the most common query pattern: filtering on OrderDate and OrderStatus, and sorting by OrderDate descending. By specifying DESC in the index key, the index is ordered in the same direction as the sort, allowing SQL Server to avoid a sort operation and retrieve rows in order directly from the index. This non-clustered index can satisfy the query entirely without touching the clustered index (OrderID), reducing I/O and improving performance.

Exam trap

The trap here is that candidates assume any index on the filtered columns will help, but they overlook the importance of index key order matching the sort direction (DESC) to avoid a sort operation, which is a common performance pitfall in Azure SQL Database.

How to eliminate wrong answers

Option A is wrong because creating a clustered index on OrderDate would physically reorder the table by OrderDate, which can cause page splits and fragmentation due to frequent inserts, and it does not include OrderStatus for filtering, so queries would still need to look up rows. Option B is wrong because the non-clustered index on (OrderDate, OrderStatus) is not ordered descending, so queries sorting by OrderDate DESC would require an expensive sort operation; also, the clustered index on OrderID is fine, but the index order does not match the query sort. Option C is wrong because a non-clustered index on (OrderStatus, OrderDate) does not support the sort by OrderDate DESC efficiently (the leading column is OrderStatus, not OrderDate), and the clustered index on OrderID offers no benefit for date-range filtering.

Full explanation →

621

MCQmedium

You are designing a data storage solution for a social media application that stores user profile pictures and uploaded photos. The solution must support high throughput and be optimized for reading and writing large binary objects. Which Azure data service should you recommend?

A.Azure Files

B.Azure Blob Storage

C.Azure SQL Database

D.Azure Cosmos DB

AnswerB

Azure Blob Storage is designed for storing large amounts of unstructured data, such as images, and offers high throughput.

Why this answer

Azure Blob Storage is optimized for storing large amounts of unstructured data, such as images and videos, and provides high throughput for read/write operations. Option A is wrong because Azure Cosmos DB is a NoSQL database for structured data, not optimized for large binary objects. Option B is wrong because Azure Files is a fully managed file share for SMB/NFS, not ideal for high-throughput binary object storage.

Option D is wrong because Azure SQL Database is a relational database for structured data.

Full explanation →

622

Multi-Selectmedium

Which TWO features are available in Azure SQL Database to help protect data at rest?

Select 2 answers

A.Transparent Data Encryption (TDE)

B.Dynamic Data Masking

C.Always Encrypted

D.Auditing

E.Row-Level Security

AnswersA, C

TDE encrypts the entire database at rest.

Why this answer

Option A and Option C are correct. Transparent Data Encryption encrypts the database files at rest. Always Encrypted encrypts sensitive columns at rest and in use.

Option B is wrong because Dynamic Data Masking does not encrypt data; it masks it in query results. Option D is wrong because Auditing is for tracking, not encryption. Option E is wrong because Row-Level Security controls access, not encryption.

Full explanation →

623

MCQmedium

A company stores sensor data in Azure Blob Storage. The data is appended every minute and rarely modified. The compliance team requires that blobs older than 90 days be moved to a more cost-effective storage tier, and blobs older than 365 days be deleted. Which solution should you recommend?

A.Use Azure Backup to set a retention policy for the storage account.

B.Configure a Blob Storage lifecycle management policy.

C.Enable Blob Soft Delete and set retention days to 365.

D.Move the blobs to Azure Files and set a file retention policy.

AnswerB

Lifecycle policies automate tiering and deletion.

Why this answer

A lifecycle management policy can automatically transition blobs to cooler tiers (e.g., Cool after 90 days) and delete them after 365 days. Option B is wrong because Azure Files is for file shares. Option C is wrong because Blob Storage does not have a built-in backup retention policy for this scenario.

Option D is wrong because Soft Delete is for recovery, not automated tiering or deletion.

Full explanation →

624

MCQeasy

A company stores terabytes of customer support chat transcripts in JSON format. The data is rarely modified and needs to be accessed by analysts using SQL queries. The analysts do not want to manage servers or provision throughput. Which Azure service should be used to store and query this data?

A.Azure Blob Storage (with Azure Data Lake Storage Gen2) and query using Azure Synapse Serverless SQL

B.Azure Cosmos DB

C.Azure Table Storage

D.Azure SQL Database

AnswerA

Correct. This combination provides cost-effective storage and serverless SQL querying without infrastructure management.

Why this answer

Azure Blob Storage with Azure Data Lake Storage Gen2 provides a cost-effective, scalable solution for storing large volumes of JSON data in its native format. By using Azure Synapse Serverless SQL, analysts can query this data directly with standard T-SQL without provisioning any infrastructure or managing throughput, meeting the requirement for serverless, on-demand querying of rarely modified data.

Exam trap

The trap here is that candidates often confuse Azure Cosmos DB's SQL API with traditional SQL querying, overlooking the requirement to avoid provisioning throughput, or they assume Azure Table Storage supports SQL queries when it only supports key-value lookups via REST or OData.

How to eliminate wrong answers

Option B is wrong because Azure Cosmos DB is a NoSQL database designed for globally distributed, low-latency access and requires provisioning throughput (RU/s), which contradicts the requirement to avoid managing throughput. Option C is wrong because Azure Table Storage is a key-value store that does not support SQL queries natively; it uses OData or REST APIs, not SQL. Option D is wrong because Azure SQL Database is a relational database that requires provisioning a server and managing throughput (DTUs or vCores), which violates the requirement to not manage servers or provision throughput.

Full explanation →

625

MCQmedium

A company uses Azure SQL Database for an order management system. The 'Orders' table has millions of rows and is queried frequently with filters on OrderDate and CustomerID. The table currently has a clustered index on OrderID. Which action will most improve query performance for these frequent filters?

A.Create a non-clustered index on OrderDate and CustomerID

B.Create a clustered index on OrderDate

C.Create a non-clustered index on OrderID

D.Partition the table by CustomerID

AnswerA

Correct. A non-clustered index on the columns used in WHERE clauses (OrderDate, CustomerID) allows the database engine to quickly locate rows without scanning the entire table.

Why this answer

The frequent filters on OrderDate and CustomerID require a covering index that includes both columns. A non-clustered index on (OrderDate, CustomerID) allows SQL Server to perform an index seek for queries filtering on those columns, avoiding full clustered index scans on the existing clustered index on OrderID. This directly reduces I/O and improves query response times.

Exam trap

The trap here is that candidates often assume partitioning alone solves query performance issues, but without an appropriate index, partitioning only helps with data management and partition elimination, not with efficient row-level filtering for specific column combinations.

How to eliminate wrong answers

Option B is wrong because changing the clustered index to OrderDate would reorganize the entire table's physical order, which could slow down other queries that rely on the current OrderID ordering and would not directly benefit the specific filter on CustomerID. Option C is wrong because creating a non-clustered index on OrderID duplicates the existing clustered index's key column, offering no performance gain for filters on OrderDate and CustomerID. Option D is wrong because partitioning the table by CustomerID improves manageability and partition elimination for range scans, but it does not create a seekable index structure for the specific combination of OrderDate and CustomerID; queries would still require scanning all partitions unless an appropriate index exists.

Full explanation →

626

MCQmedium

A company runs an e-commerce application on Azure SQL Database. The application experiences unpredictable traffic spikes during flash sales and promotional events. The company wants to automatically scale compute resources based on actual demand and pay only for the resources consumed. Which Azure SQL Database deployment option best meets these requirements?

A.Serverless

B.Provisioned DTU

C.Provisioned vCore

D.Hyperscale

AnswerA

Correct. Serverless compute automatically scales and pauses, ideal for intermittent, unpredictable workloads.

Why this answer

The Serverless deployment option for Azure SQL Database automatically scales compute resources (vCores) based on actual demand, pausing the database during idle periods and resuming on the first connection. This model charges per second for the compute used, making it ideal for unpredictable traffic spikes like flash sales, as it eliminates the need to over-provision and ensures you pay only for consumed resources.

Exam trap

The trap here is that candidates confuse Hyperscale's storage scalability with compute auto-scaling, or assume Provisioned tiers can automatically scale without manual intervention, when in fact only Serverless provides automatic compute scaling and per-second billing for intermittent workloads.

How to eliminate wrong answers

Option B is wrong because Provisioned DTU uses a fixed, pre-allocated compute and storage bundle that cannot automatically scale based on demand; you must manually change the service tier or use elastic pools, which still require upfront sizing. Option C is wrong because Provisioned vCore also uses a fixed number of vCores that must be manually scaled up or down, and it charges for the provisioned compute even when idle, not per-second consumption. Option D is wrong because Hyperscale is designed for very large databases (up to 100 TB) with fast scaling of storage and read replicas, but its compute tier is provisioned (not serverless) and does not auto-pause or charge per-second for compute; it targets high-throughput workloads, not intermittent bursty traffic.

Full explanation →

627

MCQmedium

A gaming application stores player profiles as JSON documents. Each profile has standard fields like playerId, username, and email, but also optional fields such as achievements and gamePreferences. The application needs to query profiles by playerId with low latency and also run SQL-like queries to find players with specific achievements. Which Azure Cosmos DB API should they choose?

A.A: Table API

B.B: MongoDB API

C.C: SQL (Core) API

D.D: Cassandra API

AnswerC

The SQL API provides native support for JSON documents, low-latency point reads by partition key (playerId), and the ability to run SQL-like queries on document fields such as achievements.

Why this answer

The SQL (Core) API is the best choice because it natively supports JSON documents with flexible schemas (including optional fields like achievements and gamePreferences), provides low-latency point reads by playerId using the id field as the partition key, and enables SQL-like queries (e.g., SELECT * FROM c WHERE ARRAY_CONTAINS(c.achievements, 'specific_achievement')) without requiring a separate indexing or translation layer.

Exam trap

The trap here is that candidates often confuse the MongoDB API's JSON document support with SQL-like query capability, but the question specifically requires SQL-like queries, which only the SQL (Core) API provides natively.

How to eliminate wrong answers

Option A is wrong because the Table API is designed for key-value storage with a fixed schema (entities with properties), not for querying nested JSON arrays like achievements, and it lacks native SQL-like query support for complex JSON structures. Option B is wrong because the MongoDB API uses MongoDB's query language (e.g., db.collection.find({achievements: 'specific_achievement'})) rather than SQL-like syntax, and the question explicitly requires SQL-like queries. Option D is wrong because the Cassandra API uses CQL (Cassandra Query Language) which is not SQL-like in the relational sense, and it is optimized for wide-column storage with a flat schema, not for querying nested JSON documents or optional fields.

Full explanation →

628

MCQeasy

A bank processes online fund transfers. Each transaction must ensure that either both the debit from the sender's account and the credit to the receiver's account occur, or if any part fails, the entire transaction is rolled back. Which ACID property does this guarantee?

A.Atomicity

B.Consistency

C.Isolation

D.Durability

AnswerA

Atomicity guarantees that all operations in a transaction are treated as a single unit. If any operation fails, the entire transaction is rolled back, leaving data unchanged. This directly applies to the bank scenario.

Why this answer

Atomicity ensures that a transaction is treated as a single, indivisible unit of work. In this fund transfer scenario, atomicity guarantees that both the debit and credit operations either complete successfully together or are fully rolled back if any part fails, preventing partial updates that could leave the system in an inconsistent state.

Exam trap

Microsoft often tests atomicity by describing a multi-step operation and asking which ACID property ensures the 'all-or-nothing' behavior, and the trap here is that candidates confuse atomicity with consistency, thinking that consistency alone prevents partial updates, when in fact atomicity is the property that enforces the rollback of incomplete transactions.

How to eliminate wrong answers

Option B (Consistency) is wrong because consistency ensures that a transaction brings the database from one valid state to another, preserving all defined rules (e.g., constraints, triggers), but it does not inherently guarantee the all-or-nothing behavior of the debit and credit pair. Option C (Isolation) is wrong because isolation controls how concurrent transactions are executed to prevent interference (e.g., dirty reads), not the atomic completion of a single transaction's operations. Option D (Durability) is wrong because durability guarantees that once a transaction is committed, its changes persist even after a system failure, but it does not address the rollback of a failed transaction.

Full explanation →

629

MCQeasy

Your organization has a data lake on Azure Data Lake Storage Gen2 containing petabytes of raw clickstream data. Data scientists need to run exploratory analysis using Python and Spark, but they are not experienced with cluster management or infrastructure. The IT team wants to minimize administrative overhead while providing a collaborative notebook environment. Additionally, the solution must integrate with Microsoft Purview for data cataloging and lineage. Which Azure service should you recommend?

A.Azure Databricks

B.Azure Data Science Virtual Machine

C.Azure Synapse Analytics (Synapse Studio)

D.Azure HDInsight (Spark cluster)

AnswerA

Databricks provides a collaborative notebook environment, automated cluster management, and integrates with Microsoft Purview.

Why this answer

Azure Databricks is the correct choice because it provides a fully managed, collaborative notebook environment optimized for Apache Spark, allowing data scientists to run Python and Spark-based exploratory analysis without managing clusters. It integrates natively with Azure Data Lake Storage Gen2 for accessing petabytes of clickstream data and supports Microsoft Purview for automated data cataloging and lineage tracking, minimizing administrative overhead.

Exam trap

The trap here is that candidates may choose Azure Synapse Analytics because it also supports Spark and notebooks, but they overlook that Databricks is purpose-built for collaborative data science with minimal infrastructure management, while Synapse is optimized for data warehousing and ETL workloads.

How to eliminate wrong answers

Option B (Azure Data Science Virtual Machine) is wrong because it is a pre-configured VM that requires manual cluster management and scaling, lacking the serverless Spark capabilities and collaborative notebook environment needed for petabyte-scale analysis. Option C (Azure Synapse Analytics) is wrong because while it offers Spark pools and notebooks, it is primarily designed for enterprise data warehousing and ETL, and its collaborative notebook experience is less mature than Databricks, with higher administrative overhead for cluster management. Option D (Azure HDInsight) is wrong because it requires manual cluster provisioning, configuration, and scaling, and does not provide a built-in collaborative notebook environment, increasing administrative burden for data scientists unfamiliar with infrastructure.

Full explanation →

630

MCQeasy

A company maintains a database of customer orders that are updated frequently. They also store aggregated monthly sales reports that are generated once and then only read. Which statement correctly distinguishes these two types of data workloads?

A.Transactional data is optimized for write operations, and analytical data is optimized for read operations.

B.Transactional data must always be stored in non-relational databases, and analytical data in relational databases.

C.Analytical data always requires real-time processing, whereas transactional data is batch-processed.

D.Transactional data is read-only and analytical data is frequently updated.

AnswerA

This is correct. OLTP systems are designed for efficient writes, while OLAP systems are designed for complex reads.

Why this answer

Option A is correct because transactional workloads (like the frequently updated customer orders) are optimized for write-heavy operations, ensuring ACID compliance and data integrity, while analytical workloads (like the read-only monthly sales reports) are optimized for read-heavy operations, often using columnar storage or pre-aggregated data to speed up queries. This distinction aligns with the core difference between OLTP (Online Transaction Processing) and OLAP (Online Analytical Processing) systems in Azure, such as Azure SQL Database for transactional data and Azure Synapse Analytics for analytical data.

Exam trap

The trap here is that candidates confuse the typical characteristics of OLTP and OLAP, mistakenly thinking analytical data requires real-time processing or that transactional data is read-only, when in fact the opposite is true for each.

How to eliminate wrong answers

Option B is wrong because transactional data can be stored in both relational databases (e.g., Azure SQL Database) and non-relational databases (e.g., Azure Cosmos DB), and analytical data is often stored in relational or specialized columnar stores (e.g., Azure Synapse), not exclusively in one type. Option C is wrong because analytical data typically uses batch processing (e.g., nightly ETL jobs) rather than real-time processing, while transactional data requires real-time or near-real-time processing for individual write operations. Option D is wrong because transactional data is frequently updated (write-heavy), not read-only, and analytical data is typically read-only or updated in bulk during refresh cycles, not frequently updated.

Full explanation →

631

MCQmedium

A retail company needs to analyze sales transactions as they occur to detect fraud patterns and immediately block suspicious orders. They also need to run daily batch reports on historical sales data. Which combination of Azure services should they use to meet both real-time and batch processing requirements?

A.Azure Stream Analytics for real-time processing and Azure Synapse Analytics for batch analytics

B.Azure Data Factory for both real-time and batch processing

C.Azure Logic Apps for real-time processing and Azure Synapse Analytics for batch analytics

D.Azure Stream Analytics for both real-time and batch processing

AnswerA

Stream Analytics handles real-time insights, Synapse Analytics handles large-scale batch queries.

Why this answer

Azure Stream Analytics is purpose-built for real-time data streaming and can process sales transactions as they occur to detect fraud patterns and block suspicious orders immediately. Azure Synapse Analytics provides a unified analytics platform that can run large-scale batch queries on historical sales data for daily reports, making this combination ideal for both real-time and batch processing needs.

Exam trap

The trap here is that candidates often confuse Azure Data Factory's orchestration capabilities with real-time processing, or assume that a single service like Stream Analytics can handle both streaming and batch analytics, when in fact each service is specialized for a distinct workload type.

How to eliminate wrong answers

Option B is wrong because Azure Data Factory is an orchestration and ETL service for data movement and transformation, not a real-time stream processing engine; it cannot process transactions as they occur with sub-second latency. Option C is wrong because Azure Logic Apps is designed for workflow automation and integration, not for high-throughput, low-latency real-time stream analytics required for fraud detection. Option D is wrong because Azure Stream Analytics is optimized for real-time stream processing and does not natively support batch analytics on historical data; it lacks the SQL-based analytical engine and large-scale query capabilities of a dedicated batch analytics service like Synapse.

Full explanation →

632

MCQmedium

You are designing a solution to store IoT device telemetry data. Each message is a small JSON payload (1-2 KB). The data is written once and read frequently for real-time dashboards. Which Azure data store should you use?

A.Azure SQL Database

B.Azure Cosmos DB

C.Azure Blob Storage

D.Azure Table Storage

AnswerB

Cosmos DB offers low-latency reads for JSON.

Why this answer

Azure Cosmos DB is the correct choice because it is a globally distributed, multi-model database service that offers single-digit millisecond read and write latencies at any scale, making it ideal for real-time dashboards consuming IoT telemetry. Its support for JSON documents natively aligns with the small JSON payloads, and its ability to handle high-throughput writes (once) and low-latency reads (frequently) without schema management fits the workload perfectly.

Exam trap

The trap here is that candidates often choose Azure Blob Storage because they associate 'JSON payloads' with 'files,' overlooking that Blob Storage lacks the low-latency query and indexing capabilities required for real-time dashboards, while Cosmos DB is purpose-built for such operational workloads.

How to eliminate wrong answers

Option A is wrong because Azure SQL Database is a relational database that requires a fixed schema and is optimized for complex queries and transactions, not for the high-velocity, schema-less JSON ingestion typical of IoT telemetry. Option C is wrong because Azure Blob Storage is designed for storing large, unstructured binary objects (e.g., images, videos, backups) and does not provide the sub-second query latency or indexing needed for real-time dashboards; it is better suited for archival or batch processing of telemetry data. Option D is wrong because Azure Table Storage is a key-value store that lacks native JSON support, advanced indexing, and the low-latency read capabilities required for real-time dashboards; it is more appropriate for simple, high-volume structured data with limited query patterns.

Full explanation →

633

MCQhard

A retail company ingests daily sales data from multiple stores as CSV files stored in Azure Blob Storage. The data must be cleaned and transformed using Spark, then loaded into Azure Synapse Analytics for large-scale reporting. The pipeline must run on a schedule, handle failures with retries, and minimize manual intervention. Which combination of Azure services should they use to orchestrate and execute this pipeline?

A.Azure Data Factory, Azure Databricks, and Azure Synapse Analytics.

B.Azure Stream Analytics, Azure Data Lake Storage, and Power BI.

C.Azure Functions, Azure SQL Database, and Azure Analysis Services.

D.Azure Logic Apps, Azure HDInsight, and Azure Cosmos DB.

AnswerA

Correct. ADF orchestrates the pipeline, Databricks performs Spark-based transformations, and Synapse Analytics serves as the data warehouse for reporting.

Why this answer

Option A is correct because Azure Data Factory provides the orchestration and scheduling layer, Azure Databricks executes the Spark-based cleaning and transformation, and Azure Synapse Analytics serves as the target data warehouse for large-scale reporting. This combination supports retry policies for failure handling and minimizes manual intervention through automated pipeline execution.

Exam trap

The trap here is that candidates may confuse Azure Databricks with HDInsight or overlook the need for a dedicated orchestration service like Data Factory, assuming that a compute service alone can handle scheduling and retries.

How to eliminate wrong answers

Option B is wrong because Azure Stream Analytics is designed for real-time stream processing, not batch CSV ingestion and Spark transformations, and Power BI is a visualization tool, not a data transformation or orchestration service. Option C is wrong because Azure Functions is a serverless compute service unsuitable for complex Spark transformations, Azure SQL Database lacks the large-scale analytics capabilities of Synapse, and Azure Analysis Services is for semantic modeling, not data ingestion or transformation. Option D is wrong because Azure Logic Apps is a workflow automation tool for simple integrations, not robust pipeline orchestration with retries, Azure HDInsight is a managed Hadoop/Spark service but lacks the integrated orchestration and scheduling of Data Factory, and Azure Cosmos DB is a NoSQL database not designed for large-scale reporting workloads like Synapse Analytics.

Full explanation →

634

MCQmedium

A company is migrating a 500 GB financial database to Azure. The database requires low read/write latency, supports a high number of concurrent transactions, and must have a Recovery Point Objective (RPO) of less than 5 seconds and a Recovery Time Objective (RTO) of less than 30 minutes. The company is willing to pay more for these guarantees. Which Azure SQL Database service tier should they choose?

A.General Purpose

B.Business Critical

C.Hyperscale

D.Serverless (General Purpose)

AnswerB

Business Critical uses multiple synchronous replicas to achieve low latency, an RPO of less than 5 seconds, and an RTO of approximately 30 minutes, meeting the requirements.

Why this answer

Business Critical is the correct choice because it provides the lowest read/write latency through always-on secondary replicas and uses local SSD storage, which is essential for high-concurrency transactional workloads. It also guarantees an RPO of less than 5 seconds via synchronous data replication and an RTO of under 30 minutes, meeting the strict recovery requirements.

Exam trap

The trap here is that candidates often choose Hyperscale because it supports large databases and fast scaling, but they overlook that its RPO is not as tight as Business Critical's synchronous replication, and its read/write latency can be higher due to the page server architecture.

How to eliminate wrong answers

Option A is wrong because General Purpose uses remote blob storage with higher latency and asynchronous replication, resulting in an RPO of up to 5 minutes and RTO of 5-10 minutes, which fails the sub-5-second RPO requirement. Option C is wrong because Hyperscale, while offering fast scaling and low latency for reads, has an RPO of up to 5 seconds (not guaranteed under 5 seconds) and an RTO of up to 10 minutes, but its architecture is optimized for large databases with high throughput rather than ultra-low latency for high-concurrency OLTP; also, its RPO is not as tight as Business Critical's synchronous commit. Option D is wrong because Serverless (General Purpose) inherits the same latency and recovery limitations as General Purpose, with an RPO of up to 5 minutes and RTO of 5-10 minutes, and its auto-pause feature can introduce additional cold-start delays, making it unsuitable for strict RPO/RTO guarantees.

Full explanation →

635

Multi-Selectmedium

A data engineering team is building a data pipeline to run daily batch loads from an on-premises SQL Server to Azure Synapse Analytics. The pipeline must include data transformation using a visual interface with no coding, and must support schema mapping and data validation. Which THREE Azure services should be used together?

Select 3 answers

A.Azure Synapse Analytics

B.Azure Data Factory

C.Azure Databricks

D.Azure Blob Storage

E.Azure Analysis Services

AnswersA, B, D

Synapse is the target serving layer for analytics.

Why this answer

Azure Synapse Analytics is the correct destination for the pipeline because it is a cloud-based data warehouse that supports high-performance analytics on large-scale data, making it ideal for daily batch loads from SQL Server. It integrates natively with Azure Data Factory for orchestration and Azure Blob Storage for staging, enabling schema mapping and data validation through visual interfaces without coding.

Exam trap

The trap here is that candidates often assume Azure Databricks is required for transformations, but the question explicitly requires a visual interface with no coding, which Azure Data Factory's Mapping Data Flows provide, not Databricks' notebook-based approach.

Full explanation →

636

MCQmedium

A logistics company collects sensor data from delivery trucks. Each sensor sends a JSON message that includes a fixed set of core fields (truck ID, timestamp) but also includes optional fields such as temperature, humidity, and engine diagnostics depending on the sensor type. The JSON structure varies between messages. How should this data be classified?

A.Structured data

B.Semi-structured data

C.Unstructured data

D.Relational data

AnswerB

Correct. Semi-structured data does not have a rigid schema but has some organizational properties (tags, markers) to separate data elements. JSON with optional fields is a classic example.

Why this answer

The JSON messages contain a fixed set of core fields (truck ID, timestamp) but also include optional fields that vary per message, meaning the data has a flexible schema. This mixture of structured fields and variable attributes is the defining characteristic of semi-structured data, which does not require a rigid schema like a relational table but still has organizational properties (e.g., key-value pairs). In Azure, this type of data is commonly stored in services like Azure Cosmos DB or Azure Blob Storage with JSON format.

Exam trap

The trap here is that candidates often mistake any data with a consistent core set of fields as 'structured data', overlooking that the presence of optional, varying fields makes it semi-structured.

How to eliminate wrong answers

Option A is wrong because structured data requires a fixed, predefined schema (e.g., columns in a SQL table) with consistent fields across all records, but the JSON messages here have optional fields that vary. Option C is wrong because unstructured data has no predefined structure or schema (e.g., raw video files, plain text), whereas JSON has a defined key-value format. Option D is wrong because relational data specifically refers to data organized into tables with rows and columns linked by foreign keys, which is not the case for JSON messages with varying fields.

Full explanation →

637

MCQeasy

A data analyst needs to create a real-time dashboard in Power BI that refreshes every second from an Azure Stream Analytics job. Which Power BI feature should they use?

A.Scheduled refresh

B.Streaming dataset

C.DirectQuery

D.Import mode

AnswerB

Streaming datasets handle real-time data.

Why this answer

Option B is correct: Power BI streaming datasets support real-time refresh from Stream Analytics. Option A is wrong because scheduled refresh is for periodic data. Option C is wrong because DirectQuery is for interactive queries, not real-time streaming.

Option D is wrong because import mode is for static data.

Full explanation →

638

MCQmedium

Your organization stores IoT sensor data as JSON blobs in Azure Blob Storage. You need to query this data using SQL statements without moving the data. Which Azure service should you use?

A.Azure SQL Database

B.Azure Cosmos DB

C.Azure Data Lake Storage Gen2

D.Azure Synapse Serverless SQL

AnswerD

Azure Synapse Serverless SQL can query JSON files in Blob Storage using T-SQL.

Why this answer

Option A is correct because Azure Synapse Serverless SQL can query JSON data directly from Blob Storage using OPENROWSET with SQL. Option B is wrong because Azure Cosmos DB is a NoSQL database, not a query service over Blob Storage. Option C is wrong because Azure Data Lake Storage is a storage service, not a query engine.

Option D is wrong because SQL Database is a relational database, not for querying files in Blob Storage.

Full explanation →

639

MCQhard

A global e-commerce platform uses a combination of relational and NoSQL databases. The order management system requires ACID transactions across multiple tables (Orders, OrderItems, Inventory). The product catalog uses a flexible schema to accommodate varying product attributes and is read-heavy. The session store requires low-latency key-value lookups with eventual consistency. Which of the following pairings of data stores best matches these requirements?

A.Order management: Azure Cosmos DB (NoSQL API) - Product catalog: Azure SQL Database - Session store: Azure Table Storage

B.Order management: Azure SQL Database - Product catalog: Azure Cosmos DB (NoSQL API) - Session store: Azure Cache for Redis

C.Order management: Azure Table Storage - Product catalog: Azure SQL Database - Session store: Azure Cosmos DB (NoSQL API)

D.Order management: Azure Cosmos DB (Table API) - Product catalog: Azure Cache for Redis - Session store: Azure SQL Database

AnswerB

Azure SQL Database provides strong ACID transactions for orders. Cosmos DB with NoSQL API offers flexible schema and low-latency reads for the product catalog. Azure Cache for Redis delivers sub-millisecond key-value lookups ideal for session state with eventual consistency.

Why this answer

Option B is correct because Azure SQL Database provides full ACID transaction support across multiple tables, making it ideal for order management. Azure Cosmos DB (NoSQL API) offers a flexible schema and high read throughput for the product catalog. Azure Cache for Redis delivers sub-millisecond key-value lookups with eventual consistency, perfect for session storage.

Exam trap

The trap here is that candidates often assume NoSQL databases like Cosmos DB can handle ACID transactions across multiple tables, but in reality, Cosmos DB only guarantees atomicity within a single document or stored procedure, not across separate containers or tables.

How to eliminate wrong answers

Option A is wrong because Azure Cosmos DB (NoSQL API) does not support multi-table ACID transactions across separate containers; it only offers single-document atomicity. Option C is wrong because Azure Table Storage lacks ACID transaction support across multiple tables, and Azure SQL Database is not optimized for flexible-schema, read-heavy product catalogs. Option D is wrong because Azure Cosmos DB (Table API) also lacks multi-table ACID transactions, Azure Cache for Redis is not designed for persistent, flexible-schema catalog storage, and Azure SQL Database is not suitable for low-latency key-value session stores with eventual consistency.

Full explanation →

640

MCQmedium

A smart building company stores sensor data from thousands of IoT devices as JSON documents in Azure Cosmos DB using the NoSQL API. Each document contains fields: deviceId (string), timestamp (datetime), temperature (float), humidity (float), and additional device-specific fields (e.g., motionDetected, CO2level). The most common query is: SELECT * FROM c WHERE c.deviceId = 'sensor-123' AND c.timestamp >= '2025-01-01' AND c.timestamp < '2025-02-01' ORDER BY c.timestamp DESC. Which indexing strategy will provide the best performance for this query?

A.Use the default indexing policy that automatically indexes all properties

B.Create a composite index on (deviceId ASC, timestamp DESC)

C.Disable indexing for all properties to speed up writes

D.Create a spatial index on the deviceId field

AnswerB

This composite index matches the query predicates: first seeks on deviceId equality, then efficiently performs a range scan on timestamp in descending order, avoiding an in-memory sort.

Why this answer

Option B is correct because the query filters on `deviceId` (equality) and `timestamp` (range with ORDER BY DESC). A composite index on `(deviceId ASC, timestamp DESC)` allows Cosmos DB to efficiently locate the partition for the device and then scan the timestamp range in descending order without an in-memory sort, minimizing RU consumption and latency.

Exam trap

The trap here is that candidates assume the default indexing policy is sufficient for all queries, but they miss that composite indexes are required to efficiently support queries that combine equality filters on one property with range filters and ORDER BY on another property.

How to eliminate wrong answers

Option A is wrong because the default indexing policy indexes all properties individually, which does not optimize the combined filter on `deviceId` and `timestamp` with an ORDER BY clause, leading to higher RU usage and potential full scans. Option C is wrong because disabling indexing entirely would force every query to perform a full sequential scan of all documents, dramatically increasing RU cost and latency, especially for range queries. Option D is wrong because a spatial index is designed for geospatial queries (e.g., ST_DISTANCE, ST_WITHIN) and has no relevance to filtering on `deviceId` and `timestamp`.

Full explanation →

641

MCQmedium

A data analyst needs to create interactive dashboards that display real-time data from Azure SQL Database. Which Microsoft tool should they use?

A.Microsoft Excel

B.Microsoft Copilot

C.Azure Data Studio

D.Power BI

AnswerD

Business analytics tool for real-time dashboards.

Why this answer

Power BI is the correct tool because it is designed specifically for creating interactive dashboards and reports, and it supports real-time data connectivity to Azure SQL Database through DirectQuery or streaming datasets. This allows the data analyst to visualize live data without manual refreshes, meeting the requirement for real-time dashboards.

Exam trap

The trap here is that candidates may confuse Azure Data Studio (a database management tool) with a visualization tool, or assume Microsoft Excel is sufficient for real-time dashboards, when Power BI is the only option that natively supports interactive, real-time visualizations with Azure SQL Database.

How to eliminate wrong answers

Option A is wrong because Microsoft Excel is a spreadsheet application that can connect to Azure SQL Database but lacks native support for real-time interactive dashboards; it requires manual data refresh or Power Query, and its visualization capabilities are limited compared to dedicated BI tools. Option B is wrong because Microsoft Copilot is an AI assistant integrated into various Microsoft products (like Power BI or Azure) to help generate content or code, but it is not a standalone tool for creating dashboards or connecting to live data sources. Option C is wrong because Azure Data Studio is a cross-platform database management and query tool for Azure SQL Database, primarily used for writing T-SQL queries, managing databases, and developing scripts; it does not provide dashboard or real-time visualization capabilities.

Full explanation →

642

MCQeasy

A data engineer is classifying data types collected from three sources for a data lake. Source 1: Customer records from a SQL database exported as CSV files with fixed columns (CustomerID, Name, Address). Source 2: Product reviews obtained via API as JSON documents with varying fields (e.g., some reviews include 'rating' and 'verified_purchase', others include 'comment'). Source 3: Scanned handwritten order forms saved as TIFF images. Which statement correctly categorizes these data by structure?

A.Source 1: Structured; Source 2: Semi-structured; Source 3: Unstructured

B.Source 1: Structured; Source 2: Structured; Source 3: Unstructured

C.Source 1: Semi-structured; Source 2: Structured; Source 3: Unstructured

D.Source 1: Structured; Source 2: Unstructured; Source 3: Semi-structured

AnswerA

Correct. CSV files with fixed columns are structured. JSON with varying fields is semi-structured. TIFF images are unstructured.

Why this answer

Option A is correct because Source 1 (CSV from SQL) has a fixed schema with defined columns, making it structured data. Source 2 (JSON from API) allows varying fields per document, which is the hallmark of semi-structured data. Source 3 (TIFF images) contains no inherent schema or machine-readable structure, classifying it as unstructured data.

Exam trap

The trap here is that candidates confuse CSV files (which are structured when they have a fixed schema) with semi-structured data, or assume JSON is always structured because it has key-value pairs, ignoring that varying fields make it semi-structured.

How to eliminate wrong answers

Option B is wrong because it incorrectly classifies Source 2 (JSON with varying fields) as structured, ignoring that JSON documents with optional or varying fields do not enforce a rigid schema like a SQL table. Option C is wrong because it mislabels Source 1 (CSV with fixed columns) as semi-structured, whereas CSV with a consistent schema is structured, and it also mislabels Source 2 as structured instead of semi-structured. Option D is wrong because it classifies Source 2 (JSON) as unstructured, but JSON has key-value pairs and a defined format, making it semi-structured, and it mislabels Source 3 (TIFF images) as semi-structured, but images lack any inherent data structure.

Full explanation →

643

MCQmedium

A data analyst needs to run ad-hoc SQL queries on terabytes of CSV files stored in Azure Data Lake Storage Gen2. The queries are infrequent and unpredictable. The analyst wants to pay only for the amount of data processed by each query, and does not want to manage any compute or storage infrastructure. Which Azure service should they use?

A.Azure Synapse Analytics dedicated SQL pool

B.Azure Data Factory

C.Azure Synapse Serverless SQL pool

D.Azure Analysis Services

AnswerC

Serverless SQL pool allows on-demand SQL querying of data in the data lake, paying only for the data processed per query, with zero infrastructure management.

Why this answer

Azure Synapse Serverless SQL pool (C) is the correct choice because it allows querying data in Azure Data Lake Storage Gen2 using T-SQL without provisioning any compute resources. It charges per terabyte of data processed, making it ideal for infrequent, unpredictable ad-hoc queries, and it eliminates infrastructure management.

Exam trap

The trap here is that candidates often confuse Azure Synapse Serverless SQL pool with Azure Synapse Analytics dedicated SQL pool, assuming both require provisioning compute, but the serverless option is specifically designed for on-demand, pay-per-query workloads with no infrastructure management.

How to eliminate wrong answers

Option A is wrong because Azure Synapse Analytics dedicated SQL pool requires provisioning and managing dedicated compute resources, incurring costs even when idle, which contradicts the pay-per-query and no-management requirements. Option B is wrong because Azure Data Factory is an orchestration and ETL service, not a SQL query engine; it cannot run ad-hoc SQL queries directly on CSV files in Data Lake Storage Gen2. Option D is wrong because Azure Analysis Services is an OLAP engine for semantic models and pre-aggregated data, not designed for direct querying of raw CSV files with T-SQL, and it requires managing a dedicated server instance.

Full explanation →

644

MCQeasy

A company stores customer data in a SQL table with fixed columns (CustomerID, Name, Email, SignupDate). They also store product images as JPEG files and application logs as JSON documents. Which of the following correctly classifies each data type?

A.SQL table: structured, JPEG: unstructured, JSON: semi-structured

B.SQL table: structured, JPEG: semi-structured, JSON: unstructured

C.SQL table: semi-structured, JPEG: unstructured, JSON: structured

D.SQL table: unstructured, JPEG: structured, JSON: semi-structured

AnswerA

Correct. SQL tables have a fixed schema (structured), JPEG files have no inherent schema (unstructured), and JSON documents have a flexible schema (semi-structured).

Why this answer

Option A is correct because a SQL table with fixed columns enforces a rigid schema, making it structured data. JPEG files are binary blobs with no internal schema, classifying them as unstructured. JSON documents use key-value pairs with flexible schemas, which is the definition of semi-structured data.

Exam trap

The trap here is confusing semi-structured data (like JSON) with unstructured data (like images), or assuming that any file format with a standard (like JPEG) is semi-structured, when in fact JPEG is purely binary and unstructured.

How to eliminate wrong answers

Option B is wrong because it incorrectly classifies JPEG as semi-structured (JPEG is binary and lacks schema) and JSON as unstructured (JSON has a flexible schema, making it semi-structured). Option C is wrong because it classifies the SQL table as semi-structured (SQL tables with fixed columns are structured, not semi-structured) and JSON as structured (JSON is semi-structured, not rigidly structured). Option D is wrong because it classifies the SQL table as unstructured (SQL tables are highly structured) and JPEG as structured (JPEG files have no schema).

Full explanation →

645

MCQeasy

A hospital stores patient records. Each record includes a PatientID (integer), Name (text), DateOfBirth (date), and MRI scan images (binary files). Which classification best describes the MRI scan images?

A.Structured data

B.Semi-structured data

C.Unstructured data

D.Streaming data

AnswerC

Unstructured data has no predefined data model. Binary files like images, videos, and audio files fall into this category.

Why this answer

MRI scan images are binary files that lack a predefined data model or schema, making them unstructured data. Unlike structured data (e.g., rows in a SQL table) or semi-structured data (e.g., JSON with tags), binary image files cannot be easily queried or organized using traditional relational database tools without additional processing.

Exam trap

Microsoft often tests the misconception that any data stored in a database (e.g., as a BLOB) is structured, but the classification depends on the data's internal format, not its storage location.

How to eliminate wrong answers

Option A is wrong because structured data requires a fixed schema with rows and columns, such as a PatientID integer in a relational table, which does not apply to binary image files. Option B is wrong because semi-structured data has organizational properties like tags or key-value pairs (e.g., JSON or XML), whereas MRI images are raw binary blobs without inherent metadata structure. Option D is wrong because streaming data refers to continuous data flows from sources like IoT sensors or log streams, not static binary files stored in a database.

Full explanation →

646

MCQeasy

A company uses Azure Synapse Analytics to run large-scale batch processing jobs every night. The jobs currently take 6 hours to complete, but the business requires completion within 4 hours. Which action should the company take to improve job performance?

A.Replace PolyBase with Azure Data Factory for data movement.

B.Migrate from serverless SQL pool to dedicated SQL pool.

C.Move the underlying data to Azure Data Lake Storage Gen2.

D.Increase the data warehouse units (DWUs) for the dedicated SQL pool.

AnswerD

Scaling up DWUs allocates more compute resources, reducing job duration.

Why this answer

Increasing the data warehouse units (DWUs) for the dedicated SQL pool scales the compute resources (CPU, memory, and I/O bandwidth) available to the Synapse SQL pool. This directly reduces the execution time of batch processing jobs by allowing more parallel processing, enabling the 6-hour job to complete within the required 4-hour window.

Exam trap

The trap here is that candidates confuse storage optimization (e.g., moving to ADLS Gen2) with compute scaling, or assume that changing data movement tools (PolyBase vs. Data Factory) will fix performance, when the core issue is insufficient compute capacity for the batch workload.

How to eliminate wrong answers

Option A is wrong because replacing PolyBase with Azure Data Factory does not inherently improve query performance; PolyBase is used for high-performance data loading and querying external data, while Data Factory is an orchestration tool—neither addresses the compute bottleneck causing the slow batch jobs. Option B is wrong because migrating from serverless SQL pool to dedicated SQL pool would change the architecture but does not guarantee faster performance without scaling; serverless SQL pool is designed for ad-hoc queries and small-scale processing, not for large-scale batch jobs that require dedicated, scalable compute resources. Option C is wrong because moving data to Azure Data Lake Storage Gen2 improves storage performance and scalability but does not directly accelerate query execution within Synapse Analytics; the bottleneck is compute capacity, not storage location.

Full explanation →

647

MCQmedium

A smart home company stores sensor readings from thousands of devices in Azure Cosmos DB. Each reading includes a deviceID, timestamp (ISO format), sensor type, and value. The most common query retrieves all readings for a specific device within a time range. To minimize Request Units (RU) consumption and ensure even data distribution, which property should be chosen as the partition key?

A.A) deviceID

B.B) timestamp

C.C) sensor type

D.D) value

AnswerA

deviceID has high cardinality (many unique values) and is always used in the query filter, leading to efficient partition routing and even distribution.

Why this answer

DeviceID is the correct partition key because it is the primary filter in the most common query (all readings for a specific device within a time range). Partitioning by deviceID ensures that all readings for a single device are stored in the same logical partition, making queries highly efficient by targeting a single partition. It also provides even data distribution across physical partitions, as thousands of devices will have roughly equal numbers of readings, minimizing RU consumption.

Exam trap

The trap here is that candidates often choose timestamp because they think it naturally orders data by time, but they overlook that the most common query filters by deviceID first, and using timestamp as the partition key would cause cross-partition queries for every device-specific time range, dramatically increasing RU costs.

How to eliminate wrong answers

Option B (timestamp) is wrong because using timestamp as the partition key would cause all readings with the same timestamp (e.g., same second) to land in the same partition, creating hot spots and uneven distribution, and queries for a specific device would need to fan out across many partitions. Option C (sensor type) is wrong because sensor types are typically few (e.g., temperature, humidity), leading to a small number of large partitions (hot partitions) and poor query performance for device-specific queries. Option D (value) is wrong because values are highly varied and not used as a filter in the common query, making it a poor choice for partition key—it would scatter each device's data across many partitions, increasing RU consumption for range queries.

Full explanation →

648

MCQeasy

Your organization has a large dataset of customer transactions stored in Azure Blob Storage as CSV files. You need to run ad-hoc SQL queries on this data without loading it into a database. Which Azure service should you use?

A.Azure Data Factory

B.Azure SQL Database

C.Azure Synapse Serverless SQL pool

D.Azure Analysis Services

AnswerC

Serverless SQL pool queries data directly from Blob Storage using T-SQL, with no loading required.

Why this answer

Azure Synapse Serverless SQL pool allows you to query data directly from files in Azure Blob Storage using standard T-SQL syntax, without needing to load or move the data into a database. It uses a pay-per-query model and supports CSV, Parquet, and JSON formats, making it ideal for ad-hoc analytical queries over large datasets stored in data lakes.

Exam trap

The trap here is that candidates often confuse Azure Data Factory (a data movement/orchestration tool) with a query engine, or assume Azure SQL Database can query external files via PolyBase (which requires loading into external tables, not direct ad-hoc querying).

How to eliminate wrong answers

Option A is wrong because Azure Data Factory is an ETL and data orchestration service, not a SQL query engine; it cannot run ad-hoc SQL queries directly against files. Option B is wrong because Azure SQL Database requires data to be loaded into its relational storage before querying, which contradicts the requirement to query without loading. Option D is wrong because Azure Analysis Services is an OLAP engine for semantic models and multidimensional analysis, not designed for direct SQL queries over raw CSV files in Blob Storage.

Full explanation →

649

MCQmedium

A social media company stores user posts in Azure Cosmos DB. Each post document contains fields like postId, userId, content, timestamp, and an array of comments. The comments array can grow large (hundreds per post), and the application frequently retrieves a post without its comments to display in a feed. To optimize read performance and minimize request units (RU) consumption, which data modeling approach should the company adopt?

A.A. Store comments in a separate container to isolate the data.

B.B. Store comments as separate documents and reference them from the post document via a comments array of IDs.

C.C. Use a vertical partition within the same document to separate the comments array.

D.D. Migrate the data to Azure SQL Database to use normalized tables and indexes.

AnswerB

This approach decouples comments from the post document. When retrieving a post for the feed, the application reads only the post document, avoiding the large comments array. This reduces RU consumption and improves latency. Comments can be loaded on demand when needed.

Why this answer

Option B is correct because storing comments as separate documents and referencing them via an array of IDs in the post document allows the application to retrieve the post without comments in a single point read, consuming minimal request units (RUs). This avoids loading the large comments array when only the post metadata is needed for the feed, significantly reducing RU consumption and improving read performance in Azure Cosmos DB.

Exam trap

The trap here is that candidates may think embedding the comments array is always optimal for performance, but they overlook that reading the entire document with a large array wastes RUs when only the post metadata is needed, making reference-based modeling more efficient for this access pattern.

How to eliminate wrong answers

Option A is wrong because storing comments in a separate container would require cross-container queries or application-level joins, increasing RU cost and latency, and losing the benefit of document co-location. Option C is wrong because Azure Cosmos DB does not support vertical partitions within a document; the comments array is already part of the document, and separating it logically does not reduce RU consumption when reading the entire document. Option D is wrong because migrating to Azure SQL Database is unnecessary and contradicts the requirement to optimize non-relational data; it would introduce schema rigidity and higher latency for the social media use case.

Full explanation →

650

MCQhard

You are designing a multi-tenant SaaS application using Azure SQL Database. Each tenant has its own database. You need to perform maintenance across all databases efficiently. Which feature should you use?

A.Elastic pools

B.Failover groups

C.SQL Server Agent

D.Elastic Jobs

AnswerD

Elastic Jobs automate administrative tasks across multiple databases.

Why this answer

Option C is correct because Elastic Jobs allows executing T-SQL scripts across multiple databases. Option A is wrong because elastic pools are for resource sharing, not for maintenance. Option B is wrong because failover groups are for high availability.

Option D is wrong because SQL Agent is not available in Azure SQL Database.

Full explanation →

651

MCQhard

Your company has an Azure SQL Database that stores customer orders. You notice that long-running reports are causing blocking on the transactional tables. Which approach would minimize impact on transaction processing while still allowing reporting?

A.Use the READ UNCOMMITTED isolation level for reporting queries

B.Increase the service tier to Business Critical

C.Shard the database by customer region

D.Create a read-only replica in Azure SQL Database Hyperscale

AnswerD

Offloads reporting to a separate replica.

Why this answer

Option C is correct because creating a read-only replica offloads reporting queries. Option A is wrong because increasing database throughput does not eliminate blocking. Option B is wrong because READ UNCOMMITTED might cause dirty reads.

Option D is wrong because distributing data across databases (sharding) adds complexity and may not solve blocking.

Full explanation →

652

MCQmedium

A company wants to analyze customer feedback from surveys and social media. The data includes both structured (ratings) and unstructured (comments) text. They plan to use Azure Cognitive Services for sentiment analysis. Which service should they use for text analytics?

A.Azure Synapse Analytics

B.Azure Cosmos DB

C.Azure AI Language

D.Azure Machine Learning

AnswerC

Provides pre-built sentiment analysis for text.

Why this answer

Azure AI Language (formerly part of Azure Cognitive Services) provides pre-built text analytics capabilities, including sentiment analysis, key phrase extraction, and language detection. This service is specifically designed to process unstructured text data like survey comments and social media posts, making it the correct choice for analyzing customer feedback.

Exam trap

The trap here is that candidates may confuse Azure Synapse Analytics or Azure Machine Learning as general-purpose analytics tools, overlooking that Azure AI Language is the dedicated, pre-built service for text analytics tasks like sentiment analysis.

How to eliminate wrong answers

Option A is wrong because Azure Synapse Analytics is a big data analytics platform for data warehousing and data integration, not a service for performing sentiment analysis on text. Option B is wrong because Azure Cosmos DB is a NoSQL database service for storing and querying structured and semi-structured data, not a text analytics service. Option D is wrong because Azure Machine Learning is a platform for building, training, and deploying custom machine learning models, which is overkill and not the pre-built service designed for sentiment analysis.

Full explanation →

653

Multi-Selecthard

Which THREE of the following are valid considerations when choosing between Azure Blob Storage and Azure Data Lake Storage Gen2 for a big data analytics workload?

Select 3 answers

A.ADLS Gen2 can be optimized for high-throughput analytics workloads

B.ADLS Gen2 supports a hierarchical namespace for folder-level organization

C.Blob Storage provides POSIX-compliant access control lists (ACLs)

D.ADLS Gen2 cannot use Blob Storage APIs

E.Blob Storage supports lifecycle management policies

AnswersA, B, E

ADLS Gen2 is designed for big data analytics with high throughput.

Why this answer

ADLS Gen2 supports a hierarchical namespace, POSIX-like permissions, and is cost-effective for both hot and cool tiers. Blob Storage lacks hierarchical namespace by default. Both support lifecycle management.

ADLS Gen2 can be used with Blob APIs but also has additional features.

Full explanation →

654

MCQhard

A company uses Azure SQL Database with the Business Critical service tier. They notice increased latency during peak hours. They need to improve performance without changing the application code. Which action should they take?

A.Increase the number of vCores

B.Add a read replica

C.Enable auto-pause

D.Change the service tier to General Purpose

AnswerA

Scaling up provides more CPU/memory to handle peak load.

Why this answer

Option B is correct because increasing the number of vCores (scale up) provides more resources to handle peak load without code changes. Option A is wrong because adding read replicas helps read scalability but not write latency. Option C is wrong because changing to General Purpose might reduce performance.

Option D is wrong because enabling auto-pause is for serverless, not for performance.

Full explanation →

655

MCQmedium

A mobile game stores player achievements in Azure Cosmos DB. Each player has a PlayerID, and achievements are stored as JSON documents with varying fields. The most common query retrieves all achievements for a specific player. To ensure low latency and efficient throughput, which property should be chosen as the partition key?

A.PlayerID

B.Timestamp

C.AchievementType

D.Region

AnswerA

PlayerID is the most common query filter, has high cardinality, and evenly distributes data across partitions, resulting in efficient queries and throughput.

Why this answer

PlayerID is the correct partition key because the most common query retrieves all achievements for a specific player, and partitioning on PlayerID ensures that all documents for a given player are stored in the same physical partition. This allows the query to target a single partition, minimizing cross-partition queries and providing low latency and efficient throughput.

Exam trap

The trap here is that candidates often choose a high-cardinality key like Timestamp without considering the query pattern, mistakenly thinking any unique value is good, but the partition key must align with the most frequent query filter to avoid cross-partition overhead.

How to eliminate wrong answers

Option B (Timestamp) is wrong because using Timestamp as the partition key would scatter each player's achievements across multiple partitions, forcing cross-partition queries for the common 'all achievements for a player' query, increasing latency and RU consumption. Option C (AchievementType) is wrong because it would group achievements of the same type together, but a player's achievements span multiple types, again requiring cross-partition queries to retrieve all achievements for a player. Option D (Region) is wrong because it is unrelated to the player-centric query pattern; it would distribute a single player's data across partitions based on region, causing the same cross-partition query issue.

Full explanation →

656

MCQmedium

A social media application stores user profile data as JSON documents. Each user's document has a different structure, with fields that vary based on user activity. The application needs to query these documents efficiently using SQL-like syntax and support high write throughput. Which Azure data store is most appropriate for this workload?

A.Azure SQL Database

B.Azure Blob Storage

C.Azure Cosmos DB

D.Azure Table Storage

AnswerC

Azure Cosmos DB is a globally distributed, multi-model NoSQL database that supports JSON documents natively. It allows flexible schemas, SQL-like querying, and high throughput, making it ideal for this scenario.

Why this answer

Azure Cosmos DB is the most appropriate choice because it natively supports storing and querying JSON documents with varying schemas, offers SQL-like query syntax via its core (SQL) API, and provides guaranteed low-latency reads/writes at any scale with automatic indexing of all fields. Its multi-model nature and configurable consistency levels make it ideal for high-throughput workloads like a social media application.

Exam trap

The trap here is that candidates often confuse Azure Table Storage's key-value capabilities with document database features, overlooking that Table Storage does not support JSON documents, nested fields, or SQL-like queries, whereas Cosmos DB is explicitly designed for such workloads.

How to eliminate wrong answers

Option A is wrong because Azure SQL Database requires a fixed relational schema, making it inefficient for storing JSON documents with varying structures; while it supports JSON functions, it is not optimized for high write throughput on schema-less data. Option B is wrong because Azure Blob Storage is designed for unstructured binary or text data (like images or logs) and does not support SQL-like querying of individual JSON documents or efficient point queries on document fields. Option D is wrong because Azure Table Storage is a key-value store that does not support JSON documents natively, lacks SQL-like query syntax, and requires a flat schema with predefined partition and row keys, making it unsuitable for querying nested JSON fields.

Full explanation →

657

MCQeasy

You have an Azure Blob Storage container configured with the JSON snippet shown in the exhibit. What does the 'publicAccess' setting of 'Blob' allow?

A.Anonymous users can write blobs

B.No anonymous access is allowed

C.Anonymous users can list blobs in the container

D.Anonymous users can read blobs if they know the blob URL

AnswerD

'Blob' level allows anonymous read access to individual blobs, but not listing.

Why this answer

The 'Blob' level of public access allows anonymous read access to blobs only; container metadata is not accessible. 'Container' level would allow anonymous listing of blobs. 'None' disables public access. 'Storage' is not a valid value.

Full explanation →

658

MCQhard

You are designing a solution to store large binary files (up to 100 GB each) that are frequently read but rarely updated. The data must be accessible via HTTPS and support concurrent reads. Which Azure data store should you use?

A.Azure Files

B.Azure Cosmos DB

C.Azure NetApp Files

D.Azure Blob Storage

AnswerD

Supports large blobs, HTTPS access, and concurrent reads.

Why this answer

Option B is correct because Azure Blob Storage supports large blobs (up to 190.7 TiB) and is optimized for read-heavy workloads with HTTPS access and concurrent reads. Option A is wrong because Azure Files has a maximum file size of 4 TiB and is designed for file shares, not large binary blobs. Option C is wrong because Cosmos DB is for NoSQL transactional data, not large binary files.

Option D is wrong because Azure NetApp Files is for high-performance file workloads, but more complex and expensive for simple blob storage.

Full explanation →

659

MCQmedium

You need to choose a data storage solution for a global e-commerce platform that requires single-digit millisecond read and write latencies across multiple regions. The data is semi-structured and includes user profiles and product catalogs. Which Azure service should you use?

A.Azure Redis Cache

B.Azure Cosmos DB

C.Azure Table Storage

D.Azure SQL Database

AnswerB

Azure Cosmos DB offers global distribution and low latency for semi-structured data.

Why this answer

Azure Cosmos DB is the correct choice because it is a globally distributed, multi-model database service that guarantees single-digit millisecond read and write latencies at the 99th percentile, regardless of the number of regions. It supports semi-structured data natively through its document (JSON) API, making it ideal for user profiles and product catalogs that require low-latency access across multiple geographic regions.

Exam trap

The trap here is that candidates often confuse Azure Redis Cache's in-memory speed with the need for persistent, globally distributed storage, overlooking that Redis Cache is not designed for durable, multi-region data storage with consistency guarantees.

How to eliminate wrong answers

Option A is wrong because Azure Redis Cache is an in-memory data store designed primarily for caching and session state, not for persistent, globally distributed storage of semi-structured data with multi-region write capabilities. Option C is wrong because Azure Table Storage is a NoSQL key-value store that offers only eventual consistency by default and does not provide guaranteed single-digit millisecond latencies across multiple regions or native global distribution. Option D is wrong because Azure SQL Database is a relational database that requires a fixed schema, making it less suitable for semi-structured data, and its global replication options (e.g., failover groups) do not guarantee single-digit millisecond latencies for writes across multiple regions.

Full explanation →

660

MCQhard

A company uses Azure Synapse Analytics dedicated SQL pool for large-scale data warehousing. They have a fact table with billions of rows and frequently run queries that filter by a date range and join with a product dimension table. Which table distribution and partitioning strategy will minimize data movement and improve query performance?

A.Round-robin distribution with no partitioning

B.Hash-distribute on ProductID with partitioning on Date

C.Replicate the fact table on all distributions and partition on ProductID

D.Hash-distribute on Date with partitioning on ProductID

AnswerB

Hash-distribution on ProductID co-locates rows with the same ProductID, enabling efficient joins with the product dimension. Partitioning on the Date column enables partition elimination for date range queries, reducing the amount of data scanned.

Why this answer

Hash-distributing the fact table on ProductID ensures that rows for the same product are co-located on the same distribution, minimizing data movement when joining with the product dimension table. Partitioning on Date allows partition elimination for date-range filters, reducing the amount of data scanned. This combination directly addresses the query pattern of date-range filtering and product joins.

Exam trap

The trap here is that candidates often confuse the roles of distribution and partitioning, thinking that partitioning on the join key (ProductID) will improve join performance, when in fact hash distribution on the join key is what co-locates data for joins, while partitioning on the filter column (Date) enables partition elimination.

How to eliminate wrong answers

Option A is wrong because round-robin distribution distributes rows evenly without any logical grouping, causing all joins to require data shuffling across distributions, which is highly inefficient for large fact tables. Option C is wrong because replicating a billion-row fact table on all distributions would consume excessive storage and memory, and partitioning on ProductID does not help with date-range filtering. Option D is wrong because hash-distributing on Date scatters rows for the same product across distributions, forcing data movement during the join with the product dimension table, and partitioning on ProductID does not enable partition elimination for date-range filters.

Full explanation →

661

MCQhard

A retail company uses Azure Data Lake Storage Gen2 to store raw clickstream data. They need to process this data using Azure Databricks to create hourly aggregated reports. The data pipeline must minimize costs while meeting a five-minute processing SLA. What is the most cost-effective compute option?

A.Use interactive clusters with autoscaling

B.Use job clusters with pool-based allocation

C.Use Azure Synapse Serverless SQL pools

D.Provision a dedicated SQL pool in Azure Synapse

AnswerB

Job clusters with pools reduce startup latency and cost, ideal for scheduled jobs.

Why this answer

Job clusters with pool-based allocation are the most cost-effective compute option for this scenario because job clusters are ephemeral—they start only when a job runs and terminate automatically after completion, eliminating idle costs. Pool-based allocation further reduces startup latency by maintaining a warm pool of pre-initialized VMs, enabling the pipeline to meet the five-minute SLA without paying for always-on compute.

Exam trap

The trap here is that candidates often confuse interactive clusters (always-on, for exploration) with job clusters (ephemeral, for automation), and assume that any 'pool' feature increases cost rather than reducing it, leading them to incorrectly select interactive clusters with autoscaling as the cheaper option.

How to eliminate wrong answers

Option A is wrong because interactive clusters are designed for ad-hoc exploration and remain running until manually terminated, incurring continuous costs even when idle, which contradicts the cost-minimization requirement. Option C is wrong because Azure Synapse Serverless SQL pools are a query engine for data lakes, not a compute option for running Databricks jobs; they cannot execute Databricks notebooks or Spark transformations. Option D is wrong because provisioning a dedicated SQL pool in Azure Synapse is a provisioned, always-on resource that incurs high fixed costs and is not designed for Databricks-based processing, making it unsuitable for a cost-sensitive, batch-oriented pipeline.

Full explanation →

662

MCQhard

A company uses Azure Table storage to store session state for a web application. They notice that read latency increases during peak hours. Which design change should they implement to reduce latency?

A.Change to Azure Blob storage

B.Store large attributes in a separate table

C.Switch to Azure Queue storage

D.Use a partition key that distributes load evenly, such as UserID

AnswerD

Even distribution avoids hot partitions and reduces latency.

Why this answer

Option D is correct because Azure Table storage partitions data based on the partition key. Using a partition key that distributes load evenly, such as UserID, ensures that read requests are spread across multiple partition servers, preventing hot partitions and reducing latency during peak hours.

Exam trap

The trap here is that candidates may confuse Azure Table storage with other Azure storage services (Blob, Queue) or focus on data size optimization (Option B) instead of understanding how partition key design directly impacts read performance in a partitioned NoSQL store.

How to eliminate wrong answers

Option A is wrong because Azure Blob storage is designed for unstructured data (e.g., images, videos) and does not provide the low-latency, key-value access pattern needed for session state. Option B is wrong because storing large attributes in a separate table does not address the root cause of read latency—it may even increase complexity and latency due to additional table lookups. Option C is wrong because Azure Queue storage is a messaging service for asynchronous communication, not a low-latency storage solution for session state reads.

Full explanation →

663

MCQmedium

A data analyst needs to create a real-time dashboard in Power BI that displays sales data from an Azure SQL Database. The dashboard must update every 10 minutes without manual refresh. Which Power BI feature should they use?

A.DirectQuery mode

B.Scheduled refresh with Import mode

C.Streaming datasets

D.Paginated reports

AnswerA

DirectQuery queries the source directly, enabling near real-time updates.

Why this answer

DirectQuery mode is correct because it allows Power BI to query the Azure SQL Database directly for each visual interaction, ensuring the dashboard reflects the latest data without requiring a manual refresh. Since the requirement is for updates every 10 minutes, DirectQuery can be configured to auto-refresh at that interval via the 'Automatic page refresh' setting, which sends T-SQL queries to the database on a timer. This avoids the latency and storage overhead of importing data, making it ideal for near-real-time monitoring.

Exam trap

The trap here is that candidates confuse 'real-time dashboard' with 'streaming datasets' (Option C), but streaming datasets require a push-based architecture, not a pull from an existing database like Azure SQL Database, which DirectQuery handles natively.

How to eliminate wrong answers

Option B (Scheduled refresh with Import mode) is wrong because Import mode requires a scheduled refresh (minimum 30 minutes for shared capacity, 1 minute for Premium) and stores a copy of the data in Power BI, which introduces latency and does not support sub-10-minute updates without Premium. Option C (Streaming datasets) is wrong because streaming datasets are designed for real-time data ingestion from sources like Azure Stream Analytics or IoT Hub, not for querying an existing Azure SQL Database; they require pushing data via the Power BI REST API, not pulling from a database. Option D (Paginated reports) is wrong because paginated reports are intended for pixel-perfect, print-ready layouts (e.g., invoices) and do not support automatic, timer-based dashboard updates; they require manual refresh or subscription-based rendering.

Full explanation →

664

MCQmedium

A ride-sharing application uses Azure Cosmos DB for trip data. Each trip record contains TripID (unique), DriverID, RiderID, TripDate, and other details. The most common query retrieves all trips for a specific driver within a given date range. Which partition key should be chosen to minimize Request Unit (RU) consumption and ensure even data distribution?

A.TripID

B.DriverID

C.TripDate

D.RiderID

AnswerB

DriverID aligns with the most common query pattern. All trips for a given driver are stored together, allowing single-partition queries. This minimizes RU consumption if the number of trips per driver is within the 20 GB logical partition limit.

Why this answer

DriverID is the optimal partition key because the most common query filters on DriverID and a date range. Partitioning by DriverID ensures that all trips for a specific driver are stored in the same physical partition, making the query a single-partition operation that consumes minimal Request Units (RUs). It also provides even data distribution across partitions because each driver generates a roughly similar number of trips, avoiding hot spots.

Exam trap

The trap here is that candidates often pick TripDate because it seems logical for date-range queries, but they overlook that the primary filter is DriverID, and partitioning by TripDate would cause cross-partition queries and potential hot spots on high-traffic dates.

How to eliminate wrong answers

Option A is wrong because TripID is unique per trip, which would cause each query to fan out across all partitions (cross-partition query), increasing RU consumption and latency. Option C is wrong because TripDate can lead to hot partitions (e.g., all trips on a single day hitting one partition) and does not directly support the primary filter on DriverID, forcing cross-partition queries. Option D is wrong because RiderID is not used in the most common query filter, so partitioning by RiderID would still require a cross-partition query to find trips by DriverID, wasting RUs.

Full explanation →

665

MCQhard

A retail company processes petabytes of sales transaction data stored in Azure Data Lake Storage Gen2. They need to run recurring complex queries that involve large joins and aggregations. The queries must consistently complete within a fixed time window overnight. The company wants predictable performance and costs. Which Azure service should they use?

A.Azure Synapse Analytics Serverless SQL pool

B.Azure Synapse Analytics Dedicated SQL pool

C.Azure SQL Database

D.Azure Analysis Services

AnswerB

Dedicated SQL pool provisions reserved compute resources, enabling consistent query performance and predictable costs for recurring, complex, large-scale analytics workloads like overnight batch processing.

Why this answer

Azure Synapse Analytics Dedicated SQL pool provides reserved, fixed compute resources that ensure predictable performance and cost for recurring complex queries involving large joins and aggregations. It is designed for petabyte-scale data warehousing workloads with consistent SLAs, making it ideal for overnight batch processing within a fixed time window.

Exam trap

The trap here is that candidates confuse serverless SQL pool's flexibility with dedicated SQL pool's predictability, overlooking that serverless is designed for ad-hoc exploration, not consistent, fixed-time batch processing.

How to eliminate wrong answers

Option A is wrong because Azure Synapse Analytics Serverless SQL pool is a pay-per-query service with variable performance that depends on data volume and concurrency, making it unsuitable for predictable, fixed-time workloads. Option C is wrong because Azure SQL Database is a transactional OLTP database not optimized for petabyte-scale analytics or complex, large-scale joins and aggregations. Option D is wrong because Azure Analysis Services is an OLAP engine for semantic models and in-memory analytics, not a direct query engine for raw petabyte-scale data in Data Lake Storage Gen2.

Full explanation →

666

MCQeasy

You need to query data stored in Azure Cosmos DB for NoSQL using SQL-like syntax. Which feature should you use?

A.Use Azure SQL Database elastic query

B.Use Power BI DirectQuery

C.Use the SQL API built into Cosmos DB

D.Use Azure Synapse Analytics Serverless SQL pool

AnswerC

Cosmos DB's SQL API allows querying JSON documents with SQL-like syntax.

Why this answer

Azure Cosmos DB for NoSQL provides a native SQL API that allows you to query JSON documents using SQL-like syntax. This API translates standard SQL queries into Cosmos DB's internal query engine, enabling you to SELECT, filter, and project data directly from containers without any additional services or connectors.

Exam trap

The trap here is that candidates may confuse Azure Synapse Analytics Serverless SQL pool (which can also query Cosmos DB) with the native Cosmos DB SQL API, but the question specifically asks for the feature built into Cosmos DB for NoSQL, not an external query service.

How to eliminate wrong answers

Option A is wrong because Azure SQL Database elastic query is used to query data across multiple Azure SQL databases, not for querying Cosmos DB NoSQL data. Option B is wrong because Power BI DirectQuery is a connection mode for real-time analytics from Power BI, not a feature for directly querying Cosmos DB with SQL-like syntax. Option D is wrong because Azure Synapse Analytics Serverless SQL pool can query Cosmos DB via the Synapse Link feature, but it is not the built-in SQL API of Cosmos DB itself and requires additional configuration.

Full explanation →

667

MCQhard

A retail company uses Azure Data Lake Storage Gen2 as a data lake and Azure Databricks for ETL. They notice that a Spark job reading Parquet files from the data lake fails with an 'Access Denied' error when the job runs as a service principal. The service principal has Storage Blob Data Contributor role on the storage account. What is the most likely cause?

A.The service principal needs the Storage Blob Data Owner role instead.

B.The service principal lacks the necessary permissions on the Data Lake Storage Gen2 endpoint (dfs.core.windows.net).

C.The storage account has a firewall enabled that blocks the Databricks cluster IP.

D.The service principal does not have the Storage Blob Data Contributor role assigned at the container level.

AnswerB

The 'Access Denied' error occurs when using the DFS endpoint without proper RBAC assignment for that endpoint.

Why this answer

Option B is correct because Azure Data Lake Storage Gen2 uses a hierarchical namespace and a separate endpoint (dfs.core.windows.net) for data plane operations. Even though the service principal has the Storage Blob Data Contributor role (which grants read/write permissions via the blob endpoint), the job may be accessing the data through the DFS endpoint, which requires explicit permissions on that endpoint. The 'Access Denied' error typically occurs when the service principal is not granted the necessary RBAC role at the storage account level for the DFS endpoint, or when the role assignment is not properly propagated.

Exam trap

The trap here is that candidates often assume the Storage Blob Data Contributor role grants full access to all data plane operations, but they overlook that Azure Data Lake Storage Gen2 requires explicit permissions on the DFS endpoint for hierarchical namespace operations, which is a common point of failure in Spark-based ETL jobs.

How to eliminate wrong answers

Option A is wrong because the Storage Blob Data Owner role is not required; the Storage Blob Data Contributor role already provides read/write/delete permissions on data, and the issue is about endpoint-specific access, not role level. Option C is wrong because the question does not mention any firewall configuration, and a firewall blocking the Databricks cluster IP would cause a different error (e.g., network connectivity failure) rather than an 'Access Denied' error specifically tied to the service principal. Option D is wrong because the Storage Blob Data Contributor role is assigned at the storage account level, which by default applies to all containers; the error is not about container-level assignment but about the service principal lacking permissions on the DFS endpoint.

Full explanation →

668

MCQhard

Refer to the exhibit. You are configuring a custom role in Azure RBAC for a team that needs to read and list blobs in a storage account. The JSON snippet shows the permissions assigned. After assigning this role to a user, they report they cannot see the storage account in the Azure portal. What is the most likely cause?

A.The dataActions should be actions instead of dataActions.

B.The role does not include read permission on the storage account resource.

C.The role is not assigned at the subscription scope.

D.The user needs the Contributor role to view the storage account.

AnswerB

Missing Microsoft.Storage/storageAccounts/read action prevents seeing the account.

Why this answer

Option C is correct because the role only includes dataActions for blob read, but lacks read access to the storage account resource itself (e.g., Microsoft.Storage/storageAccounts/read). To see the storage account in the portal, the user needs read permissions on the resource. Option A is wrong because the role is scoped to the storage account, not subscription level.

Option B is wrong because portal access does not require Contributor role; Reader role suffices. Option D is wrong because dataActions are correctly used for blob-level permissions.

Full explanation →

669

Drag & Drophard

A company is building a modern data warehouse on Azure using a lakehouse approach. Arrange the following steps in the correct order to implement a typical pipeline that starts with raw data ingestion and ends with business reporting.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps

Order

1Step 1

2Step 2

3Step 3

4Step 4

Why this order

The lakehouse pipeline starts with raw data ingestion, followed by transformation and loading into a curated layer, then creating tables for querying, and finally reporting with Power BI.

Full explanation →

670

MCQeasy

Your company is implementing a data governance solution using Microsoft Purview. The data catalog must automatically scan and classify sensitive data in Azure SQL Database, Azure Synapse Analytics, and Amazon S3. The company uses Microsoft Entra ID for identity management. You need to ensure that the Purview managed identity can authenticate to these data sources. Which authentication method should you configure for the Amazon S3 connection?

A.AWS IAM authentication

B.SQL Authentication

C.Windows Authentication

D.Microsoft Entra ID authentication

AnswerA

Amazon S3 uses AWS IAM for authentication. The Purview managed identity must be granted access via an IAM role.

Why this answer

Amazon S3 is an external cloud storage service that does not support Microsoft Entra ID, SQL Authentication, or Windows Authentication. To authenticate Purview's managed identity to S3, you must configure AWS IAM authentication, which allows Purview to assume an IAM role with permissions to read the S3 bucket metadata and data for scanning and classification.

Exam trap

The trap here is that candidates may assume Microsoft Entra ID authentication works for all data sources because the question mentions Entra ID for identity management, but Amazon S3 is an AWS service that requires AWS IAM, not Microsoft's identity system.

How to eliminate wrong answers

Option B (SQL Authentication) is wrong because SQL Authentication is used for Azure SQL Database and Azure Synapse Analytics, not for Amazon S3, which is a non-relational object store. Option C (Windows Authentication) is wrong because Windows Authentication is only applicable to on-premises SQL Server or Azure services integrated with Active Directory, not to AWS S3. Option D (Microsoft Entra ID authentication) is wrong because Amazon S3 does not support Microsoft Entra ID; it uses AWS IAM for identity and access management.

Full explanation →

671

MCQeasy

A media company stores user profile images in Azure Blob Storage. Regulators require that the images cannot be deleted or overwritten for a period of 90 days after upload. Which Azure Blob Storage feature should the company enable to meet this requirement?

A.A: Soft delete

B.B: Immutable storage with a time-based retention policy

C.C: Access tiers (Hot, Cool, Archive)

D.D: Lifecycle management rules

AnswerB

Immutable storage enforces a write-once-read-many (WORM) state, preventing any modification or deletion during the retention period, exactly as required.

Why this answer

Immutable storage with a time-based retention policy (also known as WORM – Write Once, Read Many) prevents blobs from being deleted or overwritten for a specified retention interval. By setting a 90-day policy, the company ensures that user profile images remain unmodifiable and undeletable during that period, directly satisfying the regulatory requirement.

Exam trap

The trap here is that candidates often confuse soft delete (which only recovers deleted blobs) with immutable storage (which prevents both deletion and overwrite during the retention period), leading them to choose soft delete when the requirement explicitly prohibits overwrites as well.

How to eliminate wrong answers

Option A is wrong because soft delete only protects blobs from accidental deletion by retaining them for a configurable period after deletion, but it does not prevent overwrites or guarantee immutability for a fixed duration. Option C is wrong because access tiers (Hot, Cool, Archive) control storage cost and retrieval latency based on data access patterns, but they offer no protection against deletion or overwrite. Option D is wrong because lifecycle management rules automate transitions between access tiers or deletion based on age or conditions, but they do not enforce a write-once, read-many (WORM) state that blocks modifications or deletions.

Full explanation →

672

Multi-Selecthard

Which THREE are benefits of using Azure SQL Database serverless compute tier?

Select 3 answers

A.Guaranteed high availability with 99.99% SLA

B.Billing per second for compute usage

C.Auto-scaling compute based on workload

D.Auto-pause during periods of inactivity

E.Ideal for high-throughput, latency-sensitive applications

AnswersB, C, D

Billed per second of compute usage.

Why this answer

Options A, B, and E are correct. Serverless auto-scales, pauses during inactivity, and charges per second. Option C is wrong because serverless does not provide a fixed SLA for compute.

Option D is wrong because serverless is not designed for high throughput workloads; it may have cold start delays.

Full explanation →

673

MCQeasy

Refer to the exhibit. You have a CSV file stored in Azure Blob Storage. You want to query this file using Azure Synapse Serverless SQL. Which OPENROWSET option should you use?

A.FORMAT = 'JSON'

B.FORMAT = 'PARQUET'

C.FORMAT = 'CSV'

D.FORMAT = 'DELTA'

AnswerC

CSV format is correct for CSV files.

Why this answer

Option D is correct because FORMAT = 'CSV' specifies the file format for CSV files. Option A is wrong because FORMAT = 'JSON' is for JSON files. Option B is wrong because FORMAT = 'PARQUET' is for Parquet files.

Option C is wrong because FORMAT = 'DELTA' is for Delta Lake.

Full explanation →

674

MCQmedium

A data engineering team needs to build a pipeline that ingests streaming data from IoT devices into Azure Data Lake Storage Gen2. The data arrives as JSON messages. They want to use a service that can capture the streaming data in near real-time and store it as files in the data lake without writing custom code for the ingestion. Which Azure service should they use?

A.Azure Data Factory

B.Azure Event Hubs with Capture

C.Azure Stream Analytics

D.Azure Synapse Pipelines

AnswerB

Event Hubs Capture automatically writes the incoming stream of events to Azure Data Lake Storage in near real-time without any custom code, meeting the requirement exactly.

Why this answer

Azure Event Hubs with Capture is the correct choice because it natively ingests streaming JSON data from IoT devices in near real-time and automatically writes the data to Azure Data Lake Storage Gen2 as files without requiring any custom code. The Capture feature automatically persists the event stream to the specified storage destination at defined time or size intervals, making it ideal for serverless, code-free ingestion.

Exam trap

The trap here is that candidates often confuse Azure Stream Analytics as the primary ingestion service for raw data capture, when in fact Stream Analytics is a processing engine that requires a query and output sink, whereas Event Hubs Capture provides direct, code-free persistence of raw streams.

How to eliminate wrong answers

Option A is wrong because Azure Data Factory is a code-free ETL orchestration service for batch data movement and transformation, not designed for real-time streaming ingestion from IoT devices. Option C is wrong because Azure Stream Analytics is a real-time analytics and processing engine that requires a query to transform data before output, and it does not natively capture raw streaming data to files without custom code. Option D is wrong because Azure Synapse Pipelines is built on Azure Data Factory and inherits the same batch-oriented orchestration limitations, lacking native real-time streaming capture capabilities.

Full explanation →

675

Matchingmedium

Match each Azure data consistency model to its description.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts

Matches

Reads always see the latest write

Reads may lag behind writes by up to K versions or T time

Consistent reads within a client session

Reads never see out-of-order writes

No ordering guarantee, eventually consistent

Why these pairings

Azure Cosmos DB offers five consistency levels.

Full explanation →

Microsoft Azure Data Fundamentals DP-900 (DP-900) — Questions 601–675