Microsoft Azure Data Fundamentals DP-900 DP-900 Questions 301–375 | Page 5/14

301

MCQeasy

A company uses Azure Table Storage to store user session data. The data must be encrypted at rest. What should you do?

A.No action is required; Azure Storage encrypts data at rest by default.

B.Enable Azure Storage Service Encryption (SSE).

C.Use Azure SQL Database Transparent Data Encryption (TDE).

D.Implement client-side encryption before storing data.

AnswerA

Azure Storage automatically encrypts all data.

Why this answer

Azure Table Storage, as part of Azure Storage, automatically encrypts all data at rest using Azure Storage Service Encryption (SSE) with 256-bit AES encryption. This encryption is enabled by default for all new and existing storage accounts, including Table Storage, and cannot be disabled. Therefore, no additional action is required to meet the encryption-at-rest requirement.

Exam trap

The trap here is that candidates may think encryption at rest requires explicit configuration (like enabling SSE or TDE), not realizing that Azure Storage encrypts all data at rest by default, making options B, C, and D unnecessary or incorrect for this specific scenario.

How to eliminate wrong answers

Option B is wrong because Azure Storage Service Encryption (SSE) is already enabled by default for all Azure Storage accounts, including Table Storage; explicitly enabling it is unnecessary and redundant. Option C is wrong because Transparent Data Encryption (TDE) is a feature specific to Azure SQL Database and SQL Server, not applicable to Azure Table Storage, which is a non-relational, key-value store. Option D is wrong because client-side encryption is an optional, additional layer of security for scenarios requiring end-to-end encryption, but it is not required to achieve encryption at rest, which is already handled server-side by Azure Storage by default.

Full explanation →

302

Multi-Selectmedium

Which TWO are valid use cases for Azure Stream Analytics?

Select 2 answers

A.Building and training a machine learning model

B.Orchestrating complex data pipelines with dependencies

C.Real-time fraud detection on credit card transactions

D.Processing IoT sensor data and alerting when thresholds are exceeded

E.Batch processing of historical sales data

AnswersC, D

Stream Analytics can process streaming transactions in real time to detect fraud.

Why this answer

Azure Stream Analytics is a real-time event processing engine designed for analyzing high volumes of fast-moving streaming data. Option C is correct because Stream Analytics can process credit card transactions in real time, applying pattern matching and anomaly detection to identify potentially fraudulent activity as it occurs.

Exam trap

The trap here is that candidates confuse real-time stream processing (Stream Analytics) with batch processing (Azure Synapse) or pipeline orchestration (Azure Data Factory), leading them to select options that describe different Azure services.

Full explanation →

303

MCQeasy

Refer to the exhibit. You are designing a fact table for a data warehouse. The table will store sales transactions with daily granularity. Which column would be most appropriate as the distribution column in a hash-distributed table in Azure Synapse Analytics?

A.SalesAmount

B.CustomerKey

C.ProductKey

D.OrderDate

AnswerB

CustomerKey has high cardinality and is frequently used in joins, making it a good distribution column.

Why this answer

CustomerKey (B) is the most appropriate distribution column because it has high cardinality and is frequently used in joins with dimension tables, ensuring data is evenly distributed across distributions in Azure Synapse Analytics. A hash-distributed table requires a column with many unique values to avoid data skew, and CustomerKey is a natural key for sales transactions that meets this requirement.

Exam trap

Microsoft often tests the misconception that any column with high cardinality is suitable for hash distribution, but the trap here is that the column must also be frequently used in joins and evenly distribute data, not just have many unique values.

How to eliminate wrong answers

Option A (SalesAmount) is wrong because it is a measure column with continuous values that would cause data skew and poor query performance due to uneven distribution. Option C (ProductKey) is wrong because while it has high cardinality, it is less frequently used in join operations compared to CustomerKey, and using it may lead to suboptimal distribution for common sales analysis queries. Option D (OrderDate) is wrong because it has low cardinality (only 365 distinct values per year) and would cause severe data skew, as all transactions on the same date would hash to the same distribution, leading to hot spots and degraded performance.

Full explanation →

304

MCQmedium

A company uses Azure SQL Database for its order management system. The database has a table named Orders with columns OrderID (INT, PRIMARY KEY), CustomerID (INT), OrderDate (DATE), TotalAmount (DECIMAL). Queries that filter by OrderDate are slow. The database administrator observes that the nonclustered index on OrderDate has high fragmentation and many page splits. Which action will most likely improve query performance for these date-based queries?

A.Rebuild the nonclustered index on OrderDate with a FILLFACTOR of 80.

B.Change the data type of TotalAmount from DECIMAL to FLOAT.

C.Remove the clustered index on OrderID and create a clustered index on OrderDate.

D.Add a columnstore index on the OrderDate column.

AnswerA

Rebuilding with a lower fill factor reserves free space on pages, reducing page splits and fragmentation, thus improving performance.

Why this answer

Rebuilding the nonclustered index on OrderDate with a FILLFACTOR of 80 reduces page splits by leaving free space in each leaf-level page. This accommodates future insertions and updates that modify the OrderDate values, lowering fragmentation and improving query performance for date-based filters.

Exam trap

The trap here is that candidates may think changing the clustered index to OrderDate (Option C) is the best solution, but they overlook that this would disrupt the primary key and cause even more fragmentation for a table with frequent inserts, whereas rebuilding with a lower fill factor directly addresses page splits without altering the table's physical design.

How to eliminate wrong answers

Option B is wrong because changing TotalAmount from DECIMAL to FLOAT does not address fragmentation or page splits on the OrderDate index; it could introduce rounding errors and is irrelevant to date-based query performance. Option C is wrong because removing the clustered index on OrderID (the primary key) and creating a clustered index on OrderDate would reorganize the entire table by date, which might improve range scans but would severely degrade point lookups by OrderID and cause excessive page splits due to non-sequential date inserts; it is not the most targeted fix. Option D is wrong because a columnstore index is designed for analytical/aggregation workloads on large tables, not for improving point lookup or range filter performance on a single column in an OLTP system; it would add overhead without addressing fragmentation.

Full explanation →

305

MCQmedium

A social media startup needs to store user sessions as key-value pairs. Each session has a unique session ID, and the data needs to be globally distributed across multiple Azure regions to support low-latency reads for users worldwide. The development team expects heavy write throughput and needs flexible schema. Which Azure data store should they choose?

A.Azure Table Storage

B.Azure Blob Storage

C.Azure Cosmos DB

D.Azure Cache for Redis

AnswerC

Azure Cosmos DB supports global distribution, multiple consistency levels, and flexible schema, making it ideal for globally distributed key-value workloads with high throughput.

Why this answer

Azure Cosmos DB is the correct choice because it provides globally distributed, multi-region writes and reads with turnkey global distribution, supports flexible schema via its document model, and offers multiple consistency levels to balance performance and data integrity. It is designed for high-throughput, low-latency workloads like user sessions, with session IDs serving as natural partition keys for efficient key-value lookups.

Exam trap

The trap here is that candidates often confuse Azure Cache for Redis (a caching layer) with a durable, globally distributed data store, overlooking that session data requiring persistence and global replication needs a database like Cosmos DB, not an in-memory cache.

How to eliminate wrong answers

Option A is wrong because Azure Table Storage is a NoSQL key-value store but lacks native global distribution with multi-region writes and low-latency reads across regions; it is region-bound and requires manual replication. Option B is wrong because Azure Blob Storage is optimized for unstructured binary or text data (e.g., images, logs) and does not support key-value access patterns with high write throughput or flexible schema for session data. Option D is wrong because Azure Cache for Redis is an in-memory cache, not a durable data store; it is designed for temporary caching and cannot guarantee persistence or global distribution for session data that must survive restarts or be replicated across regions.

Full explanation →

306

MCQeasy

A company receives customer order data from its online store in a CSV file. Each line contains fields like OrderID, CustomerName, Product, Quantity, and OrderDate. This data is best described as:

A.Structured data

B.Semi-structured data

C.Unstructured data

D.Transactional data

AnswerA

Correct because the CSV file has a fixed schema with consistent fields per record, making it structured.

Why this answer

A is correct because the CSV file contains data that conforms to a strict tabular schema with predefined columns (OrderID, CustomerName, Product, Quantity, OrderDate) and consistent data types per column. This rigid, row-and-column format with a fixed schema is the defining characteristic of structured data, which can be directly loaded into a relational database or Azure SQL Database without transformation.

Exam trap

The trap here is that candidates confuse 'transactional data' (a workload type) with 'structured data' (a format classification), leading them to pick D because the data describes orders, even though the question explicitly asks about the data's format, not its business purpose.

How to eliminate wrong answers

Option B is wrong because semi-structured data (e.g., JSON, XML, Parquet) allows flexible schema variations, such as missing fields or nested structures, whereas CSV enforces a fixed number of columns per row and a consistent order. Option C is wrong because unstructured data (e.g., text files, images, videos) has no predefined schema or organization, while CSV has a clear row/column structure. Option D is wrong because transactional data refers to a type of workload (OLTP) that records business transactions, not a data format classification; the CSV file itself is a structured data format regardless of whether it contains transactional records.

Full explanation →

307

MCQeasy

A business analyst needs to explore and create interactive visualizations of sales data stored in Azure Data Lake Storage Gen2 without writing SQL code. Which Azure service is best suited for this drag-and-drop data exploration?

A.Azure Stream Analytics

B.Azure Data Factory

C.Azure Databricks

D.Microsoft Power BI

AnswerD

Power BI offers a user-friendly, drag-and-drop interface to connect to data lakes, transform data visually, and build interactive reports and dashboards without writing custom code.

Why this answer

Microsoft Power BI is the correct choice because it provides a drag-and-drop interface for creating interactive visualizations and exploring data without requiring SQL code. It can directly connect to Azure Data Lake Storage Gen2 using Power Query or the ADLS connector, enabling business analysts to build reports and dashboards through intuitive visual interactions.

Exam trap

The trap here is that candidates may confuse Azure Data Factory's visual pipeline designer with a drag-and-drop exploration tool, but Data Factory is for data movement and orchestration, not interactive visualization.

How to eliminate wrong answers

Option A is wrong because Azure Stream Analytics is a real-time event processing engine that requires SQL-like queries (Stream Analytics Query Language) to analyze streaming data, not a drag-and-drop visualization tool. Option B is wrong because Azure Data Factory is a cloud-based ETL and data integration service that orchestrates data movement and transformation using pipelines, not an interactive visualization or exploration tool. Option C is wrong because Azure Databricks is an Apache Spark-based analytics platform that requires writing code (Python, Scala, SQL) or using notebooks for data exploration, not a no-code drag-and-drop interface.

Full explanation →

308

MCQeasy

A company wants to store JSON documents that need to be queried with high throughput and low latency globally. Which Azure data service is most appropriate?

A.Azure Table Storage

B.Azure Cosmos DB

C.Azure SQL Database

D.Azure Blob Storage

AnswerB

Cosmos DB provides global distribution, low latency, and native JSON support.

Why this answer

Azure Cosmos DB is the most appropriate service because it is a globally distributed, multi-model database that natively supports JSON documents and provides guaranteed single-digit-millisecond latency at the 99th percentile, along with high throughput via configurable request units (RUs). Its turnkey global distribution enables low-latency reads and writes across multiple Azure regions, making it ideal for globally queried JSON workloads.

Exam trap

The trap here is that candidates confuse Azure Table Storage's key-value model with JSON document support, or assume Azure SQL Database's JSON functions make it suitable for globally distributed, high-throughput JSON workloads, missing Cosmos DB's core differentiator of turnkey global distribution and guaranteed low latency.

How to eliminate wrong answers

Option A is wrong because Azure Table Storage is a NoSQL key-value store that stores data in entity/partition structures, not native JSON documents, and it lacks global distribution with guaranteed low-latency SLAs. Option C is wrong because Azure SQL Database is a relational database that stores data in tables with a fixed schema, not as native JSON documents, and while it supports JSON functions, it is not designed for globally distributed, high-throughput JSON queries with multi-region write capabilities. Option D is wrong because Azure Blob Storage is an object storage service for unstructured binary data, not a queryable database; it cannot natively query JSON documents with low latency and high throughput.

Full explanation →

309

MCQmedium

A company is designing a multi-tenant SaaS application. Each tenant has its own relational database, but the total number of tenants is expected to grow rapidly. The company wants to manage all databases efficiently and optimize costs by sharing resources among tenants with low usage. What should they use?

A.SQL Server on Azure Virtual Machines

B.Azure SQL Managed Instance

C.Azure Database for MySQL with elastic pools

D.Azure SQL Database elastic pools

AnswerD

Elastic pools share resources among databases, ideal for multi-tenant SaaS.

Why this answer

Option A is correct because elastic pools in Azure SQL Database allow sharing resources among multiple databases, optimizing cost for low-usage tenants. Option B (Azure SQL Managed Instance) is for single large databases. Option C (SQL Server on Azure VM) requires separate VMs.

Option D (Azure Database for MySQL) does not have elastic pools.

Full explanation →

310

Matchingmedium

Match each Azure data migration tool to its use case.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts

Matches

Migrate databases to Azure with minimal downtime

Copy blobs or files to/from Azure Storage

Offline data transfer for large datasets

Ship physical disks to Azure datacenter

Orchestrate data movement and transformation

Why these pairings

Azure offers various tools for data migration scenarios.

Full explanation →

311

MCQhard

You are reviewing an ARM template for an Azure SQL Database deployment. What is the maximum size of the database?

A.5 GB

B.50 GB

C.5 MB

D.500 GB

AnswerA

5,368,709,120 bytes = 5 GB.

Why this answer

The ARM template for an Azure SQL Database deployment specifies the database size based on the selected service tier and performance level. For the Basic tier, the maximum database size is 5 GB, which is the correct answer. This is a fixed limit for Basic tier databases, while higher tiers like Standard or Premium support larger sizes.

Exam trap

The trap here is that candidates often assume all Azure SQL Database tiers support large sizes (like 500 GB) or confuse the Basic tier's 5 GB limit with the much smaller 5 MB, forgetting that Basic is designed for low-cost, small-scale workloads.

How to eliminate wrong answers

Option B (50 GB) is wrong because it is not a standard maximum size for any Azure SQL Database tier; the Basic tier is 5 GB, Standard goes up to 250 GB or more, and Premium up to 4 TB. Option C (5 MB) is wrong because it is far too small; even the smallest Azure SQL Database (Basic) supports 5 GB, not megabytes. Option D (500 GB) is wrong because while some Standard or Premium tiers can reach that size, the Basic tier is limited to 5 GB, and the question does not specify a higher tier.

Full explanation →

312

MCQmedium

A data analyst needs to run ad-hoc SQL queries on petabytes of Parquet files stored in Azure Data Lake Storage Gen2. The queries are infrequent and highly selective. The analyst wants to pay only for the data scanned by each query and does not want to provision any compute resources. They also need to create views to simplify future queries for other analysts. Which Azure service should they use?

A.Azure Data Factory

B.Azure Synapse Serverless SQL pool

C.Azure Synapse Dedicated SQL pool

D.Azure Databricks

AnswerB

Correct. Azure Synapse Serverless SQL pool allows querying data in ADLS Gen2 using T-SQL, charges per data scanned, and supports creating views.

Why this answer

Azure Synapse Serverless SQL pool (correct answer) is a pay-per-query service that charges only for the data scanned, requires no provisioning of compute resources, and supports creating views over external data in Azure Data Lake Storage Gen2. It uses T-SQL to query Parquet files directly, making it ideal for infrequent, highly selective ad-hoc queries without managing infrastructure.

Exam trap

The trap here is that candidates often confuse Azure Synapse Dedicated SQL pool (provisioned, always-on compute) with Serverless SQL pool (pay-per-query, no provisioning), or assume Azure Databricks can run ad-hoc SQL without a running cluster, but Databricks requires an active cluster and does not offer pay-per-query billing for data scanned.

How to eliminate wrong answers

Option A is wrong because Azure Data Factory is an ETL and data orchestration service, not a query engine; it does not support running ad-hoc SQL queries directly on Parquet files or pay-per-query billing. Option C is wrong because Azure Synapse Dedicated SQL pool requires provisioning and paying for reserved compute resources even when idle, contradicting the requirement to pay only for data scanned and not provision compute. Option D is wrong because Azure Databricks requires provisioning a cluster (compute) and incurs costs for cluster uptime, not per-query data scanned; it also does not natively support creating T-SQL views for other analysts without additional setup.

Full explanation →

313

MCQhard

A financial analytics company has two distinct data processing workloads. The first workload ingests real-time stock trade data from a message queue, calculates moving averages every minute, and updates a dashboard for traders. The second workload receives daily CSV files containing end-of-day trade summaries, transforms them using Python scripts, and loads the results into a data warehouse for monthly reporting. Which statement correctly characterizes these workloads?

A.First workload: Stream processing, Second workload: Batch processing

B.First workload: Batch processing, Second workload: Stream processing

C.First workload: OLTP, Second workload: OLAP

D.First workload: Transactional processing, Second workload: Analytical processing

AnswerA

Real-time stock trade analysis with moving averages is a classic stream processing workload (low latency, continuous). End-of-day CSV file processing is batch processing (scheduled, bulk).

Why this answer

Option A is correct because the first workload processes real-time stock trade data from a message queue and calculates moving averages every minute, which is a classic stream processing pattern (continuous, low-latency data ingestion and computation). The second workload handles daily CSV files with end-of-day summaries, transforms them with Python scripts, and loads results into a data warehouse for monthly reporting, which is a classic batch processing pattern (scheduled, high-latency processing of bounded data sets).

Exam trap

The trap here is that candidates confuse 'real-time' with 'transactional processing' (OLTP) or 'analytical processing' (OLAP), when the correct distinction is between stream processing (continuous, low-latency) and batch processing (scheduled, high-latency).

How to eliminate wrong answers

Option B is wrong because it reverses the definitions: the first workload is clearly stream processing (real-time, message queue), not batch processing, and the second workload is batch processing (daily files, scheduled transformation), not stream processing. Option C is wrong because OLTP (Online Transaction Processing) refers to systems that handle high-volume, low-latency transactions (e.g., order entry), not real-time analytics; the first workload is stream processing, not OLTP. Option D is wrong because 'transactional processing' is synonymous with OLTP, not stream processing, and 'analytical processing' is synonymous with OLAP, not batch processing; the first workload is stream processing, and the second is batch processing.

Full explanation →

314

MCQmedium

Your organization uses Microsoft Fabric to build a data lakehouse. Data engineers need to transform data using Spark and store results in Delta Lake format. Which Fabric component should they use?

A.Dataflows Gen2

B.Pipelines

C.Notebooks

D.Semantic models

AnswerC

Notebooks support Spark and Delta Lake.

Why this answer

Notebooks in Microsoft Fabric provide an interactive environment for writing and executing Spark code, which is required for transforming data using Spark. The results can be directly written to Delta Lake format, making Notebooks the correct component for this task.

Exam trap

The trap here is that candidates may confuse Pipelines (which orchestrate activities) with the actual compute engine (Notebooks) that runs Spark transformations, leading them to select Pipelines as the component for executing Spark code.

How to eliminate wrong answers

Option A is wrong because Dataflows Gen2 are used for low-code data transformation using Power Query, not for running Spark code. Option B is wrong because Pipelines are used for orchestrating and scheduling data movement and transformation activities, but they do not execute Spark transformations themselves. Option D is wrong because Semantic models are used for defining business logic and measures for reporting in Power BI, not for data transformation or Spark execution.

Full explanation →

315

MCQmedium

A gaming company stores player scores in Azure Cosmos DB using the NoSQL API. Each document contains fields: PlayerID (unique to the player), GameID, Score, Timestamp. The most common query is: 'Retrieve all scores for a specific GameID, ordered by Score descending.' Which property should be chosen as the partition key to minimize Request Unit (RU) consumption?

A.PlayerID

B.GameID

C.Score

D.Timestamp

AnswerB

GameID is the attribute used in the query filter. Choosing it as the partition key ensures that all scores for a given game are co-located in one partition, allowing a point query to that single partition and minimizing RU cost.

Why this answer

GameID is the correct partition key because the most common query filters on GameID, and Cosmos DB routes queries to the exact physical partition(s) containing that GameID's data. This minimizes RU consumption by avoiding cross-partition fan-out, as the query engine can target a single partition. Using any other field would force scanning multiple partitions, increasing RU cost.

Exam trap

The trap here is that candidates often pick a unique key like PlayerID thinking it ensures even distribution, but they overlook that the query pattern (filtering by GameID) must drive the partition key choice to avoid cross-partition queries.

How to eliminate wrong answers

Option A is wrong because PlayerID is unique per player, so each partition would hold only one document, leading to excessive partitions and cross-partition queries for the GameID-based query. Option C is wrong because Score is a high-cardinality, frequently updated value that would cause hot partitions and inefficient query routing, as the query filters on GameID, not Score. Option D is wrong because Timestamp is monotonically increasing, which creates a hot partition on the latest timestamp and does not align with the query filter on GameID, forcing full partition scans.

Full explanation →

316

MCQmedium

A social media company stores user session data. Each session record must be quickly looked up by user ID and must have strong consistency so that once a session is written, subsequent reads always return the latest data. The company expects billions of session records globally and needs low-latency reads/writes. Which Azure data store best meets these requirements?

A.Azure Cosmos DB (SQL API)

B.Azure Blob Storage

C.Azure Table Storage

D.Azure Cache for Redis

AnswerA

Correct. Cosmos DB provides low-latency, globally distributed key-value access with multiple consistency levels including strong consistency.

Why this answer

Azure Cosmos DB with SQL API is the correct choice because it offers single-digit millisecond read/write latencies at any scale, global distribution, and tunable consistency levels including strong consistency. Strong consistency ensures that once a write is acknowledged, all subsequent reads return the latest data, which is critical for session state where stale reads could cause authentication or authorization failures. Cosmos DB also supports automatic indexing and partitioning by user ID, enabling fast lookups across billions of records.

Exam trap

The trap here is that candidates often confuse Azure Table Storage's low cost and key-value model with the strong consistency requirement, not realizing that Table Storage defaults to eventual consistency and cannot guarantee that a read immediately after a write returns the latest data, especially in globally distributed scenarios.

How to eliminate wrong answers

Option B (Azure Blob Storage) is wrong because it is designed for unstructured binary or text data (e.g., images, videos, backups) and does not support low-latency key-value lookups or strong consistency guarantees for individual records; it is optimized for throughput, not point reads. Option C (Azure Table Storage) is wrong because while it supports key-value lookups, it only offers eventual consistency by default and cannot provide strong consistency across globally distributed replicas, which is required for session data. Option D (Azure Cache for Redis) is wrong because it is an in-memory cache that provides low latency but does not guarantee strong consistency (it is eventually consistent) and data is volatile unless persistence is configured, making it unsuitable as a durable primary store for session records that must survive restarts.

Full explanation →

317

Multi-Selecteasy

A retail company operates an online store. When a customer places an order, the system immediately updates inventory and payment records. Separately, the company's business analysts run weekly reports that aggregate sales data to identify trends. Which two terms correctly describe these workloads?

Select 2 answers

A.Batch processing and real-time processing

B.OLTP and OLAP

C.Structured and Unstructured data

D.Data ingestion and data transformation

AnswersA, B

Batch processing refers to processing data in large batches, while real-time processing handles data as it arrives. The order processing is transactional, not necessarily real-time analytics, and the weekly reports are batch, but 'batch' and 'real-time' are not the precise terms for workload types.

Why this answer

The order processing system that immediately updates inventory and payment records is a classic Online Transaction Processing (OLTP) workload, which handles high volumes of small, real-time transactions with ACID guarantees. The weekly sales aggregation reports run by business analysts are an Online Analytical Processing (OLAP) workload, which involves complex queries over large historical datasets to support business intelligence. Option B correctly pairs these two distinct processing paradigms.

Exam trap

The trap here is that candidates confuse the processing mode (batch vs. real-time) with the workload type (OLTP vs. OLAP), failing to recognize that OLTP is inherently real-time and OLAP is typically batch-oriented, but the question specifically asks for the correct terms that describe the workloads themselves.

Full explanation →

318

MCQmedium

A company has an existing on-premises SQL Server database that is 500 GB in size. The database uses SQL Server Agent jobs for scheduled maintenance and linked servers to query data from a remote SQL Server instance. The company wants to migrate to Azure with minimal application changes and needs automated backups and patching. Which Azure SQL service should they choose?

A.Azure SQL Database

B.Azure SQL Managed Instance

C.SQL Server on Azure Virtual Machines

D.Azure Database for PostgreSQL

AnswerB

Azure SQL Managed Instance provides high compatibility with on-premises SQL Server, including support for SQL Agent jobs and linked servers. It also includes automated backups, patching, and high availability, meeting all requirements.

Why this answer

Azure SQL Managed Instance is correct because it provides near 100% compatibility with on-premises SQL Server, including support for SQL Server Agent jobs and linked servers, while offering automated backups and patching. This minimizes application changes during migration, unlike other Azure SQL options that lack these features.

Exam trap

The trap here is that candidates often choose Azure SQL Database for its simplicity, overlooking its lack of SQL Server Agent and linked server support, which are critical for the described workload.

How to eliminate wrong answers

Option A is wrong because Azure SQL Database does not support SQL Server Agent jobs or linked servers, requiring significant application changes. Option C is wrong because SQL Server on Azure Virtual Machines requires manual management of backups and patching, contradicting the need for automated maintenance. Option D is wrong because Azure Database for PostgreSQL is a different database engine, incompatible with SQL Server Agent jobs and linked servers.

Full explanation →

319

Multi-Selectmedium

Which TWO of the following statements about Azure SQL Database elastic pools are true?

Select 2 answers

A.They store actual database data

B.They allow multiple databases to share a fixed set of resources

C.They are cost-effective for databases with unpredictable usage patterns

D.They cannot be scaled after creation

E.They are limited to a single Azure SQL Database server

AnswersB, C

Elastic pools allocate resources to a pool of databases.

Why this answer

Options A and C are correct. Elastic pools allow sharing of resources among multiple databases and are cost-effective for databases with variable usage patterns. Option B is wrong because elastic pools can be scaled up or down.

Option D is wrong because elastic pools are not limited to a single server. Option E is wrong because elastic pools do not store data; databases within pools do.

Full explanation →

320

MCQmedium

You need to design a data storage solution for a global e-commerce application that must support ACID transactions and require minimal latency for point lookups by a unique key. Which Azure data service should you use?

A.Azure Table Storage

B.Azure SQL Database

C.Azure Blob Storage

D.Azure Cosmos DB

AnswerD

Cosmos DB provides low-latency point lookups and ACID transactions within a logical partition.

Why this answer

Azure Cosmos DB is the correct choice because it provides global distribution with multi-region writes, guarantees ACID transactions through its transactional batch API, and offers single-digit millisecond latency for point reads by a unique key (e.g., id and partition key). This makes it ideal for a global e-commerce application requiring both strong consistency and low-latency lookups.

Exam trap

The trap here is that candidates often assume Azure SQL Database is the only ACID-compliant option, overlooking Cosmos DB's transactional batch support and its superior global low-latency capabilities.

How to eliminate wrong answers

Option A is wrong because Azure Table Storage does not support ACID transactions (it only offers entity-level atomicity) and has higher latency for point lookups compared to Cosmos DB. Option B is wrong because Azure SQL Database, while fully ACID-compliant, is not designed for global distribution with minimal latency; it requires read replicas and manual failover, and its point lookup latency is higher than Cosmos DB's single-digit millisecond SLA. Option C is wrong because Azure Blob Storage is an object store for unstructured data, does not support ACID transactions, and point lookups by unique key are not its primary access pattern (it uses HTTP-based REST operations with higher latency).

Full explanation →

321

MCQmedium

A social media application stores user posts in Azure Cosmos DB. Each post has fields: PostID (unique), UserID, Timestamp, Content, LikesCount. The application frequently queries for all posts by a specific UserID ordered by Timestamp descending. To minimize Request Unit (RU) consumption, which partition key and indexing strategy should be used?

A.Partition key: UserID, and create a composite index on (UserID, Timestamp DESC)

B.Partition key: Timestamp, and sort by UserID in the query

C.Partition key: PostID, and use ORDER BY Timestamp

D.Partition key: UserID, and use ORDER BY PostID

AnswerA

This design localizes all posts for a user in one partition and uses an index that directly supports the filter and sort order.

Why this answer

Option A is correct because UserID is the most frequently filtered attribute, making it an ideal partition key to distribute data evenly and avoid cross-partition queries. Adding a composite index on (UserID, Timestamp DESC) allows the query to be served from a single physical partition with an index seek, minimizing RU consumption by avoiding a full scan or sort operation.

Exam trap

The trap here is that candidates often pick a partition key based on the ORDER BY column (Timestamp) without realizing that the filter column (UserID) should be the partition key to avoid cross-partition queries, and that a composite index is needed to avoid an expensive sort.

How to eliminate wrong answers

Option B is wrong because Timestamp as a partition key would cause hot partitions (e.g., all posts from a trending time) and the query would need to scatter across partitions to filter by UserID, increasing RU. Option C is wrong because PostID as a partition key would scatter each user's posts across many partitions, forcing a cross-partition query with ORDER BY Timestamp that requires a costly sort across partitions. Option D is wrong because using ORDER BY PostID does not satisfy the requirement to order by Timestamp descending, and even with UserID as partition key, the query would need to sort posts by PostID instead of Timestamp, which is incorrect and inefficient.

Full explanation →

322

MCQmedium

A global social media app uses Azure Cosmos DB (NoSQL API) to store user profile data. The app is read-heavy and must serve content with the lowest possible latency to users worldwide. The data is updated by users, and the business has determined that eventual consistency is acceptable because immediate consistency after a write is not critical for profile views. Which consistency level should they choose to minimize read latency?

A.Eventual

B.Strong

C.Bounded staleness

D.Session

AnswerA

Eventual consistency provides the best read performance and availability, with no ordering guarantees.

Why this answer

Eventual consistency is the correct choice because it offers the lowest read latency by allowing reads to return data immediately without waiting for replication to complete. Since the app is read-heavy, global, and can tolerate eventual consistency for profile views, this consistency level minimizes the time to serve content by not imposing any ordering or staleness guarantees on replicas.

Exam trap

The trap here is that candidates often choose Session consistency because it is the default for many Azure Cosmos DB SDKs, but the question explicitly asks for the lowest read latency with eventual consistency acceptable, making Eventual the correct answer despite Session being a common default.

How to eliminate wrong answers

Option B (Strong) is wrong because it requires all replicas to agree on the latest write before any read can proceed, which adds significant latency, especially across global regions, and is unnecessary given the business's acceptance of eventual consistency. Option C (Bounded staleness) is wrong because it imposes a maximum staleness window (e.g., 5 seconds or 10 operations), which still introduces a replication delay and higher read latency compared to eventual, and is overkill for a scenario where any staleness is acceptable. Option D (Session) is wrong because it guarantees monotonic reads and writes within a single client session, which adds overhead to maintain session context and does not minimize read latency globally; it is designed for per-session consistency, not for lowest-latency global reads.

Full explanation →

323

MCQmedium

A team is designing a data pipeline to process streaming sensor data from IoT devices. The data must be ingested, transformed in real time, and stored in a time-series database. Which combination of Azure services should they use?

A.Azure IoT Hub, Azure Data Lake Storage, and Azure Databricks

B.Azure IoT Hub, Azure Stream Analytics, and Azure Data Explorer

C.Azure Event Hubs, Azure Functions, and Azure SQL Database

D.Azure Event Hubs, Azure Synapse Pipelines, and Azure Cosmos DB

AnswerB

IoT Hub ingests device data, Stream Analytics performs real-time transformations, and Data Explorer is a time-series database for fast analytics.

Why this answer

Option B is correct because Azure IoT Hub ingests streaming sensor data from IoT devices, Azure Stream Analytics provides real-time transformation and analysis of the data streams, and Azure Data Explorer (ADX) is a fully managed time-series database optimized for high-velocity telemetry data. This combination directly addresses the requirement for ingestion, real-time transformation, and time-series storage.

Exam trap

The trap here is that candidates often confuse Azure Data Explorer with Azure Data Lake Storage or Azure SQL Database, assuming any storage service can handle time-series data, but ADX is the only Azure service purpose-built for high-ingestion-rate time-series analytics with features like materialized views and data sharding.

How to eliminate wrong answers

Option A is wrong because Azure Data Lake Storage is a hierarchical file store for batch/analytics, not a time-series database, and Azure Databricks is primarily for batch and interactive analytics, not real-time stream processing with low-latency time-series storage. Option C is wrong because Azure SQL Database is a relational OLTP database not optimized for time-series workloads, and Azure Functions is event-driven compute, not a dedicated stream processing service for real-time transformations. Option D is wrong because Azure Synapse Pipelines is an orchestration tool for data movement and transformation, not real-time stream processing, and Azure Cosmos DB is a multi-model NoSQL database that lacks native time-series optimizations like automatic retention policies and downsampling.

Full explanation →

324

MCQeasy

A retail company stores customer transaction data in a relational database. Each transaction is recorded with a fixed schema including TransactionID, CustomerID, ProductID, Quantity, and TotalAmount. Which type of data does this represent?

A.Unstructured data

B.Semi-structured data

C.Structured data

D.Binary data

AnswerC

Structured data conforms to a predefined schema, typically stored in rows and columns in a relational database. The fixed schema of TransactionID, CustomerID, etc., makes this structured data.

Why this answer

Option C is correct because the data conforms to a fixed schema with defined columns (TransactionID, CustomerID, ProductID, Quantity, TotalAmount) and data types, which is the defining characteristic of structured data. In a relational database, this schema enforces consistency and allows for efficient querying using SQL, making it a classic example of structured data.

Exam trap

The trap here is that candidates may confuse 'structured data' with 'semi-structured data' because both have some organization, but the key differentiator is the rigid, predefined schema enforced by the relational database versus the flexible, self-describing schema of semi-structured formats like JSON or XML.

How to eliminate wrong answers

Option A is wrong because unstructured data has no predefined schema or organization (e.g., text files, images, videos), whereas this data has a fixed schema. Option B is wrong because semi-structured data has some organizational properties but does not conform to a rigid schema (e.g., JSON, XML with flexible tags), while this data uses a strict relational schema. Option D is wrong because binary data refers to raw byte sequences (e.g., executable files, images), not tabular data with typed columns.

Full explanation →

325

MCQhard

A data analyst needs to run ad-hoc SQL queries on petabytes of log data stored as Parquet files in Azure Data Lake Storage Gen2. The queries join multiple tables and require high concurrency from multiple analysts. The solution should minimize cost by only paying for queries executed. Which Azure service should they use?

A.Azure Synapse Serverless SQL pool

B.Azure Synapse Dedicated SQL pool

C.Azure HDInsight with Spark

D.Azure Databricks

AnswerA

Serverless SQL pool allows querying data lake files with T-SQL, charges per TB of data processed, and scales automatically for concurrency.

Why this answer

Azure Synapse Serverless SQL pool is the correct choice because it allows running ad-hoc T-SQL queries directly on Parquet files in Azure Data Lake Storage Gen2 without provisioning any infrastructure. It uses a pay-per-query billing model, making it cost-effective for sporadic, high-concurrency workloads where you only want to pay for the compute resources consumed during query execution.

Exam trap

The trap here is that candidates often confuse 'serverless' with 'Dedicated SQL pool' or choose Spark-based services like Databricks or HDInsight, failing to recognize that the key requirement is pay-per-query billing for ad-hoc SQL on data lake storage, which only Serverless SQL pool provides.

How to eliminate wrong answers

Option B is wrong because Azure Synapse Dedicated SQL pool requires provisioning and paying for dedicated compute resources 24/7, even when no queries are running, which contradicts the requirement to minimize cost by only paying for queries executed. Option C is wrong because Azure HDInsight with Spark involves provisioning a persistent cluster with fixed compute nodes, incurring ongoing costs regardless of query activity, and is more suited for batch processing and ETL rather than ad-hoc SQL queries. Option D is wrong because Azure Databricks also requires a running cluster (even with auto-termination, there is overhead) and is optimized for Spark-based analytics and machine learning, not for serverless SQL-on-demand with pay-per-query billing.

Full explanation →

326

MCQhard

Your company, Contoso Ltd., operates a global e-commerce platform. The data engineering team ingests over 10 TB of raw clickstream data daily into Azure Data Lake Storage Gen2. The data is partitioned by date and hour. Business analysts need to query this data using Azure Synapse Serverless SQL to generate daily sales reports. However, the reports are taking over 30 minutes to run, and the team needs to improve query performance without moving data to a dedicated SQL pool. You are asked to recommend a solution. Which action should you take?

A.Convert the data from JSON to Parquet format and apply Snappy compression.

B.Use Azure Data Factory to copy the data into Azure SQL Database and create indexes.

C.Create a dedicated SQL pool and distribute the data across 60 distributions.

D.Create external tables using a partition elimination strategy and ensure the data is partitioned by date.

AnswerD

Partition elimination allows the serverless SQL engine to read only the partitions needed for the query, significantly reducing data scanned and improving performance.

Why this answer

Option D is correct because Azure Synapse Serverless SQL can use external tables with partition elimination to skip irrelevant partitions (e.g., date/hour folders) during query execution. This reduces the amount of data scanned, directly improving query performance without moving data. Partition elimination works by filtering on the partition column (e.g., date) in the WHERE clause, allowing the query engine to read only the necessary files.

Exam trap

The trap here is that candidates often assume converting file format (Parquet) alone is sufficient, but the question specifically targets reducing data scanned via partition elimination, which is a more direct optimization for partitioned data in serverless SQL.

How to eliminate wrong answers

Option A is wrong because while converting to Parquet with Snappy compression can improve performance, it does not address the root cause of scanning all 10 TB daily; partition elimination is more impactful for reducing data scanned. Option B is wrong because copying data to Azure SQL Database defeats the requirement of not moving data to a dedicated SQL pool, and it introduces additional cost and latency. Option C is wrong because creating a dedicated SQL pool explicitly violates the requirement to not move data to a dedicated SQL pool; it also involves provisioning and managing separate compute resources.

Full explanation →

327

MCQeasy

A retail company plans to store product catalog data that includes product ID, name, description, price, and a varying set of attributes (e.g., size, color, material). The application requires low-latency reads and writes, global distribution, and the ability to handle schema flexibility. Which Azure data store is best suited for this workload?

A.Azure SQL Database

B.Azure Cosmos DB

C.Azure Table Storage

D.Azure Blob Storage

AnswerB

Azure Cosmos DB is a globally distributed, multi-model NoSQL database that supports flexible schemas, low-latency reads/writes, and global distribution, making it well-suited for product catalog data.

Why this answer

Azure Cosmos DB is the best choice because it provides low-latency reads and writes (single-digit milliseconds at the 99th percentile), global distribution with multi-region writes, and automatic schema flexibility through its document model. It supports varying product attributes (e.g., size, color, material) without requiring schema changes, and its turnkey global distribution ensures data is replicated across regions for fast access.

Exam trap

The trap here is that candidates often confuse Azure Table Storage with Cosmos DB because both are NoSQL, but Table Storage lacks global distribution with multi-region writes and the low-latency guarantees required for this workload, while Cosmos DB is the only option that combines schema flexibility, global distribution, and low-latency reads/writes.

How to eliminate wrong answers

Option A is wrong because Azure SQL Database is a relational store with a fixed schema, requiring ALTER TABLE statements to add new attributes, which cannot handle the varying set of attributes efficiently and does not natively support global distribution with multi-region writes. Option C is wrong because Azure Table Storage is a key-value store that lacks native support for global distribution with multi-region writes and does not provide the same low-latency guarantees as Cosmos DB (Table Storage is optimized for high throughput but not single-digit millisecond latency at scale). Option D is wrong because Azure Blob Storage is designed for unstructured binary data (e.g., images, videos) and does not support querying on individual attributes like product ID or price, nor does it provide schema flexibility for document-like data.

Full explanation →

328

Matchingmedium

Match each Azure data security feature to its description.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts

Matches

Encrypts data at rest automatically

Encrypts data in use and in transit

Identity-based access control

IP-level network restrictions

Private IP connectivity over Azure backbone

Why these pairings

These are important data security mechanisms in Azure.

Full explanation →

329

MCQmedium

A financial application stores transactions in an Azure SQL Database table with columns: TransactionID (clustered index), AccountID, TransactionDate, Amount. Queries frequently filter on AccountID and TransactionDate together. The table contains millions of rows. Which index strategy will most improve query performance for these filters?

A.Clustered index on (AccountID, TransactionDate)

B.Nonclustered index on (TransactionDate)

C.Nonclustered index on (AccountID) INCLUDE (TransactionDate)

D.Nonclustered index on (AccountID, TransactionDate)

AnswerD

Correct. A composite nonclustered index on (AccountID, TransactionDate) enables an index seek on AccountID and a range scan on TransactionDate within that partition, minimizing I/O.

Why this answer

Option D creates a nonclustered index on (AccountID, TransactionDate) that acts as a covering index for queries filtering on both columns. SQL Server can seek directly to the matching rows using the composite key order, avoiding a full table scan or key lookup. This is the most efficient strategy because the index is sorted by AccountID first, then TransactionDate, matching the query predicate exactly.

Exam trap

The trap here is that candidates often choose Option A (changing the clustered index) because they think it will be faster for all queries, but they overlook the negative impact on the existing primary key and the fact that a nonclustered covering index is sufficient and less disruptive.

How to eliminate wrong answers

Option A is wrong because changing the clustered index to (AccountID, TransactionDate) would reorganize the entire table's physical order, potentially harming performance for other queries that rely on the existing TransactionID clustered index (e.g., range scans or joins on TransactionID). Option B is wrong because a nonclustered index on TransactionDate alone cannot efficiently filter on AccountID; it would require scanning all rows for each AccountID or performing a key lookup for each match. Option C is wrong because a nonclustered index on AccountID with TransactionDate as an included column only helps when filtering solely on AccountID; it does not support seeking on both columns together, as the included column is not part of the index key and cannot be used for range or equality filtering on TransactionDate.

Full explanation →

330

MCQeasy

A company wants to build a data lake on Azure for storing structured, semi-structured, and unstructured data. The solution must support fast queries on structured data without moving data to a separate store. Which Azure service should they use?

A.Azure Blob Storage

B.Azure SQL Database

C.Azure Data Lake Storage Gen2

D.Azure Cosmos DB

AnswerC

ADLS Gen2 combines blob storage with a hierarchical namespace and is optimized for analytics.

Why this answer

Azure Data Lake Storage Gen2 (ADLS Gen2) is the correct choice because it combines a hierarchical namespace with Azure Blob Storage, enabling both a data lake for unstructured and semi-structured data and fast, SQL-like queries on structured data using Azure Synapse Analytics or PolyBase without moving data. It supports the ABFS (Azure Blob File System) driver for high-performance analytics and integrates directly with query engines like Apache Spark and Azure Synapse SQL.

Exam trap

The trap here is that candidates often pick Azure Blob Storage because it is the underlying storage for ADLS Gen2, but they miss that ADLS Gen2's hierarchical namespace is required for fast, directory-level queries and direct SQL access without moving data.

How to eliminate wrong answers

Option A is wrong because Azure Blob Storage lacks a hierarchical namespace by default, making it inefficient for directory-level operations and fast queries on structured data without additional indexing or data movement. Option B is wrong because Azure SQL Database is a relational database for structured data only, not a data lake for storing unstructured or semi-structured data, and it requires data to be loaded into its tables rather than queried in place. Option D is wrong because Azure Cosmos DB is a NoSQL database designed for globally distributed, low-latency access to semi-structured data, but it does not support a data lake architecture for unstructured data or fast SQL queries on structured data without moving data to a separate store.

Full explanation →

331

MCQmedium

A data engineer needs to process raw clickstream data from multiple websites that is stored in Azure Blob Storage as JSON files. The processing must run automatically every hour, transform the data into a structured format for reporting, and handle schema changes in the source data without manual intervention. Which Azure service should be used?

A.Azure Stream Analytics with a reference data input.

B.Azure Data Factory with a Mapping Data Flow.

C.Azure SQL Database with a stored procedure.

D.Azure Logic Apps with a JSON parser.

AnswerB

Mapping Data Flows support schema drift and can be scheduled to run on a recurring basis, making it ideal for this scenario.

Why this answer

Azure Data Factory with a Mapping Data Flow is correct because it provides a code-free, visual data transformation environment that can run on a scheduled trigger (every hour), handle schema drift automatically via schema drift options in Mapping Data Flows, and process JSON files from Azure Blob Storage into a structured format for reporting. This meets all requirements: scheduled execution, transformation, and schema evolution without manual intervention.

Exam trap

The trap here is that candidates may confuse Azure Stream Analytics (real-time) with batch processing, or think Azure Logic Apps can handle complex data transformations, when in fact Data Factory is the correct service for scheduled, schema-drift-tolerant ETL on Azure.

How to eliminate wrong answers

Option A is wrong because Azure Stream Analytics is designed for real-time stream processing (e.g., sub-second latency) and uses a reference data input for static lookups, not for batch processing of hourly JSON files with schema drift handling. Option C is wrong because Azure SQL Database with a stored procedure requires manual schema changes to the stored procedure or table when the source JSON schema changes, and it cannot natively handle schema drift from JSON files without additional ETL logic. Option D is wrong because Azure Logic Apps is a workflow orchestration service for integrating APIs and services, not a data transformation engine; its JSON parser can parse JSON but lacks the ability to handle schema drift, run complex transformations, or process large-scale data efficiently on a schedule.

Full explanation →

332

MCQhard

Your team uses Azure SQL Database and wants to implement row-level security (RLS) to restrict access to sales data by region. Which type of data workload characteristic does RLS primarily address?

A.Concurrency

B.Consistency

C.Security

D.Durability

AnswerC

RLS is a security feature that restricts data access based on user identity.

Why this answer

Row-level security (RLS) in Azure SQL Database restricts data access at the database engine level by applying a security predicate that filters rows based on user attributes, such as region. This directly addresses the security characteristic of a data workload by ensuring that users can only see data they are authorized to view, without requiring application-level changes.

Exam trap

The trap here is that candidates confuse security (access control) with concurrency (multi-user access) or consistency (data integrity), because RLS involves filtering rows during queries, which might superficially resemble managing concurrent access or ensuring data correctness.

How to eliminate wrong answers

Option A is wrong because concurrency refers to the ability of multiple users to access data simultaneously without conflicts, which is managed by locking and isolation levels, not by row-level filtering. Option B is wrong because consistency ensures that data remains accurate and valid across transactions (e.g., via ACID properties), whereas RLS does not enforce data integrity rules. Option D is wrong because durability guarantees that committed transactions persist even after a system failure, typically achieved through transaction logs and backups, not through access control predicates.

Full explanation →

333

MCQmedium

A startup develops a mobile application that stores user preferences as simple key-value pairs. The app is only used in North America, and the team needs low-latency reads and writes with minimal cost. They do not require global distribution or complex querying. Which Azure data store should they choose?

A.Azure Cosmos DB (SQL API)

B.Azure Cosmos DB (Table API)

C.Azure Table Storage

D.Azure SQL Database

AnswerC

Azure Table Storage is a cost-efficient NoSQL key-value store that offers low-latency access for simple data without complex querying. It is the best choice given the single-region requirement and cost constraint.

Why this answer

Azure Table Storage is the correct choice because it provides a cost-effective, low-latency key-value store for simple data like user preferences, with no need for global distribution or complex querying. It offers single-digit millisecond latency for reads and writes within a single region, and its pay-per-request pricing model minimizes cost for a startup. The Table API in Azure Cosmos DB would be overkill and more expensive for this North America-only, non-distributed scenario.

Exam trap

The trap here is that candidates often confuse Azure Cosmos DB Table API with Azure Table Storage, assuming the Cosmos DB version is always better, but they fail to consider the cost implications and the fact that Azure Table Storage is sufficient for simple, single-region key-value workloads without global distribution.

How to eliminate wrong answers

Option A is wrong because Azure Cosmos DB (SQL API) is a globally distributed, multi-model database designed for complex queries and high throughput, which is unnecessary and costly for simple key-value pairs with no global distribution requirement. Option B is wrong because Azure Cosmos DB (Table API) provides the same key-value functionality as Azure Table Storage but with global distribution and higher throughput guarantees, leading to significantly higher costs for a single-region, low-traffic app. Option D is wrong because Azure SQL Database is a relational database with full SQL support, which is overkill for simple key-value pairs and incurs higher costs due to its provisioned compute and storage model.

Full explanation →

334

MCQmedium

A company runs an e-commerce application on Azure SQL Database. The application experiences heavy read traffic from reporting dashboards that query the same tables as the transactional workload. This causes performance degradation for the application. The company needs a solution that offloads reporting queries to a read-only copy that stays synchronized within minutes, without impacting transactional performance. Which Azure SQL Database feature should they use?

A.Auto-failover groups

B.Active geo-replication

C.Elastic pools

D.Query Performance Insight

AnswerB

Correct. Active geo-replication provides a readable secondary database that can serve reporting queries while the primary handles transactional workloads. The replication lag is typically within minutes.

Why this answer

Active geo-replication creates a readable secondary replica of the Azure SQL Database that can be used for read-only workloads like reporting dashboards. The secondary replica stays synchronized with the primary database using asynchronous replication, typically within seconds to minutes, ensuring near-real-time data without impacting transactional performance on the primary.

Exam trap

The trap here is that candidates often confuse auto-failover groups with active geo-replication, not realizing that auto-failover groups do not make the secondary readable by default unless combined with active geo-replication, and that the primary purpose of failover groups is failover orchestration, not read offloading.

How to eliminate wrong answers

Option A is wrong because auto-failover groups provide high availability and disaster recovery by managing failover of multiple databases, but they do not offload read traffic to a read-only copy; the secondary in a failover group is not readable unless you use active geo-replication within the group. Option C is wrong because elastic pools are a pricing and resource management model for pooling resources across multiple databases, not a feature for creating read-only replicas. Option D is wrong because Query Performance Insight is a diagnostic tool for analyzing query performance and identifying bottlenecks, not a mechanism to offload read traffic to a separate copy.

Full explanation →

335

MCQhard

A company uses Azure Blob Storage to store video files for a streaming service. The files are accessed frequently for the first 30 days after upload, then rarely after. The company wants to minimize storage costs while maintaining fast access for frequently accessed files. What should they implement?

A.Azure Content Delivery Network (CDN)

B.Azure Files shares

C.Blob lifecycle management policies

D.Geo-redundant storage (GRS)

AnswerC

Lifecycle management automates moving blobs between tiers (Hot, Cool, Archive) based on age, optimizing cost while keeping frequently accessed data in Hot tier.

Why this answer

Blob lifecycle management policies allow you to automatically transition blobs to cooler, cheaper access tiers (e.g., from Hot to Cool or Archive) based on age. This directly addresses the requirement: after 30 days of frequent access, the policy moves the video files to a lower-cost tier, reducing storage costs while keeping the Hot tier available for the initial high-access period.

Exam trap

The trap here is that candidates often confuse cost optimization with performance acceleration, mistakenly choosing Azure CDN (Option A) because it improves access speed, when the question explicitly asks for minimizing storage costs while maintaining fast access for frequently accessed files.

How to eliminate wrong answers

Option A is wrong because Azure CDN is a content delivery network that caches content at edge locations for faster delivery, not a storage cost optimization mechanism; it does not automatically change the storage tier of the source blobs. Option B is wrong because Azure Files shares provide SMB/NFS file shares for shared access, not a tiering solution for blob storage cost management; they are a different storage service entirely. Option D is wrong because Geo-redundant storage (GRS) replicates data to a secondary region for disaster recovery, which increases storage costs and does not address the need to reduce costs for infrequently accessed data.

Full explanation →

336

MCQmedium

A DBA runs the above KQL query in Azure Monitor for an Azure SQL Database. The query returns no results. What is the most likely reason?

A.Query Store is not enabled on the database

B.The AzureDiagnostics table does not contain SQL data

C.The category name is misspelled

D.The KQL syntax is incorrect

AnswerA

QueryStoreRuntimeStatistics is populated only when Query Store is enabled.

Why this answer

Option B is correct because the Query Store must be enabled for each database to generate runtime statistics. Option A is wrong because AzureDiagnostics is a common table for SQL diagnostics. Option C is wrong because the category name is correct.

Option D is wrong because the syntax is valid.

Full explanation →

337

MCQhard

You are a data engineer for a global gaming company. The company collects telemetry data from millions of players in real time. Each telemetry event is a JSON object containing player ID, game session ID, event type, timestamp, and a payload of up to 5 KB. The data must be stored for 90 days for real-time analytics and then moved to long-term storage for 5 years for historical analysis. The real-time analytics require querying by player ID and event type with sub-second latency. The long-term storage must be cost-effective and support batch analytics. You need to design a storage solution. Which combination of Azure services should you use to meet these requirements?

A.Store all telemetry in Azure Data Lake Storage Gen2 with a hierarchical namespace, and use Azure Synapse Serverless SQL for real-time queries.

B.Ingest telemetry into Azure Stream Analytics, output to Azure SQL Database for 90 days, then export to Azure Data Lake Storage.

C.Ingest telemetry into Azure Event Hubs and then store in Azure Blob Storage with a lifecycle management policy to delete after 90 days.

D.Ingest telemetry into Azure Cosmos DB with TTL set to 90 days, then use Azure Data Factory to copy expired data to Azure Blob Storage.

AnswerD

Cosmos DB provides low-latency queries; TTL automatically removes data; Blob Storage is cost-effective for long-term.

Why this answer

Option A is correct because Azure Cosmos DB provides sub-second query latency on player ID and event type, with a Time-to-Live (TTL) to automatically expire data after 90 days. Then, you can use Azure Data Factory to copy the expired data to Azure Blob Storage (Cool or Archive) for cost-effective long-term storage. Option B is wrong because Azure Event Hubs is for ingestion, not storage.

Option C is wrong because Azure Stream Analytics is for processing, not storage. Option D is wrong because Azure Data Lake Storage is for analytics, but for real-time sub-second queries, Cosmos DB is better.

Full explanation →

338

MCQhard

A multinational e-commerce company uses Azure SQL Database active geo-replication to replicate a critical inventory database to a secondary region. During a regional outage, the application automatically fails over to the secondary database. After the primary region recovers, the administrator wants to make the original primary the main database again without losing any data modifications made on the secondary during the outage. What should the administrator do?

A.Drop the geo-replication relationship, then recreate the secondary from the current primary.

B.Perform a forced failover to switch back to the original primary.

C.Initiate a planned failover to switch back to the original primary.

D.Delete the secondary database and restore the original primary from a backup taken before the outage.

AnswerC

A planned failover (graceful failover) synchronizes all data between replicas before switching roles, ensuring zero data loss.

Why this answer

Option C is correct because a planned failover (also known as graceful failover) in Azure SQL Database active geo-replication is designed to switch roles between the primary and secondary databases without data loss. After the original primary region recovers, initiating a planned failover synchronizes all data from the current primary (the former secondary) to the original primary, making it the new primary while preserving all modifications made during the outage. This operation ensures zero data loss because it forces a final synchronization before the role swap.

Exam trap

The trap here is confusing a planned failover (graceful, no data loss) with a forced failover (unplanned, potential data loss), leading candidates to incorrectly choose Option B when they need to preserve all modifications made on the secondary during an outage.

How to eliminate wrong answers

Option A is wrong because dropping the geo-replication relationship and recreating the secondary from the current primary would discard the original primary's data modifications made during the outage, as the original primary would be overwritten by the current primary's data. Option B is wrong because a forced failover (also called unplanned failover) is intended for disaster scenarios and can cause data loss; it does not perform a final synchronization and would not guarantee that all modifications from the secondary are preserved when switching back. Option D is wrong because deleting the secondary database and restoring the original primary from a backup taken before the outage would lose all data modifications made on the secondary during the outage, defeating the purpose of geo-replication for high availability.

Full explanation →

339

MCQeasy

A company uses Azure Synapse Analytics to run large-scale data transformations. They need to optimize costs for predictable workloads that run every night. Which Azure feature should they configure?

A.Implement dedicated SQL pool pause and resume

B.Enable always-on availability

C.Enable data compression on tables

D.Configure auto-scale

AnswerA

Pause/resume stops compute when not in use, ideal for scheduled workloads.

Why this answer

Azure Synapse Analytics dedicated SQL pool supports pause and resume, which stops compute billing while preserving data in storage. For predictable nightly workloads, pausing the pool during idle hours eliminates compute costs, then resuming it before the job runs. This directly optimizes cost for scheduled, non-continuous workloads.

Exam trap

The trap here is that candidates confuse auto-scale (which scales compute up/down while running) with pause/resume (which stops compute entirely), failing to recognize that predictable idle periods benefit from complete compute suspension rather than dynamic scaling.

How to eliminate wrong answers

Option B is wrong because always-on availability is a high-availability feature for SQL Server, not a cost optimization mechanism for Synapse Analytics. Option C is wrong because data compression reduces storage costs and improves I/O performance, but does not address compute cost optimization for idle periods. Option D is wrong because auto-scale adjusts resources dynamically based on load, which is useful for variable workloads but does not eliminate compute costs during predictable idle windows like nightly pauses.

Full explanation →

340

MCQeasy

A company stores an employee database in a relational database. The Employees table includes columns: EmployeeID (integer), FirstName (text), LastName (text), HireDate (date), and a column called Photo which stores the employee's photo as a binary large object (BLOB). Which statement best describes the data types in this table?

A.All columns store structured data.

B.The Photo column stores unstructured data, while the other columns store structured data.

C.All columns store unstructured data.

D.The HireDate column stores semi-structured data.

AnswerB

Structured data is organized with a fixed schema; the integer, text, and date columns all have a fixed type and format. The Photo column contains binary image data with no inherent structure, making it unstructured data.

Why this answer

The Photo column stores a binary large object (BLOB), which is unstructured data because it does not have a predefined schema or format that can be easily queried or indexed by relational operations. In contrast, EmployeeID, FirstName, LastName, and HireDate are all structured data types (integer, text, date) that conform to a fixed schema and support direct querying, sorting, and indexing. This distinction is fundamental in Azure data services, where structured data is typically stored in Azure SQL Database or Azure Synapse, while unstructured BLOBs are better suited for Azure Blob Storage.

Exam trap

The trap here is that candidates may assume all columns in a relational database are structured, overlooking that BLOB columns store unstructured binary data, which is a key distinction tested in the DP-900 exam under core data concepts.

How to eliminate wrong answers

Option A is wrong because it claims all columns store structured data, but the Photo column as a BLOB is unstructured binary data without a fixed schema. Option C is wrong because it states all columns store unstructured data, but EmployeeID, FirstName, LastName, and HireDate have explicit data types (integer, text, date) that are structured and schema-bound. Option D is wrong because the HireDate column stores a date value, which is structured data, not semi-structured data (semi-structured data would be something like JSON or XML with flexible schema).

Full explanation →

341

MCQmedium

A company uses Azure SQL Database for an order management system. They have a table 'Orders' with columns: OrderID (PK), CustomerID, OrderDate, TotalAmount. Queries that filter on OrderDate are slow. They create a nonclustered index on OrderDate. However, after many inserts, the index becomes fragmented and page splits occur frequently. Which action should the DBA take to maintain query performance?

A.Rebuild the index online

B.Drop and recreate the index

C.Add a clustered index on OrderDate

D.Change the index to a clustered columnstore index

AnswerA

Rebuilding the index defragments the index and can be done online to avoid blocking queries.

Why this answer

Option A is correct because rebuilding the index online eliminates fragmentation and page splits without blocking concurrent queries, which is critical for a production order management system. The ALTER INDEX REBUILD operation reorganizes the index B-tree structure, consolidating pages and reducing logical fragmentation, thereby restoring query performance on OrderDate filters.

Exam trap

The trap here is that candidates often confuse index maintenance actions, thinking a drop/recreate is simpler, or they incorrectly assume a clustered index on the filtered column always improves performance, ignoring the impact on write-heavy OLTP workloads.

How to eliminate wrong answers

Option B is wrong because dropping and recreating the index is a heavier operation that requires exclusive locks, causing downtime; it also loses any index metadata or statistics that might be in use, and the same effect can be achieved with a rebuild. Option C is wrong because adding a clustered index on OrderDate would physically reorder the entire table by that column, which could improve range scans but would also slow down inserts due to page splits on the clustered key, and it changes the table's physical structure unnecessarily. Option D is wrong because a clustered columnstore index is designed for large-scale analytical workloads (data warehousing) and is not suitable for an OLTP order management system with frequent inserts and point lookups; it would degrade performance for the typical order queries.

Full explanation →

342

MCQmedium

A company uses Azure SQL Database for an employee management system. The Employees table has 10 million rows and a clustered index on EmployeeID (the primary key). Queries that filter employees by Department and then sort by HireDate are very slow. Which indexing strategy will most improve performance for these queries?

A.Create a nonclustered index on (Department, HireDate) and include the other needed columns as included columns.

B.Create a nonclustered index on (HireDate, Department) with no included columns.

C.Create a clustered index on Department.

D.Drop the existing clustered index and recreate a clustered columnstore index.

AnswerA

This index supports both the filter (Department) and the sort order (HireDate). Using included columns makes it a covering index for the query, eliminating costly lookups to the clustered index.

Why this answer

A nonclustered index on (Department, HireDate) with included columns is optimal because it supports both the WHERE clause filter on Department and the ORDER BY on HireDate as a covering index. The index key order matches the query's filter and sort requirements, allowing SQL Server to perform a single index seek and avoid key lookups by including all needed columns. This eliminates the need to scan the clustered index or sort rows after filtering.

Exam trap

The trap here is that candidates often choose Option B because they think any index on both columns will help, but they overlook that the key column order must match the WHERE clause filter first to enable an efficient seek, not just the sort order.

How to eliminate wrong answers

Option B is wrong because the index key order (HireDate, Department) does not match the query filter on Department first, so SQL Server cannot efficiently seek on Department; it would require scanning or sorting. Option C is wrong because creating a clustered index on Department would reorder the entire table by Department, which is not the primary key and would break the existing clustered index on EmployeeID, likely degrading other queries and not directly optimizing the sort on HireDate. Option D is wrong because a clustered columnstore index is designed for large-scale analytical workloads (data warehousing) and not for point lookups or ordered retrieval in an OLTP employee management system; it would worsen performance for the described query pattern.

Full explanation →

343

MCQhard

A financial services company uses a dedicated SQL pool in Azure Synapse Analytics to run large-scale analytical queries. During peak hours, complex aggregations consume excessive resources, causing slower performance for other users. The company needs to ensure that critical scheduled management reports always receive guaranteed resources and complete within a predictable timeframe, while less important ad-hoc queries do not interfere. Which feature should they implement to manage query resource allocation?

A.Result set caching

B.Columnstore indexes

C.Table distribution

D.Workload management

AnswerD

Workload management in Azure Synapse Analytics includes workload classification and workload groups. It allows administrators to assign queries to different resource classes based on importance, ensuring critical queries get guaranteed resources and isolation from other workloads.

Why this answer

Workload management in Azure Synapse Analytics allows you to classify, assign, and govern resources for queries by using workload groups and importance levels. By configuring workload groups, you can guarantee resources for critical scheduled management reports (e.g., assigning high importance) while limiting or deprioritizing less important ad-hoc queries, ensuring predictable completion times during peak hours.

Exam trap

The trap here is that candidates often confuse performance optimization features (caching, indexing, distribution) with resource governance, assuming any performance improvement feature can solve contention, when only workload management directly controls resource allocation and prioritization.

How to eliminate wrong answers

Option A is wrong because result set caching stores query results in SSD storage to reduce latency for repeated queries, but it does not allocate or guarantee compute resources for specific workloads. Option B is wrong because columnstore indexes improve compression and query performance for analytical workloads by storing data column-wise, but they do not manage resource allocation or prioritization among concurrent queries. Option C is wrong because table distribution (e.g., hash, round-robin, replicated) optimizes data placement across distributions to reduce data movement, but it does not control resource consumption or priority for different query types.

Full explanation →

344

MCQmedium

A data engineering team is designing a modern data warehouse using Azure Synapse Analytics. They want to follow a lakehouse architecture where raw data is stored in its native format and then processed and curated for reporting. Which component in Azure Synapse Analytics is primarily used to store raw data in its original format without requiring a schema?

A.Dedicated SQL pool

B.Serverless SQL pool

C.Synapse Pipeline

D.Data Lake Storage Gen2

AnswerD

Azure Data Lake Storage Gen2 is the underlying storage that holds raw data in its native format, supporting the lakehouse architecture.

Why this answer

Azure Data Lake Storage Gen2 (ADLS Gen2) is the correct component because it provides a scalable, hierarchical file system that can store raw data in any native format (e.g., CSV, JSON, Parquet) without requiring a predefined schema. This aligns with the lakehouse architecture's requirement to ingest and persist raw data as-is before any transformation or curation.

Exam trap

The trap here is that candidates often confuse the role of a Serverless SQL pool (which can query raw data) with the actual storage layer, leading them to incorrectly select Option B instead of recognizing that ADLS Gen2 is the persistent, schema-less storage component.

How to eliminate wrong answers

Option A is wrong because a Dedicated SQL pool is a relational database engine that requires data to be loaded into structured tables with a defined schema, making it unsuitable for storing raw, schema-less data. Option B is wrong because a Serverless SQL pool is a query engine that reads data from external storage (like ADLS Gen2) but does not itself store data; it is used for on-demand querying, not persistent raw storage. Option C is wrong because Synapse Pipeline is an orchestration and data movement service (ETL/ELT) that moves and transforms data between sources and destinations, but it does not provide persistent storage for raw data.

Full explanation →

345

MCQhard

Refer to the exhibit. You create an external table in Azure SQL Database. Which data source is being used?

A.Azure Blob Storage

B.Azure Files

C.Azure SQL Database

D.Azure Data Lake Storage Gen2

AnswerA

The location 'https://mystorageaccount.blob.core.windows.net/container' indicates Blob Storage.

Why this answer

The exhibit shows an external table referencing a data source with the LOCATION set to 'https://mystorage.blob.core.windows.net/...', which is the endpoint for Azure Blob Storage. In Azure SQL Database, external tables are created over external data sources that point to Azure Blob Storage or Azure Data Lake Storage, but the URL format 'blob.core.windows.net' specifically indicates Azure Blob Storage. The CREATE EXTERNAL TABLE statement uses this data source to read data stored as files (e.g., CSV, Parquet) in the blob container.

Exam trap

The trap here is that candidates confuse Azure Blob Storage with Azure Data Lake Storage Gen2 because both can store files, but the endpoint URL (blob.core.windows.net vs. dfs.core.windows.net) is the key differentiator in the exhibit.

How to eliminate wrong answers

Option B is wrong because Azure Files uses the file.core.windows.net endpoint and is accessed via SMB protocol, not the blob.core.windows.net URL shown in the exhibit. Option C is wrong because Azure SQL Database itself cannot be the data source for an external table in the same database; external tables reference external data sources like Blob Storage or Data Lake, not another SQL database. Option D is wrong because Azure Data Lake Storage Gen2 uses the dfs.core.windows.net endpoint (or a blob endpoint with a hierarchical namespace), not the standard blob.core.windows.net URL shown in the exhibit.

Full explanation →

346

MCQhard

A company runs an e-commerce application on Azure SQL Database. The database experiences high transaction volume during business hours (9 AM to 6 PM) but very low activity at night and on weekends. They want to optimize costs by paying only for the compute resources used, while ensuring the database can automatically scale up during peak periods and scale down (or pause) during idle times. Which Azure SQL Database purchasing model and compute tier should they choose?

A.DTU-based purchasing model

B.vCore-based purchasing model with provisioned compute tier

C.vCore-based purchasing model with serverless compute tier

D.vCore-based purchasing model with Hyperscale service tier

AnswerC

Serverless automatically scales compute based on load and pauses the database during inactivity, charging only for the compute used. This perfectly matches the requirement to minimize costs during low-usage periods while handling peak traffic.

Why this answer

The vCore-based purchasing model with serverless compute tier is correct because it automatically scales compute resources based on workload demand and can pause during idle periods, charging only for consumed compute and storage. This matches the requirement of high transaction volume during business hours and low activity at night/weekends, optimizing costs by eliminating charges for unused compute capacity.

Exam trap

The trap here is that candidates often confuse the Hyperscale service tier with serverless, but Hyperscale focuses on storage scalability and fast recovery, not compute auto-scaling or pausing, making it unsuitable for cost optimization during idle periods.

How to eliminate wrong answers

Option A is wrong because the DTU-based purchasing model uses a fixed bundle of compute, storage, and I/O resources, which cannot automatically scale up/down or pause based on demand, leading to over-provisioning during idle times. Option B is wrong because the vCore-based provisioned compute tier requires a fixed number of vCores allocated continuously, even during low-activity periods, and does not support auto-scaling or pausing, resulting in higher costs. Option D is wrong because the vCore-based Hyperscale service tier is designed for large databases with high scalability and fast recovery, not for cost optimization through auto-scaling and pausing; it uses provisioned compute and does not offer serverless capabilities.

Full explanation →

347

Multi-Selectmedium

Which THREE are characteristics of structured data? (Choose three.)

Select 3 answers

A.Has a predefined schema

B.Consists of audio and video files

C.Uses JSON or XML format

D.Stored in relational databases

E.Organized in rows and columns

AnswersA, D, E

Schema is defined before data is stored.

Why this answer

Structured data has a predefined schema, meaning the data types, relationships, and constraints are defined before data is entered. This schema ensures consistency and enables efficient querying, which is why relational databases enforce a fixed schema through table definitions and constraints like primary keys and foreign keys.

Exam trap

The trap here is that candidates confuse semi-structured formats like JSON and XML with structured data, but structured data requires a rigid schema enforced by the database, not just a self-describing format.

Full explanation →

348

MCQmedium

You need to store telemetry data from millions of devices. Each record includes a device ID, timestamp, and metric value. The data will be queried by device ID and time range. Which Azure data store is best suited for this scenario?

A.Azure Data Explorer

B.Azure SQL Database

C.Azure Storage Queues

D.Azure Cosmos DB

AnswerA

ADX is built for high-performance time-series data ingestion and querying, making it ideal for telemetry.

Why this answer

Azure Data Explorer (ADX) is optimized for time-series data and can ingest high volumes of telemetry, with fast queries on time ranges and device IDs. Azure Cosmos DB is good for real-time apps but less efficient for large-scale time-series analytics. Azure SQL Database is relational and may not scale as well.

Azure Storage Queues are for messaging, not storage/query.

Full explanation →

349

MCQhard

A global e-commerce company uses Azure Cosmos DB with multiple write regions to handle high traffic from users worldwide. For their order processing system, they must guarantee that once an order is recorded, all subsequent reads from any region see the most up-to-date order status. However, they also need low write latency globally. Which configuration should they choose to meet these requirements?

A.Use multi-region writes with strong consistency

B.Use single-region writes with strong consistency

C.Use multi-region writes with bounded staleness consistency

D.Use single-region writes with eventual consistency and implement application-level conflict resolution

AnswerB

Correct. Strong consistency provides immediate global consistency, but it requires a single write region. This trade-off meets the guarantee at the cost of slightly higher write latency for remote users.

Why this answer

Strong consistency with single-region writes ensures that all reads in any region return the most recent write, because Cosmos DB replicates writes synchronously to all regions when strong consistency is configured. This guarantees linearizability: once an order is committed, every subsequent read sees that update. Single-region writes avoid the conflict-resolution overhead of multi-region writes while still providing low write latency within the primary region, and reads from secondary regions are served from locally replicated data that is kept fully consistent.

Exam trap

The trap here is that candidates assume multi-region writes are needed for global low-latency writes, but they overlook that strong consistency cannot be combined with multi-region writes, and that single-region writes with strong consistency still provide low write latency in the primary region while guaranteeing immediate read freshness across all regions.

How to eliminate wrong answers

Option A is wrong because multi-region writes with strong consistency is not supported in Azure Cosmos DB; strong consistency can only be used with a single write region. Option C is wrong because bounded staleness consistency allows reads to lag behind writes by a configurable interval (e.g., 100,000 operations or 5 seconds), which violates the requirement that all subsequent reads see the most up-to-date order status immediately. Option D is wrong because eventual consistency does not guarantee that reads return the latest write; it only guarantees that replicas will converge over time, and application-level conflict resolution cannot enforce immediate global read freshness.

Full explanation →

350

MCQhard

Refer to the exhibit. A team is deploying an Azure Storage container using an ARM template. The template sets publicAccess to 'None'. However, after deployment, users report they cannot access data even with a valid SAS token. What is the most likely cause?

A.The container disables SAS tokens by default

B.The storage account firewall is blocking all traffic

C.The storage account requires RBAC permissions, not SAS

D.The SAS token was generated with insufficient permissions

AnswerD

A valid SAS token with proper permissions should work regardless of publicAccess='None'.

Why this answer

The ARM template sets publicAccess to 'None', which only disables anonymous public access to the container. It does not affect SAS token access. The most likely cause is that the SAS token was generated with insufficient permissions (e.g., missing read, list, or write permissions) or with a restricted scope (e.g., limited to a specific blob or with a short expiry), preventing users from accessing the data even though the container is private.

Exam trap

The trap here is that candidates often confuse 'publicAccess = None' with disabling all forms of access, including SAS tokens, but in reality, SAS tokens are a separate authorization mechanism that remains functional on private containers.

How to eliminate wrong answers

Option A is wrong because Azure Storage containers do not have a setting to disable SAS tokens by default; SAS tokens are always available as a delegation mechanism unless explicitly revoked via a stored access policy or firewall rules. Option B is wrong because the storage account firewall blocking all traffic would prevent all access, including SAS token access, but the question states users cannot access data 'even with a valid SAS token', implying the firewall is not the issue (if it were, the SAS token would also be blocked). Option C is wrong because RBAC permissions are not required for SAS token access; SAS tokens provide delegated access independent of RBAC, and the storage account does not enforce RBAC-only access unless configured with Azure AD authentication and disabling shared key access.

Full explanation →

351

Multi-Selecthard

Your company is designing a big data analytics solution on Azure. The solution must ingest streaming data from IoT devices, store the data in its raw format, and then use a distributed processing engine to transform the data before loading it into a serving layer for reporting. Which TWO Azure services should you include in the design?

Select 2 answers

A.Azure Blob Storage

B.Azure Event Hubs

C.Azure Databricks

D.Azure Data Factory

E.Azure Synapse Analytics

AnswersB, C

Event Hubs ingests streaming data from IoT devices.

Why this answer

Azure Event Hubs is a fully managed, real-time data ingestion service that can ingest millions of events per second from IoT devices. It is the correct choice for streaming data ingestion because it supports protocols like AMQP and HTTPS, and it decouples event producers from consumers, allowing the raw data to be captured and stored before processing.

Exam trap

The trap here is that candidates often confuse Azure Event Hubs with Azure Blob Storage or Azure Data Factory for streaming ingestion, mistakenly thinking that any storage or ETL service can handle real-time IoT data, when in fact only a dedicated event ingestion service like Event Hubs provides the necessary throughput, partitioning, and protocol support for streaming workloads.

Full explanation →

352

MCQeasy

A company stores customer information in a SQL database table with columns: CustomerID, FirstName, LastName, Email, SignupDate. They also store product images as JPEG files in Azure Blob Storage. Which statement correctly describes the types of data involved?

A.Customer data is unstructured, product images are semi-structured.

B.Customer data is structured, product images are unstructured.

C.Both are structured.

D.Customer data is semi-structured, product images are unstructured.

AnswerB

Customer data is stored in a relational table with a fixed schema, making it structured. JPEG images have no inherent structure, making them unstructured.

Why this answer

Customer data stored in a SQL database table with defined columns (CustomerID, FirstName, LastName, Email, SignupDate) is structured because it adheres to a fixed schema with rows and columns. Product images stored as JPEG files in Azure Blob Storage are unstructured because they lack a predefined data model and are stored as binary large objects (BLOBs) without a schema. Option B correctly identifies this distinction.

Exam trap

The trap here is confusing 'unstructured' with 'semi-structured' — candidates often misclassify JPEG images as semi-structured because they have metadata (e.g., EXIF), but the data itself (pixel values) has no schema, making it unstructured, while semi-structured data like JSON has a self-describing structure.

How to eliminate wrong answers

Option A is wrong because customer data in a SQL table is structured, not unstructured, and product images are unstructured, not semi-structured. Option C is wrong because product images are unstructured, not structured; only the customer data is structured. Option D is wrong because customer data is structured, not semi-structured; semi-structured data (e.g., JSON, XML) has tags or markers but no rigid schema, whereas a SQL table has a fixed schema.

Full explanation →

353

MCQmedium

A data analyst needs to query large datasets stored as Parquet files in Azure Data Lake Storage Gen2. The queries are ad-hoc and infrequent. The analyst wants to run SQL queries directly on the data without creating any storage or compute infrastructure, and only pay for the amount of data processed. They also need to create T-SQL views to simplify queries for Power BI reports. Which Azure service should they use?

A.Azure SQL Database

B.Azure Synapse Serverless SQL pool

C.Azure HDInsight with Spark

D.Azure Databricks

AnswerB

Correct. It is a serverless query service that can directly query Parquet files in ADLS Gen2 using T-SQL, charges per query, and supports views.

Why this answer

Azure Synapse Serverless SQL pool is the correct choice because it allows querying data directly from Azure Data Lake Storage Gen2 using T-SQL without provisioning any compute or storage infrastructure. It supports ad-hoc, infrequent queries with a pay-per-query pricing model based on the amount of data processed, and it enables the creation of T-SQL views that can be used directly by Power BI for reporting.

Exam trap

The trap here is that candidates often confuse 'serverless' with 'no infrastructure at all' and incorrectly choose Azure SQL Database or HDInsight, failing to recognize that Synapse Serverless SQL pool specifically provides a T-SQL interface over data lake storage with a pay-per-query model, while the other options require persistent compute resources.

How to eliminate wrong answers

Option A is wrong because Azure SQL Database requires provisioning a dedicated database with pre-allocated compute and storage resources, incurring ongoing costs regardless of query activity, and it does not directly query Parquet files in Data Lake Storage Gen2 without data movement. Option C is wrong because Azure HDInsight with Spark requires creating and managing a cluster (compute infrastructure) and is billed by the hour for running nodes, not per-query data processed, and it does not natively support T-SQL views for Power BI. Option D is wrong because Azure Databricks requires an active cluster (compute infrastructure) and is billed per DBU (Databricks Unit) based on cluster runtime, not per-query data scanned, and it uses Spark SQL or Python rather than T-SQL for view creation.

Full explanation →

354

MCQhard

A company uses Azure Synapse Analytics dedicated SQL pool to store sales data. The fact table contains billions of rows and is hash-distributed on ProductID. Queries aggregate sales by store and product for the current month and join with a small Store dimension table (10,000 rows) and a medium-sized Product dimension table (500,000 rows). The queries are slow due to data movement during joins. Which design change will most reduce data movement and improve query performance?

A.Change the fact table to round-robin distribution.

B.Replicate the Store dimension table and the Product dimension table.

C.Change the hash distribution key of the fact table to StoreID.

D.Implement a clustered columnstore index on the fact table.

AnswerB

Correct. Replicating small dimension tables across all distributions eliminates data movement during joins, as each distribution already has the full dimension data.

Why this answer

Replicating the Store and Product dimension tables across all compute nodes eliminates the need to shuffle data during joins with the large fact table. Since both dimension tables are small enough to fit in memory on each node (10,000 and 500,000 rows), replication avoids costly data movement and significantly improves query performance for aggregations that join on multiple dimensions.

Exam trap

The trap here is that candidates often focus on indexing or distribution key changes (like C or D) without recognizing that data movement during joins is the root cause, and that replicating small dimension tables is the most direct solution to eliminate that movement.

How to eliminate wrong answers

Option A is wrong because changing the fact table to round-robin distribution would distribute rows randomly, forcing full data movement for every join and aggregation, which would worsen performance. Option C is wrong because changing the hash distribution key to StoreID would only optimize joins on StoreID, but the queries also join on ProductID, and the fact table is already hash-distributed on ProductID; changing the key would not eliminate data movement for the Product dimension join and could even increase it. Option D is wrong because implementing a clustered columnstore index improves compression and scan performance but does not reduce data movement during joins; the bottleneck here is data shuffling, not storage or I/O.

Full explanation →

355

Multi-Selecteasy

Which TWO of the following Azure services are categorized as Platform as a Service (PaaS) for data storage?

Select 2 answers

A.Azure Cosmos DB

B.Azure Synapse Analytics dedicated SQL pool

C.Azure Data Lake Storage Gen2

D.Azure SQL Database

E.Azure Virtual Machines with SQL Server

AnswersA, D

Azure Cosmos DB is a fully managed NoSQL PaaS database.

Why this answer

Azure Cosmos DB is a fully managed NoSQL database service that provides turnkey global distribution, elastic scaling, and multi-model support (document, key-value, graph, column-family). As a PaaS offering, it abstracts infrastructure management—such as hardware provisioning, patching, and replication—allowing developers to focus on data modeling and application logic.

Exam trap

The trap here is that candidates often confuse Azure Data Lake Storage Gen2 (which is IaaS-level object storage) with a managed database PaaS, or they mistakenly think Azure Synapse Analytics dedicated SQL pool is a primary data storage service rather than an analytics engine that typically queries data stored elsewhere.

Full explanation →

356

Multi-Selectmedium

Which TWO Azure services can be used to perform real-time data ingestion and processing? (Choose two.)

Select 2 answers

A.Azure SQL Database

B.Azure Event Hubs

C.Azure Blob Storage

D.Azure Data Factory

E.Azure Stream Analytics

AnswersB, E

Ingests real-time data streams.

Why this answer

Azure Event Hubs is a fully managed, real-time data ingestion service that can ingest millions of events per second from any source, using AMQP, HTTPS, or Apache Kafka protocol. It is designed for high-throughput, low-latency event streaming, making it ideal for real-time data ingestion and processing pipelines.

Exam trap

The trap here is that candidates often confuse batch processing services like Azure Data Factory or storage services like Blob Storage with real-time ingestion, forgetting that real-time requires event-driven, low-latency ingestion and processing capabilities.

Full explanation →

357

MCQhard

A healthcare organization stores patient records in Azure SQL Database. To comply with HIPAA, they need to encrypt sensitive columns like Social Security Numbers (SSNs) at rest and ensure that only authorized users can decrypt them. Which feature should they implement?

A.Always Encrypted

B.Transparent Data Encryption (TDE)

C.Dynamic Data Masking

D.Row-Level Security

AnswerA

Always Encrypted encrypts sensitive columns at the client side, keeping data encrypted in the database.

Why this answer

Option C is correct because Always Encrypted allows client-side encryption of sensitive columns, ensuring only authorized clients can decrypt. Option A is wrong because TDE encrypts the entire database at rest, not individual columns. Option B is wrong because Dynamic Data Masking obfuscates data but does not encrypt it.

Option D is wrong because Row-Level Security restricts row access but does not encrypt columns.

Full explanation →

358

MCQmedium

A company runs a customer-facing web application that uses an Azure SQL Database. The database experiences highly variable workloads: high traffic during business hours and low traffic at night and on weekends. The company wants to pay only for the compute resources consumed and automatically scale compute capacity based on demand, while maintaining the ability to pause during inactivity. Which Azure SQL Database service tier should they choose?

A.Hyperscale

B.Serverless

C.Provisioned (General Purpose)

D.Business Critical

AnswerB

Serverless automatically scales compute based on demand and can pause when inactive, billing only for the vCores used per second. This is ideal for intermittent, variable workloads.

Why this answer

The Serverless tier is designed for workloads with variable traffic and idle periods, as it automatically scales compute resources based on demand and can pause the database during inactivity, charging only for consumed compute and storage. This matches the requirement to pay only for resources used and to pause when there is no traffic, such as at night and weekends.

Exam trap

The trap here is that candidates may confuse the Serverless tier's auto-scaling and pausing with the Hyperscale tier's storage scalability, but Hyperscale does not support compute pausing and is designed for continuous high-throughput workloads, not variable demand with idle periods.

How to eliminate wrong answers

Option A is wrong because Hyperscale is built for very large databases (up to 100 TB) with high throughput and fast scaling of storage, but it does not support auto-pausing and is not cost-effective for variable workloads with idle periods. Option C is wrong because Provisioned (General Purpose) uses a fixed compute size that must be manually scaled, and it cannot pause automatically, so you pay for allocated resources even during low traffic. Option D is wrong because Business Critical is a high-performance tier with low latency and built-in replicas, but it also uses fixed provisioning without auto-pausing, leading to higher costs during idle times.

Full explanation →

359

MCQmedium

A company runs an e-commerce application on Azure SQL Database. The Orders table has millions of rows. Queries that filter on CustomerID and order by OrderDate DESC are slow. The table currently has a clustered index on OrderID (the primary key). Which indexing strategy will most improve performance for these queries?

A.Create a nonclustered index on CustomerID including OrderDate.

B.Create a nonclustered index on (CustomerID, OrderDate DESC) with included columns for other needed columns.

C.Rebuild the clustered index to be on CustomerID.

D.Create a nonclustered index on OrderDate.

AnswerB

This covering index includes both the filter column and the sort column in the correct order, allowing efficient index seek and eliminating the need for a separate sort operation.

Why this answer

Option B is correct because creating a nonclustered index on (CustomerID, OrderDate DESC) with included columns allows the query to filter on CustomerID and sort by OrderDate in descending order using a single index seek, avoiding a sort operation. This leverages the index's key order to directly return rows in the desired order, which is critical for performance on large tables in Azure SQL Database.

Exam trap

The trap here is that candidates often think including a column in an index (as an included column) is sufficient for sorting, but only key columns determine the physical order of rows in the index, so a nonclustered index with OrderDate as an included column cannot eliminate the need for a sort operation.

How to eliminate wrong answers

Option A is wrong because including OrderDate as an included column does not make it part of the index key, so the index cannot provide sorted output for OrderDate DESC; the database would still need to perform a sort after filtering on CustomerID. Option C is wrong because rebuilding the clustered index on CustomerID would force the table to be physically ordered by CustomerID, which may help filtering but would not efficiently support ordering by OrderDate DESC, and it could degrade other queries that rely on the primary key OrderID. Option D is wrong because a nonclustered index on OrderDate alone does not support filtering on CustomerID, requiring a full scan or key lookup for each row, which is inefficient for millions of rows.

Full explanation →

360

MCQmedium

A logistics company needs to analyze GPS data from delivery trucks in real time to detect delays and reroute deliveries. The GPS data is streamed into Azure Event Hubs. They also need to combine this live data with static route information stored in Azure SQL Database. Which Azure service should they use for the real-time processing?

A.Azure Data Factory

B.Azure Stream Analytics

C.Azure Analysis Services

D.Azure Synapse Pipelines

AnswerB

Correct. Azure Stream Analytics is a real-time analytics service that can process streaming data from Event Hubs and combine it with reference data from SQL Database to detect patterns and trigger actions.

Why this answer

Azure Stream Analytics is the correct choice because it is designed for real-time stream processing, capable of ingesting data from Azure Event Hubs, applying SQL-like queries to detect delays, and joining live streams with static reference data from Azure SQL Database. This enables the logistics company to analyze GPS data as it arrives and trigger rerouting actions with minimal latency.

Exam trap

The trap here is that candidates often confuse Azure Stream Analytics with Azure Data Factory or Synapse Pipelines, mistakenly thinking that any data integration or pipeline service can handle real-time streaming, when in fact those services are batch-oriented and lack the low-latency stream processing engine required for live GPS analysis.

How to eliminate wrong answers

Option A is wrong because Azure Data Factory is a cloud-based ETL and data orchestration service for batch data movement and transformation, not for real-time stream processing. Option C is wrong because Azure Analysis Services is an analytical engine for building semantic models and performing OLAP queries on historical data, not for processing live streaming data. Option D is wrong because Azure Synapse Pipelines (part of Azure Synapse Analytics) are primarily used for orchestrating batch data integration and ETL workflows, not for real-time stream analytics.

Full explanation →

361

MCQeasy

A company uses Azure Data Lake Storage Gen2 to store raw data files. Data engineers need to transform this data using a serverless approach without managing infrastructure. Which Azure service should they use?

A.Azure Data Factory with mapping data flows.

B.Azure HDInsight with Apache Spark clusters.

C.Azure Synapse Analytics with serverless SQL pool.

D.Azure Databricks with serverless SQL warehouses.

AnswerD

Serverless SQL warehouses provide on-demand compute without infrastructure management.

Why this answer

Option D is correct because Azure Databricks with serverless SQL warehouses provides a fully serverless compute option for transforming data stored in Azure Data Lake Storage Gen2. It eliminates infrastructure management by automatically scaling compute resources based on workload demands, making it ideal for ad-hoc transformations without provisioning clusters.

Exam trap

The trap here is that candidates often confuse 'serverless SQL pool' in Synapse with a general-purpose transformation tool, but it is limited to SQL-based queries and lacks the flexibility of Databricks for complex, multi-step data engineering pipelines.

How to eliminate wrong answers

Option A is wrong because Azure Data Factory with mapping data flows is a code-free data transformation service, but it is not fully serverless; it requires an Azure Integration Runtime (IR) that runs on managed virtual machines, which still involves some infrastructure overhead and is not a pure serverless approach. Option B is wrong because Azure HDInsight with Apache Spark clusters requires explicit provisioning and management of cluster nodes, including sizing, scaling, and patching, which contradicts the serverless requirement. Option C is wrong because Azure Synapse Analytics with serverless SQL pool is serverless for querying data in the lake, but it is designed for on-demand SQL queries over Parquet, CSV, or JSON files, not for general-purpose data transformations like complex ETL pipelines that Databricks handles natively.

Full explanation →

362

MCQeasy

A startup is building a new application and needs a relational database that supports JSON data, automatic scaling, and a serverless compute tier to minimize costs during low usage periods. Which Azure data service should they choose?

A.Azure Database for PostgreSQL serverless

B.Azure Cosmos DB

C.Azure SQL Database serverless tier

D.Azure SQL Managed Instance

AnswerC

Azure SQL Database serverless supports JSON, auto-scaling, and pauses when idle.

Why this answer

Option C is correct because Azure SQL Database serverless tier supports JSON, automatic scaling, and pauses during inactivity. Option A is wrong because Azure Cosmos DB is NoSQL, not relational. Option B is wrong because Azure SQL Managed Instance does not have a serverless tier.

Option D is wrong because Azure Database for PostgreSQL serverless does not support JSON as naturally as SQL Server.

Full explanation →

363

MCQmedium

A manufacturing company uses IoT sensors to collect temperature and vibration data from machinery. They need to analyze the streaming data in real time to detect anomalies and trigger alerts. Additionally, they need to run complex historical queries on months of sensor data to identify equipment failure patterns. They want a single Azure service that can handle both real-time stream processing and large-scale batch analytics using a unified query language, minimizing the need for separate technologies. Which Azure service should they use?

A.Azure Stream Analytics

B.Azure Data Lake Storage Gen2

C.Azure Synapse Analytics

D.Azure HDInsight

AnswerC

Azure Synapse Analytics is a unified analytics platform that supports both real-time stream processing (via pipelines and Spark streaming) and batch analytics with T-SQL and Apache Spark, all within a single service. It provides a unified query language and minimizes architecture complexity.

Why this answer

Azure Synapse Analytics is the correct choice because it provides a unified platform that combines both real-time stream processing (via Synapse Pipelines and Spark Streaming) and large-scale batch analytics (via Synapse SQL and Spark) using a single query language (T-SQL or Spark SQL). This eliminates the need for separate technologies, directly addressing the requirement for a single service to handle both streaming and historical batch analysis.

Exam trap

The trap here is that candidates often confuse Azure Stream Analytics as a one-stop solution for both streaming and batch, overlooking its lack of native batch analytics capabilities, while Azure Synapse Analytics is designed specifically to unify these workloads.

How to eliminate wrong answers

Option A is wrong because Azure Stream Analytics is a dedicated real-time stream processing service that lacks native support for large-scale batch analytics on historical data; it primarily processes streaming data with limited historical query capabilities. Option B is wrong because Azure Data Lake Storage Gen2 is a storage service, not a compute or analytics engine, so it cannot perform real-time stream processing or batch analytics on its own. Option D is wrong because Azure HDInsight requires separate cluster configurations for stream processing (e.g., Kafka, Storm) and batch analytics (e.g., Spark, Hive), and does not offer a unified query language across both workloads, increasing complexity.

Full explanation →

364

MCQmedium

A company uses Azure Cosmos DB with the Cassandra API for a time-series telemetry application. The data model uses a composite primary key (device_id, timestamp). The application queries telemetry for a specific device within a time range. Recently, query performance has degraded as data volume grows. The RU consumption per query is high. Which action should they take to improve performance and reduce RU cost?

A.Change the partition key to timestamp.

B.Increase the provisioned RU/s on the container.

C.Add a secondary index on the timestamp column.

D.Change the clustering order to descending on timestamp.

AnswerD

Descending order makes time-range queries efficient by reading only relevant rows.

Why this answer

Changing the clustering order to descending on timestamp (option C) allows the most recent data to be retrieved first without scanning all rows. Option A (increasing RU) does not improve efficiency. Option B (adding a secondary index) may increase RU.

Option D (changing partition key to timestamp) would scatter device data across partitions, hurting queries by device.

Full explanation →

365

MCQhard

You are implementing a data pipeline that ingests millions of events per second from IoT devices. The pipeline must tolerate failures and guarantee exactly-once processing. Which Azure service should you use to ingest the events?

A.Azure IoT Hub

B.Azure Event Hubs

C.Azure Service Bus

D.Azure Queue Storage

AnswerB

Event Hubs can ingest millions of events per second, supports checkpointing for exactly-once processing, and provides at-least-once delivery with idempotent consumers.

Why this answer

Azure Event Hubs is the correct choice because it is a big data streaming platform and event ingestion service designed for high-throughput scenarios, capable of ingesting millions of events per second. It supports exactly-once processing through checkpointing and partition-based offset management, and its built-in replication and availability zones provide fault tolerance.

Exam trap

The trap here is that candidates confuse Azure IoT Hub with Event Hubs because both handle IoT data, but IoT Hub is for device management and control, not for high-throughput event ingestion with exactly-once guarantees.

How to eliminate wrong answers

Option A is wrong because Azure IoT Hub is optimized for device management and bi-directional communication with IoT devices, not for high-throughput event ingestion at millions of events per second; it has lower throughput limits and is not designed for exactly-once processing at that scale. Option C is wrong because Azure Service Bus is a message broker for enterprise messaging with features like topics and queues, but it is not built for high-throughput event streaming and has lower throughput ceilings, making it unsuitable for millions of events per second. Option D is wrong because Azure Queue Storage is a simple message queue for decoupling application components with at-least-once delivery semantics and limited throughput, not supporting exactly-once processing or the high ingestion rates required.

Full explanation →

366

MCQmedium

Refer to the exhibit. A data engineer needs to query the orders.csv file using Azure Synapse Serverless SQL. What is the most efficient way to access this data?

A.Use PolyBase to create external table

B.Use OPENROWSET in Serverless SQL

C.Copy data to Azure SQL Database using ADF

D.Load data into a dedicated SQL pool

AnswerB

OPENROWSET can query files directly without loading.

Why this answer

Azure Synapse Serverless SQL is designed for on-demand querying of data stored in data lakes without provisioning storage. The OPENROWSET function with the BULK option allows direct querying of CSV files using T-SQL, making it the most efficient method for ad-hoc analysis of the orders.csv file without data movement or schema management.

Exam trap

The trap here is that candidates often confuse PolyBase (which is for dedicated SQL pools) with Serverless SQL's OPENROWSET, or assume that data must be moved to a database before querying, missing the serverless paradigm of query-in-place.

How to eliminate wrong answers

Option A is wrong because PolyBase is used to create external tables in dedicated SQL pools, not in Serverless SQL, and requires defining external data sources and file formats, adding unnecessary overhead for a simple query. Option C is wrong because copying data to Azure SQL Database using ADF involves data movement and additional costs, which is inefficient for a one-time or ad-hoc query. Option D is wrong because loading data into a dedicated SQL pool requires provisioning and managing a dedicated resource, which is overkill and costly for querying a single CSV file.

Full explanation →

367

MCQmedium

A company wants to build a modern data warehouse using a lakehouse architecture. They need to store raw data in its native format (e.g., CSV, JSON, Parquet) and also support BI reporting on curated, transformed data. They want to use a single storage layer for both raw and curated data. Which Azure service should they use as the core storage layer?

A.Azure SQL Database

B.Azure Synapse Analytics

C.Azure Data Lake Storage Gen2

D.Azure Cosmos DB

AnswerC

ADLS Gen2 offers a hierarchical namespace and can store both raw and curated data in any format, making it the ideal storage layer for a lakehouse architecture.

Why this answer

Azure Data Lake Storage Gen2 (ADLS Gen2) is the correct choice because it provides a single, unified storage layer that can store raw data in its native format (CSV, JSON, Parquet) in a hierarchical namespace, while also serving as the foundation for curated, transformed data used in BI reporting. It combines the scalability and cost-effectiveness of Azure Blob Storage with the file system semantics and ACLs needed for analytics workloads, making it the ideal core storage layer for a lakehouse architecture.

Exam trap

The trap here is that candidates confuse Azure Synapse Analytics (a compute/query service) with a storage layer, when the question explicitly asks for the 'core storage layer' that holds both raw and curated data, which is ADLS Gen2.

How to eliminate wrong answers

Option A is wrong because Azure SQL Database is a relational database service designed for structured, transactional data and cannot natively store raw files like CSV or JSON in their native format without schema-on-write, nor does it support a lakehouse architecture. Option B is wrong because Azure Synapse Analytics is an analytics service that provides querying and processing capabilities (e.g., serverless SQL, Spark pools), but it is not a storage layer; it relies on ADLS Gen2 or other storage for the actual data. Option D is wrong because Azure Cosmos DB is a NoSQL database optimized for globally distributed, low-latency transactional workloads and is not designed for storing large volumes of raw files in native formats or for lakehouse-style analytics.

Full explanation →

368

MCQeasy

A healthcare organization stores patient records in a relational database table with fixed columns for PatientID, Name, and DateOfBirth. Additionally, they store clinical notes as free-form text files for each patient visit. Which statement correctly classifies these data types?

A.Both patient records and clinical notes are examples of unstructured data.

B.Patient records are structured data, and clinical notes are unstructured data.

C.Both patient records and clinical notes are examples of structured data.

D.Patient records are unstructured data, and clinical notes are semi-structured data.

AnswerB

Patient records have defined columns (structured), while clinical notes are free-form text (unstructured).

Why this answer

Patient records stored in a relational database table with fixed columns (PatientID, Name, DateOfBirth) conform to a predefined schema, making them structured data. Clinical notes stored as free-form text files lack a fixed schema or organization, which classifies them as unstructured data. Option B correctly identifies this distinction.

Exam trap

The trap here is that candidates confuse 'free-form text' with semi-structured data (e.g., JSON or XML), but semi-structured data has tags or key-value pairs, whereas free-form text has no inherent structure at all.

How to eliminate wrong answers

Option A is wrong because patient records in a relational table with fixed columns are structured data, not unstructured. Option C is wrong because clinical notes as free-form text files have no predefined schema, so they are unstructured, not structured. Option D is wrong because patient records are structured (not unstructured), and clinical notes are unstructured (not semi-structured, as they lack tags or metadata that would make them semi-structured like JSON or XML).

Full explanation →

369

MCQeasy

A company processes sales transactions in real-time from a retail website. Each transaction is recorded as a row in a relational database. Additionally, the company stores weekly sales reports as PDF files. Which statement correctly describes these data types?

A.Transactions are unstructured, reports are semi-structured.

B.Transactions are structured, reports are unstructured.

C.Both are structured because they are files.

D.Both are unstructured because they are digital.

AnswerB

Correct. Transactions have a rigid schema (structured), and PDF files lack a predefined schema (unstructured).

Why this answer

Transactions are structured because they are stored as rows in a relational database, which imposes a fixed schema with defined columns and data types. Weekly sales reports as PDF files are unstructured because they lack a predefined data model and cannot be easily queried using SQL without additional processing. Option B correctly identifies this distinction.

Exam trap

The trap here is that candidates confuse 'file format' with 'data structure', assuming all files are structured, when in fact PDFs are unstructured binary files that lack the row/column schema of relational data.

How to eliminate wrong answers

Option A is wrong because it reverses the definitions: transactions are structured (not unstructured) and reports are unstructured (not semi-structured). Option C is wrong because not all files are structured; PDF files are binary blobs without a row/column schema, unlike relational database tables. Option D is wrong because being digital does not imply unstructured; structured data like relational tables is also digital but has a rigid schema.

Full explanation →

370

MCQeasy

You need to store JSON documents that are frequently read and written by a web application. Each document is about 10 KB. The solution must provide low-latency access globally. Which Azure data store should you use?

A.Azure Cosmos DB

B.Azure Blob Storage

C.Azure SQL Database

D.Azure Table Storage

AnswerA

Cosmos DB is a globally distributed NoSQL database for JSON.

Why this answer

Azure Cosmos DB is a globally distributed NoSQL database that supports JSON documents and provides low-latency access. Option A is wrong because Blob Storage is object storage, not optimized for individual document reads/writes. Option B is wrong because Table Storage is for key-value data.

Option D is wrong because SQL Database is relational.

Full explanation →

371

MCQmedium

A company stores historical sensor data in Azure Blob Storage. The data is accessed only a few times per year for compliance audits, but when requested, it must be available for reading within 15 minutes. The company wants to minimize storage costs. Which blob access tier should they use?

A.Hot

B.Cool

C.Archive

D.Premium

AnswerB

Correct. The Cool tier provides low storage cost for infrequently accessed data while supporting retrieval within minutes, meeting the 15-minute requirement.

Why this answer

The Cool tier is the optimal choice because it balances low storage cost with the ability to retrieve data within minutes, meeting the 15-minute availability requirement. Archive would incur a retrieval delay of up to 15 hours, which violates the compliance audit SLA. Hot and Premium tiers are more expensive and unnecessary for data accessed only a few times per year.

Exam trap

The trap here is that candidates often choose Archive for its lowest storage cost without considering the mandatory rehydration delay, which can take up to 15 hours and violates the 15-minute availability requirement.

How to eliminate wrong answers

Option A (Hot) is wrong because it is designed for frequently accessed data and has higher storage costs, making it cost-inefficient for data accessed only a few times per year. Option C (Archive) is wrong because it has the lowest storage cost but requires a rehydration process that can take up to 15 hours, exceeding the 15-minute availability requirement. Option D (Premium) is wrong because it is optimized for low-latency, high-transaction workloads (e.g., Azure Virtual Machine disks) and incurs the highest storage cost, which is unnecessary for infrequent compliance audits.

Full explanation →

372

MCQmedium

You need to choose a data store for a mobile app that requires real-time synchronization of user preferences across devices. The data is small per user and key-value oriented. Which Azure service is most appropriate?

A.Azure Cosmos DB

B.Azure Cache for Redis

C.Azure Blob Storage

D.Azure SQL Database

AnswerA

Cosmos DB offers low latency, global distribution, and key-value API.

Why this answer

Azure Cosmos DB is the most appropriate choice because it provides global distribution, low-latency reads and writes, and automatic conflict resolution, which are essential for real-time synchronization of user preferences across devices. Its key-value API (e.g., Table API or Core SQL API with a simple partition key) efficiently handles small, per-user data with a key-value orientation, ensuring that changes made on one device are quickly reflected on others.

Exam trap

The trap here is that candidates often confuse Azure Cache for Redis as a primary data store for persistent, synchronized user preferences, overlooking its transient nature and lack of built-in conflict resolution for multi-device scenarios.

How to eliminate wrong answers

Option B (Azure Cache for Redis) is wrong because it is an in-memory cache designed for temporary, volatile data with limited persistence options; it does not provide built-in conflict resolution or durable, globally distributed synchronization for user preferences that must persist across sessions. Option C (Azure Blob Storage) is wrong because it is optimized for large, unstructured binary objects (e.g., images, videos) and lacks the low-latency, key-value access patterns and real-time sync capabilities needed for small, frequently updated user preferences. Option D (Azure SQL Database) is wrong because it is a relational database that requires a fixed schema and is not optimized for simple key-value workloads; its overhead and lack of native conflict resolution make it unsuitable for real-time synchronization of small, per-user key-value data.

Full explanation →

373

MCQmedium

A social networking application stores user profiles as JSON documents. Each profile can have different fields (e.g., education, work history, interests) depending on what the user fills in. The application also needs to traverse friend connections as a graph to recommend new friends. The development team wants to use a single Azure Cosmos DB account for both workloads. Which combination of Azure Cosmos DB APIs should they choose?

A.SQL API for profiles and Gremlin API for graph

B.MongoDB API for profiles and SQL API for graph

C.Gremlin API for both profiles and graph

D.Table API for profiles and Gremlin API for graph

AnswerA

Correct. The SQL API provides flexible querying of JSON documents, and the Gremlin API enables graph traversal. Both can coexist in a single Cosmos DB account using separate containers.

Why this answer

The SQL API (formerly DocumentDB API) is optimized for storing and querying JSON documents with flexible schemas, making it ideal for user profiles with varying fields. The Gremlin API is designed specifically for graph traversal queries, which is required for recommending friends based on friend connections. Using both APIs on the same Cosmos DB account allows the application to handle both workloads efficiently within a single service.

Exam trap

The trap here is that candidates assume the Gremlin API can handle both document and graph workloads because it stores properties as JSON-like data, but they overlook that it lacks the flexible querying and indexing capabilities of the SQL API for unstructured documents.

How to eliminate wrong answers

Option B is wrong because the MongoDB API is designed for MongoDB-compatible document workloads, not for graph traversal; using the SQL API for graph would require manual graph logic and lacks native graph traversal capabilities. Option C is wrong because while the Gremlin API can store JSON-like properties on vertices and edges, it is not optimized for flexible schema document queries or indexing, making it inefficient for storing and querying user profiles with arbitrary fields. Option D is wrong because the Table API is a key-value store with a fixed schema, not suitable for storing complex JSON documents with varying fields, and it does not support graph traversal.

Full explanation →

374

MCQeasy

A company needs to migrate a large on-premises SQL Server database to Azure. The migration must have minimal downtime and support ongoing replication. Which Azure service should they use?

A.Azure Data Box

B.Azure Data Factory

C.Azure SQL Database

D.Azure Database Migration Service

AnswerD

Database Migration Service supports online migrations with continuous replication.

Why this answer

Azure Database Migration Service (DMS) is designed for online migrations with minimal downtime, supporting ongoing replication from SQL Server to Azure SQL Database. It uses the Data Migration Assistant (DMA) for assessment and the Azure DMS for continuous sync, enabling near-zero downtime during cutover.

Exam trap

The trap here is that candidates confuse the target service (Azure SQL Database) with the migration tool, or assume Data Factory can handle live replication, when in fact only DMS provides the necessary online migration and ongoing sync capabilities.

How to eliminate wrong answers

Option A is wrong because Azure Data Box is a physical data transfer appliance for offline bulk data migration, not suitable for minimal downtime or ongoing replication. Option B is wrong because Azure Data Factory is an ETL and orchestration service for data movement and transformation, not a dedicated migration tool with built-in replication and minimal downtime capabilities. Option C is wrong because Azure SQL Database is the target platform, not a migration service; it does not handle the migration process or replication itself.

Full explanation →

375

MCQmedium

A company uses Azure SQL Database for an e-commerce platform. The 'Orders' table has millions of rows with columns OrderID (primary key), CustomerID, OrderDate, and TotalAmount. Queries often filter by CustomerID (equality) and OrderDate (range). Currently, these queries are slow. Which index should be created to improve performance?

A.A nonclustered index on OrderID

B.A nonclustered index on (CustomerID, OrderDate)

C.A nonclustered index on OrderDate

D.A clustered index on CustomerID

AnswerB

This composite index covers the query predicate perfectly. CustomerID is the equality column, and OrderDate is the range column. The index allows the database engine to efficiently locate rows for a specific customer and then scan a small range of dates.

Why this answer

The query pattern filters by CustomerID (equality) and OrderDate (range). A composite nonclustered index on (CustomerID, OrderDate) allows SQL Database to seek directly to the matching CustomerID rows and then efficiently scan the ordered OrderDate range within that partition, avoiding a full table scan or key lookup. This index order leverages the index's B-tree structure for both equality and range predicates.

Exam trap

The trap here is that candidates often choose a single-column index on OrderDate (Option C) thinking it covers the range filter, but they overlook that without CustomerID as the leading key, the index cannot efficiently narrow down to a specific customer, resulting in a full index scan instead of a seek.

How to eliminate wrong answers

Option A is wrong because indexing OrderID (the primary key) does not help queries filtering by CustomerID and OrderDate; the primary key is already clustered by default, and a nonclustered index on it would be redundant for this workload. Option C is wrong because a single-column index on OrderDate alone would require scanning all rows for each CustomerID, as it cannot narrow down by CustomerID first, leading to inefficient range scans. Option D is wrong because a clustered index on CustomerID would physically reorder the table by CustomerID, but it would not efficiently support range queries on OrderDate within a CustomerID group, and it would break the primary key's default clustering, potentially causing fragmentation and performance issues.

Full explanation →

Microsoft Azure Data Fundamentals DP-900 (DP-900) — Questions 301–375