Microsoft Azure Data Fundamentals DP-900 DP-900 Questions 451–525 | Page 7/14

451

MCQeasy

A retail company stores product inventory data in a fixed-schema table with columns for ProductID, ProductName, QuantityInStock, and ReorderLevel. How should this data be classified?

A.Structured data

B.Semi-structured data

C.Unstructured data

D.Streaming data

AnswerA

Correct - The data has a fixed schema organized in rows and columns, which is the definition of structured data.

Why this answer

This data is classified as structured data because it conforms to a fixed schema with clearly defined columns (ProductID, ProductName, QuantityInStock, ReorderLevel) and data types, stored in a relational table. Structured data is highly organized, easily queryable via SQL, and follows a rigid schema, which matches the description of the inventory table.

Exam trap

The trap here is that candidates may confuse structured data with semi-structured data because both involve some organization, but the key distinction is that structured data requires a rigid, predefined schema (like a fixed-schema table), while semi-structured data allows schema flexibility (e.g., JSON with optional fields).

How to eliminate wrong answers

Option B is wrong because semi-structured data (e.g., JSON, XML, or CSV with flexible schemas) does not enforce a fixed schema or strict column definitions, whereas this table has a predefined schema. Option C is wrong because unstructured data (e.g., text files, images, or videos) lacks any predefined data model or organization, unlike the tabular inventory data. Option D is wrong because streaming data refers to continuous, real-time data flows (e.g., IoT sensor data or clickstreams), not static data stored in a table.

Full explanation →

452

MCQmedium

A company is evaluating deployment options for a new business application that requires a fully managed relational database. The application must support high availability with automatic failover and horizontal scaling for read-heavy workloads. The development team wants to minimize administrative overhead and prefers a PaaS solution that offers built-in read scale-out. Which Azure SQL deployment option should they choose?

A.SQL Server on Azure Virtual Machines

B.Azure SQL Managed Instance

C.Azure SQL Database (single database)

D.Azure SQL Database Elastic Pool

AnswerC

This is a fully managed PaaS service that, in the Premium or Business Critical tiers, provides up to four readable secondary replicas for read scale-out. It also supports automatic failover with zone-redundant configurations, meeting the high availability and read scaling needs with minimal administration.

Why this answer

Azure SQL Database (single database) is the correct choice because it is a fully managed PaaS relational database that supports high availability with automatic failover (99.99% SLA) and built-in read scale-out via read-only replicas (Active Geo-Replication and failover groups). It minimizes administrative overhead by handling patching, backups, and replication automatically, and it allows horizontal scaling for read-heavy workloads by offloading read traffic to secondary replicas without manual configuration.

Exam trap

The trap here is that candidates often confuse Azure SQL Managed Instance's high compatibility with full PaaS features, overlooking that it does not support built-in read scale-out, which is a key requirement for read-heavy workloads.

How to eliminate wrong answers

Option A is wrong because SQL Server on Azure Virtual Machines is an IaaS solution that requires manual configuration of high availability (e.g., Always On Availability Groups) and read scale-out, and it does not offer built-in automatic failover or PaaS-level administrative overhead reduction. Option B is wrong because Azure SQL Managed Instance provides near-100% SQL Server compatibility but lacks built-in read scale-out (read-only replicas are not supported for read-heavy workloads; it uses only a single primary replica). Option D is wrong because Azure SQL Database Elastic Pool is designed for managing multiple databases with shared resources and cost optimization, not for providing built-in read scale-out or automatic failover for a single database; it inherits the same read-scale limitations as single databases but does not add horizontal read scaling.

Full explanation →

453

MCQmedium

Refer to the exhibit. You are reviewing an Azure Cosmos DB account configuration. Which API is this account configured to use?

A.Table API

B.SQL (Core) API

C.Cassandra API

D.MongoDB API

AnswerC

EnableCassandra indicates Cassandra API.

Why this answer

The 'capabilities' array includes 'EnableCassandra', which indicates the Cassandra API is enabled. Option A is wrong because SQL API would have 'EnableSQL' capability. Option C is wrong because MongoDB would have 'EnableMongo'.

Option D is wrong because Table API would have 'EnableTable'.

Full explanation →

454

MCQhard

Match each ACID property with its correct description. Properties: - Atomicity - Consistency - Isolation - Durability Descriptions: 1. Transactions appear to execute one after the other, even if they are concurrent. 2. Once a transaction is committed, the changes are permanently saved and survive failures. 3. A transaction either completes fully or is rolled back entirely. 4. A transaction brings the database from one valid state to another, obeying all rules. Which option correctly maps each property to its description?

A.Atomicity → 3, Consistency → 4, Isolation → 1, Durability → 2

B.Atomicity → 4, Consistency → 3, Isolation → 2, Durability → 1

C.Atomicity → 2, Consistency → 1, Isolation → 3, Durability → 4

D.Atomicity → 1, Consistency → 2, Isolation → 4, Durability → 3

AnswerA

This is the correct mapping of ACID properties to their standard definitions.

Why this answer

Option A is correct because it accurately maps each ACID property to its definition. Atomicity ensures a transaction is all-or-nothing (3), Consistency guarantees the database moves from one valid state to another (4), Isolation makes concurrent transactions appear serial (1), and Durability ensures committed changes persist even after a failure (2). These are the standard definitions used in Azure SQL Database and other relational database systems.

Exam trap

The trap here is that candidates confuse the definitions of Consistency and Atomicity, often thinking Consistency means 'all-or-nothing' rather than 'valid state transitions,' or they swap Isolation with Durability by misremembering the 'permanent save' concept.

How to eliminate wrong answers

Option B is wrong because it swaps Atomicity and Consistency: Atomicity is about all-or-nothing execution, not bringing the database to a valid state (which is Consistency). Option C is wrong because it assigns Durability to 'transactions appear to execute one after the other' (Isolation) and Atomicity to 'changes are permanently saved' (Durability), completely inverting the properties. Option D is wrong because it maps Atomicity to 'transactions appear to execute one after the other' (Isolation) and Isolation to 'brings the database from one valid state to another' (Consistency), mixing up the core definitions.

Full explanation →

455

MCQmedium

You design a data solution for an e-commerce platform. Transactional data must be stored with ACID compliance for order processing, while clickstream data from the website will be used for analytics. Which combination of Azure data services best meets these needs?

A.Azure Cosmos DB for transactions; Azure SQL Database for analytics

B.Azure SQL Database for transactions; Azure Synapse Analytics for analytics

C.Azure Blob Storage for transactions; Azure Data Lake Storage for analytics

D.Azure Database for MySQL for transactions; Azure Analysis Services for analytics

AnswerB

Azure SQL Database is ACID-compliant; Synapse Analytics is for big data analytics.

Why this answer

Azure SQL Database provides full ACID compliance for transactional workloads like order processing, ensuring data integrity. Azure Synapse Analytics is optimized for large-scale analytics on clickstream data, offering massively parallel processing (MPP) and integration with data lakes. This combination separates OLTP and OLAP workloads efficiently.

Exam trap

The trap here is that candidates often assume Azure Cosmos DB (Option A) is ACID-compliant because it supports multi-document transactions within a single partition, but it does not guarantee full ACID across partitions, making it unsuitable for strict order processing.

How to eliminate wrong answers

Option A is wrong because Azure Cosmos DB is a NoSQL database that offers configurable consistency levels (not full ACID across all operations) and is not ideal for strict ACID-compliant order processing; Azure SQL Database is transactional but not optimized for large-scale analytics like Synapse. Option C is wrong because Azure Blob Storage is an object store with no ACID transaction support (it offers eventual consistency for blobs) and is unsuitable for order processing; Azure Data Lake Storage is for raw data storage, not interactive analytics. Option D is wrong because Azure Database for MySQL provides ACID compliance but Azure Analysis Services is a semantic modeling layer (not a scalable analytics engine) and lacks the MPP capabilities needed for clickstream analytics.

Full explanation →

456

MCQeasy

You need to provide temporary access to a specific blob in Azure Blob Storage for a limited time. The access should be time-limited and require no authentication from the user. Which mechanism should you use?

A.Storage account keys

B.Anonymous public access

C.Azure RBAC roles

D.Shared access signatures (SAS)

AnswerD

SAS tokens can be scoped to a specific blob with a defined expiration time, providing secure delegated access.

Why this answer

Shared access signatures (SAS) provide time-limited, delegated access to storage resources without requiring the account key. Storage account keys provide full access and never expire. RBAC is for identity-based access, not anonymous.

Access keys are long-lived secrets.

Full explanation →

457

MCQeasy

Your company uses Azure Synapse Analytics to run analytical queries on large datasets. You need to ensure that queries against a frequently accessed fact table perform well without impacting other workloads. Which feature should you use?

A.Create materialized views on the fact table.

B.Enable result set caching for the database.

C.Partition the fact table by a frequently filtered column.

D.Use workload classification to prioritize the queries.

AnswerB

Result set caching stores query results for repeated execution without recomputation.

Why this answer

Option C is correct. Result set caching stores query results in the Synapse cache, speeding up repeated queries and reducing resource contention. Option A is wrong because materialized views require upfront storage and maintenance.

Option B is wrong because workload classification prioritizes, not caches. Option D is wrong because splitting tables increases complexity.

Full explanation →

458

MCQeasy

A healthcare company stores patient records in a relational database with fixed columns (PatientID, Name, DOB, BloodType). Medical images such as X-rays are stored as DICOM files. Clinical notes are stored as free-text documents. Which of the following correctly classifies these data types from most structured to least structured?

A.Patient records (structured), DICOM files (structured), Clinical notes (unstructured)

B.Patient records (structured), DICOM files (semi-structured), Clinical notes (unstructured)

C.Patient records (semi-structured), DICOM files (unstructured), Clinical notes (structured)

D.Patient records (unstructured), DICOM files (semi-structured), Clinical notes (structured)

AnswerB

Correct. Patient records are structured because they reside in a relational table with fixed columns. DICOM files have a standard format with metadata tags, making them semi-structured. Clinical notes as free text are unstructured.

Why this answer

Patient records in a fixed-column relational database are structured data because they conform to a rigid schema with defined data types. DICOM files are semi-structured because they contain a structured header with metadata tags (e.g., patient ID, study date) alongside an unstructured binary image payload. Clinical notes as free-text documents are unstructured because they lack a predefined schema or organization, making them difficult to query without natural language processing.

Exam trap

The trap here is that candidates often misclassify DICOM files as fully structured due to their standardized header, overlooking the unstructured binary image payload that makes them semi-structured.

How to eliminate wrong answers

Option A is wrong because DICOM files are not fully structured; they have a structured header but also contain unstructured binary image data, making them semi-structured. Option C is wrong because patient records in a relational database are structured, not semi-structured, and clinical notes are unstructured, not structured. Option D is wrong because patient records are structured, not unstructured, and DICOM files are semi-structured, not semi-structured in the way described; the entire classification is reversed.

Full explanation →

459

Multi-Selecteasy

Which TWO storage tiers are available in Azure Blob Storage for general-purpose v2 storage accounts? (Choose two.)

Select 2 answers

A.Cool

B.Frozen

C.Standard

D.Cold

E.Hot

AnswersA, E

Optimized for infrequent access.

Why this answer

Option A (Hot) is correct and Option D (Cool) is correct. Archive is a tier but not listed here; Premium is a separate account type. Option B is wrong because there is no 'Cold' tier.

Option C is wrong because 'Frozen' is not a tier. Option E is wrong because 'Standard' is not a tier name (it's an account type).

Full explanation →

460

MCQeasy

You are designing a solution to store JSON documents from a web application. Each document is about 10 KB and must be queried by a unique ID. Which Azure data store is most appropriate?

A.Azure SQL Database

B.Azure Table Storage

C.Azure Cosmos DB

D.Azure Blob Storage

AnswerC

Cosmos DB is a fully managed NoSQL database with native JSON support, indexing, and fast point reads by ID.

Why this answer

Azure Cosmos DB is a NoSQL database that natively supports JSON documents and provides low-latency queries by ID. Azure Blob Storage stores blobs but is not optimized for querying by document ID. Azure SQL Database is relational and requires schema.

Azure Table Storage is key-value but less feature-rich for JSON documents.

Full explanation →

461

MCQeasy

Refer to the exhibit. You have created this Azure Data Factory pipeline. When you run it, the copy activity fails with a connectivity error. What is the most likely missing component?

A.The Azure SQL Database firewall must allow Azure services

B.A self-hosted integration runtime is not installed on premises

C.The SQL query is invalid

D.The on-premises SQL Server must have a public endpoint

AnswerB

The SelfHostedIR reference requires an actual installed IR agent on a machine that can access the on-premises SQL Server.

Why this answer

The pipeline uses a SelfHostedIR integration runtime to connect to on-premises SQL Server. To work, a self-hosted integration runtime must be installed on a machine that can reach the on-premises SQL Server. Option A is wrong because a public endpoint is not needed; the self-hosted IR connects privately.

Option B is wrong because Azure SQL Database firewall rules are not the issue (error is connectivity to on-premises). Option D is wrong because the query is valid.

Full explanation →

462

MCQhard

A company has both transactional and analytical workloads on the same SQL Server database. They want to move to Azure and separate these workloads to improve performance. They need a solution that supports both workloads without duplicating data. What should they do?

A.Use Azure Data Factory to move data between two Azure SQL Databases.

B.Use Azure Synapse Link for SQL to replicate data in near real-time to an analytical store.

C.Use Azure SQL Database for transactions and Azure SQL Data Warehouse for analytics, with periodic data copy.

D.Use Azure SQL Database for both workloads.

AnswerB

Synapse Link provides real-time replication without ETL.

Why this answer

Option D is correct because Azure Synapse Link for SQL enables real-time replication of transactional data from Azure SQL Database to Synapse Analytics for analytical queries without ETL. Option A is wrong because moving to Azure SQL Database alone doesn't separate workloads. Option B is wrong because using Azure SQL Database and SQL Data Warehouse separately would require data duplication.

Option C is wrong because Azure Data Factory is for ETL, not real-time replication.

Full explanation →

463

MCQmedium

A data engineer needs to build a pipeline that runs every hour, copies new sales data from an on-premises SQL Server to Azure Data Lake Storage Gen2, transforms the data using PySpark, and then loads it into Azure Synapse Analytics dedicated SQL pool. Which Azure service should be used to orchestrate the entire pipeline?

A.Azure Data Factory

B.Azure Stream Analytics

C.Azure Logic Apps

D.Azure Databricks

AnswerA

Azure Data Factory provides orchestration and scheduling for data pipelines. It can copy data from on-premises sources, run custom processing (like PySpark on Databricks), and load results into Synapse Analytics.

Why this answer

Azure Data Factory (ADF) is the correct choice because it is a cloud-based ETL and data integration service designed to orchestrate complex pipelines. It can copy data from on-premises SQL Server via a self-hosted integration runtime, trigger the pipeline on an hourly schedule, execute PySpark transformations in Azure Databricks or HDInsight, and load the results into Azure Synapse Analytics dedicated SQL pool—all within a single, managed orchestration workflow.

Exam trap

The trap here is that candidates confuse Azure Databricks (a compute/transform service) with an orchestration service, forgetting that ADF is the dedicated tool for scheduling, copying, and managing the full pipeline lifecycle.

How to eliminate wrong answers

Option B is wrong because Azure Stream Analytics is a real-time stream processing engine for data from sources like IoT Hub or Event Hubs; it does not support scheduled batch orchestration, on-premises data copy via self-hosted IR, or PySpark transformations. Option C is wrong because Azure Logic Apps is a low-code workflow service for integrating SaaS applications and APIs, not designed for big data ETL pipelines with PySpark or direct loading into Synapse dedicated SQL pool. Option D is wrong because Azure Databricks is an analytics platform for running PySpark jobs, but it lacks native orchestration capabilities for scheduling, copying data from on-premises SQL Server, and managing the end-to-end pipeline dependencies—it is a compute target, not an orchestrator.

Full explanation →

464

MCQmedium

A company runs an e-commerce application on Azure SQL Database. The database has a table named Orders with columns: OrderID (int, primary key), CustomerID (int), OrderDate (datetime), TotalAmount (decimal). The application frequently runs the following query: SELECT * FROM Orders WHERE CustomerID = 12345 AND OrderDate BETWEEN '2025-01-01' AND '2025-01-31' ORDER BY OrderDate DESC. The table contains 10 million rows. Which index would best optimize this query?

A.A nonclustered index on OrderDate only.

B.A nonclustered index on (CustomerID, OrderDate DESC) including TotalAmount as included column.

C.A clustered index on (OrderDate, CustomerID).

D.A nonclustered index on (OrderDate DESC) only.

AnswerB

This composite index covers both filter conditions (CustomerID equality, OrderDate range) and includes TotalAmount to avoid key lookups. The descending order helps with ORDER BY OrderDate DESC without additional sorting.

Why this answer

Option B is correct because the query filters on both CustomerID and OrderDate, so a composite nonclustered index on (CustomerID, OrderDate DESC) allows SQL Server to perform an index seek on CustomerID and then an ordered range scan on OrderDate, avoiding a sort operation. Including TotalAmount as an included column makes the index covering, so the query can be satisfied entirely from the index without key lookups to the clustered index.

Exam trap

The trap here is that candidates often think a single-column index on the most selective column (OrderDate) is sufficient, but they overlook that the query's equality filter on CustomerID must be the leading key column to enable an efficient seek, and that including the SELECT column avoids key lookups.

How to eliminate wrong answers

Option A is wrong because an index on OrderDate only would require scanning all rows for the given CustomerID, as the filter on CustomerID cannot use the index, leading to a full scan or inefficient partial scan. Option C is wrong because a clustered index on (OrderDate, CustomerID) would order the entire table by OrderDate first, making seeks on CustomerID inefficient and requiring a scan of all rows for that date range; also, changing the clustered index from the primary key (OrderID) could impact other queries and insert performance. Option D is wrong because an index on OrderDate DESC only suffers the same issue as Option A: it cannot efficiently locate rows for a specific CustomerID, resulting in a scan or bookmark lookup.

Full explanation →

465

MCQmedium

An e-commerce company runs a data pipeline that reads all orders from the previous hour, aggregates total sales per product category, and writes the results to a reporting database. The pipeline executes at the start of every hour. Which type of data processing workload does this pipeline represent?

A.Batch processing

B.Stream processing

C.Transactional processing

D.Interactive processing

AnswerA

The pipeline processes a batch of data (hourly orders) on a schedule, which is batch processing.

Why this answer

This pipeline reads all orders from the previous hour, aggregates total sales per product category, and writes results to a reporting database at the start of every hour. This is a classic batch processing workload because data is collected over a fixed time window (one hour) and processed as a single, scheduled job, not continuously. Batch processing is ideal for non-real-time, high-volume data transformations like hourly sales aggregation.

Exam trap

The trap here is that candidates confuse scheduled batch processing with stream processing because both can handle time-windowed aggregations, but batch processes data in discrete, scheduled chunks while stream processes data continuously as it arrives.

How to eliminate wrong answers

Option B is wrong because stream processing handles data in real-time or near-real-time as it arrives, not on a fixed hourly schedule. Option C is wrong because transactional processing (OLTP) focuses on individual, atomic transactions (e.g., placing an order) and does not involve aggregating data over a time window. Option D is wrong because interactive processing involves user-driven queries or operations that return results immediately, not scheduled batch jobs.

Full explanation →

466

MCQmedium

A global online gaming company needs a data store for player game session logs. Each log record has a SessionID (unique), PlayerID, GameID, StartTime, EndTime, and a JSON payload containing variable game state details. The company requires low-latency writes for millions of concurrent sessions and wants to query by PlayerID and time range. Schema flexibility is important because game state details change frequently. Which Azure data store should they choose?

A.Azure Cosmos DB with the NoSQL API

B.Azure Table Storage

C.Azure Blob Storage

D.Azure SQL Database

AnswerA

Supports flexible JSON schemas, high throughput, low-latency queries, and global distribution.

Why this answer

Azure Cosmos DB with the NoSQL API is the correct choice because it provides low-latency writes (single-digit milliseconds at the 99th percentile) for millions of concurrent sessions, supports schema-flexible JSON documents that can accommodate frequently changing game state payloads, and enables efficient queries by PlayerID and time range using a composite index or a partition key like PlayerID combined with a time-based sort order.

Exam trap

The trap here is that candidates often choose Azure Table Storage because they think it is 'NoSQL' and 'fast,' but they overlook its lack of native JSON support and schema flexibility, which are critical for the variable game state payloads described in the question.

How to eliminate wrong answers

Option B is wrong because Azure Table Storage does not natively support JSON payloads or schema flexibility for variable game state details; it stores data as entities with fixed property sets and requires flattening complex nested data. Option C is wrong because Azure Blob Storage is designed for unstructured binary or text data, not for low-latency, indexed queries by PlayerID and time range; it lacks native query capabilities and would require additional services like Azure Data Lake or external indexing. Option D is wrong because Azure SQL Database enforces a fixed relational schema, which cannot accommodate the frequently changing game state details without costly schema migrations, and its write throughput is limited compared to Cosmos DB's horizontal scaling for millions of concurrent sessions.

Full explanation →

467

MCQhard

You need to upload a 500 GB file to Azure Blob Storage. The network connection is unreliable. Which feature should you use to ensure the upload completes successfully?

A.Configure a lifecycle management policy to tier the file.

B.Use the Put Block and Put Block List operations with a block size that handles retries.

C.Use Azure File Sync to synchronize the file to the cloud.

D.Use AzCopy with the /Z parameter to resume the upload.

AnswerB

Block-level upload allows resuming.

Why this answer

Option B is correct because the Put Block and Put Block List operations allow you to upload a large file as a series of individual blocks. Each block can be retried independently if the network fails, and the final Put Block List assembles the blocks into a single blob. This block-level retry mechanism ensures the upload completes despite an unreliable connection.

Exam trap

The trap here is that candidates often confuse AzCopy's resume capability (which works at the file level) with the block-level retry mechanism of Put Block/Put Block List, assuming any resume feature is sufficient for unreliable networks, but only block-level retries provide fine-grained resilience.

How to eliminate wrong answers

Option A is wrong because lifecycle management policies are used to automatically tier or delete blobs based on age or access patterns, not to handle upload retries or reliability. Option C is wrong because Azure File Sync is designed for continuous synchronization of files between on-premises servers and Azure file shares, not for a one-time upload of a single large file with retry handling. Option D is wrong because AzCopy with the /Z parameter (or --resume in newer versions) supports resuming a failed transfer, but it does not provide the granular block-level retry mechanism that Put Block and Put Block List offer for unreliable networks.

Full explanation →

468

MCQhard

An e-commerce company runs a transactional database on Azure SQL Database. During peak shopping seasons, they experience performance degradation due to high read traffic on product catalog tables. The company wants to offload read queries to a read-only copy of the database without affecting write performance. What should they implement?

A.Configure Active Geo-Replication to create a readable secondary replica.

B.Enable Read Scale-Out on the database.

C.Implement Azure Cache for Redis to cache product catalog data.

D.Create a failover group with a secondary region.

AnswerA

Active Geo-Replication provides readable secondaries that can handle read-only queries.

Why this answer

Option A is correct because Active Geo-Replication allows creating readable secondary replicas in the same or different region. Option B (Failover groups) is for disaster recovery, not read scaling. Option C (Read Scale-Out) is a feature of Azure SQL Database that automatically routes read queries to a secondary replica, but it is not available in all service tiers; the correct feature name is Active Geo-Replication for readable secondaries.

Option D (Azure Cache for Redis) adds caching but not a read-only copy of the database.

Full explanation →

469

MCQhard

You are the database administrator for a global e-commerce company that uses Azure SQL Database. The company has a single database in the West US region. The database hosts a table named Orders with 500 million rows. The table is clustered on OrderId (uniqueidentifier). The most critical query is a daily report that aggregates total sales by product category for the previous day. This query currently takes over 30 minutes to run and causes performance degradation on the primary database. The report must be available by 6:00 AM local time each day, and the data must be no more than 24 hours old. You need to design a solution that minimizes impact on the transactional workload and improves report query performance. What should you do?

A.Partition the Orders table by OrderDate and create a partitioned view for the report

B.Add a nonclustered columnstore index on the primary database for the Orders table

C.Create a nonclustered index on the primary database covering the columns used in the report query

D.Create a readable secondary replica in the same region and run the report query against the replica

AnswerD

A readable secondary replica offloads the reporting workload from the primary, and you can create indexes on the replica without affecting the primary.

Why this answer

Option D is correct because creating a read replica offloads the reporting query from the primary and allows indexing specifically for the report. Option A is wrong because adding indexes on the primary may degrade write performance. Option B is wrong because partitioning alone won't isolate the reporting workload.

Option C is wrong because nonclustered columnstore index on the primary still impacts transactional workload.

Full explanation →

470

MCQeasy

A company wants to run complex analytics queries across petabytes of data stored in Azure Data Lake Storage. They need a serverless option that supports T-SQL. Which Azure service should they use?

A.Azure SQL Database serverless

B.Azure Analysis Services

C.Azure Databricks

D.Azure Synapse Serverless SQL pool

AnswerD

Serverless SQL pool provides T-SQL interface over data in Data Lake Storage, with pay-per-query pricing.

Why this answer

Azure Synapse Serverless SQL pool is the correct choice because it provides a serverless, on-demand query service that allows you to run T-SQL queries directly against data stored in Azure Data Lake Storage (ADLS). It supports complex analytics over petabytes of data without provisioning any infrastructure, and it uses T-SQL as the query language, meeting all the stated requirements.

Exam trap

The trap here is that candidates often confuse 'serverless' with 'Azure SQL Database serverless' (Option A) because of the name, but fail to recognize that Azure SQL Database serverless is a transactional database, not a data lake query engine, and does not support querying external storage like ADLS with T-SQL.

How to eliminate wrong answers

Option A is wrong because Azure SQL Database serverless is a serverless compute tier for a relational database, but it is designed for transactional workloads and does not natively query data stored in Azure Data Lake Storage; it requires data to be loaded into the database first. Option B is wrong because Azure Analysis Services is a fully managed platform as a service (PaaS) that provides enterprise-grade data modeling and semantic layers, but it does not support direct T-SQL queries against ADLS; it uses DAX or MDX and requires data to be imported or queried via a gateway. Option C is wrong because Azure Databricks is an Apache Spark-based analytics platform that supports SQL queries via Spark SQL, but it does not use T-SQL; it uses Spark SQL syntax and requires a cluster to be running, even if auto-terminating, making it not a true serverless T-SQL option.

Full explanation →

471

MCQmedium

A retail company wants to analyze customer clickstream data in real-time to detect patterns and trigger personalized offers. They also store the raw clickstream data in Azure Data Lake Storage for later batch analysis. Which Azure service should they use for the real-time processing component?

A.Azure Data Factory

B.Azure Stream Analytics

C.Azure Batch

D.Azure Data Lake Analytics

AnswerB

Azure Stream Analytics processes streaming data in real time using SQL-like queries, making it suitable for real-time analytics and event-driven responses.

Why this answer

Azure Stream Analytics is the correct choice because it is designed for real-time data processing and analytics on streaming data, such as clickstream events. It can ingest data from sources like Azure Event Hubs, apply SQL-like queries to detect patterns, and output results to triggers or storage, all with sub-second latency. This matches the requirement for real-time pattern detection and personalized offer triggering.

Exam trap

The trap here is that candidates often confuse Azure Data Factory (a batch ETL tool) with real-time processing, or they assume Azure Data Lake Analytics can handle streaming data because it works with Data Lake Storage, but it is strictly a batch service.

How to eliminate wrong answers

Option A is wrong because Azure Data Factory is a cloud-based ETL and data integration service for orchestrating and moving data between stores, but it does not perform real-time stream processing; it operates on scheduled or event-driven batch pipelines. Option C is wrong because Azure Batch is a job scheduling and compute management service for running large-scale parallel and high-performance computing (HPC) workloads, not for real-time stream analytics. Option D is wrong because Azure Data Lake Analytics is a batch analytics service that uses U-SQL to process data stored in Azure Data Lake Storage, but it is not designed for real-time or streaming data processing.

Full explanation →

472

MCQmedium

A media company stores raw video footage as blobs in Azure Blob Storage. After processing, the raw footage is kept for compliance purposes and is accessed only a few times per year. The company wants to minimize storage costs while ensuring the data is durable and can be restored within 24 hours if needed. Which Azure Blob Storage access tier should they use?

A.Hot

B.Cool

C.Cold

D.Archive

AnswerD

The Archive tier offers the lowest storage cost for data that is rarely accessed and can tolerate a retrieval latency of up to 15 hours. This matches the company's requirements of access only a few times per year and a 24-hour recovery window.

Why this answer

The Archive tier is the correct choice because it offers the lowest storage cost for data that is rarely accessed (a few times per year) and can tolerate a retrieval latency of up to 15 hours, which is well within the 24-hour restoration requirement. Azure Blob Storage's Archive tier is designed for long-term retention, compliance, and backup scenarios where durability is maintained through geo-redundant replication options, and data can be rehydrated to an online tier (e.g., Hot or Cool) within the specified time frame.

Exam trap

The trap here is that candidates often confuse the Cold tier (which is a separate tier in Azure, not to be mistaken with Archive) and assume it is the cheapest option, but Archive is actually the lowest-cost tier for data that can tolerate a 24-hour retrieval time, while Cold is still more expensive and has a lower retrieval latency.

How to eliminate wrong answers

Option A is wrong because the Hot tier is optimized for frequent access and has the highest storage cost, making it unsuitable for data accessed only a few times per year. Option B is wrong because the Cool tier is designed for data accessed infrequently (e.g., every 30 days) but still incurs higher storage costs than Archive and has a lower retrieval latency than needed, which is unnecessary for a 24-hour restore window. Option C is wrong because the Cold tier, while cheaper than Cool, is still more expensive than Archive and is intended for data accessed roughly once every 90 days, not for data accessed only a few times per year with a 24-hour retrieval tolerance.

Full explanation →

473

MCQeasy

A retail company stores years of historical sales data in Azure Data Lake Storage Gen2 as Parquet files. Business analysts need to run complex SQL queries over this data to identify sales trends, and they want to visualize the results in Power BI dashboards. They prefer to avoid moving data into a separate database to minimize storage costs and latency. Which Azure service should they use to query the data directly in the lake?

A.Azure Synapse Analytics

B.Azure SQL Database

C.Azure Data Factory

D.Azure Analysis Services

AnswerA

Correct. The serverless SQL pool in Azure Synapse Analytics can query Parquet files directly in the data lake, supports T-SQL, and integrates with Power BI.

Why this answer

Azure Synapse Analytics provides the serverless SQL pool capability that allows you to query data directly in Azure Data Lake Storage Gen2 using T-SQL without moving or copying the data. This enables business analysts to run complex SQL queries over Parquet files in the lake and connect the results to Power BI for visualization, minimizing storage costs and latency by avoiding a separate database.

Exam trap

The trap here is that candidates may confuse Azure Data Factory as a query service because it can transform data, but it is an orchestration tool, not an interactive SQL query engine for ad-hoc analysis.

How to eliminate wrong answers

Option B (Azure SQL Database) is wrong because it requires importing data into a relational database, which incurs additional storage costs and latency from data movement, contradicting the requirement to query data directly in the lake. Option C (Azure Data Factory) is wrong because it is an ETL and data orchestration service, not a query engine; it cannot run interactive SQL queries directly against Parquet files in the lake. Option D (Azure Analysis Services) is wrong because it is a semantic modeling and OLAP engine that requires data to be loaded into its in-memory cache from a source, not a direct query service for data in the lake.

Full explanation →

474

Multi-Selecteasy

A car manufacturing company has two data processing systems: one system processes real-time sensor data from assembly lines to immediately detect equipment failures, and another system processes historical production records to generate monthly efficiency reports. Which two types of data processing workloads best describe these systems?

Select 2 answers

A.Stream processing and batch processing

B.OLTP and OLAP

C.Online processing and offline processing

D.Transactional processing and analytical processing

AnswersA, D

Correct. Real-time sensor analysis is stream processing; historical reports are batch processing.

Why this answer

Stream processing handles real-time sensor data to detect equipment failures immediately, as it processes data continuously with low latency. Batch processing is ideal for historical production records to generate monthly efficiency reports, as it processes large volumes of data at scheduled intervals. These two workloads directly match the definitions of stream and batch processing in Azure data services like Azure Stream Analytics and Azure Synapse Analytics.

Exam trap

Microsoft often tests the distinction between stream/batch and OLTP/OLAP by making candidates confuse real-time transaction processing (OLTP) with real-time stream processing, but OLTP is for individual record updates, not continuous sensor data streams.

Full explanation →

475

MCQmedium

A company is migrating a 2-TB on-premises SQL Server database to Azure. The database uses SQL Server Agent jobs for scheduled maintenance, relies on linked servers to query data from another SQL Server instance, and requires cross-database queries within the same instance. The company wants a fully managed PaaS service that minimizes application code changes and provides automatic backups and patching. Which Azure SQL service should they choose?

A.Azure SQL Database (Single Database)

B.Azure SQL Database (Elastic Pool)

C.Azure SQL Managed Instance

D.SQL Server on Azure Virtual Machine

AnswerC

Azure SQL Managed Instance provides built-in support for SQL Server Agent, linked servers, and cross-database queries, with automatic backups and patching, while being fully managed PaaS.

Why this answer

Azure SQL Managed Instance is the correct choice because it provides near 100% compatibility with on-premises SQL Server, including support for SQL Server Agent jobs, linked servers, and cross-database queries within the same instance. As a fully managed PaaS service, it offers automatic backups, patching, and high availability while minimizing application code changes, unlike Azure SQL Database which lacks instance-scoped features.

Exam trap

The trap here is that candidates often confuse Azure SQL Database (which is database-level PaaS) with Azure SQL Managed Instance (which is instance-level PaaS), overlooking that linked servers, Agent jobs, and cross-database queries require instance-scoped functionality not available in Azure SQL Database.

How to eliminate wrong answers

Option A is wrong because Azure SQL Database (Single Database) does not support SQL Server Agent jobs, linked servers, or cross-database queries within the same instance; it is a database-level PaaS service, not instance-scoped. Option B is wrong because Azure SQL Database (Elastic Pool) shares the same limitations as Single Database—it still lacks instance-level features like Agent jobs and linked servers, and only provides resource pooling for multiple databases. Option D is wrong because SQL Server on Azure Virtual Machine is an IaaS service, not fully managed PaaS; it requires the customer to manage patching, backups, and high availability, contradicting the requirement for a fully managed service.

Full explanation →

476

MCQmedium

A startup is building a global user session store. Each session consists of a simple key (session ID) and a value (user data as a JSON string). The application requires low-latency reads and writes from any Azure region, and the data must be durable. Which Azure service is best suited for this scenario?

A.Azure Cosmos DB (Table API)

B.Azure Table Storage

C.Azure Redis Cache

D.Azure Blob Storage

AnswerA

Cosmos DB with Table API provides a globally distributed, low-latency, fully managed key-value store. It supports automatic scaling and multi-region writes, fitting the startup's requirements.

Why this answer

Azure Cosmos DB (Table API) is the best fit because it provides global, multi-region writes with tunable consistency, guaranteed single-digit-millisecond latency for reads and writes at the 99th percentile, and full durability with automatic replication across any number of Azure regions. The Table API offers a key-value store interface (session ID as partition key, JSON value) while also supporting schema flexibility and SLA-backed performance, which is critical for a global user session store.

Exam trap

The trap here is that candidates often confuse Azure Table Storage (a simple, regional key-value store) with Azure Cosmos DB Table API (a globally distributed, low-latency, SLA-backed service), assuming both offer the same global performance and durability, when in fact only Cosmos DB provides multi-region writes and guaranteed latency.

How to eliminate wrong answers

Option B (Azure Table Storage) is wrong because it is a regional service that does not natively support multi-region writes or global distribution with low-latency reads from any region; it also lacks the SLA-guaranteed single-digit-millisecond latency that Cosmos DB provides. Option C (Azure Redis Cache) is wrong because it is an in-memory cache that is not durable by default (data can be lost on node failure unless Redis persistence is enabled, which still sacrifices performance) and does not offer the same durability guarantees as a fully managed database service. Option D (Azure Blob Storage) is wrong because it is designed for large, unstructured binary objects (blobs) and does not provide a low-latency key-value API for simple session lookups; its read/write latency is significantly higher than Cosmos DB or Redis, making it unsuitable for real-time session access.

Full explanation →

477

MCQhard

A company has an Azure SQL Database with an 'Orders' table containing millions of rows. The table has a clustered index on OrderID (primary key). Queries frequently filter by CustomerID (equality) and OrderDate (range). These queries are slow and cause high logical reads. Which index strategy will most improve performance for these specific queries?

A.Create a non-clustered index on (CustomerID, OrderDate).

B.Rebuild the clustered index on (OrderDate, CustomerID).

C.Create a non-clustered index on OrderDate.

D.Create a filtered index on OrderDate for recent dates.

AnswerA

This composite index matches the query pattern exactly: it allows index seek on CustomerID and then range scan on OrderDate, providing optimal performance.

Why this answer

A non-clustered index on (CustomerID, OrderDate) is a covering index for queries filtering by CustomerID (equality) and OrderDate (range). It allows SQL Server to perform an index seek on CustomerID, then a range scan on OrderDate, retrieving all needed columns without touching the clustered index (if the query is covered). This dramatically reduces logical reads compared to a full clustered index scan or a key lookup.

Exam trap

The trap here is that candidates often think a filtered index or a single-column index is sufficient, but they overlook that the query has both an equality and a range predicate, requiring a composite index that supports both in the correct order (equality first, range second) to achieve optimal seek + range scan performance.

How to eliminate wrong answers

Option B is wrong because rebuilding the clustered index on (OrderDate, CustomerID) would change the physical order of the table, but the primary key (OrderID) must remain unique and clustered; altering the clustered index to a non-unique key violates best practices and would require a separate unique constraint. Option C is wrong because a single-column index on OrderDate only supports range scans on date, but still requires key lookups to filter by CustomerID, leading to high logical reads. Option D is wrong because a filtered index on OrderDate for recent dates only helps queries with a date predicate on recent rows; it does not support the CustomerID equality filter and misses older data, so it is not a general solution for the described workload.

Full explanation →

478

MCQmedium

A data analytics team stores sales transaction data in Parquet files in Azure Data Lake Storage Gen2. They want to run complex analytical queries that join this data with dimension tables stored in Azure Synapse Analytics dedicated SQL pool. The team prefers not to move or copy the data from the data lake. Which feature should they use to query the data lake data directly?

A.Azure Data Factory pipelines

B.PolyBase external tables

C.Azure Stream Analytics

D.Azure Databricks notebooks

AnswerB

PolyBase enables Synapse to create external tables that query data in the data lake without moving it.

Why this answer

PolyBase external tables in Azure Synapse Analytics dedicated SQL pool allow you to query data stored in Azure Data Lake Storage Gen2 (ADLS Gen2) directly using T-SQL, without moving or copying the data. This is the correct feature because it enables complex analytical joins between the Parquet files in the data lake and the dimension tables in the dedicated SQL pool, leveraging the external table's ability to read Parquet format natively.

Exam trap

The trap here is that candidates often confuse PolyBase with Azure Data Factory pipelines, thinking that any query across data lake and Synapse requires a data movement pipeline, but PolyBase provides direct T-SQL querying without copying data.

How to eliminate wrong answers

Option A is wrong because Azure Data Factory pipelines are used for data movement, orchestration, and transformation, not for directly querying data in place; they would require copying or moving data to run queries. Option C is wrong because Azure Stream Analytics is a real-time stream processing service for analyzing streaming data (e.g., from IoT devices or event hubs), not for querying static Parquet files in a data lake. Option D is wrong because Azure Databricks notebooks are an interactive analytics environment that can query data in ADLS Gen2, but they require a separate compute cluster and do not integrate directly with Synapse dedicated SQL pool for T-SQL-based joins with dimension tables; they are not the native Synapse feature for in-place querying.

Full explanation →

479

MCQhard

A retail company uses Azure SQL Database to store inventory data. They notice excessive blocking and deadlocks during peak hours. Which design change would best reduce these issues?

A.Implement read replicas for reporting queries

B.Use the READ UNCOMMITTED isolation level

C.Add appropriate indexes to reduce lock duration

D.Scale up to a higher service objective

AnswerC

Indexes reduce scan times, thereby shortening lock duration and reducing contention.

Why this answer

Adding appropriate indexes reduces the number of rows scanned during queries, which shortens lock duration and lowers the chance of blocking and deadlocks. In Azure SQL Database, indexes help queries become more efficient by using seeks instead of scans, minimizing the time locks are held on resources.

Exam trap

The trap here is that candidates often confuse scaling up (more resources) with performance tuning, but the DP-900 exam tests understanding that blocking and deadlocks are primarily caused by inefficient query execution, not insufficient hardware.

How to eliminate wrong answers

Option A is wrong because read replicas offload reporting traffic but do not reduce blocking or deadlocks on the primary database; they only separate read workloads. Option B is wrong because READ UNCOMMITTED avoids blocking by reading dirty data but does not reduce deadlocks or blocking for write operations, and it introduces data consistency issues. Option D is wrong because scaling up to a higher service objective increases resources (CPU, IO, memory) but does not address the root cause of inefficient queries that cause long-held locks.

Full explanation →

480

MCQmedium

You are designing a data pipeline that ingests sales transactions from an on-premises SQL Server database into Azure Synapse Analytics for reporting. The data must be processed incrementally every hour with minimal latency. Which Azure service should you use to orchestrate the pipeline?

A.Azure Logic Apps

B.Azure Databricks

C.Azure Functions

D.Azure Data Factory

AnswerD

Azure Data Factory is purpose-built for ETL and data orchestration, supporting incremental loads from on-premises.

Why this answer

Azure Data Factory (ADF) is the correct choice because it is a cloud-based ETL and data orchestration service designed specifically for building complex, schedule-driven pipelines. It natively supports incremental data loading from on-premises SQL Server via self-hosted integration runtime, and can trigger pipelines on an hourly schedule with minimal latency, making it ideal for this scenario.

Exam trap

The trap here is that candidates confuse orchestration services with compute or processing services, assuming Azure Databricks or Azure Functions can handle scheduling and data movement, when in fact Azure Data Factory is the dedicated PaaS orchestrator for such pipelines.

How to eliminate wrong answers

Option A is wrong because Azure Logic Apps is a workflow automation service for integrating apps and services, not designed for heavy data movement or complex ETL orchestration; it lacks native support for self-hosted integration runtime and incremental data loading from on-premises databases. Option B is wrong because Azure Databricks is an Apache Spark-based analytics platform for big data processing and machine learning, not a pipeline orchestration service; while it can process data, it requires additional tooling for scheduling and orchestration. Option C is wrong because Azure Functions is a serverless compute service for running event-driven code, not a data pipeline orchestrator; it lacks built-in connectors for on-premises SQL Server and does not provide scheduling or monitoring capabilities for complex data movement.

Full explanation →

481

MCQmedium

A global e-commerce platform uses Azure Cosmos DB to store product inventory data. Customers add items to their cart, which reduces the available inventory count. The application requires that after a customer adds an item, any subsequent read of that product's inventory from any region in the world must reflect the reduced count immediately. Which Cosmos DB consistency level should be used?

A.Eventual consistency

B.Consistent prefix consistency

C.Session consistency

D.Strong consistency

AnswerD

Correct. Strong consistency provides linearizability, ensuring every read sees the most recent write globally.

Why this answer

Strong consistency ensures that any read operation returns the most recent write, regardless of the region. Since the application requires that after a customer adds an item, any subsequent read of that product's inventory from any region must reflect the reduced count immediately, Strong consistency is the only level that guarantees linearizability and zero staleness across all replicas.

Exam trap

The trap here is that candidates often assume Session consistency is sufficient because it provides 'read your writes' within a session, but the question explicitly requires immediate global visibility for any subsequent read from any region, which only Strong consistency can guarantee.

How to eliminate wrong answers

Option A is wrong because Eventual consistency allows reads to return stale data for an unbounded period, which would not guarantee immediate visibility of the reduced inventory count. Option B is wrong because Consistent prefix consistency only guarantees that reads never see out-of-order writes, but it does not guarantee that the read returns the latest write; stale data can still be returned. Option C is wrong because Session consistency guarantees monotonic reads and writes only within the context of a single client session; other clients or regions outside the session could still see stale data.

Full explanation →

482

MCQmedium

A data engineering team needs to transform large datasets stored in Azure Data Lake Storage Gen2 using Apache Spark with Python code. They want a fully managed service that provides serverless Spark pools, meaning no clusters to manage and automatic scaling. Which Azure service should they use?

A.Azure HDInsight

B.Azure Databricks

C.Azure Synapse Analytics with serverless Spark pools

D.Azure Machine Learning

AnswerC

Correct. Azure Synapse Analytics provides serverless Apache Spark pools that scale automatically and charge per use.

Why this answer

Azure Synapse Analytics with serverless Spark pools is the correct choice because it provides a fully managed, serverless Apache Spark environment that automatically scales and eliminates the need to manage clusters. This service directly supports transforming large datasets in Azure Data Lake Storage Gen2 using Python code with Spark, meeting the team's requirement for a no-cluster-management, auto-scaling solution.

Exam trap

The trap here is that candidates often confuse Azure Databricks as the only serverless Spark option, but Azure Synapse Analytics also offers serverless Spark pools that are fully managed and integrated with Azure Data Lake Storage Gen2, making it the correct answer for this specific scenario.

How to eliminate wrong answers

Option A is wrong because Azure HDInsight requires manual cluster management and provisioning, not serverless; it is a managed Hadoop/Spark service but still involves cluster lifecycle management. Option B is wrong because Azure Databricks, while offering serverless Spark, is a separate platform with its own workspace and pricing model, not the native Azure Synapse Analytics serverless Spark pool that integrates directly with Azure Data Lake Storage Gen2. Option D is wrong because Azure Machine Learning is focused on building, training, and deploying machine learning models, not on general-purpose data transformation with Apache Spark.

Full explanation →

483

MCQmedium

A company is migrating a 1.5 TB on-premises SQL Server database to Azure. The database relies on SQL Server Agent jobs for daily ETL processes and uses linked servers to query data from another on-premises SQL Server database. The company wants a fully managed PaaS service that requires minimal application changes. Which Azure SQL service should they choose?

A.Azure SQL Database

B.Azure SQL Managed Instance

C.SQL Server on Azure Virtual Machines

D.Azure Synapse Analytics

AnswerB

Correct. Azure SQL Managed Instance supports SQL Server Agent, linked servers, and provides high compatibility with on-premises SQL Server while being a fully managed PaaS service.

Why this answer

Azure SQL Managed Instance is the correct choice because it provides near 100% compatibility with on-premises SQL Server, including support for SQL Server Agent jobs and linked servers, while being a fully managed PaaS service. This minimizes application changes, as the migration can leverage the existing database code and features without significant rework.

Exam trap

The trap here is that candidates often choose Azure SQL Database because it is the most well-known PaaS option, failing to recognize that its lack of SQL Server Agent and linked server support would require significant application changes, which the question explicitly wants to minimize.

How to eliminate wrong answers

Option A is wrong because Azure SQL Database is a PaaS service but lacks support for SQL Server Agent jobs and linked servers, requiring significant application changes to replace these features with alternatives like elastic jobs or external tables. Option C is wrong because SQL Server on Azure Virtual Machines is an IaaS service, not fully managed PaaS, requiring the company to manage the VM and SQL Server, including patching and backups, which contradicts the requirement for minimal management overhead. Option D is wrong because Azure Synapse Analytics is a distributed analytics service designed for large-scale data warehousing and big data workloads, not for transactional OLTP workloads with SQL Server Agent jobs and linked servers, and would require major application redesign.

Full explanation →

484

MCQeasy

A manufacturing company collects sensor data from equipment on the factory floor. The data is generated continuously and must be processed immediately to detect anomalies and trigger alerts. Which type of data processing workload best describes this scenario?

A.Batch processing

B.Stream processing

C.Transactional processing

D.Analytical processing

AnswerB

Stream processes data in real time as it arrives, making it suitable for scenarios requiring immediate alerts and actions.

Why this answer

B is correct because the scenario requires continuous data ingestion and immediate processing to detect anomalies and trigger alerts, which is the defining characteristic of stream processing. Technologies like Azure Stream Analytics or Apache Kafka are designed to handle unbounded data streams with low-latency processing, unlike batch processing which operates on static datasets at scheduled intervals.

Exam trap

The trap here is that candidates confuse 'stream processing' with 'batch processing' because both can involve large volumes of data, but the key differentiator is the requirement for immediate, continuous processing versus scheduled, deferred processing.

How to eliminate wrong answers

Option A is wrong because batch processing processes data in large, discrete chunks at scheduled times, which cannot meet the 'immediately' requirement for real-time anomaly detection. Option C is wrong because transactional processing focuses on ACID-compliant operations for individual transactions (e.g., order entry), not continuous sensor data streams. Option D is wrong because analytical processing typically involves historical data aggregation and reporting (e.g., OLAP cubes), not real-time event-driven alerting.

Full explanation →

485

MCQhard

A company uses Azure SQL Database for its e-commerce platform. During a traffic spike, queries against the Orders table become slow. The table has 10 million rows and is clustered on OrderId. The most common query filters by CustomerId and OrderDate range. Which index change would most improve performance?

A.Create a clustered index on CustomerId

B.Partition the table by OrderId

C.Create a nonclustered index on (CustomerId, OrderDate)

D.Create a nonclustered index on (OrderDate, CustomerId)

AnswerC

This index supports efficient seek on CustomerId and range scan on OrderDate, directly addressing the slow query.

Why this answer

Option D is correct because a nonclustered index on (CustomerId, OrderDate) includes the columns used in the WHERE clause, enabling a seek operation. Option A is wrong because a clustered index on OrderId does not help queries filtering by CustomerId. Option B is wrong because a nonclustered index on (OrderDate, CustomerId) is less selective if OrderDate is high-cardinality.

Option C is wrong because partitioning by OrderId does not help queries filtering by CustomerId.

Full explanation →

486

MCQmedium

A company uses Azure SQL Database for an e-commerce application. They need to ensure that if the primary region fails, the database can be failed over to a secondary region with minimal data loss. Which feature should they enable?

A.Active geo-replication

B.Point-in-time restore

C.Auto-failover groups

D.Read-access geo-redundant storage (RA-GRS)

AnswerA

Active geo-replication provides a readable secondary in another region with low RPO.

Why this answer

Option A is correct because active geo-replication creates readable secondary replicas in a different region and supports failover with a configurable recovery point objective (RPO) of up to 5 seconds. Option B is wrong because auto-failover groups build on active geo-replication and enable automatic failover, but the core feature for minimal data loss is geo-replication. Option C is wrong because backup storage redundancy (RA-GRS) is for backups, not live failover.

Option D is wrong because point-in-time restore restores to a specific time, not for region failover.

Full explanation →

487

MCQeasy

A retail company receives a continuous stream of customer orders from their website via Azure Event Hubs. They also receive daily inventory updates from suppliers as CSV files uploaded to Azure Blob Storage. The company needs to calculate real-time order fulfillment availability by joining the streaming orders with the latest inventory snapshot. Additionally, they generate nightly sales reports from historical order data. Which Azure service should they use for the real-time processing component?

A.Azure Data Factory

B.Azure Stream Analytics

C.Azure Databricks

D.Azure Synapse Pipelines

AnswerB

Azure Stream Analytics is a real-time analytics service that can process streaming data from sources like Event Hubs, join with reference data, and output results with sub-minute latency.

Why this answer

Azure Stream Analytics is the correct choice because it is designed for real-time data processing, allowing you to join streaming data from Event Hubs with static or reference data (like the latest inventory snapshot from Blob Storage) using SQL-like queries. This enables the calculation of real-time order fulfillment availability as orders arrive, which is the core requirement.

Exam trap

The trap here is that candidates often choose Azure Databricks because they associate it with 'real-time' processing, but Stream Analytics is the simpler, more cost-effective, and purpose-built service for this exact pattern of joining streaming data with static reference data.

How to eliminate wrong answers

Option A is wrong because Azure Data Factory is an orchestration and ETL service for batch data movement and transformation, not for real-time stream processing. Option C is wrong because Azure Databricks is a big data analytics platform that can handle both batch and streaming workloads, but it is overkill for this specific real-time join scenario and requires more complex setup (e.g., Spark Structured Streaming) compared to the simpler, purpose-built Stream Analytics. Option D is wrong because Azure Synapse Pipelines are used for orchestrating data movement and transformation within Azure Synapse Analytics, primarily for batch workloads, not real-time stream processing.

Full explanation →

488

MCQhard

A company runs a financial application on Azure SQL Database. The Transactions table has a clustered columnstore index to support fast analytical queries on large historical datasets. However, the application also ingests a high volume of new transactions each second, and the columnstore index is causing performance degradation for these real-time inserts. The workload is hybrid (OLTP and OLAP). Which feature should the company implement to improve insert performance while still enabling efficient analytical queries on the table?

A.A: In-memory OLTP

B.B: Elastic Query

C.C: Hyperscale service tier

D.D: Convert the table to a rowstore heap with a nonclustered columnstore index

AnswerD

A nonclustered columnstore index on a rowstore table allows efficient OLTP inserts into the rowstore while the columnstore index periodically processes batches for analytical performance, achieving a balanced hybrid workload.

Why this answer

Option D is correct because converting the table to a rowstore heap with a nonclustered columnstore index allows the table to handle high-volume singleton inserts efficiently (rowstore heap) while still enabling fast analytical queries via the nonclustered columnstore index. This hybrid approach separates the OLTP insert path from the OLAP read path, avoiding the overhead of maintaining a clustered columnstore index during real-time ingestion.

Exam trap

The trap here is that candidates often assume a clustered columnstore index is always the best choice for analytical queries, overlooking the significant insert performance penalty it imposes on high-volume OLTP workloads, and fail to recognize that a nonclustered columnstore index on a heap can provide the same analytical benefits without the insert bottleneck.

How to eliminate wrong answers

Option A is wrong because In-memory OLTP is designed to accelerate OLTP transactions by storing tables in memory, but it does not directly address the performance degradation caused by a clustered columnstore index during inserts; it would require redesigning the table as a memory-optimized table and does not inherently support columnstore analytics. Option B is wrong because Elastic Query is used to query data across multiple Azure SQL databases or external data sources, not to improve insert performance on a single table. Option C is wrong because the Hyperscale service tier provides scalable storage and compute for large databases, but it does not change the fundamental behavior of a clustered columnstore index; inserts into a clustered columnstore index still incur overhead from delta store management and compression.

Full explanation →

489

MCQmedium

A database system must ensure that when a transfer of funds between two accounts is processed, if the system crashes after debiting the first account but before crediting the second, the database automatically undoes the debit. This property is best described as:

A.Atomicity

B.Consistency

C.Isolation

D.Durability

AnswerA

Atomicity ensures that all operations in a transaction complete or none do; a crash triggers an automatic rollback, undoing the partial debit.

Why this answer

Atomicity ensures that a transaction is treated as a single, indivisible unit of work. If the system crashes after debiting one account but before crediting the other, the database's transaction log records the partial changes, and during recovery, the database engine (e.g., SQL Server's ARIES recovery model) performs an automatic rollback of the uncommitted transaction, undoing the debit to maintain atomicity.

Exam trap

The trap here is that candidates confuse atomicity with consistency, thinking that maintaining a correct total balance (consistency) is what undoes the debit, but atomicity is the property that specifically handles the rollback of incomplete transactions after a crash.

How to eliminate wrong answers

Option B is wrong because consistency ensures that a transaction brings the database from one valid state to another, enforcing integrity constraints (e.g., total balance remains constant), but it does not inherently handle crash recovery or undo partial changes. Option C is wrong because isolation controls how concurrent transactions interact (e.g., via locking or snapshot isolation), preventing dirty reads or lost updates, but it does not address crash recovery or rollback of incomplete transactions. Option D is wrong because durability guarantees that once a transaction is committed, its changes persist even after a crash (e.g., via write-ahead logging), but it does not undo uncommitted changes; durability applies only to committed transactions.

Full explanation →

490

MCQhard

A company uses Azure Stream Analytics to process IoT data from thousands of devices. The output is written to Azure SQL Database for reporting. Recently, the job latency increased significantly. The company suspects that the SQL Database is throttling writes. Which action should the company take to reduce latency?

A.Change the input serialization from JSON to Avro.

B.Switch the output to Azure Cosmos DB with sufficient RU/s and use change feed to sync to SQL Database.

C.Increase the batch size of writes to Azure SQL Database.

D.Increase the number of Streaming Units for the Stream Analytics job.

AnswerB

Cosmos DB offers higher write throughput; change feed can asynchronously sync to SQL.

Why this answer

The correct answer is B because the latency is caused by Azure SQL Database throttling writes due to its row-based storage and limited write throughput. By switching the output to Azure Cosmos DB with sufficient Request Units per second (RU/s), the Stream Analytics job can write at high speed without throttling, and the change feed can then asynchronously sync data to Azure SQL Database for reporting, decoupling the write bottleneck.

Exam trap

The trap here is that candidates often assume increasing compute resources (Streaming Units) or batch sizes will fix any performance issue, but the real bottleneck is the output sink's write throttling, which requires a decoupled architecture like Cosmos DB with change feed.

How to eliminate wrong answers

Option A is wrong because changing input serialization from JSON to Avro reduces input data size and parsing overhead, but does not address the output write throttling to Azure SQL Database. Option C is wrong because increasing the batch size of writes to Azure SQL Database may help marginally but does not resolve the fundamental throttling issue; Azure SQL Database still enforces DTU or vCore limits that cap write throughput, and larger batches can increase lock contention and deadlock risks. Option D is wrong because increasing the number of Streaming Units (SUs) for the Stream Analytics job increases input processing throughput but does not alleviate the output sink bottleneck; the job will still be throttled by Azure SQL Database's write limits.

Full explanation →

491

MCQeasy

A company stores massive amounts of unstructured log data as text files in Azure Blob Storage. The logs are written once and accessed only a few times per month for compliance audits. When accessed, the data must be available within 15 minutes. The company's priority is minimizing storage costs. Which Azure Blob Storage access tier should they use?

A.Hot

B.Cool

C.Archive

D.Premium

AnswerB

Cool tier provides low storage cost for infrequently accessed data with immediate retrieval, meeting the cost and availability requirements.

Why this answer

The Cool access tier is optimal because the logs are accessed infrequently (a few times per month) but require retrieval within 15 minutes. Cool tier offers lower storage costs than Hot while still supporting near-instant access, making it the best balance for minimizing storage costs with occasional compliance audits.

Exam trap

The trap here is that candidates often choose Archive for cost minimization without considering the rehydration latency requirement, mistakenly assuming all infrequently accessed data qualifies for Archive regardless of retrieval time constraints.

How to eliminate wrong answers

Option A is wrong because the Hot tier has higher storage costs, which contradicts the company's priority of minimizing storage costs for data that is rarely accessed. Option C is wrong because the Archive tier has the lowest storage cost but requires rehydration times of up to 15 hours, exceeding the 15-minute availability requirement. Option D is wrong because the Premium tier is designed for high-performance, low-latency access with sub-millisecond latency and higher costs, which is unnecessary and cost-prohibitive for infrequently accessed log data.

Full explanation →

492

MCQmedium

Your team is designing a database for a financial application that requires ACID transactions. You are considering Azure Cosmos DB and Azure SQL Database. Which service should you choose?

A.Azure SQL Database

B.Azure Table Storage

C.Azure Cosmos DB

D.Azure Database for MySQL

AnswerA

Provides full ACID transactions.

Why this answer

Option B is correct because Azure SQL Database provides full ACID transactions. Option A is wrong because Azure Cosmos DB provides limited transactional support. Option C is wrong because Azure Database for MySQL provides ACID but the question asks between Cosmos DB and SQL Database.

Option D is wrong because Azure Table Storage does not support ACID.

Full explanation →

493

MCQmedium

Refer to the exhibit. You are reviewing an ARM template for an Azure SQL Database deployment. What is the maximum size (in GB) of the database?

A.256 GB

B.500 GB

C.250 GB

D.268 GB

AnswerC

268435456000 bytes = 250 GB.

Why this answer

Option B is correct because 268435456000 bytes equals 250 GB (268435456000 / (1024^3) = 250). Option A is wrong because 268 GB would be 268435456000 bytes? Actually 268 GB is close but 250 GB is exact. Option C is wrong because 256 GB would be 274877906944 bytes.

Option D is wrong because 500 GB would be 536870912000 bytes.

Full explanation →

494

MCQeasy

A retail company runs a nightly job that reads all sales transactions from the previous day from an operational database, aggregates them by product category and store location, and writes the summary results into a data warehouse for reporting. Which type of data processing workload does this nightly job represent?

A.A. Real-time processing

B.B. Batch processing

C.C. Stream processing

D.D. Transactional processing

AnswerB

Batch processing involves processing a defined set of data at scheduled intervals, such as nightly aggregation jobs.

Why this answer

The nightly job processes data in discrete, scheduled batches—reading all sales transactions from the previous day, aggregating them, and writing results to a data warehouse. This is the classic definition of batch processing, where data is collected over a period and processed together in a single job run. In Azure, this workload maps to services like Azure Data Factory or Azure Synapse Pipelines executing scheduled pipelines.

Exam trap

The trap here is that candidates confuse 'scheduled' or 'periodic' processing with stream processing, but the key differentiator is that batch processing works on a bounded dataset (all data from the previous day) while stream processing works on an unbounded, continuous flow of data.

How to eliminate wrong answers

Option A is wrong because real-time processing requires data to be processed immediately as it arrives (e.g., sub-second latency), not after a 24-hour delay. Option C is wrong because stream processing continuously processes unbounded data streams (e.g., from Azure Stream Analytics or Event Hubs), not a fixed set of historical records from a previous day. Option D is wrong because transactional processing handles individual, atomic operations (e.g., OLTP in Azure SQL Database) with ACID guarantees, not bulk aggregation of historical data.

Full explanation →

495

MCQhard

A financial services company processes real-time stock trade data from multiple exchanges. Trades are ingested into Azure Event Hubs. The company needs to compute a 5-minute sliding window average of trade prices per stock symbol and ensure that each trade is processed exactly once within the window. The aggregated results must be stored in Azure SQL Database for historical reporting and also sent to a Power BI dashboard for near real-time visualization. Which Azure service should be used for the real-time processing?

A.Azure Stream Analytics

B.Azure Databricks with Structured Streaming

C.Azure Data Factory

D.Azure Event Hubs

AnswerA

Correct. Stream Analytics is designed for complex event processing with windowed aggregations and supports exactly-once delivery. It can output to multiple sinks, including SQL Database and Power BI, in near real-time.

Why this answer

Azure Stream Analytics is the correct choice because it is purpose-built for real-time stream processing with native support for time-based windowing (e.g., 5-minute sliding window) and exactly-once semantics when used with Azure Event Hubs as input and Azure SQL Database as output. It can directly compute the sliding window average of trade prices per stock symbol and output results to both Azure SQL Database for historical storage and Power BI for near real-time visualization, all without requiring additional code or infrastructure management.

Exam trap

The trap here is that candidates often confuse Azure Event Hubs (a data ingestion service) with a processing engine, or assume that Azure Databricks is the only option for streaming analytics, overlooking the simpler, fully managed, and cost-effective Azure Stream Analytics for straightforward windowed aggregations.

How to eliminate wrong answers

Option B (Azure Databricks with Structured Streaming) is wrong because while it can process streaming data, it is a more complex, code-intensive solution that requires cluster management and does not natively guarantee exactly-once processing out-of-the-box without additional configuration; it is overkill for this specific sliding window aggregation task. Option C (Azure Data Factory) is wrong because it is an orchestration and ETL service for batch data movement and transformation, not a real-time stream processing engine; it cannot compute sliding window averages on live trade data. Option D (Azure Event Hubs) is wrong because it is a data ingestion and event streaming platform, not a compute service; it cannot perform the aggregation or windowing logic required to compute the average trade price.

Full explanation →

496

MCQhard

A multinational corporation uses Azure Synapse Analytics serverless SQL pool to query data in Azure Data Lake Storage. The security team requires that access to specific columns containing personally identifiable information (PII) be restricted based on the user's role. Which feature should be implemented?

A.Row-level security (RLS)

B.Column-level security

C.Azure Purview data classification

D.Dynamic data masking

AnswerB

Column-level security allows granting or denying access to specific columns, preventing unauthorized users from querying PII columns.

Why this answer

Column-level security (CLS) in Azure Synapse Analytics serverless SQL pool allows you to restrict access to specific columns containing PII based on the user's role or identity. By granting or denying SELECT permissions on individual columns, you can ensure that only authorized users see sensitive data while others see NULL or an error. This directly meets the requirement to restrict column access by role.

Exam trap

The trap here is that candidates often confuse Dynamic data masking with column-level security, but DDM only masks data at the presentation layer and does not prevent access to the underlying column, whereas CLS actually denies permission to read the column.

How to eliminate wrong answers

Option A is wrong because Row-level security (RLS) restricts access to entire rows based on a predicate, not specific columns, so it cannot limit visibility of PII columns. Option C is wrong because Azure Purview data classification is a metadata and governance tool that identifies and labels sensitive data, but it does not enforce access restrictions on columns. Option D is wrong because Dynamic data masking (DDM) obfuscates data at query time for non-privileged users but does not prevent access to the underlying column data; privileged users can still see the original values, and it does not provide role-based column-level restriction.

Full explanation →

497

MCQhard

An e-commerce application uses Azure SQL Database. The Orders table stores millions of rows with columns: OrderID (primary key, clustered index), CustomerID, OrderDate, OrderStatus, TotalAmount. Queries frequently filter on OrderDate and OrderStatus, and sort results by OrderDate DESC. Which indexing strategy will most improve query performance for these filters and sort?

A.Create a nonclustered index on OrderDate only.

B.Create a nonclustered index on OrderDate, OrderStatus and include other columns needed by the query.

C.Create a clustered columnstore index on the table.

D.Create a nonclustered index on OrderStatus only.

AnswerB

A composite nonclustered index on (OrderDate, OrderStatus) allows the query to filter on both columns efficiently and provides the data already sorted by OrderDate (the leading key). Including other columns avoids key lookups, making the query even faster.

Why this answer

Option B is correct because a nonclustered index on (OrderDate, OrderStatus) supports both the filter and the sort in a single index seek/scan. SQL Server can use the index to locate rows matching both predicates and return them already sorted by OrderDate DESC without a separate sort operation, which is critical for performance on millions of rows.

Exam trap

The trap here is that candidates often think a single-column index on the most filtered column (OrderDate) is sufficient, overlooking that the second filter (OrderStatus) and the sort order require a composite index to avoid extra processing.

How to eliminate wrong answers

Option A is wrong because an index on OrderDate only does not cover the OrderStatus filter, forcing key lookups or a full scan to evaluate the status predicate, which is inefficient for large tables. Option C is wrong because a clustered columnstore index is optimized for analytical/aggregation workloads, not for point lookups or ordered retrieval of individual rows; it would degrade performance for the described transactional queries. Option D is wrong because an index on OrderStatus only does not help with the OrderDate sort, requiring a separate sort operation after filtering, and it does not support the date range filter efficiently.

Full explanation →

498

MCQmedium

A company plans to migrate an on-premises SQL Server database to Azure. The database currently uses SQL Server Agent jobs for scheduled maintenance tasks, cross-database queries, and query store for performance tuning. The database size is 500 GB and needs to scale to 10 TB eventually. They want a managed service that requires minimal application changes. Which Azure relational database service should they choose?

A.Azure SQL Managed Instance

B.Azure SQL Database (single database)

C.Azure SQL Database Hyperscale

D.Azure Database for SQL Server

AnswerA

Azure SQL Managed Instance offers the highest compatibility with on-premises SQL Server, supporting SQL Agent, cross-database queries within the instance, and Query Store. It provides up to 16 TB of storage, meeting the size requirements.

Why this answer

Azure SQL Managed Instance is the correct choice because it provides near 100% compatibility with on-premises SQL Server, including support for SQL Server Agent jobs, cross-database queries, and Query Store, while being a fully managed PaaS service. It allows scaling up to 10 TB (up to 16 TB with some configurations) with minimal application changes, as it uses the same T-SQL surface area and network configuration (VNet) as on-premises SQL Server.

Exam trap

The trap here is that candidates often confuse Azure SQL Database (single database) with SQL Managed Instance, assuming that Hyperscale's large storage capacity compensates for missing features like SQL Server Agent and cross-database queries, but the exam tests the specific feature requirements (Agent jobs, cross-database queries) that only Managed Instance fully supports.

How to eliminate wrong answers

Option B (Azure SQL Database single database) is wrong because it does not support SQL Server Agent jobs or cross-database queries (except via elastic queries or external tables), and its maximum size is 4 TB (or 100 TB with Hyperscale, but still lacks Agent and cross-database support). Option C (Azure SQL Database Hyperscale) is wrong because, while it supports large databases up to 100 TB, it does not support SQL Server Agent jobs or cross-database queries, and it requires application changes for connection strings and some T-SQL features. Option D (Azure Database for SQL Server) is wrong because this is not a real Azure service; the correct name is Azure Database for SQL Server (which is actually a marketing term for SQL Server on Azure VMs) or Azure SQL Database, but as a distinct service, it does not exist — the intended trap is confusing it with SQL Server on Azure VMs, which is IaaS, not a managed service.

Full explanation →

499

MCQeasy

A media company stores large video files in Azure Blob Storage. The videos are accessed frequently for the first 30 days after upload, then rarely for the next 180 days. After that, they are only needed for compliance and are never accessed. Which access tier should be used for the first 30 days to minimize costs while maintaining low latency?

A.Hot tier

B.Cool tier

C.Archive tier

D.Premium tier

AnswerA

Correct. The Hot tier is optimized for frequent read/write access and offers the lowest latency, which is needed for daily active video editing and streaming.

Why this answer

The Hot tier is the correct choice for the first 30 days because it provides the lowest latency access and highest throughput for frequently accessed data, which matches the requirement of frequent access during this period. While the Hot tier has the highest storage cost per GB, it has no retrieval costs, making it cost-effective for high-access patterns. The other tiers introduce either retrieval fees (Cool), high latency (Archive), or unnecessary cost (Premium) for this use case.

Exam trap

The trap here is that candidates often choose the Cool tier thinking it saves money on storage for the first 30 days, but they overlook the retrieval costs and the fact that Hot tier is actually cheaper for frequently accessed data due to zero retrieval fees.

How to eliminate wrong answers

Option B (Cool tier) is wrong because although it has lower storage cost, it incurs a retrieval cost per GB and has slightly higher latency than Hot, making it suboptimal for frequent access during the first 30 days. Option C (Archive tier) is wrong because it has the lowest storage cost but retrieval times can take hours (up to 15 hours for standard priority), which violates the low-latency requirement for frequent access. Option D (Premium tier) is wrong because it is designed for high-performance block blob workloads with consistent low latency and higher cost, but it is overkill and more expensive than Hot for standard video file access.

Full explanation →

500

Multi-Selecteasy

Which TWO Azure services can be used to perform data transformation in a serverless manner? (Choose two.)

Select 2 answers

A.Azure Databricks with Apache Spark

B.Azure Synapse serverless SQL pool

C.Azure Data Factory mapping data flows

D.Azure SQL Database

E.Azure HDInsight with Hive

AnswersB, C

Serverless SQL query engine for data lakes.

Why this answer

Azure Synapse serverless SQL pool (Option B) is correct because it allows you to query and transform data directly from data lake files (e.g., Parquet, CSV) using T-SQL without provisioning any dedicated compute resources. It uses a pay-per-query model, making it inherently serverless for data transformation tasks.

Exam trap

The trap here is that candidates often confuse 'serverless' with 'managed' or 'cloud-based,' incorrectly selecting services like Azure Databricks or HDInsight which still require explicit cluster provisioning and management, whereas the exam specifically tests the pay-per-query, no-provisioning model of serverless SQL pool and mapping data flows.

Full explanation →

501

MCQmedium

A financial services company stores petabytes of transaction data in Parquet format in Azure Data Lake Storage Gen2. Data analysts need to run complex SQL queries that join multiple large tables and aggregate billions of rows, with results expected within seconds. The company wants to use a massively parallel processing (MPP) engine that supports T-SQL and can be paused to reduce costs during off-hours. They also need native integration with Azure Data Factory and Power BI. Which Azure service should they use?

A.Azure Synapse Analytics

B.Azure HDInsight

C.Azure Databricks

D.Azure SQL Database

AnswerA

Synapse Analytics provides MPP architecture, T-SQL support, pause capability, and tight integration with Azure Data Factory and Power BI, making it ideal for large-scale data warehousing.

Why this answer

Azure Synapse Analytics (formerly SQL DW) is the correct choice because it provides a massively parallel processing (MPP) engine that distributes data across 60 distributions, enabling complex T-SQL queries on petabyte-scale Parquet data with results in seconds. It supports native T-SQL, can be paused to reduce costs during off-hours, and offers built-in integration with Azure Data Factory and Power BI through its SQL endpoints and linked service connectors.

Exam trap

The trap here is that candidates often confuse Azure Synapse Analytics with Azure SQL Database or Azure Databricks, not realizing that only Synapse combines MPP architecture, native T-SQL support, pause capability, and direct integration with Azure Data Factory and Power BI for petabyte-scale analytics.

How to eliminate wrong answers

Option B (Azure HDInsight) is wrong because it is a managed Hadoop/Spark cluster that does not natively support T-SQL (it uses HiveQL or Spark SQL) and cannot be paused; it must be stopped or deleted to save costs, and its integration with Power BI is indirect via Hive ODBC. Option C (Azure Databricks) is wrong because it is an Apache Spark-based analytics platform that does not support T-SQL (it uses Spark SQL or Python/Scala) and cannot be paused; it runs on clusters that must be terminated to stop billing, and while it integrates with Azure Data Factory and Power BI, it lacks the native T-SQL MPP engine required for the described workload. Option D (Azure SQL Database) is wrong because it is a single-node relational database designed for OLTP workloads, not MPP; it cannot handle petabyte-scale data or distribute queries across multiple nodes, and it cannot be paused (only stopped, which incurs storage costs).

Full explanation →

502

MCQeasy

A company stores IoT sensor data as JSON files in Azure Blob Storage. A data analyst needs to run ad-hoc SQL queries on these files without moving the data and without provisioning any compute clusters. The analyst wants to pay only for the amount of data processed by each query. Which Azure service should they use?

A.Azure SQL Database

B.Azure Synapse Serverless SQL pool

C.Azure Cosmos DB

D.Azure Data Factory

AnswerB

Correct. Azure Synapse Serverless SQL pool can query JSON files in Blob Storage using T-SQL and charges per data processed.

Why this answer

Azure Synapse Serverless SQL pool allows you to query data directly from Azure Blob Storage using T-SQL without provisioning any compute clusters. It uses a pay-per-query model where you are billed only for the amount of data processed, making it ideal for ad-hoc SQL queries on JSON files stored in Blob Storage without data movement.

Exam trap

The trap here is that candidates often confuse Azure Synapse Serverless SQL pool with Azure SQL Database, assuming any SQL-capable service can query files in Blob Storage, but only the serverless pool provides pay-per-query billing and direct file access without provisioning compute.

How to eliminate wrong answers

Option A is wrong because Azure SQL Database is a provisioned relational database service that requires you to import data into it and pay for reserved compute, not for data processed per query. Option C is wrong because Azure Cosmos DB is a NoSQL database that requires data to be ingested into its containers and does not support ad-hoc SQL queries directly on files in Blob Storage without provisioning throughput. Option D is wrong because Azure Data Factory is an ETL and data integration service, not a SQL query engine; it cannot run ad-hoc SQL queries directly on files without moving or transforming data.

Full explanation →

503

MCQmedium

A library management system uses Azure SQL Database. The Books table has 500,000 rows with columns: BookID (primary key, clustered), Title, Author, ISBN, PublishedYear, CopiesAvailable. Queries frequently filter by Author and then sort results by PublishedYear in descending order. The queries also return the Title and CopiesAvailable columns. Which indexing strategy will most improve query performance for these operations?

A.Create a nonclustered index on (Author, PublishedYear DESC) and include (Title, CopiesAvailable)

B.Create a nonclustered index on Author only

C.Create a nonclustered index on PublishedYear DESC

D.Keep only the existing clustered index on BookID

AnswerA

This covering index supports the filter and sort without accessing the base table.

Why this answer

Option A is correct because it creates a covering nonclustered index on (Author, PublishedYear DESC) that directly supports the filter (Author) and sort (PublishedYear DESC) operations. Including Title and CopiesAvailable as non-key columns makes the index covering, meaning all required columns are in the index leaf level, so SQL Server can satisfy the query entirely from the index without key lookups to the clustered index. This minimizes I/O and improves query performance.

Exam trap

The trap here is that candidates often think any index on the filtered column (Author) is sufficient, overlooking the need to also cover the sort order and include all returned columns to avoid key lookups.

How to eliminate wrong answers

Option B is wrong because an index on Author only would support the filter but not the sort on PublishedYear DESC, requiring a separate sort operation in the query plan. Option C is wrong because an index on PublishedYear DESC alone does not support the filter on Author, so SQL Server would still need to scan or seek on Author separately. Option D is wrong because the existing clustered index on BookID is not useful for filtering by Author or sorting by PublishedYear, leading to a full table scan and sort.

Full explanation →

504

MCQmedium

A company uses Azure Synapse Analytics to run a data warehouse. They need to load 500 GB of historical data from Azure Blob Storage into a staging table. They want the fastest load performance with minimal administrative overhead. Which method should they use?

A.Use SQL Server Integration Services (SSIS)

B.Use PolyBase with the COPY INTO statement

C.Use Azure Data Factory with Copy activity

D.Use the bcp utility

AnswerB

PolyBase parallel loading provides fastest throughput with minimal management.

Why this answer

PolyBase with the COPY INTO statement is the fastest method for loading large volumes of data into Azure Synapse Analytics because it leverages the Massively Parallel Processing (MPP) architecture to read data directly from Azure Blob Storage in parallel across all compute nodes, bypassing any single-node bottleneck. It also requires minimal administrative overhead as it is a native T-SQL command with automatic schema inference and no external tools or orchestration to manage.

Exam trap

The trap here is that candidates often assume Azure Data Factory is always the fastest for data movement because of its visual interface and parallelization, but they overlook that PolyBase's direct integration with Synapse's MPP engine provides superior performance for warehouse loading without intermediate data routing.

How to eliminate wrong answers

Option A is wrong because SQL Server Integration Services (SSIS) runs on a single integration runtime node and cannot exploit Synapse's MPP parallelism, making it significantly slower for 500 GB loads, and it requires managing an SSIS catalog and packages, adding administrative overhead. Option C is wrong because Azure Data Factory with Copy activity introduces an additional orchestration layer that, while parallelized, still routes data through the Data Factory service rather than directly into Synapse's compute nodes, resulting in slower performance compared to PolyBase's direct parallel reads; it also requires pipeline monitoring and configuration overhead. Option D is wrong because the bcp utility is a single-threaded command-line tool that loads data row by row over a network connection, making it extremely slow for 500 GB and unsuitable for bulk loading into a distributed data warehouse.

Full explanation →

505

Multi-Selecthard

Which THREE are valid reasons to choose Azure SQL Managed Instance over Azure SQL Database?

Select 3 answers

A.Simpler high-availability configuration

B.Need for SQL Server Agent and CLR integration

C.Desire for automated backups

D.Need for instance-level features like Service Broker or Database Mail

E.Requirement for cross-database queries within the same instance

AnswersB, D, E

Managed Instance supports SQL Agent and CLR, which are not available in Azure SQL Database single database.

Why this answer

Options A, B, and D are correct. Managed Instance offers near 100% compatibility with SQL Server, supports SQL Server Agent, and provides a virtual network for secure integration. Option C is wrong because both services support automated backups.

Option E is wrong because both can be configured for high availability; Managed Instance does not necessarily have a simpler HA setup.

Full explanation →

506

MCQeasy

A marketing company collects data from social media feeds including text posts, images, and videos. The data arrives in various formats with no fixed structure or schema. This type of data is best described as:

A.A) Structured data

B.B) Semi-structured data

C.C) Unstructured data

D.D) Relational data

AnswerC

Unstructured data has no predefined schema and includes free-form text, images, videos, etc. Social media feeds typically contain this type of data.

Why this answer

Unstructured data lacks a predefined data model or schema, making it ideal for storing text posts, images, and videos that arrive in varied formats. Unlike structured or semi-structured data, unstructured data cannot be easily organized into rows and columns or parsed with tags, which is why option C is correct for this scenario.

Exam trap

The trap here is that candidates confuse semi-structured data (e.g., JSON with tags) with unstructured data, but the key differentiator is the complete absence of any schema or metadata markers in the described social media feeds.

How to eliminate wrong answers

Option A is wrong because structured data requires a fixed schema with rows and columns (e.g., a SQL table), which does not apply to free-form text, images, or videos. Option B is wrong because semi-structured data has some organizational properties like tags or key-value pairs (e.g., JSON, XML), but the data described has no fixed structure or schema at all. Option D is wrong because relational data is a subset of structured data stored in tables with defined relationships, which is not the case for heterogeneous social media feeds.

Full explanation →

507

MCQeasy

A healthcare organization stores medical imaging files (DICOM) that are actively used by radiologists for the first 30 days. After 30 days, the files are accessed infrequently for up to 5 years. After 5 years, they must be retained for legal compliance but are accessed very rarely. The organization wants to minimize storage costs. Which strategy should they use to manage the data lifecycle in Azure Blob Storage?

A.Store all files in the Hot tier and use lifecycle management to move to the Archive tier after 5 years.

B.Store files in the Hot tier, move to Cool tier after 30 days, then to Archive tier after 5 years.

C.Store all files in the Archive tier from the beginning to minimize cost.

D.Store all files in the Cool tier to balance cost and access.

AnswerB

This lifecycle management strategy aligns with access patterns: Hot for active use, Cool for infrequent access, Archive for long-term retention, minimizing overall storage cost.

Why this answer

Option B is correct because it aligns the data lifecycle with the access patterns: Hot tier for frequent initial access, Cool tier for infrequent access after 30 days, and Archive tier for long-term compliance after 5 years. Azure Blob Storage lifecycle management policies can automate these transitions, minimizing costs by using the most cost-effective tier for each phase.

Exam trap

The trap here is that candidates often assume the Archive tier is always the cheapest option from day one, ignoring the high retrieval costs and latency for actively used data, or they overlook the need for a graduated tier strategy to match changing access patterns.

How to eliminate wrong answers

Option A is wrong because moving directly to Archive after 5 years leaves files in the Hot tier for the entire 5 years, incurring high storage costs for infrequently accessed data. Option C is wrong because storing all files in the Archive tier from the beginning would cause high retrieval costs and latency for the first 30 days when radiologists need active access, and the Archive tier has a 15-minute to several-hour rehydration time. Option D is wrong because the Cool tier is not cost-optimal for the first 30 days of active use (Hot tier is cheaper for frequent access) and does not provide the lowest cost for the 5+ year retention period (Archive tier is cheaper).

Full explanation →

508

MCQmedium

A retail company wants to analyze years of historical sales data stored as CSV files in Azure Blob Storage. The analytics solution must be serverless, allow T-SQL queries without managing infrastructure, and integrate directly with Power BI. Which Azure service should the company use?

A.Azure SQL Database

B.Azure Synapse Analytics serverless SQL pool

C.Azure Cosmos DB

D.Azure Analysis Services

AnswerB

The serverless SQL pool in Azure Synapse Analytics can query data in data lakes and Blob Storage using T-SQL without managing any infrastructure, and it integrates with Power BI.

Why this answer

Azure Synapse Analytics serverless SQL pool is the correct choice because it provides a serverless, on-demand query service that can directly query CSV files stored in Azure Blob Storage using T-SQL without requiring any infrastructure management. It integrates natively with Power BI via the T-SQL endpoint, enabling direct data visualization from the queried files.

Exam trap

The trap here is that candidates often confuse Azure SQL Database (a provisioned database) with a serverless query service, or they mistakenly think Azure Analysis Services can directly query raw files, when in fact it requires pre-loaded data models.

How to eliminate wrong answers

Option A is wrong because Azure SQL Database is a fully managed relational database service that requires provisioning and managing a database instance, not a serverless query service over files in Blob Storage. Option C is wrong because Azure Cosmos DB is a NoSQL database designed for globally distributed, low-latency workloads and does not support T-SQL queries or direct querying of CSV files in Blob Storage. Option D is wrong because Azure Analysis Services is a semantic modeling service that requires data to be loaded into a model and does not directly query CSV files in Blob Storage using T-SQL.

Full explanation →

509

MCQhard

A financial services company runs a critical application on Azure SQL Managed Instance. They need to ensure that in the event of a regional outage, the database can be failed over to a secondary region with minimal data loss and automatic failover. The secondary region should not be used for read traffic during normal operations. Which configuration meets these requirements?

A.Create an automatic failover group with readable secondary enabled

B.Use Azure SQL Database backup to blob storage and restore to another region

C.Implement active geo-replication and manually initiate failover

D.Configure a failover group with automatic failover policy and set the secondary to non-readable

AnswerD

Failover groups with automatic failover and readable secondary disabled meet the requirements.

Why this answer

Option A is correct because failover groups with automatic failover and readable secondary set to 'No' provide disaster recovery with minimal data loss and no read traffic on secondary. Option B is wrong because active geo-replication allows read-only access on secondary. Option C is wrong because automatic failover groups require readable secondary to be enabled for automatic failover.

Option D is wrong because backup restore does not provide automatic failover and has higher data loss.

Full explanation →

510

MCQmedium

A social media application stores user profiles in Azure Cosmos DB using the NoSQL API. Each profile includes UserID, Name, Email, and an array of Posts. The most common query retrieves a user's profile by UserID. The application requires strong consistency for writes so that once a profile is updated, all subsequent reads see the latest data. To minimize Request Unit (RU) consumption, which partition key should be chosen?

A.UserID

B.Email

C.Name

D.A synthetic partition key combining UserID and Region

AnswerA

UserID is unique and high cardinality, distributing data evenly. Retrieving by UserID becomes a point read, consuming the fewest RUs.

Why this answer

UserID is the correct partition key because it is the primary filter in the most common query (retrieving a profile by UserID), ensuring each query targets a single logical partition. This minimizes cross-partition queries and RU consumption. Additionally, UserID provides high cardinality and even distribution, which prevents hot partitions and supports the required strong consistency for writes.

Exam trap

The trap here is that candidates often choose a synthetic key or a secondary attribute like Email, thinking they need to avoid hot partitions, but they overlook that the most common query pattern and the need for minimal RU consumption dictate using the primary query filter as the partition key.

How to eliminate wrong answers

Option B (Email) is wrong because while Email is unique, it is not the primary query filter; using it would require an additional index lookup or cross-partition query for the most common operation, increasing RU cost. Option C (Name) is wrong because Name is not unique and has low cardinality, leading to large partitions and potential hot spots, which degrades performance and RU efficiency. Option D (A synthetic partition key combining UserID and Region) is wrong because it adds unnecessary complexity and could cause cross-partition queries if Region is not consistently used in the query filter; it also risks uneven data distribution if Region is skewed.

Full explanation →

511

MCQhard

A financial analytics company stores petabytes of transaction data in Parquet files in Azure Data Lake Storage Gen2. Data analysts need to run complex SQL queries that join multiple large tables and return results within seconds. The company also wants to integrate with Power BI for visualization and Azure Data Factory for ETL orchestration. They require a massively parallel processing (MPP) engine to handle the scale. Which Azure service should they choose?

A.Azure Synapse Analytics dedicated SQL pool

B.Azure SQL Database

C.Azure Cosmos DB

D.Azure Analysis Services

AnswerA

Correct. The dedicated SQL pool in Azure Synapse Analytics is an MPP engine optimized for large-scale analytical workloads. It can query data directly in ADLS Gen2 via PolyBase, supports complex joins, and integrates with Power BI and Azure Data Factory.

Why this answer

Azure Synapse Analytics dedicated SQL pool is the correct choice because it provides a massively parallel processing (MPP) engine that distributes data across 60 distributions, enabling fast execution of complex SQL queries on petabyte-scale data stored in Parquet files in Azure Data Lake Storage Gen2. It natively integrates with Power BI for visualization and Azure Data Factory for ETL orchestration, meeting all stated requirements.

Exam trap

The trap here is that candidates may confuse Azure Synapse Analytics dedicated SQL pool with Azure SQL Database, assuming both are 'SQL' and thus interchangeable, but the key differentiator is the MPP architecture required for petabyte-scale workloads.

How to eliminate wrong answers

Option B (Azure SQL Database) is wrong because it is a single-node, general-purpose relational database that lacks MPP architecture and cannot efficiently handle petabyte-scale data or complex joins across large tables with sub-second response times. Option C (Azure Cosmos DB) is wrong because it is a NoSQL database designed for globally distributed, low-latency access to semi-structured data, not for running complex SQL joins on petabyte-scale relational data stored in Parquet files. Option D (Azure Analysis Services) is wrong because it is an OLAP engine that provides in-memory analytics and semantic modeling, but it does not execute SQL queries directly against raw data in Data Lake Storage; it requires pre-processed data and lacks the MPP engine needed for petabyte-scale query processing.

Full explanation →

512

MCQhard

Your organization is migrating a large on-premises SQL Server database to Azure SQL Database. The database is used by a critical line-of-business application that requires near-zero downtime during migration. The application uses a mix of read and write operations, and you need to minimize the cutover time. The database is about 2 TB in size. You plan to use Azure Database Migration Service (DMS) for the migration. The source SQL Server is version 2019 Enterprise Edition. The target is a Business Critical tier Azure SQL Database. You need to choose the appropriate migration method and configuration. Which of the following actions should you take?

A.Use offline migration mode with DMS on Standard tier

B.Use online migration mode with DMS on Basic tier

C.Use online migration mode with DMS on Premium tier

D.Use offline migration mode with DMS on Premium tier, then manually copy remaining changes

AnswerC

Online mode provides near-zero downtime; Premium tier handles large volumes efficiently.

Why this answer

Option C is correct because using online migration mode with continuous sync allows near-zero downtime by replicating changes from source to target, and using the Premium tier for DMS provides better performance for large databases. Option A is wrong because offline migration causes downtime. Option B is wrong because using Basic tier DMS is too slow for 2 TB.

Option D is wrong because migrating to Managed Instance is not necessary and may not align with requirements.

Full explanation →

513

Multi-Selectmedium

Which TWO of the following are benefits of using Azure Table Storage over Azure Blob Storage for storing semi-structured data?

Select 2 answers

A.Supports querying by partition key and row key

B.Designed for key-value storage and retrieval

C.Provides automatic indexing of all attributes

D.Supports REST API access

E.Offers higher throughput for large files

AnswersA, B

Table Storage is optimized for key-based queries, unlike Blob Storage.

Why this answer

Table Storage supports key-value access and automatic indexing of partition and row keys, making queries by key efficient. Blob Storage is for unstructured data and does not provide built-in key-based querying. Both have REST APIs.

Blob Storage has higher throughput for large files.

Full explanation →

514

MCQeasy

A company collects data from three sources: Source A: Customer records from a relational database with fixed columns (CustomerID, Name, Address). Source B: Social media posts in JSON format with varying fields (e.g., some posts have 'likes', others have 'shares'). Source C: Handwritten notes saved as scanned images in TIFF format. Which statement correctly categorizes the data by structure?

A.Source A: Structured, Source B: Semi-structured, Source C: Unstructured

B.Source A: Structured, Source B: Unstructured, Source C: Semi-structured

C.Source A: Semi-structured, Source B: Structured, Source C: Unstructured

D.Source A: Semi-structured, Source B: Unstructured, Source C: Structured

AnswerA

This correctly identifies structured data (customer records with fixed columns), semi-structured data (JSON with variable fields), and unstructured data (images with no inherent structure).

Why this answer

Source A's relational database with fixed columns (CustomerID, Name, Address) enforces a strict schema, making it structured data. Source B's JSON format allows varying fields like 'likes' or 'shares' per record, which is the hallmark of semi-structured data (self-describing, schema-on-read). Source C's scanned TIFF images are binary blobs with no inherent internal structure for querying, classifying them as unstructured data.

This matches the standard DP-900 categorization: structured (fixed schema), semi-structured (flexible schema), unstructured (no schema).

Exam trap

Microsoft often tests the misconception that 'JSON is unstructured because it looks like text' or that 'scanned images are semi-structured because they have metadata,' but the DP-900 definition hinges on whether the data has a fixed schema (structured), flexible schema (semi-structured), or no schema (unstructured).

How to eliminate wrong answers

Option B is wrong because it misclassifies Source B (JSON with varying fields) as unstructured, but JSON is the classic example of semi-structured data due to its key-value pairs and flexible schema. Option C is wrong because it labels Source A (relational database with fixed columns) as semi-structured, but relational databases enforce a rigid schema (rows and columns) that defines structured data. Option D is wrong because it calls Source A semi-structured (should be structured) and Source C structured (should be unstructured), completely reversing the correct categorization.

Full explanation →

515

MCQmedium

A manufacturing company deploys IoT sensors on equipment in a factory. They need to monitor sensor data in real time to detect anomalies and trigger immediate alerts. They also need to store years of historical sensor data for monthly capacity planning reports that involve complex aggregations. The company wants a cost-effective solution that minimizes data movement between storage and compute. Which combination of Azure services should they use for real-time processing and historical batch analytics?

A.A. Azure Stream Analytics for real-time processing, Azure Data Lake Storage Gen2 for historical storage, and Azure Synapse Analytics for batch queries.

B.B. Azure Data Factory for real-time processing, Azure Cosmos DB for historical storage, and Power BI for batch queries.

C.C. Azure Functions for real-time processing, Azure Table Storage for historical storage, and Azure Analysis Services for batch queries.

D.D. Azure Event Hubs for real-time processing, Azure SQL Database for historical storage, and Azure Machine Learning for batch queries.

AnswerA

This combination correctly pairs a real-time stream processing engine (Stream Analytics) with a scalable data lake (Data Lake Storage) and an analytics service (Synapse Analytics) that can query the lake directly, minimizing data movement.

Why this answer

Azure Stream Analytics is purpose-built for real-time processing of streaming data from IoT sensors, enabling immediate anomaly detection and alerting. Azure Data Lake Storage Gen2 provides cost-effective, scalable storage for years of historical sensor data, while Azure Synapse Analytics (formerly SQL Data Warehouse) can run complex aggregations directly against that data without moving it, minimizing data movement and cost.

Exam trap

The trap here is that candidates often confuse data ingestion services (like Event Hubs) with real-time processing engines (like Stream Analytics), or they pick a database like Cosmos DB or SQL Database for historical storage without considering cost and aggregation performance at scale.

How to eliminate wrong answers

Option B is wrong because Azure Data Factory is an orchestration and data movement service, not a real-time stream processing engine; Azure Cosmos DB is a NoSQL database optimized for low-latency transactional workloads, not cost-effective long-term storage for large-scale historical analytics. Option C is wrong because Azure Functions is a serverless compute service that can process events but lacks built-in stream analytics capabilities like windowing and temporal joins; Azure Table Storage is a key-value store that does not support complex aggregations efficiently. Option D is wrong because Azure Event Hubs is a data ingestion service, not a real-time processing engine; Azure SQL Database is not cost-effective for storing years of high-volume sensor data, and Azure Machine Learning is for building predictive models, not for running batch aggregation queries.

Full explanation →

516

MCQeasy

Your team is migrating a data warehouse to Azure Synapse Analytics. You need to ensure that the data model supports both historical trend analysis and current-day reporting with minimal storage redundancy. Which table design pattern should you use?

A.Single flat table containing all attributes

B.Wide table with repeated customer attributes per order

C.Highly normalized design with many tables

D.Star schema with dimension and fact tables

AnswerD

Star schema is the standard for data warehousing, enabling efficient queries and reducing storage redundancy.

Why this answer

The star schema is the correct choice because it separates business processes into fact tables (for measures like sales quantities) and dimension tables (for descriptive attributes like customer or date). This design directly supports both historical trend analysis (by joining facts with the date dimension) and current-day reporting (by filtering on the latest date) while minimizing storage redundancy through normalized dimensions. Azure Synapse Analytics is optimized for star schemas, leveraging columnstore indexes and distributed tables to accelerate such queries.

Exam trap

The trap here is that candidates often confuse 'normalization' (Option C) with data warehouse best practices, not realizing that star schemas intentionally denormalize dimensions to optimize for read-heavy analytical queries, while highly normalized designs are better suited for OLTP systems, not Azure Synapse Analytics.

How to eliminate wrong answers

Option A is wrong because a single flat table containing all attributes would cause massive data duplication and poor query performance, as every row repeats customer and product details for each order, leading to high storage costs and slow analytical scans. Option B is wrong because a wide table with repeated customer attributes per order introduces significant redundancy and update anomalies, making it inefficient for both historical analysis and current reporting, and it contradicts the goal of minimal storage redundancy. Option C is wrong because a highly normalized design with many tables (e.g., 3NF) requires complex joins across numerous tables, which degrades query performance in a data warehouse context and is not optimized for the analytical workloads that Synapse is designed for.

Full explanation →

517

MCQmedium

A company receives real-time clickstream data from its website via Azure Event Hubs. They need to detect fraudulent clicks within seconds and also produce daily aggregate reports of visitor statistics for historical analysis. Which combination of Azure services should they use for the real-time detection and the daily aggregation, respectively?

A.Azure Stream Analytics for real-time detection; Azure Data Factory for daily aggregation

B.Azure Databricks for both real-time detection and daily aggregation

C.Azure Synapse Analytics for real-time detection; Azure Blob Storage for daily aggregation

D.Azure Functions for real-time detection; Azure SQL Database for daily aggregation

AnswerA

Stream Analytics is purpose-built for real-time stream processing, enabling low-latency fraud detection. Data Factory can orchestrate and schedule the daily batch pipeline to aggregate data, possibly using Azure Databricks or SQL.

Why this answer

Azure Stream Analytics is purpose-built for real-time stream processing, making it ideal for detecting fraudulent clicks within seconds from Event Hubs. Azure Data Factory is a cloud-based ETL service that can orchestrate and execute daily aggregation jobs on historical data, such as producing visitor statistics reports from stored clickstream data.

Exam trap

The trap here is that candidates often confuse Azure Stream Analytics with Azure Databricks or Azure Functions for real-time processing, or think that Azure Blob Storage alone can perform aggregation, when in fact the question tests the specific pairing of a stream-processing service with a batch orchestration service.

How to eliminate wrong answers

Option B is wrong because Azure Databricks is an Apache Spark-based analytics platform that can handle both real-time and batch workloads, but it is not the most cost-effective or simplest choice for the specific combination of real-time detection and daily aggregation; the question expects the optimal pair of services, and Databricks is overkill for simple aggregation. Option C is wrong because Azure Synapse Analytics is a unified analytics service that excels at large-scale data warehousing and big data analytics, but it is not designed for real-time stream processing (it can ingest streams via pipelines but lacks native low-latency detection capabilities); Azure Blob Storage is a storage service, not a compute service for aggregation. Option D is wrong because Azure Functions is a serverless compute service that can process events in real time, but it is not optimized for high-throughput stream processing and lacks built-in windowing and state management for complex fraud detection; Azure SQL Database can store and aggregate data but is not a dedicated orchestration or ETL service for daily batch aggregation.

Full explanation →

518

MCQhard

A company stores sensitive customer data in Azure Blob Storage. They need to ensure that data at rest is encrypted using a customer-managed key that is stored in Azure Key Vault. Additionally, they want to prevent data from being accessed by unauthorized users even if the storage account key is compromised. Which combination should they use?

A.Enable customer-managed keys and use Azure Defender for Storage

B.Enable customer-managed keys and use Azure Monitor

C.Enable infrastructure encryption and use Azure Backup

D.Enable storage account encryption and use Azure Sentinel

AnswerA

Customer-managed keys encrypt data at rest; Defender for Storage provides threat detection, but to prevent unauthorized access, you should also use Azure AD authentication and RBAC.

Why this answer

Option D is correct because enabling customer-managed keys (CMK) in Azure Key Vault provides encryption at rest with a key the customer controls, and enabling Azure Defender for Storage (now part of Microsoft Defender for Cloud) provides advanced threat protection, but it does not prevent access if the key is compromised. The best practice to prevent unauthorized access is to use Azure AD authentication and RBAC, not just CMK. However, among the options, D is the closest to a valid combination.

Option A is wrong because infrastructure encryption is about double encryption, not access control. Option B is wrong because Azure Backup is irrelevant. Option C is wrong because Azure Sentinel is for security monitoring, not access control.

Full explanation →

519

MCQmedium

A company has 12 SQL Server databases, each about 30 GB. The databases experience unpredictable load spikes during the day. The company wants to migrate to Azure SQL Database to reduce administrative overhead and optimize costs by sharing resources among the databases. Which deployment option should they choose?

A.Single database with provisioned DTU

B.Elastic pool

C.Managed Instance

D.SQL Server on Azure Virtual Machine

AnswerB

Elastic pools allow databases to share resources, reducing cost and handling unpredictable spikes efficiently.

Why this answer

Elastic pools are designed to share resources (eDTUs or eVCores) among multiple databases with unpredictable, overlapping load spikes. By pooling resources, the company can optimize costs because databases do not all peak simultaneously, and the pool’s total resource allocation is lower than the sum of individual peak requirements. This reduces administrative overhead by providing a single management point for scaling and monitoring all databases in the pool.

Exam trap

The trap here is that candidates often choose Single Database (Option A) thinking it is simpler, but they miss that elastic pools are specifically designed for cost optimization when multiple databases have variable and overlapping load patterns, not for isolated workloads.

How to eliminate wrong answers

Option A is wrong because a single database with provisioned DTU would require each database to be sized for its peak load, leading to over-provisioning and higher costs, and it does not share resources across databases. Option C is wrong because Managed Instance is a full SQL Server instance in Azure, which still requires managing instance-level resources and does not provide the same resource-sharing efficiency as an elastic pool; it is also overkill for 12 small databases and incurs higher administrative overhead. Option D is wrong because SQL Server on Azure Virtual Machine requires full administrative control over the OS and SQL Server, negating the goal of reducing administrative overhead, and it does not offer built-in resource sharing or elastic scaling across databases.

Full explanation →

520

MCQmedium

A logistics company stores shipment tracking data as JSON documents in Azure Cosmos DB. Each document contains fields like trackingId, origin, destination, status, weight, and optional fields (estimatedDelivery, carrierNotes). The application needs to perform low-latency lookups by trackingId and also run queries to find all shipments that have a specific origin and status. Which Azure Cosmos DB API should they choose?

A.Table API

B.SQL (Core) API

C.MongoDB API

D.Gremlin API

AnswerB

The SQL API supports querying JSON documents with standard SQL syntax, indexes all properties automatically, and allows both point reads and filter queries efficiently.

Why this answer

The SQL (Core) API is the best choice because it provides native support for querying JSON documents with a SQL-like syntax, enabling efficient low-latency lookups by trackingId (using a partition key) and flexible queries on fields like origin and status. It also supports indexing on any JSON property, making queries on optional fields like estimatedDelivery or carrierNotes performant without schema management.

Exam trap

Microsoft often tests the misconception that any JSON document store is equivalent, but the trap here is that the MongoDB API is a wire-protocol-compatible option that candidates might choose because they associate it with JSON, while the SQL API is actually the native, most performant choice for document queries on Azure Cosmos DB.

How to eliminate wrong answers

Option A is wrong because the Table API is designed for key-value storage with a fixed schema and limited query capabilities (only on partition key and row key), not for complex JSON queries on multiple fields like origin and status. Option C is wrong because the MongoDB API, while supporting JSON documents, introduces unnecessary overhead and complexity for a workload that can be handled natively by the SQL API, and it does not offer the same optimized SQL-like querying for the described patterns. Option D is wrong because the Gremlin API is a graph traversal API intended for highly connected data (e.g., social networks, recommendation engines), not for simple document lookups and property-based queries on shipment data.

Full explanation →

521

MCQmedium

A software company develops a multi-tenant SaaS application. They deploy a separate Azure SQL Database for each tenant. The databases are small (2-5 GB) and have highly variable loads — some tenants use the app heavily during the day, others at night. The company wants to maximize resource utilization and minimize costs by allowing databases to share a pool of resources, while still maintaining a predictable performance per database. Which Azure SQL Database deployment option should they choose?

A.Single database with DTU purchasing model

B.Single database with vCore purchasing model

C.Elastic pool

D.Azure SQL Managed Instance

AnswerC

Correct. Elastic pools share resources across multiple databases, ideal for variable, low-average usage patterns, and provide cost savings.

Why this answer

C is correct because an elastic pool allows multiple Azure SQL databases with variable and unpredictable usage patterns to share a fixed pool of resources (eDTUs or vCores), maximizing resource utilization and minimizing cost. The pool provides a predictable performance per database through per-database min/max resource limits, which is ideal for the described multi-tenant SaaS scenario with small databases and highly variable loads.

Exam trap

The trap here is that candidates often confuse elastic pools with single databases, thinking that the vCore model alone provides elasticity, but vCore single databases still allocate dedicated resources per database and lack the shared-pool cost benefit that elastic pools offer for multi-tenant SaaS workloads.

How to eliminate wrong answers

Option A is wrong because a single database with the DTU purchasing model allocates dedicated resources to one database, which cannot be shared across tenants, leading to wasted capacity and higher costs when loads are variable. Option B is wrong because a single database with the vCore purchasing model also provides dedicated resources per database, lacking the resource pooling and cost efficiency needed for many small databases with intermittent usage. Option D is wrong because Azure SQL Managed Instance is a fully managed instance of SQL Server with fixed resources, designed for lift-and-shift migrations or complex enterprise workloads, not for pooling many small databases with variable loads.

Full explanation →

522

MCQmedium

A data engineer needs to build a data pipeline that runs daily to copy sales data from an on-premises SQL Server to Azure Synapse Analytics. Which Azure service should they use to orchestrate the pipeline?

A.Azure Analysis Services

B.Azure Data Factory

C.Azure Databricks

D.Azure HDInsight

AnswerB

Cloud-based ETL and data integration service for orchestration.

Why this answer

Azure Data Factory (ADF) is the correct choice because it is a cloud-based data integration service specifically designed to orchestrate and automate data pipelines. It supports scheduled triggers (e.g., daily runs) and provides native connectors to copy data from on-premises SQL Server (via Self-Hosted Integration Runtime) to Azure Synapse Analytics, making it the ideal tool for this ETL/ELT workload.

Exam trap

The trap here is that candidates may confuse Azure Data Factory with Azure Databricks or HDInsight because both can process data, but they overlook that the question specifically asks for orchestration of a scheduled copy pipeline, which is ADF's primary purpose, not a general-purpose analytics platform.

How to eliminate wrong answers

Option A is wrong because Azure Analysis Services is an analytical engine for creating semantic models and performing data analysis (e.g., OLAP cubes), not a pipeline orchestration or data movement service. Option C is wrong because Azure Databricks is an Apache Spark-based analytics platform primarily used for big data processing, machine learning, and interactive analytics; while it can move data, it lacks the native scheduling and copy-activity orchestration that ADF provides for this specific daily pipeline requirement. Option D is wrong because Azure HDInsight is a managed Hadoop/Spark cluster service for running big data frameworks (e.g., Hive, HBase, Storm) and is not designed for simple scheduled data copying between SQL Server and Synapse; it would require additional setup and is overkill for this task.

Full explanation →

523

MCQeasy

Which classification of data describes information that has a fixed schema and is organized into rows and columns, such as data found in a relational database table?

A.Unstructured data

B.Semi-structured data

C.Structured data

D.Transformed data

AnswerC

Structured data conforms to a fixed schema, typically in tables with rows and columns. This is the standard format for relational database systems.

Why this answer

Structured data is defined by a fixed schema, where each data element adheres to a predefined data type and relationship, organized into rows and columns. This is the fundamental model of a relational database table, such as those in Azure SQL Database or SQL Server, where constraints like primary keys and foreign keys enforce the schema.

Exam trap

Microsoft often tests the distinction between structured and semi-structured data, where candidates mistakenly classify JSON or XML as structured because it has some organization, but the key differentiator is the rigid, predefined schema enforced by the database, not just the presence of tags or keys.

How to eliminate wrong answers

Option A is wrong because unstructured data has no predefined schema or organization, such as text files, images, or videos, and cannot be stored directly in rows and columns. Option B is wrong because semi-structured data has some organizational properties (like tags or key-value pairs) but does not enforce a rigid schema; examples include JSON or XML files, which are not strictly row-and-column. Option D is wrong because 'transformed data' is not a classification of data by structure; it refers to data that has been processed or altered from its original form, such as through ETL operations, and does not describe a schema-based organization.

Full explanation →

524

MCQmedium

A retail company needs to analyze clickstream data from their website in real time to detect fraudulent activity and also run complex historical queries on months of data to identify shopping trends. They want a single service that can handle both streaming and batch analytics using a unified query language, minimizing data movement. Which Azure service should they use?

A.Azure Stream Analytics

B.Azure Synapse Analytics

C.Azure HDInsight

D.Azure Data Explorer

AnswerD

Azure Data Explorer (ADX) is built for real-time analytics on streaming data and interactive queries on large historical datasets using Kusto Query Language (KQL), making it the right choice for this combined workload.

Why this answer

Azure Data Explorer (ADX) is designed for real-time analytics on streaming data and can also handle complex historical queries over large volumes of data using the Kusto Query Language (KQL). It minimizes data movement by ingesting streaming data directly and storing it in a columnar format optimized for both real-time and batch queries, making it the ideal single service for this scenario.

Exam trap

The trap here is that candidates often choose Azure Stream Analytics because it is explicitly marketed for real-time streaming, but they overlook the requirement for complex historical queries and a unified query language, which ADX uniquely satisfies with KQL.

How to eliminate wrong answers

Option A is wrong because Azure Stream Analytics is a real-time stream processing engine that uses a SQL-like language, but it is not optimized for complex historical queries over months of data and typically requires a separate storage layer (e.g., Azure Data Lake) for batch analytics, increasing data movement. Option B is wrong because Azure Synapse Analytics is a unified analytics platform that supports both streaming and batch workloads, but it relies on T-SQL and is designed more for large-scale data warehousing and big data processing, not for the low-latency, high-frequency clickstream analytics that ADX excels at with KQL. Option C is wrong because Azure HDInsight is a managed Hadoop/Spark cluster that can handle streaming and batch analytics, but it requires multiple components (e.g., Spark Streaming, Hive) and does not offer a single unified query language; it also involves significant data movement between storage and compute layers.

Full explanation →

525

MCQmedium

A company uses Azure SQL Database and needs to run complex analytical queries that scan large amounts of data. The queries are experiencing performance issues. Which Azure service should they use to offload the analytical workload?

A.Azure SQL Database (Hyperscale tier)

B.Azure Analysis Services

C.Azure Data Lake Storage

D.Azure Synapse Analytics dedicated SQL pool

AnswerD

It is designed for analytical workloads.

Why this answer

Option C is correct because Azure SQL Data Warehouse (now Synapse Analytics dedicated SQL pool) is optimized for analytical workloads. Option A is wrong because Azure Analysis Services is for semantic models, not direct query offloading. Option B is wrong because Azure SQL Database is transactional, not analytical.

Option D is wrong because Azure Data Lake Storage is for data lakes, not relational analytics.

Full explanation →

Microsoft Azure Data Fundamentals DP-900 (DP-900) — Questions 451–525