Microsoft Azure Data Fundamentals DP-900 DP-900 Questions 826–900 | Page 12/14

826

MCQeasy

The exhibit shows a firewall rule configuration for an Azure SQL Server. A developer reports that a client app running on IP 10.0.0.5 cannot connect to the database. What is the most likely cause?

A.The firewall rules are not in valid JSON format

B.The first rule only allows a single IP (10.0.0.0), not the entire subnet

C.The second rule overrides the first rule

D.The firewall rule uses IPv4 addresses, but the client is using IPv6

AnswerB

The first rule specifies start and end IP both 10.0.0.0, so only that exact IP is allowed; 10.0.0.5 is blocked.

Why this answer

Option B is correct because the first rule allows only IP 10.0.0.0 (single IP), not the subnet. Option A is wrong because Azure SQL supports both IPv4 and IPv6. Option C is wrong because the rules are valid JSON.

Option D is wrong because the client IP is within the 192.168.1.0/24 range but the first rule is more restrictive.

Full explanation →

827

MCQhard

Refer to the exhibit. An administrator runs an Azure CLI command to show the status of a Synapse SQL pool. The output shown is returned. What does this output indicate about the SQL pool?

A.The SQL pool was restored from a backup.

B.The SQL pool is online and available for queries.

C.The SQL pool is paused.

D.The SQL pool is currently being restored.

AnswerB

Both status and provisioning state indicate success.

Why this answer

The Azure CLI command `az synapse sql pool show` returns the provisioning state of the SQL pool. The output shows `"provisioningState": "Online"`, which directly indicates that the pool is provisioned, active, and ready to accept queries. This is the standard status for a running dedicated SQL pool in Azure Synapse Analytics.

Exam trap

The trap here is that candidates may confuse the `provisioningState` field with other status indicators like `status` or `state` in different Azure resources, or assume that `Online` could mean a restore completed successfully, but the output explicitly shows the current state, not the history of operations.

How to eliminate wrong answers

Option A is wrong because the output does not contain any fields such as `restorePointInTime`, `sourceDatabaseId`, or `recovery` that would indicate a restore operation; the `Online` state simply means the pool is running. Option C is wrong because a paused SQL pool would show `"provisioningState": "Paused"` or `"status": "Paused"` in the output, not `Online`. Option D is wrong because a pool being restored would show a provisioning state of `"Restoring"` or `"InProgress"`, not `Online`.

Full explanation →

828

MCQeasy

A company uses Azure Synapse Analytics to run large-scale analytics on sales data. They need to ensure that the workload can automatically scale based on demand without manual intervention. What feature should they configure?

A.Auto-pause and auto-resume

B.Read Scale-out replicas

C.Elastic Database Pools

D.Manual scale-up during peak hours

AnswerA

Enables automatic scaling and cost optimization by pausing when idle and resuming on demand.

Why this answer

Auto-pause and auto-resume is the correct feature because Azure Synapse Analytics (dedicated SQL pool) supports automatic scaling through the ability to pause the compute resources when idle and resume them on demand, which effectively scales the workload based on demand without manual intervention. This feature reduces costs by stopping compute billing during inactivity and automatically resumes when a query or activity is detected, meeting the requirement for automatic scaling.

Exam trap

The trap here is that candidates confuse auto-pause/auto-resume with manual scaling or other database scaling features (like read replicas or elastic pools) that are specific to Azure SQL Database, not Azure Synapse Analytics.

How to eliminate wrong answers

Option B is wrong because Read Scale-out replicas are a feature of Azure SQL Database (and Azure SQL Managed Instance) that offloads read-only workloads to a replica, not a mechanism for automatic scaling of compute resources in Azure Synapse Analytics. Option C is wrong because Elastic Database Pools are a shared resource model for managing multiple Azure SQL Databases with variable usage patterns, not a feature of Azure Synapse Analytics for scaling a single analytics workload. Option D is wrong because manual scale-up during peak hours requires human intervention, which directly contradicts the requirement for automatic scaling without manual intervention.

Full explanation →

829

MCQmedium

A retail company collects sales data from multiple stores. Data is ingested into Azure Data Lake Storage Gen2 as CSV files. The data team needs to run ad-hoc SQL queries on this data without moving it, and they want to pay only for the amount of data processed. They also need to integrate with Power BI for visualization. Which Azure service should they use?

A.Azure Synapse Analytics dedicated SQL pool

B.Azure SQL Database

C.Azure Data Lake Analytics

D.Azure Synapse Serverless SQL pool

AnswerD

Serverless SQL pool enables querying files in Data Lake with standard T-SQL, paying only for data processed, and integrates with Power BI.

Why this answer

Azure Synapse Serverless SQL pool (option D) is correct because it allows querying data directly from Azure Data Lake Storage Gen2 using T-SQL without moving the data, and it uses a pay-per-query model where you are billed only for the amount of data processed. It also integrates seamlessly with Power BI for visualization, making it ideal for ad-hoc SQL queries on CSV files.

Exam trap

The trap here is that candidates often confuse Azure Synapse Serverless SQL pool with Azure Synapse Analytics dedicated SQL pool, mistakenly thinking both require provisioning and pay for compute, or they overlook that Azure Data Lake Analytics is deprecated and not the correct service for ad-hoc SQL queries on data lakes.

How to eliminate wrong answers

Option A is wrong because Azure Synapse Analytics dedicated SQL pool requires provisioning and paying for dedicated compute resources (even when idle), and it is designed for large-scale data warehousing with persistent storage, not for ad-hoc pay-per-query scenarios on existing data lakes. Option B is wrong because Azure SQL Database is a fully managed relational database that requires data to be imported and stored within it, and it does not support querying data directly from Azure Data Lake Storage Gen2 without moving it. Option C is wrong because Azure Data Lake Analytics uses U-SQL (a combination of SQL and C#) and is a separate analytics service that processes data in a data lake, but it is not a SQL-based query service for ad-hoc queries and does not offer the same pay-per-query model as Serverless SQL pool; it has been deprecated in favor of Azure Synapse Serverless SQL pool.

Full explanation →

830

MCQeasy

A retail company processes customer orders throughout the day. Each order involves inserting a new record into a database table, updating inventory counts, and deleting temporary cart data. At the end of each week, the company runs a query that aggregates all orders by product category and region to generate a sales report. Which of the following best describes these two workloads?

A.Order processing is OLAP; weekly reporting is OLTP

B.Order processing is batch processing; weekly reporting is streaming processing

C.Order processing is OLTP; weekly reporting is OLAP

D.Both workloads are OLTP

AnswerC

Order processing is transactional (OLTP) and the weekly report is analytical (OLAP). This is a correct distinction between the two common data workload patterns.

Why this answer

Order processing involves frequent, small transactions (inserts, updates, deletes) that are typical of Online Transaction Processing (OLTP) workloads, which prioritize data integrity and low-latency writes. The weekly sales report aggregates large volumes of historical data by product category and region, which is characteristic of Online Analytical Processing (OLAP) workloads that support complex queries and data summarization. Option C correctly identifies these two distinct workload types.

Exam trap

The trap here is that candidates confuse the terms OLTP and OLAP, mistakenly thinking that any database operation is OLTP or that reporting is always OLTP, when in fact the key differentiator is the workload pattern—transactional vs. analytical.

How to eliminate wrong answers

Option A is wrong because it reverses the definitions: order processing is OLTP (not OLAP) due to its transactional nature, and weekly reporting is OLAP (not OLTP) because it involves heavy aggregation over historical data. Option B is wrong because order processing is not batch processing—it occurs in real-time as each order is placed, and weekly reporting is not streaming processing; it is a scheduled batch job that runs at fixed intervals. Option D is wrong because both workloads are not OLTP; the weekly reporting query performs large-scale aggregations that would degrade OLTP performance and is designed for OLAP systems.

Full explanation →

831

MCQhard

A retail company ingests clickstream data from its e-commerce website into Azure Event Hubs. They need to detect customer journey patterns in real time within seconds and also prepare aggregated data for daily trend reports stored in Azure Data Lake Storage Gen2. The real-time processing must handle high throughput and support complex temporal queries like sessionization. The daily aggregation should be cost-effective and use serverless compute. Which combination of Azure services should they use?

A.Azure Stream Analytics for real-time processing and Azure Data Factory for daily batch aggregation

B.Azure Functions for real-time processing and Azure Databricks for daily batch aggregation

C.Azure Stream Analytics for real-time processing and Azure Batch for daily batch aggregation

D.Azure Data Lake Analytics for real-time processing and Azure Data Factory for daily batch aggregation

AnswerA

Correct. Stream Analytics handles real-time complex event processing. Data Factory can orchestrate serverless batch transformations and load data into Data Lake Storage.

Why this answer

Azure Stream Analytics is ideal for real-time processing of high-throughput clickstream data from Event Hubs, supporting complex temporal queries like sessionization with low latency (seconds). Azure Data Factory provides cost-effective, serverless orchestration for daily batch aggregation, efficiently moving and transforming data to Azure Data Lake Storage Gen2 without managing infrastructure.

Exam trap

The trap here is confusing Azure Functions (serverless compute) with Azure Stream Analytics (dedicated stream processing) for real-time analytics, and assuming Azure Batch (parallel job execution) is equivalent to Azure Data Factory (orchestrated data integration) for batch aggregation, leading candidates to overlook the specific requirements for high-throughput temporal queries and serverless cost-effectiveness.

How to eliminate wrong answers

Option B is wrong because Azure Functions is not designed for high-throughput, stateful real-time stream processing with complex temporal queries like sessionization; it's better suited for event-driven, short-lived tasks. Option C is wrong because Azure Batch is a job scheduler for parallel compute workloads, not a serverless data integration service for daily batch aggregation; it requires managing compute pools and lacks the built-in data movement and transformation capabilities of Data Factory. Option D is wrong because Azure Data Lake Analytics is deprecated and not suitable for real-time processing; it was designed for batch U-SQL jobs, not low-latency stream processing, and lacks native Event Hubs input support.

Full explanation →

832

MCQeasy

A data engineer needs to load data from an on-premises SQL Server database to Azure Synapse Analytics every hour with minimal latency. Which Azure service should they use?

A.Azure Databricks

B.Azure Data Factory

C.Azure SQL Database

D.Azure HDInsight

AnswerB

Cloud-based ETL service that can run pipelines every hour with low latency.

Why this answer

Azure Data Factory (ADF) is the correct choice because it provides a fully managed, code-free ETL service that can connect to on-premises SQL Server via self-hosted integration runtime, and load data into Azure Synapse Analytics with low latency using a scheduled trigger (e.g., every hour). ADF supports incremental data loading and parallel copy activities, minimizing latency while handling the required frequency.

Exam trap

The trap here is that candidates often confuse Azure Data Factory with Azure Databricks or HDInsight, assuming any big data or analytics service can handle scheduled data ingestion, but only ADF is purpose-built for orchestration and low-latency data movement from on-premises sources.

How to eliminate wrong answers

Option A is wrong because Azure Databricks is an Apache Spark-based analytics platform designed for big data processing and machine learning, not a dedicated data ingestion or orchestration service; it lacks native scheduling and on-premises connectivity for hourly low-latency loads without additional setup. Option C is wrong because Azure SQL Database is a relational database service, not a data integration or orchestration tool; it cannot directly load data from on-premises SQL Server into Synapse Analytics on a schedule. Option D is wrong because Azure HDInsight is a managed Hadoop/Spark cluster service for big data analytics, not a data movement or orchestration service; it requires custom scripting and manual scheduling to perform hourly loads, adding complexity and latency.

Full explanation →

833

MCQeasy

A company stores customer data in a SQL Server database table with columns: CustomerID (integer), Name (varchar), Email (varchar), SignupDate (date). All rows adhere to this schema. Which type of data does this represent?

A.Structured data

B.Unstructured data

C.Semi-structured data

D.Transactional data

AnswerA

Correct. The data is stored in a relational table with a fixed schema, which is the definition of structured data.

Why this answer

This data is structured because it conforms to a fixed schema with clearly defined columns (CustomerID, Name, Email, SignupDate) and data types (integer, varchar, date). In SQL Server, structured data is stored in tables with rows and columns, enabling efficient querying via T-SQL and indexing. The consistent adherence to the schema across all rows is the hallmark of structured data.

Exam trap

The trap here is that candidates confuse the content of the data (e.g., customer information) with its structure, or mistakenly think that any data in a database is automatically structured, ignoring the distinction between structured, semi-structured, and unstructured formats.

How to eliminate wrong answers

Option B is wrong because unstructured data has no predefined schema or organization (e.g., text files, images, videos), whereas this table has a rigid schema. Option C is wrong because semi-structured data (e.g., JSON, XML) allows schema flexibility and nested structures, but this table enforces fixed columns and data types. Option D is wrong because transactional data refers to records of business transactions (e.g., sales orders, payments), not the general classification of data format; this table could store transactional data, but the question asks about the type of data based on its structure.

Full explanation →

834

MCQmedium

A company runs a mission-critical SQL Server database on-premises. They plan to migrate to Azure SQL Database and need to choose the appropriate service tier. The database is currently 500 GB and is expected to grow to 8 TB within two years. The workload is read-heavy with many concurrent users, and they require fast scaling of compute resources without significant downtime. Which Azure SQL Database service tier should they choose?

A.General Purpose

B.Business Critical

C.Hyperscale

D.Serverless

AnswerC

Hyperscale tier supports databases up to 100 TB, allows fast scaling of compute resources with minimal downtime, and is optimized for read-heavy workloads with high concurrency. It is the best fit for databases that exceed 4 TB and require rapid scaling.

Why this answer

Hyperscale is the correct choice because it supports databases up to 100 TB, far exceeding the expected 8 TB growth, and provides fast scaling of compute resources without downtime by using a distributed architecture with separate compute and storage nodes. Its read-heavy workload with many concurrent users benefits from Hyperscale's multiple readable replicas and buffer pool extension, ensuring high performance and availability.

Exam trap

The trap here is that candidates often confuse the 4 TB limit of General Purpose and Business Critical with the 100 TB limit of Hyperscale, or assume that Serverless is suitable for any workload that needs scaling, ignoring its auto-pausing behavior and lack of support for high concurrency and consistent performance.

How to eliminate wrong answers

Option A is wrong because General Purpose has a maximum database size of 4 TB, which cannot accommodate the expected growth to 8 TB, and its compute scaling requires downtime. Option B is wrong because Business Critical also has a 4 TB size limit and, while offering higher performance, does not support the required 8 TB growth or fast compute scaling without downtime. Option D is wrong because Serverless is designed for intermittent, unpredictable workloads with auto-pausing, not for a mission-critical, read-heavy, high-concurrency workload that requires consistent performance and fast scaling without downtime.

Full explanation →

835

Drag & Dropmedium

Drag and drop the steps to perform a point-in-time restore of an Azure SQL Database in the correct order.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps

Order

Why this order

Point-in-time restore uses automated backups to recover a database to a specific time within the retention period.

Full explanation →

836

MCQeasy

A company ingests streaming data from social media feeds and needs to process and analyze the data in real time. Which Azure service should they use to capture the stream?

A.Azure Stream Analytics

B.Azure IoT Hub

C.Azure Event Hubs

D.Azure Data Lake Storage

AnswerC

Azure Event Hubs is a scalable event ingestion service for streaming data.

Why this answer

Azure Event Hubs is a fully managed, real-time data ingestion service designed to capture and process millions of events per second from sources like social media feeds. It provides a scalable, low-latency endpoint for streaming data, making it the correct choice for capturing the stream before further analysis.

Exam trap

The trap here is that candidates confuse Azure Stream Analytics (a processing service) with Event Hubs (an ingestion service), or assume IoT Hub is suitable for non-IoT streaming data due to its similar event ingestion capability.

How to eliminate wrong answers

Option A is wrong because Azure Stream Analytics is a stream processing engine that analyzes data in motion, not a capture/ingestion service; it typically consumes from Event Hubs or IoT Hub. Option B is wrong because Azure IoT Hub is specifically built for bidirectional communication with IoT devices, not for general-purpose social media stream ingestion, and it lacks the high-throughput, multi-protocol ingestion capabilities of Event Hubs. Option D is wrong because Azure Data Lake Storage is a hierarchical file store for batch and analytics workloads, not a real-time streaming capture service; it cannot ingest streaming data directly without an intermediary like Event Hubs or Stream Analytics.

Full explanation →

837

MCQmedium

A manufacturing company stores IoT sensor data as JSON documents in Azure Cosmos DB. Each document contains a device ID, a timestamp, and a varying set of sensor readings. The application frequently queries data by device ID and a time range to retrieve all readings for a specific device over a period. The development team wants to use an API that supports SQL-like queries on this JSON data. Which Azure Cosmos DB API should they choose?

A.Azure Cosmos DB Core (SQL) API

B.Azure Cosmos DB MongoDB API

C.Azure Cosmos DB Cassandra API

D.Azure Cosmos DB Gremlin API

AnswerA

The Core (SQL) API natively supports JSON documents and allows querying using SQL syntax, which is exactly what the application needs.

Why this answer

The Azure Cosmos DB Core (SQL) API is the correct choice because it natively supports querying JSON documents using SQL-like syntax, which aligns with the requirement to run SQL-like queries on JSON data. This API provides a rich query language for filtering by device ID and timestamp ranges, making it ideal for the described IoT scenario where documents have varying sensor readings.

Exam trap

The trap here is that candidates may confuse the MongoDB API's support for JSON documents with SQL-like querying, but MongoDB uses its own query language (e.g., db.collection.find()) rather than SQL syntax, which is a key distinction tested in the DP-900 exam.

How to eliminate wrong answers

Option B (MongoDB API) is wrong because it uses MongoDB's query language (based on BSON and MongoDB operators), not SQL-like syntax, and would require the team to adapt to a different query paradigm. Option C (Cassandra API) is wrong because it uses CQL (Cassandra Query Language) and is designed for wide-column store data, not for querying JSON documents with varying schemas. Option D (Gremlin API) is wrong because it is a graph traversal API used for graph data models, not for SQL-like queries on JSON documents.

Full explanation →

838

MCQeasy

A company is developing a web application that stores user profiles as JSON documents. The application needs to query these documents using SQL-like queries, and must support automatic indexing of all properties. They want a fully managed, globally distributed NoSQL database with low latency. Which Azure Cosmos DB API should they use?

A.Table API

B.Cassandra API

C.SQL API

D.Gremlin API

AnswerC

The SQL API allows you to store JSON documents and query them using SQL-like syntax, with automatic indexing of all properties. It is ideal for document-based applications.

Why this answer

The SQL API (formerly DocumentDB API) is the correct choice because it natively supports querying JSON documents with SQL-like syntax (SELECT * FROM c WHERE c.property = value). It automatically indexes all properties by default, provides a fully managed, globally distributed NoSQL database with low-latency reads and writes, and is designed specifically for document-based workloads like user profiles.

Exam trap

The trap here is that candidates often confuse the SQL API with the Table API because both support querying, but the Table API lacks SQL-like syntax and automatic indexing of all properties, making it unsuitable for JSON document workloads.

How to eliminate wrong answers

Option A is wrong because the Table API is designed for key-value storage with a schema-less table structure, not for querying JSON documents with SQL-like queries; it uses OData and REST-based queries, not SQL. Option B is wrong because the Cassandra API is optimized for wide-column stores using the Cassandra Query Language (CQL), which is similar to SQL but does not natively support JSON document queries or automatic indexing of all properties. Option D is wrong because the Gremlin API is built for graph databases and uses the Gremlin traversal language for navigating relationships, not for SQL-like queries on JSON documents.

Full explanation →

839

MCQmedium

A data engineering team needs to analyze petabytes of historical sales data stored in Azure Data Lake Storage Gen2. They require the ability to run complex SQL queries that join multiple tables and need high performance. The solution must separate compute from storage to allow independent scaling of resources. Which Azure service should they use?

A.Azure Synapse Analytics dedicated SQL pool

B.Azure SQL Database

C.Azure Cosmos DB

D.Azure Table Storage

AnswerA

Correct. It is built for petabyte-scale data warehousing with separate compute and storage, ideal for complex analytical queries.

Why this answer

Azure Synapse Analytics dedicated SQL pool is designed for petabyte-scale data warehousing, providing massively parallel processing (MPP) to run complex SQL queries across multiple tables with high performance. It separates compute from storage, allowing independent scaling of compute resources without moving data, which aligns with the requirement for decoupled scaling.

Exam trap

The trap here is that candidates often confuse Azure SQL Database's familiar SQL interface with the ability to handle petabyte-scale analytics, overlooking the fundamental architectural difference between OLTP and MPP data warehouse systems.

How to eliminate wrong answers

Option B is wrong because Azure SQL Database is a relational database service for OLTP workloads, not designed for petabyte-scale analytics or independent compute-storage separation. Option C is wrong because Azure Cosmos DB is a NoSQL database optimized for low-latency, globally distributed applications, not for complex SQL joins on petabytes of historical data. Option D is wrong because Azure Table Storage is a key-value NoSQL store for semi-structured data, lacking SQL query capabilities and MPP architecture for large-scale analytics.

Full explanation →

840

MCQeasy

A company stores customer data in a SQL Server table with fixed columns (CustomerID, Name, Email, SignupDate). The company also stores application logs as JSON documents and marketing images as JPEG files. Which data type describes the customer data?

A.Structured data

B.Semi-structured data

C.Unstructured data

D.Relational data

AnswerA

Correct. A SQL table with defined columns and data types is the classic example of structured data.

Why this answer

Customer data stored in a SQL Server table with fixed columns (CustomerID, Name, Email, SignupDate) follows a rigid schema where each row has the same set of columns with defined data types. This conforms to the relational model, making it structured data. Structured data is organized into rows and columns with a fixed schema, enabling efficient querying via SQL.

Exam trap

The trap here is that candidates confuse 'relational data' (a storage model) with 'structured data' (a data type category), leading them to pick D instead of A, even though the question explicitly asks for the data type.

How to eliminate wrong answers

Option B is wrong because semi-structured data (e.g., JSON, XML) does not enforce a fixed schema; it allows flexible key-value pairs or nested structures, which does not match the fixed-column SQL Server table. Option C is wrong because unstructured data (e.g., JPEG images, plain text files) lacks a predefined data model or organization, unlike the tabular customer data. Option D is wrong because 'relational data' is not a data type category in the DP-900 core data concepts; it describes a storage model (relational databases) that can hold structured data, but the question asks for the data type, not the storage model.

Full explanation →

841

MCQmedium

A manufacturing company collects sensor data from thousands of IoT devices. Each sensor reading includes a timestamp, device ID, and a variable set of measurements (e.g., temperature, pressure, vibration) that differ by device type. The company needs to store this data in a globally distributed NoSQL database that supports low-latency writes and flexible schema. Which Azure data store should they choose?

A.Azure SQL Database

B.Azure Cosmos DB with the NoSQL API

C.Azure Cache for Redis

D.Azure Database for PostgreSQL

AnswerB

Azure Cosmos DB NoSQL API stores JSON documents with a flexible schema, supports global distribution, and provides low-latency writes, making it ideal for IoT sensor data with variable attributes.

Why this answer

Azure Cosmos DB with the NoSQL API is the correct choice because it is a globally distributed, multi-model database service that supports low-latency writes at scale, a flexible schema (schemaless), and automatic indexing of variable sensor measurements. Its multi-region write capability and configurable consistency levels meet the requirements of high-throughput IoT ingestion from thousands of devices.

Exam trap

The trap here is that candidates often confuse Azure Cache for Redis as a primary database for IoT data, but it is an in-memory cache without durability guarantees, not a globally distributed NoSQL store for persistent sensor readings.

How to eliminate wrong answers

Option A is wrong because Azure SQL Database is a relational database with a fixed schema, which cannot easily handle the variable set of measurements per device type and does not natively support global distribution with low-latency writes at IoT scale. Option C is wrong because Azure Cache for Redis is an in-memory data store primarily used for caching and session state, not a durable, globally distributed NoSQL database for persistent sensor data storage. Option D is wrong because Azure Database for PostgreSQL is a relational database with a fixed schema and limited global distribution capabilities compared to Cosmos DB, making it unsuitable for flexible schema and low-latency multi-region writes.

Full explanation →

842

MCQmedium

A manufacturing company collects temperature and vibration data from thousands of sensors. The data is streamed to Azure Event Hubs. The company wants to store all this raw data in Azure Data Lake Storage Gen2 for future batch analytics. They need a solution that automatically writes the streaming data to the data lake in near real-time, without requiring any custom code for the write operation. Which Azure feature should they use?

A.Azure Stream Analytics job output to Azure Data Lake Storage Gen2

B.Azure Event Hubs Capture

C.Azure Data Factory Copy Activity

D.Azure Synapse Pipelines

AnswerB

Event Hubs Capture automatically captures streaming data into Azure Blob Storage or Azure Data Lake Storage Gen2 without any custom code. It writes data in Avro format and is ideal for long-term storage and batch analytics.

Why this answer

Azure Event Hubs Capture is the correct choice because it automatically writes streaming data from Event Hubs to Azure Data Lake Storage Gen2 in near real-time without requiring any custom code. It integrates directly with Event Hubs to buffer and write data in Avro format, meeting the requirement for a no-code, automated solution.

Exam trap

The trap here is that candidates often confuse Azure Stream Analytics as the only way to output Event Hubs data to storage, overlooking Event Hubs Capture which provides a simpler, code-free alternative for raw data persistence.

How to eliminate wrong answers

Option A is wrong because Azure Stream Analytics requires a job definition and query logic to output to Data Lake Storage Gen2, which involves custom code (SQL-like queries) and is not a fully automatic write operation without configuration. Option C is wrong because Azure Data Factory Copy Activity is a batch-oriented data movement tool that requires scheduling or triggers to copy data, not a near real-time streaming solution, and it does not natively integrate with Event Hubs for continuous streaming. Option D is wrong because Azure Synapse Pipelines are designed for orchestration and ETL in a Synapse workspace, not for automatic, code-free streaming writes from Event Hubs to Data Lake Storage Gen2.

Full explanation →

843

MCQeasy

A retail company maintains a database of customer information including CustomerID, Name, Address, and Phone. Each record follows the same fixed schema. This type of data is best described as:

A.Structured data

B.Semi-structured data

C.Unstructured data

D.Relational data

AnswerA

Structured data has a fixed schema, like a table with defined columns.

Why this answer

Structured data conforms to a fixed schema where each record has the same fields (CustomerID, Name, Address, Phone) and data types, making it ideal for relational database storage. This rigid, tabular format allows efficient querying using SQL and enforces consistency across all rows.

Exam trap

The trap here is that candidates confuse 'relational data' (a storage model) with 'structured data' (a data type), leading them to select Option D, but the DP-900 exam categorizes data by its structure, not by the database system used to store it.

How to eliminate wrong answers

Option B is wrong because semi-structured data (e.g., JSON, XML) does not enforce a fixed schema; fields can vary between records, unlike the uniform schema described. Option C is wrong because unstructured data (e.g., images, videos, text files) has no predefined structure or schema, whereas customer records with fixed fields are clearly organized. Option D is wrong because 'relational data' is not a data type category in the DP-900 taxonomy; it refers to a database model that stores structured data, but the question asks for the data type itself, not the storage model.

Full explanation →

844

Multi-Selecthard

Which THREE are valid Azure data storage services that support semi-structured data?

Select 3 answers

A.Azure Cosmos DB

B.Azure SQL Database

C.Azure Table Storage

D.Azure Blob Storage

E.Azure Data Lake Storage

AnswersA, C, D

Supports JSON documents and semi-structured data.

Why this answer

Azure Cosmos DB is a fully managed NoSQL database service that natively supports semi-structured data through its flexible schema model, allowing documents, key-value pairs, and graph data. It provides multiple APIs (SQL, MongoDB, Cassandra, Gremlin, and Table) to interact with semi-structured data, making it a valid choice for this question.

Exam trap

The trap here is that candidates may incorrectly assume Azure SQL Database supports semi-structured data because it can store JSON in columns, but it still requires a predefined relational schema and does not natively handle flexible schemas like a true NoSQL service.

Full explanation →

845

MCQmedium

A social media application uses Azure Cosmos DB to store user posts. When a user publishes a new post, they immediately refresh their feed and expect to see their own post right away. However, the application can tolerate temporary staleness for posts from other users. Which Azure Cosmos DB consistency level should the app use for the read operations that display the feed?

A.Strong

B.Bounded staleness

C.Session

D.Eventual

AnswerC

Session consistency guarantees that within the same client session, reads will see the latest writes. This means the user will always see their own post immediately, while reads of other users' posts may be stale. This is the most cost-effective and correct choice.

Why this answer

Session consistency guarantees monotonic reads, writes, and read-your-writes within a single client session. Because the user expects to see their own post immediately after publishing, but can tolerate staleness for others' posts, Session consistency provides the exact guarantee needed: the user's own writes are immediately visible to them, while other users' posts may be slightly stale.

Exam trap

The trap here is that candidates often pick Eventual consistency because they see 'tolerate temporary staleness' and forget that the user's own post must be immediately visible, which requires at least read-your-writes — a guarantee that Session consistency provides but Eventual does not.

How to eliminate wrong answers

Option A is wrong because Strong consistency would force all replicas to agree on the latest write before any read returns, which adds latency and is unnecessary since the app tolerates staleness for other users' posts. Option B is wrong because Bounded staleness allows a configurable lag (time or operations) but applies uniformly to all reads, not just those from other users, and would still impose a stricter guarantee than needed for the user's own posts. Option D is wrong because Eventual consistency does not guarantee read-your-writes, so the user might not see their own post immediately after refreshing, which violates the stated requirement.

Full explanation →

846

MCQhard

A data warehouse team uses Azure Synapse Analytics dedicated SQL pool to serve both business executives running weekly reports and data scientists running complex ad-hoc queries on large fact tables. The ad-hoc queries often consume excessive resources and degrade performance for the weekly reports. The team needs to ensure that the weekly reports always get guaranteed resources regardless of other concurrent queries. Which Synapse feature should they use?

A.Workload classification

B.Result set caching

C.Materialized views

D.Columnstore indexes

AnswerA

Correct. Workload classification assigns queries to workload groups with resource limits, ensuring critical queries get guaranteed resources.

Why this answer

Workload classification in Azure Synapse Analytics dedicated SQL pool allows the team to assign incoming queries to specific workload groups with predefined resource allocations. By classifying the weekly report queries into a group with guaranteed minimum resources (e.g., using `CREATE WORKLOAD CLASSIFIER` with `IMPORTANCE` and `REQUEST_MIN_RESOURCE_PERCENT`), the team ensures those queries always receive the necessary resources, even when ad-hoc data scientist queries are running concurrently. This directly addresses the need for predictable performance for critical reports.

Exam trap

The trap here is that candidates often confuse performance optimization features (like caching, materialized views, or indexes) with resource governance features, mistakenly believing that making queries faster inherently guarantees resource availability, whereas workload classification is the only option that provides explicit resource isolation and guarantees.

How to eliminate wrong answers

Option B (Result set caching) is wrong because it only caches query results for repeated executions, which does not guarantee resources for the weekly reports; it can improve performance for identical queries but does not prevent resource contention. Option C (Materialized views) is wrong because they pre-compute and store aggregated data to speed up queries, but they do not provide resource guarantees or isolation; they can be used alongside workload management but are not a solution for resource contention. Option D (Columnstore indexes) is wrong because they improve compression and query performance for large fact tables by using columnar storage, but they do not allocate or guarantee resources for specific workloads; they are a storage optimization, not a resource management feature.

Full explanation →

847

MCQeasy

A data file contains records for customer orders. Each record has fields for OrderID, CustomerID, and OrderDate that are present in every record. However, some records include an optional 'DiscountCode' field, and others include an optional 'GiftMessage' field. The file is stored in JSON format. Which type of data does this file represent?

A.Structured data

B.Semi-structured data

C.Unstructured data

D.Transactional data

AnswerB

Correct. Semi-structured data has some organizational properties (like tags or keys) but allows variations in the schema. JSON documents with optional fields fall into this category.

Why this answer

The JSON file contains records with a fixed set of fields (OrderID, CustomerID, OrderDate) that are always present, but also includes optional fields (DiscountCode, GiftMessage) that may appear in some records but not others. This mix of a consistent schema with flexible, self-describing fields is the hallmark of semi-structured data. JSON itself is a semi-structured format because it uses key-value pairs and allows nested or optional attributes without requiring a rigid schema.

Exam trap

The trap here is that candidates confuse 'semi-structured' with 'unstructured' because they see optional fields and think the data has no structure, but the presence of a consistent base schema (OrderID, CustomerID, OrderDate) clearly distinguishes it as semi-structured.

How to eliminate wrong answers

Option A is wrong because structured data requires a fixed schema (e.g., a relational table with predefined columns), but this JSON file allows optional fields that may be missing from some records, violating the strict schema requirement. Option C is wrong because unstructured data has no predefined structure or organization (e.g., raw text, images, audio), whereas this file has a consistent base schema with OrderID, CustomerID, and OrderDate in every record. Option D is wrong because transactional data refers to data that records events or transactions (like orders), but this is a classification of data content, not a classification of data structure; the question asks about the type of data based on its format, not its business use.

Full explanation →

848

MCQhard

A company is migrating their on-premises data warehouse, which is built on a Netezza appliance, to Azure. The data warehouse contains over 10 terabytes of data and supports complex BI queries with multiple joins and aggregations. The company requires a cloud-based solution that provides massively parallel processing (MPP) to handle large-scale queries efficiently. They also need to integrate with existing ETL tools like Azure Data Factory and provide native connectivity to Power BI. Which Azure service should they choose?

A.Azure SQL Database

B.Azure Databricks

C.Azure Synapse Analytics dedicated SQL pool

D.Azure HDInsight

AnswerC

Dedicated SQL pool provides MPP capabilities for fast query performance on large datasets. It is designed for cloud data warehousing and seamlessly integrates with Azure Data Factory and Power BI.

Why this answer

Azure Synapse Analytics dedicated SQL pool is the correct choice because it provides massively parallel processing (MPP) architecture designed for petabyte-scale data warehousing, exactly matching the 10+ TB requirement. It natively integrates with Azure Data Factory for ETL and offers built-in Power BI connectivity via the T-SQL endpoint, supporting complex BI queries with multiple joins and aggregations.

Exam trap

The trap here is that candidates often confuse Azure Databricks (a Spark-based analytics platform) with a data warehouse, overlooking that Synapse dedicated SQL pool is the only option that provides native MPP, T-SQL support, and direct Power BI connectivity for large-scale BI workloads.

How to eliminate wrong answers

Option A is wrong because Azure SQL Database is a single-node, general-purpose relational database that lacks MPP architecture and cannot efficiently handle 10+ TB of complex BI queries with massive parallelism. Option B is wrong because Azure Databricks is an Apache Spark-based analytics platform optimized for data engineering and machine learning, not a dedicated data warehouse; it does not provide native T-SQL-based MPP for large-scale BI workloads and requires additional configuration for Power BI connectivity. Option D is wrong because Azure HDInsight is a managed Hadoop/Spark cluster service focused on big data processing and batch analytics, not a dedicated MPP data warehouse; it lacks native T-SQL support and optimized query execution for complex BI joins and aggregations, and its Power BI integration is indirect through Spark connectors.

Full explanation →

849

MCQmedium

Your company stores customer data in Azure Blob Storage. To comply with data residency regulations, you must ensure data is replicated within the same Azure region. Which replication option should you choose?

A.Zone-redundant storage (ZRS)

B.Locally-redundant storage (LRS)

C.Geo-redundant storage (GRS)

D.Read-access geo-redundant storage (RA-GRS)

AnswerB

LRS replicates within a single region, keeping data resident.

Why this answer

Locally-redundant storage (LRS) replicates data three times within a single physical location in the same Azure region, ensuring data residency compliance by never copying data outside that region. This is the only option that guarantees all replicas stay within one region without any cross-region or cross-zone replication.

Exam trap

The trap here is that candidates often confuse 'replication within the same region' with 'zone-redundant storage' (ZRS) because ZRS also stays within the region, but the question's emphasis on 'data residency' and 'same region' is designed to test whether you know that LRS is the simplest and most restrictive option that keeps all copies in a single location, while ZRS still uses multiple zones which may be considered separate data centers for some compliance definitions.

How to eliminate wrong answers

Option A is wrong because Zone-redundant storage (ZRS) replicates data synchronously across three Azure availability zones within the same region, which still satisfies data residency but is not the simplest or most cost-effective choice when only intra-region replication is required; however, the question asks for the option that ensures data is replicated within the same region, and ZRS does that, but LRS is more directly aligned with the 'same region' requirement without zone-level distribution. Option C is wrong because Geo-redundant storage (GRS) replicates data to a secondary region that is hundreds of miles away, violating data residency regulations that require data to stay within a single region. Option D is wrong because Read-access geo-redundant storage (RA-GRS) also replicates data to a secondary region and additionally provides read access to that secondary copy, which still breaks the data residency constraint.

Full explanation →

850

MCQeasy

A ride-sharing company processes trip requests from customers. Each trip is recorded as a single transaction that updates the driver's status, calculates the fare, and logs the ride. At the end of each month, the company runs reports that aggregate millions of trips to determine average wait times and revenue per driver. Which pair of terms best describes these two distinct workloads?

A.OLTP and OLAP

B.Batch processing and stream processing

C.ETL and ELT

D.Relational and non-relational

AnswerA

Correct. OLTP handles the individual trip transactions, while OLAP handles the monthly reporting and aggregation.

Why this answer

The first workload (trip request processing) is a classic OLTP (Online Transaction Processing) system because each trip is a single, atomic transaction that updates driver status, calculates fare, and logs the ride in real time. The second workload (monthly aggregation reports) is OLAP (Online Analytical Processing) because it queries millions of historical trip records to compute averages and revenue summaries. These two patterns have fundamentally different data storage and query optimization requirements, making OLTP and OLAP the correct pair.

Exam trap

The trap here is that candidates confuse the processing method (batch/stream) with the workload type (OLTP/OLAP), but the question specifically asks for the pair that best describes the distinct workloads—transactional updates vs. analytical reporting—which is the classic OLTP vs. OLAP distinction.

How to eliminate wrong answers

Option B is wrong because batch processing and stream processing describe data ingestion patterns, not the transactional vs. analytical nature of the workloads; the trip requests are processed individually (not in batches or streams), and the monthly reports are batch analytics, but the question asks for the pair that best describes the distinct workloads, not the processing method. Option C is wrong because ETL and ELT are data integration processes (Extract, Transform, Load vs. Extract, Load, Transform) used to move data between systems, not the fundamental workload types themselves.

Option D is wrong because relational and non-relational refer to database models (structured tables vs. flexible schemas), which are orthogonal to the transactional vs. analytical distinction; both workloads could be implemented using either model.

Full explanation →

851

MCQhard

A healthcare analytics company receives continuous streams of patient monitoring data from IoT devices. The data must be processed in near real-time to detect critical events (e.g., abnormal heart rate). Processed data is then stored in a columnar format for historical analysis and reporting by data analysts using SQL. Which combination of Azure services should they use for ingestion, processing, and storage?

A.Azure Event Hubs, Azure Stream Analytics, Azure Synapse Analytics

B.Azure IoT Hub, Azure Data Factory, Azure SQL Data Warehouse

C.Azure Event Hubs, Azure Stream Analytics, Azure Cosmos DB

D.Azure Blob Storage, Azure Databricks, Azure Table Storage

AnswerA

Event Hubs ingests data in real-time. Stream Analytics processes the stream to detect events and transform data. Synapse Analytics provides a columnar data warehouse for historical analysis. This combination fits the requirements exactly.

Why this answer

Azure Event Hubs is designed for high-throughput, low-latency ingestion of streaming data from millions of IoT devices. Azure Stream Analytics provides a SQL-based, near real-time processing engine to detect critical events like abnormal heart rates. Azure Synapse Analytics (formerly SQL Data Warehouse) offers a columnar storage format (e.g., columnstore indexes) optimized for historical analysis and SQL-based reporting by data analysts.

Exam trap

The trap here is that candidates often confuse Azure IoT Hub with Event Hubs for high-volume event ingestion, or assume Cosmos DB is suitable for columnar analytics storage, but IoT Hub is for device management and Cosmos DB is row-oriented NoSQL, not optimized for SQL-based historical reporting.

How to eliminate wrong answers

Option B is wrong because Azure IoT Hub is primarily for bidirectional device management and communication, not optimized for high-throughput event ingestion; Azure Data Factory is a batch-oriented ETL service, not a real-time stream processor; and Azure SQL Data Warehouse (now part of Synapse) is correct for storage but the ingestion and processing components are mismatched. Option C is wrong because Azure Cosmos DB is a NoSQL document database with row-oriented storage, not a columnar format suitable for historical SQL analytics and reporting. Option D is wrong because Azure Blob Storage is object storage without native streaming ingestion; Azure Databricks is a big data analytics platform but not a dedicated near real-time stream processor like Stream Analytics; and Azure Table Storage is a NoSQL key-value store, not columnar or SQL-queryable for analysts.

Full explanation →

852

MCQhard

A financial services company runs critical end-of-day reports in an Azure Synapse Analytics dedicated SQL pool. These reports require guaranteed resource allocation and must complete within a fixed time window. However, ad-hoc analytical queries from data scientists often consume resources, causing contention and delaying the critical reports. Which feature should the company implement to ensure the critical reports always receive sufficient resources?

A.A. Create a workload group for the critical reports with a high importance setting and assign a minimum percentage of resources.

B.B. Enable result set caching on all queries to reduce execution time.

C.C. Implement materialized views for the aggregations used in the critical reports.

D.D. Use hash distribution for the fact tables to improve query parallelism.

AnswerA

Correct. Workload groups with importance and resource allocation ensure that critical queries get priority and guaranteed resources, preventing ad-hoc queries from starving them.

Why this answer

Option A is correct because workload groups in Azure Synapse Analytics dedicated SQL pool allow you to assign a minimum percentage of resources (e.g., CPU and memory) to a specific workload, ensuring guaranteed resource allocation. By setting high importance for the critical reports, the system prioritizes them over ad-hoc queries, preventing resource contention and ensuring they complete within the fixed time window.

Exam trap

The trap here is that candidates often confuse performance optimization features (caching, materialized views, distribution) with resource governance, which is the only mechanism to guarantee resource allocation and priority in a shared environment.

How to eliminate wrong answers

Option B is wrong because result set caching only reduces execution time for repeated queries by storing results, but it does not guarantee resource allocation or prevent contention from concurrent ad-hoc queries. Option C is wrong because materialized views improve query performance by pre-computing aggregations, but they do not provide resource isolation or priority for critical workloads. Option D is wrong because hash distribution improves parallelism for large fact tables, but it does not address resource contention or guarantee resource allocation for specific queries.

Full explanation →

853

MCQmedium

A mobile gaming company is building a new feature that stores player profiles and game settings as key-value pairs. The development team is most familiar with SQL queries and wants to minimize the learning curve. They require low-latency reads and writes, and the data does not require complex joins. Which Azure Cosmos DB API should they choose?

A.A. Core (SQL) API

B.B. Azure Cosmos DB for Table API

C.C. Azure Cosmos DB for MongoDB API

D.D. Azure Cosmos DB for Cassandra API

AnswerA

Correct. The Core (SQL) API uses a SQL-like query language, which aligns with the team's SQL skills. It handles key-value data with low latency and is the most straightforward choice.

Why this answer

The Core (SQL) API is the correct choice because it provides native support for SQL queries, which aligns with the development team's familiarity with SQL and minimizes the learning curve. It stores data in JSON documents with key-value pairs, supports low-latency reads and writes, and does not require complex joins, making it ideal for player profiles and game settings.

Exam trap

The trap here is that candidates may choose the Azure Cosmos DB for Table API (Option B) because they associate 'key-value pairs' with Table storage, but the question emphasizes SQL familiarity and low-latency reads/writes, which the Core (SQL) API directly supports with its native SQL query capability.

How to eliminate wrong answers

Option B is wrong because the Azure Cosmos DB for Table API uses OData and REST-based queries, not SQL, and is designed for key-value storage with a flat schema, which would require the team to learn a new query syntax. Option C is wrong because the Azure Cosmos DB for MongoDB API uses MongoDB's query language (based on JSON-like documents with BSON), not SQL, and would introduce a learning curve for a team familiar only with SQL. Option D is wrong because the Azure Cosmos DB for Cassandra API uses CQL (Cassandra Query Language), which is similar to SQL but has distinct syntax and limitations (e.g., no JOINs, no aggregate functions without specific configurations), requiring adaptation and not minimizing the learning curve.

Full explanation →

854

MCQhard

Your company uses Azure SQL Database to power a global application. You need to ensure that users in Europe and Asia have low-latency read access to product data, while writes are synchronized across all regions. What should you configure?

A.Use Azure Traffic Manager to route users to the nearest Azure SQL Database instance.

B.Migrate to Azure Cosmos DB for multi-region writes.

C.Create a failover group that includes all regions.

D.Configure Active Geo-Replication with readable secondaries in Europe and Asia.

AnswerD

Active Geo-Replication provides readable secondaries for low-latency reads.

Why this answer

Active Geo-Replication for Azure SQL Database allows you to configure readable secondary replicas in different Azure regions. This provides low-latency read access for users in Europe and Asia by directing their read traffic to the nearest secondary, while writes are synchronized asynchronously to all secondaries, ensuring data consistency across regions.

Exam trap

The trap here is that candidates may confuse failover groups (which provide a single readable secondary for disaster recovery) with Active Geo-Replication (which supports multiple readable secondaries for distributed read scaling), or mistakenly think Traffic Manager alone can solve the read latency issue without database-level replication.

How to eliminate wrong answers

Option A is wrong because Azure Traffic Manager is a DNS-based traffic load balancer that routes users to endpoints, but it does not provide the underlying database replication or readable secondaries needed for low-latency reads and synchronized writes. Option B is wrong because migrating to Azure Cosmos DB is unnecessary; the requirement is for relational data (Azure SQL Database), and Cosmos DB is a NoSQL database, not a relational solution. Option C is wrong because a failover group is designed for high availability and disaster recovery, not for providing low-latency read access across multiple regions; it uses a single readable secondary and does not support multiple readable secondaries for distributed read workloads.

Full explanation →

855

MCQeasy

A company stores customer records in a relational database table with fixed columns (CustomerID, Name, Email). They also store product reviews as JSON documents that may contain varying fields such as Rating, Comment, and optional Tags. Additionally, they store product images as JPEG files. Which of the following correctly orders these data types from most structured to least structured?

A.JSON documents, relational table, image files

B.Relational table, JSON documents, image files

C.Image files, relational table, JSON documents

D.Relational table, image files, JSON documents

AnswerB

This is correct: relational tables have a fixed schema (most structured), JSON allows flexible fields (semi-structured), and image files are binary with no inherent structure (unstructured).

Why this answer

Relational tables enforce a fixed schema with predefined columns and data types, making them the most structured. JSON documents have a flexible schema where fields like Tags are optional, placing them in the middle. Image files are binary blobs with no inherent structure, making them the least structured.

Option B correctly orders these from most structured (relational table) to least structured (image files).

Exam trap

The trap here is that candidates often confuse semi-structured JSON with unstructured data, or assume that all data with a format (like JPEG headers) is structured, but the key distinction is schema rigidity and queryability.

How to eliminate wrong answers

Option A is wrong because it places JSON documents ahead of relational tables, but JSON documents have a flexible schema (optional fields, varying types) while relational tables enforce a rigid schema with fixed columns and constraints, making tables more structured. Option C is wrong because it places image files first, but image files are unstructured binary data with no schema, while relational tables and JSON documents both have some level of structure. Option D is wrong because it places image files before JSON documents, but JSON documents have a defined structure (key-value pairs, nesting) whereas image files are completely unstructured binary blobs.

Full explanation →

856

MCQhard

A company uses Azure Synapse Analytics dedicated SQL pool for a large data warehouse. The fact table contains billions of rows and is hash-distributed on ProductID. Frequent queries join this fact table with a small Store dimension table (10,000 rows) and a medium-sized Product dimension table (500,000 rows). The queries aggregate sales by store and product for recent months, but run slowly due to data movement during joins. Which design change will most reduce data movement and improve query performance?

A.Replicate the Store dimension table

B.Change the distribution of the fact table to round-robin

C.Change the distribution key of the fact table to StoreID

D.Add a nonclustered index on the StoreID column in the fact table

AnswerA

Correct. Replicating a small dimension table (less than 1 GB) copies it to all distributions, eliminating data movement when joining with the fact table. This directly addresses the performance issue.

Why this answer

Replicating the small Store dimension table (10,000 rows) across all compute nodes eliminates the need to shuffle data during joins with the fact table. In Azure Synapse dedicated SQL pool, replicated tables store a full copy on each distribution, so queries that join a replicated table with a distributed fact table avoid costly data movement, significantly improving performance for frequent aggregation queries.

Exam trap

The trap here is that candidates often think changing the distribution key or adding an index will solve data movement, but they overlook that replicating the small dimension table is the most direct and cost-effective way to eliminate shuffling for frequent joins.

How to eliminate wrong answers

Option B is wrong because changing the fact table to round-robin distribution would distribute rows randomly without any hash key, which would force full data movement for every join and aggregation, making performance worse. Option C is wrong because changing the distribution key to StoreID would co-locate fact rows with the same StoreID on the same distribution, but the Store dimension is small and already a candidate for replication; more importantly, the fact table is large and hash-distributed on ProductID for other workloads, and changing the key could break existing query patterns and still require movement for ProductID-based joins. Option D is wrong because adding a nonclustered index on StoreID in the fact table does not reduce data movement during joins; indexes improve local data access but do not affect the distribution-level data shuffling required when tables are on different distributions.

Full explanation →

857

MCQhard

A company uses Azure Databricks for data engineering. They need to ensure that only authorized users can access the workspace, and they want to use single sign-on (SSO) with their existing identity provider. Which integration should they configure?

A.Microsoft Defender XDR

B.Microsoft Intune

C.Azure Key Vault

D.Microsoft Entra ID (Azure AD)

AnswerD

Provides SSO and identity management for Azure Databricks.

Why this answer

Microsoft Entra ID (Azure AD) is the identity and access management service that provides SSO capabilities for Azure Databricks. By integrating Azure Databricks with Entra ID, you can enforce conditional access policies and authenticate users via your existing identity provider using protocols like SAML 2.0 or OAuth 2.0, ensuring only authorized users access the workspace.

Exam trap

The trap here is that candidates may confuse Azure Key Vault (a secrets store) with identity management, or assume Microsoft Defender XDR or Intune handle SSO, when only Microsoft Entra ID provides the federation and authentication services required for single sign-on.

How to eliminate wrong answers

Option A is wrong because Microsoft Defender XDR is a security analytics and threat protection suite, not an identity provider or SSO integration service. Option B is wrong because Microsoft Intune is a mobile device management (MDM) and mobile application management (MAM) service, not used for configuring SSO or identity federation. Option C is wrong because Azure Key Vault is a secrets management service for storing keys, certificates, and passwords, not an identity provider or SSO solution.

Full explanation →

858

MCQhard

A data engineer needs to implement a solution that provides near real-time analytics on clickstream data. The data arrives as JSON events and must be queryable with sub-second latency using SQL-like queries. The solution should minimize operational overhead. Which Azure service should they use?

A.Azure Stream Analytics

B.Azure Analysis Services

C.Azure Synapse Analytics

D.Azure Data Explorer

AnswerD

ADX is designed for near real-time analytics on streaming data, supports SQL-like KQL, and delivers sub-second query performance.

Why this answer

Azure Data Explorer (ADX) is designed for interactive analytics on large volumes of streaming and historical data with sub-second query latency using Kusto Query Language (KQL), which supports SQL-like syntax. It natively ingests JSON events, provides near real-time analytics, and minimizes operational overhead as a fully managed, serverless service.

Exam trap

The trap here is that candidates often confuse Azure Stream Analytics (a real-time processing engine) with Azure Data Explorer (an interactive analytics database), failing to recognize that the requirement for 'sub-second latency using SQL-like queries' on stored data points to a query engine, not a stream processor.

How to eliminate wrong answers

Option A is wrong because Azure Stream Analytics is a real-time stream processing engine that outputs to sinks (e.g., Power BI, Event Hubs) but does not natively support sub-second interactive SQL queries on stored data; it is designed for continuous queries, not ad-hoc analytics. Option B is wrong because Azure Analysis Services is an OLAP engine for semantic models and multidimensional cubes, not designed for raw clickstream JSON ingestion or sub-second query latency on streaming data. Option C is wrong because Azure Synapse Analytics is a big data analytics platform optimized for large-scale batch and interactive queries using dedicated SQL pools, but it incurs higher operational overhead and is not purpose-built for near real-time, sub-second latency on high-velocity streaming JSON events.

Full explanation →

859

Multi-Selecthard

Which THREE components are part of Microsoft Fabric's end-to-end analytics platform? (Choose three.)

Select 3 answers

A.Synapse Data Engineering

B.Azure Machine Learning

C.OneLake

D.Power BI

E.Azure DevOps

AnswersA, C, D

A workload in Fabric for data transformation.

Why this answer

Synapse Data Engineering is a core component of Microsoft Fabric, providing a unified platform for data ingestion, transformation, and orchestration using Spark and pipelines. It integrates seamlessly with OneLake for storage and Power BI for visualization, forming part of Fabric's end-to-end analytics solution.

Exam trap

The trap here is that candidates may confuse Azure Machine Learning as part of Fabric because both involve AI/analytics, but Fabric's scope is limited to integrated data engineering, lakehouse, and BI components, excluding dedicated ML services.

Full explanation →

860

MCQmedium

A company stores terabytes of web server log data in CSV files in Azure Data Lake Storage Gen2. Data analysts need to run ad-hoc SQL queries on this data to analyze user behavior patterns. The queries are complex, involve joins across multiple files, and the analysts prefer not to move the data into a separate store. Which Azure service should they use?

A.Azure Data Factory

B.Azure Synapse Serverless SQL pool

C.Azure SQL Database

D.Azure HDInsight

AnswerB

Correct. Serverless SQL pool in Azure Synapse allows you to query data in Data Lake Storage using T-SQL without moving it, enabling ad-hoc analysis with minimal setup.

Why this answer

Azure Synapse Serverless SQL pool is the correct choice because it allows analysts to run T-SQL queries directly against CSV files stored in Azure Data Lake Storage Gen2 without moving the data. It uses a distributed query engine to process complex joins across multiple files, making it ideal for ad-hoc analytics on large-scale log data.

Exam trap

The trap here is that candidates confuse Azure Data Factory's data movement capabilities with query execution, or assume that any SQL-capable service (like Azure SQL Database) can query external files without data import, but only Synapse Serverless SQL pool provides native, serverless SQL querying over Data Lake Storage.

How to eliminate wrong answers

Option A is wrong because Azure Data Factory is an ETL and orchestration service, not a query engine; it cannot run ad-hoc SQL queries directly on data. Option C is wrong because Azure SQL Database requires data to be imported into its relational store, violating the requirement to not move data. Option D is wrong because Azure HDInsight is a managed Hadoop/Spark cluster that requires provisioning and is overkill for ad-hoc SQL queries; it also typically involves moving data or setting up a separate compute layer.

Full explanation →

861

MCQeasy

A retail chain collects sales data from all its stores at the end of each business day by exporting CSV files from each store's database. The data is then combined and analyzed to generate daily sales reports. Which type of data processing does this describe?

A.Batch processing

B.Real-time processing

C.Stream processing

D.Interactive query

AnswerA

Batch processing handles data in discrete, scheduled batches, which matches the daily collection and reporting cycle.

Why this answer

This describes batch processing because sales data is collected from each store at the end of the business day, exported as CSV files, and then combined and analyzed in a scheduled, non-continuous manner. Batch processing is ideal for large volumes of data that are processed at periodic intervals, such as daily sales reports, rather than requiring immediate action.

Exam trap

The trap here is that candidates confuse 'daily export' with 'real-time' because they think 'daily' implies frequent updates, but batch processing is defined by the scheduled, non-continuous nature of the data collection and processing, not the frequency.

How to eliminate wrong answers

Option B (Real-time processing) is wrong because real-time processing requires data to be processed immediately as it arrives, with sub-second latency, which does not match the end-of-day CSV export and batch analysis. Option C (Stream processing) is wrong because stream processing handles continuous, unbounded data flows (e.g., from IoT sensors or clickstreams) and processes each event incrementally, not by collecting files at a fixed time. Option D (Interactive query) is wrong because interactive query refers to ad-hoc, on-demand exploration of data (e.g., using SQL against a data warehouse), not a scheduled, automated daily report generation from exported files.

Full explanation →

862

MCQmedium

A company receives daily sales data from multiple retail stores as CSV files that are uploaded to Azure Blob Storage. The data must be cleansed, validated, and aggregated before being loaded into Azure Synapse Analytics for reporting. The transformations involve complex business logic and must run reliably every night. The company wants a service that can orchestrate and execute the entire pipeline with minimal development effort. Which Azure service should they use?

A.Azure Data Factory with mapping data flows

B.Azure Stream Analytics

C.Azure Databricks

D.Azure Logic Apps

AnswerA

Azure Data Factory provides schedule-based orchestration and mapping data flows to perform complex transformations without coding. It integrates seamlessly with Azure Synapse Analytics for loading transformed data.

Why this answer

Azure Data Factory with mapping data flows is correct because it provides a code-free, visual interface for building complex data transformations (cleansing, validation, aggregation) that can be orchestrated on a schedule. Mapping data flows execute at scale on Azure Databricks clusters without requiring manual Spark code, making it ideal for nightly batch ETL pipelines with minimal development effort.

Exam trap

The trap here is that candidates often confuse Azure Data Factory with Azure Logic Apps because both are 'orchestration' services, but Logic Apps is for API/application integration (HTTP, Office 365, etc.) and cannot perform large-scale data transformations or run Spark-based data flows.

How to eliminate wrong answers

Option B (Azure Stream Analytics) is wrong because it is designed for real-time stream processing (e.g., IoT telemetry, live dashboards) and does not natively support scheduled batch orchestration or complex multi-step transformations on CSV files in Blob Storage. Option C (Azure Databricks) is wrong because while it can perform the required transformations, it requires writing custom Spark code (Scala, Python, or SQL) and managing clusters, which contradicts the 'minimal development effort' requirement. Option D (Azure Logic Apps) is wrong because it is a low-code workflow service for integrating SaaS applications and APIs, not for running large-scale data transformations or executing complex business logic on big datasets.

Full explanation →

863

MCQeasy

A company wants to run SQL queries on data stored in Azure Cosmos DB for NoSQL. Which API should they use?

A.Core (SQL) API

B.Gremlin API

C.Cassandra API

D.MongoDB API

AnswerA

Supports SQL queries on NoSQL data.

Why this answer

The Core (SQL) API is the native API for Azure Cosmos DB for NoSQL, designed to query JSON documents using a SQL-like syntax. Since the requirement is to run SQL queries on data stored in Azure Cosmos DB for NoSQL, this API directly supports that need without requiring any protocol translation or schema mapping.

Exam trap

The trap here is that candidates often confuse 'SQL queries' with the Cassandra API because both use a SQL-like language, but Cassandra uses CQL, not standard SQL, and is designed for a different data model (wide-column vs. document).

How to eliminate wrong answers

Option B (Gremlin API) is wrong because it is used for graph data models and queries using the Apache TinkerPop graph traversal language, not for SQL queries on NoSQL documents. Option C (Cassandra API) is wrong because it implements the Apache Cassandra wire protocol for wide-column stores and uses CQL (Cassandra Query Language), not standard SQL. Option D (MongoDB API) is wrong because it provides compatibility with MongoDB's document model and query syntax (e.g., BSON, find(), aggregate()), not SQL.

Full explanation →

864

MCQhard

A data engineering team uses Azure Data Factory to orchestrate an ETL pipeline that loads data from an on-premises SQL Server to Azure Synapse Analytics. The pipeline fails intermittently with timeout errors during the copy activity. The network is stable. What should they do first to resolve the issue?

A.Use a staging copy via Azure Blob Storage

B.Increase the copy activity timeout setting

C.Configure a self-hosted integration runtime

D.Downgrade the Azure Synapse dedicated SQL pool to Basic tier

AnswerB

Increasing timeout allows the copy to complete without timing out.

Why this answer

The correct answer is B because the copy activity in Azure Data Factory has a default timeout of 7 days, but intermittent timeout errors during a stable network connection indicate that the copy operation is taking longer than the configured timeout. Increasing the timeout setting directly addresses the root cause by allowing the activity to complete without prematurely failing, assuming the data volume or complexity is causing longer execution times.

Exam trap

The trap here is that candidates often confuse timeout errors with connectivity or performance issues, leading them to choose staging or integration runtime changes, when the direct fix is adjusting the activity's timeout property.

How to eliminate wrong answers

Option A is wrong because using a staging copy via Azure Blob Storage is a performance optimization for large data transfers or to enable PolyBase, not a solution for timeout errors—it does not change the timeout duration of the copy activity. Option C is wrong because a self-hosted integration runtime is required for connecting to on-premises data sources, but the question states the pipeline already uses one (since it connects to on-premises SQL Server), so reconfiguring it does not resolve timeout errors. Option D is wrong because downgrading the Azure Synapse dedicated SQL pool to Basic tier reduces performance and may worsen timeout issues, as it provides fewer resources and slower data ingestion, contradicting the goal of resolving timeouts.

Full explanation →

865

MCQmedium

A manufacturing company installs IoT sensors on equipment in a factory. Each sensor sends a reading (device ID, timestamp, temperature, vibration) every second. The application must store these readings with extremely low write latency, support queries for the latest reading per device, and allow range queries over the last hour for a specific device. The development team expects high throughput writes (millions per day) and does not require complex joins. Which Azure data store is most appropriate for this workload?

A.Azure Cosmos DB

B.Azure Table Storage

C.Azure Blob Storage

D.Azure SQL Database

AnswerA

Correct. Azure Cosmos DB provides low-latency writes, flexible schema, and supports point reads and range queries with automatic indexing. It is designed for high-throughput, globally distributed workloads like IoT time-series data.

Why this answer

Azure Cosmos DB is the most appropriate because it offers single-digit millisecond write and read latencies at any scale, which is critical for the high-throughput, low-latency IoT sensor ingestion described. Its support for automatic indexing and efficient point reads (by device ID and timestamp) enables fast retrieval of the latest reading per device, while its native time-to-live (TTL) and range query capabilities on the timestamp field allow efficient queries over the last hour for a specific device. Additionally, Cosmos DB's schema-agnostic, non-relational model fits the simple key-value structure of sensor readings without requiring complex joins.

Exam trap

The trap here is that candidates often choose Azure Table Storage because it is a low-cost, schema-less NoSQL option, but they overlook its lack of guaranteed single-digit millisecond latency and the need for manual partition key design to avoid throttling under high-throughput IoT workloads.

How to eliminate wrong answers

Option B (Azure Table Storage) is wrong because, while it can handle high throughput and is schema-less, it does not provide single-digit millisecond write latency guarantees and lacks native support for efficient range queries on timestamps with the same low latency as Cosmos DB; its partition key design would require careful planning to avoid hot partitions under millions of writes per day. Option C (Azure Blob Storage) is wrong because it is optimized for large, unstructured binary objects (blobs) and not for high-frequency, small record writes with sub-second latency; it also does not support point queries or range queries on individual records without additional indexing layers like Azure Data Lake Storage. Option D (Azure SQL Database) is wrong because, although it supports range queries and indexing, it introduces relational overhead and higher write latency compared to Cosmos DB, and its fixed schema and transaction costs are not ideal for the high-throughput, schema-flexible IoT workload described.

Full explanation →

866

MCQhard

Refer to the exhibit. A database administrator runs this KQL query in Azure Monitor Log Analytics. The query returns no results. What is the most likely reason?

A.The summarize operator syntax is wrong

B.The ResourceType filter is incorrect

C.The render command is not supported

D.The time range is incorrect

AnswerB

Azure SQL Database diagnostics use ResourceType 'MICROSOFT.SQL/SERVERS/DATABASES', not 'AZURESQLDB'.

Why this answer

The KQL query filters on `ResourceType` with a value that does not match any actual Azure resource type (e.g., a typo or incorrect casing). Since Azure Monitor Log Analytics stores resource types in a specific format (e.g., 'microsoft.compute/virtualmachines'), an incorrect filter will return zero results even if data exists. The query syntax, render command, and time range are all valid, so the filter is the most likely cause.

Exam trap

Microsoft often tests the candidate's understanding that KQL filters are case-sensitive and that resource type values must exactly match the Azure Resource Manager format, leading candidates to overlook a simple typo or casing error.

How to eliminate wrong answers

Option A is wrong because the `summarize` operator syntax is correct: it uses `count()` as an aggregation function, which is valid. Option C is wrong because the `render` command is supported in Azure Monitor Log Analytics for visualizing results (e.g., `render timechart`). Option D is wrong because the time range is not specified in the query, so it defaults to the last 24 hours, which is a valid range and would not cause zero results unless no data exists in that period.

Full explanation →

867

MCQeasy

Your company wants to use Microsoft Fabric to create a unified analytics platform. Which component in Microsoft Fabric provides a lake-centric, collaborative, and governed data foundation?

A.Power BI

B.Data Factory

C.OneLake

D.Synapse Data Engineering

AnswerC

OneLake is the lake-centric data foundation in Fabric.

Why this answer

OneLake is the correct answer because it is the single, unified, lake-centric data foundation in Microsoft Fabric. It provides a multi-cloud, SaaS-based data lake that is automatically provisioned for every Fabric tenant, enabling collaborative and governed access to data without data duplication, while supporting open formats like Delta Parquet.

Exam trap

The trap here is that candidates confuse the tool that provides the data foundation (OneLake) with the workloads that operate on top of it (like Synapse Data Engineering or Data Factory), or mistake Power BI's role as a visualization layer for the underlying storage and governance layer.

How to eliminate wrong answers

Option A is wrong because Power BI is a business intelligence and visualization tool, not a data lake or storage foundation; it consumes data from sources like OneLake but does not provide the lake-centric foundation itself. Option B is wrong because Data Factory is a data integration and orchestration service for pipelines and data movement, not a governed, collaborative data lake. Option D is wrong because Synapse Data Engineering is a workload for building and managing data transformation pipelines (e.g., using Spark or notebooks), but it relies on OneLake as its underlying storage and governance layer, not the other way around.

Full explanation →

868

MCQmedium

A social media analytics company needs to store large amounts of user activity logs. Each log entry contains a timestamp, user ID, activity type, and a dynamic set of custom attributes (e.g., page viewed, time spent). The application requires low-latency writes and point reads by a composite key (user ID and timestamp). The data is rarely updated after insertion. The company wants a fully managed NoSQL database that supports serverless throughput and automatic expiration of old logs (TTL). Which Azure Cosmos DB API should they choose?

A.Table API

B.NoSQL API (Core/SQL API)

C.Cassandra API

D.Gremlin API

AnswerA

The Table API is built for key-value stores and supports a schema-less design with composite keys (PartitionKey + RowKey). It also supports serverless throughput and TTL (time-to-live) to automatically delete old entries, fitting the activity log use case.

Why this answer

The Table API is the correct choice because it provides a fully managed, serverless NoSQL database with automatic TTL (Time-to-Live) for data expiration, low-latency point reads and writes by a composite key (partition key + row key), and is optimized for storing large volumes of structured log data with dynamic attributes. It supports the exact requirements: high-throughput writes, point queries by user ID and timestamp, and automatic expiration of old logs without manual intervention.

Exam trap

The trap here is that candidates often choose the NoSQL API (Core/SQL API) because it is the most well-known Cosmos DB API, but they overlook that the Table API is specifically optimized for high-volume, low-latency key-value workloads with composite keys and automatic TTL, making it the correct choice for log data with dynamic attributes.

How to eliminate wrong answers

Option B (NoSQL API) is wrong because while it supports serverless throughput and TTL, it is designed for document-based data with flexible schemas and requires a partition key and sort key for point reads, but it does not natively support composite key queries as efficiently as the Table API's row key design; however, the primary reason it is not the best fit is that the Table API is more cost-effective and simpler for log data with dynamic attributes. Option C (Cassandra API) is wrong because it is based on the Cassandra distributed database, which uses a different data model (wide-column stores) and does not support serverless throughput in the same way; it also requires more manual management of consistency and replication, and while it supports TTL, it is not the simplest fully managed option for this use case. Option D (Gremlin API) is wrong because it is a graph database API designed for traversing relationships between entities (e.g., social networks, recommendation engines), not for storing and querying time-series log data with composite keys; it lacks native support for TTL and serverless throughput in the same manner as the Table API.

Full explanation →

869

MCQeasy

You need to migrate an on-premises SQL Server database to Azure SQL Managed Instance with minimal downtime. Which tool should you use?

A.Azure Data Factory

B.SQL Server Integration Services (SSIS)

C.BACPAC export and import

D.Azure Database Migration Service (DMS)

AnswerD

DMS supports online migrations with minimal downtime.

Why this answer

Option A is correct because Azure DMS supports online migration with minimal downtime. Option B is wrong because BACPAC is for offline export/import. Option C is wrong because SSIS is for integration.

Option D is wrong because Azure Data Factory is for data movement but not optimized for online migration.

Full explanation →

870

Multi-Selectmedium

Which TWO of the following are correct descriptions of data processing workloads in Azure?

Select 2 answers

A.Streaming processing is used for interactive queries on historical data.

B.Streaming processing is used to process data at rest.

C.Streaming processing is used to process data in real time as it arrives.

D.Batch processing is used to process data in real time as it arrives.

E.Batch processing is used to process large volumes of data at scheduled intervals.

AnswersC, E

Streaming processes data continuously in real time.

Why this answer

Option C is correct because streaming processing in Azure (e.g., Azure Stream Analytics, Event Hubs, or Kafka on HDInsight) is designed to ingest, analyze, and act on data in near real-time as it arrives, often with sub-second latency. This is fundamentally different from batch processing, which handles data at rest.

Exam trap

The trap here is that candidates confuse 'streaming' with 'interactive querying' or assume batch can handle real-time data, but Azure explicitly separates these workloads based on data state (in motion vs. at rest) and latency requirements.

Full explanation →

871

MCQmedium

A company has an existing IoT application that uses Apache Cassandra for time-series sensor data. They want to migrate to Azure's fully managed NoSQL database service while continuing to use the Cassandra Query Language (CQL) and benefiting from global distribution and low latency. Which Azure Cosmos DB API should they use?

A.Core (SQL) API

B.MongoDB API

C.Cassandra API

D.Gremlin API

AnswerC

Correct. The Cassandra API provides compatibility with the Apache Cassandra wire protocol and CQL, enabling a seamless migration from a Cassandra cluster to a globally distributed, fully managed service.

Why this answer

The Cassandra API for Azure Cosmos DB is wire-protocol-compatible with Apache Cassandra, meaning you can use existing CQL (Cassandra Query Language) tools, drivers, and code with minimal changes. It provides a fully managed, globally distributed NoSQL database with low-latency reads and writes, which directly matches the company's requirement to migrate from self-managed Cassandra while preserving their CQL-based application logic.

Exam trap

The trap here is that candidates may confuse the 'Cassandra API' with the 'Core (SQL) API' because both support SQL-like syntax, but only the Cassandra API uses the native CQL wire protocol and wide-column storage model required for time-series sensor data.

How to eliminate wrong answers

Option A is wrong because the Core (SQL) API uses a SQL-like query language and a different data model (JSON documents with optional schema), not the Cassandra Query Language (CQL), so existing CQL code would not work. Option B is wrong because the MongoDB API uses the MongoDB wire protocol and BSON document model, which is incompatible with CQL and Cassandra's table/partition-key structure. Option D is wrong because the Gremlin API is designed for graph databases using the Apache TinkerPop graph traversal language, not for time-series sensor data modeled in Cassandra's wide-column format.

Full explanation →

872

MCQhard

A company ingests raw clickstream data as JSON files into Azure Data Lake Storage Gen2. Data scientists need to explore the data interactively using Python notebooks, and the BI team needs to create reports from aggregated datasets derived from this data. The solution must be serverless, scale automatically, and minimize administration. Which Azure service should they choose?

A.A. Azure Synapse Analytics (serverless SQL pool)

B.B. Azure Databricks

C.C. Azure HDInsight with Spark

D.D. Azure Data Lake Analytics

AnswerB

Azure Databricks provides collaborative notebooks with Python, Scala, and SQL support. Its serverless mode pools resources dynamically, scales automatically, and handles the full data science lifecycle from exploration to transformation.

Why this answer

Azure Databricks is correct because it provides a serverless, interactive Apache Spark environment that data scientists can use with Python notebooks for exploratory analysis, and it can produce aggregated datasets for BI reporting. It scales automatically and minimizes administration by managing the cluster lifecycle, making it ideal for ad-hoc data exploration on raw JSON files in Azure Data Lake Storage Gen2.

Exam trap

The trap here is that candidates often confuse serverless SQL pools (Synapse) as suitable for interactive Python exploration, but they are designed for SQL-based querying, not notebook-based data science workflows.

How to eliminate wrong answers

Option A is wrong because Azure Synapse Analytics (serverless SQL pool) is optimized for T-SQL queries over data in the lake, not for interactive Python notebook-based exploration by data scientists. Option C is wrong because Azure HDInsight with Spark requires manual cluster configuration, scaling, and ongoing administration, violating the serverless and minimize administration requirements. Option D is wrong because Azure Data Lake Analytics is a legacy service that uses U-SQL, not Python notebooks, and is being phased out in favor of serverless Spark offerings like Databricks.

Full explanation →

873

MCQhard

A global gaming company stores player profiles in Azure Cosmos DB. Each profile document contains PlayerID (unique), PlayerName, Email, and a nested array of Achievements. The most common query is to look up a player by PlayerID and retrieve their achievements. The company needs strong consistency for reads and writes to ensure that when a player earns an achievement, it is immediately visible. Which partition key and consistency level should they choose?

A.A. Partition key: PlayerID; Consistency: Eventual

B.B. Partition key: PlayerID; Consistency: Strong

C.C. Partition key: Achievements; Consistency: Strong

D.D. Partition key: Email; Consistency: Bounded staleness

AnswerB

PlayerID makes each player's document a single partition, enabling efficient lookups. Strong consistency ensures that once a write is committed, subsequent reads return the latest data.

Why this answer

Option B is correct because PlayerID is the natural partition key for the most common query (lookup by PlayerID), ensuring efficient single-partition queries. Strong consistency is required to guarantee that when a player earns an achievement, the write is immediately visible to all subsequent reads, which is critical for the gaming scenario.

Exam trap

The trap here is that candidates may confuse 'most common query' with 'partition key' and pick a non-query column (like Achievements or Email) or choose a weaker consistency level, not realizing that Strong consistency is required for immediate visibility and that PlayerID is the optimal partition key for the lookup pattern.

How to eliminate wrong answers

Option A is wrong because Eventual consistency does not guarantee immediate visibility of writes, which violates the requirement that achievements are immediately visible after earning. Option C is wrong because Achievements is a nested array, not a top-level property, and using it as a partition key would cause inefficient cross-partition queries and potential hot partitions. Option D is wrong because Email is not the primary query key (PlayerID is), and Bounded staleness, while stronger than Eventual, still allows a configurable lag, which does not meet the strict 'immediately visible' requirement.

Full explanation →

874

MCQmedium

The exhibit shows a T-SQL query against an Azure SQL Database. What is the purpose of the HAVING clause in this query?

A.To sort the result set by TotalSales descending

B.To join two tables

C.To filter groups after aggregation

D.To filter rows before grouping

AnswerC

HAVING filters groups based on aggregate conditions.

Why this answer

The HAVING clause is used in T-SQL to filter groups after the GROUP BY clause has performed aggregation. In this query, it restricts the result set to only those product categories whose total sales (SUM(Amount)) exceed 1000, which is a condition on the aggregated value, not on individual rows.

Exam trap

The trap here is that candidates often confuse HAVING with WHERE, mistakenly thinking HAVING filters individual rows before grouping, when in fact WHERE performs that role and HAVING only applies after aggregation.

How to eliminate wrong answers

Option A is wrong because sorting the result set is done by the ORDER BY clause, not HAVING. Option B is wrong because joining tables is accomplished with JOIN clauses (e.g., INNER JOIN, LEFT JOIN), not HAVING. Option D is wrong because filtering rows before grouping is the role of the WHERE clause, which operates on individual rows before aggregation; HAVING filters after aggregation.

Full explanation →

875

MCQmedium

A marketing company ingests streaming data from social media feeds into Azure Event Hubs. They want to perform real-time sentiment analysis on the data and store the results in Azure SQL Database for immediate dashboarding. They also need to aggregate the raw data over longer time windows and store it in Azure Data Lake Storage for historical trend analysis. Which combination of Azure services should they use for the two processing paths?

A.Azure Stream Analytics for real-time analysis and Azure Data Factory for batch aggregation

B.Azure Databricks for both real-time analysis and batch aggregation

C.Azure Stream Analytics for both real-time analysis and batch aggregation

D.Azure Data Factory for real-time analysis and Azure Databricks for batch aggregation

AnswerA

Azure Stream Analytics handles real-time processing and outputs to SQL Database. Azure Data Factory can schedule batch pipelines to read raw data from Event Hubs (or captured data) and aggregate it into Azure Data Lake Storage.

Why this answer

Azure Stream Analytics is ideal for real-time sentiment analysis on streaming data from Event Hubs, as it can process data in-motion with low latency and output directly to Azure SQL Database for immediate dashboarding. Azure Data Factory is the correct choice for batch aggregation over longer time windows, as it can orchestrate and execute periodic data movement and transformation jobs to load aggregated data into Azure Data Lake Storage for historical analysis.

Exam trap

The trap here is that candidates often assume a single service like Stream Analytics or Databricks can handle both real-time and batch processing equally well, but the exam expects you to recognize that Stream Analytics excels at real-time streaming while Data Factory is the appropriate managed service for scheduled batch aggregation in a cost-effective, serverless manner.

How to eliminate wrong answers

Option B is wrong because Azure Databricks can handle both real-time and batch processing, but it is not the most cost-effective or simplest solution for this specific scenario; Stream Analytics is purpose-built for real-time streaming without the overhead of a Spark cluster, and Data Factory is a managed orchestration service better suited for scheduled batch aggregation than Databricks. Option C is wrong because Azure Stream Analytics is designed for real-time stream processing and does not natively support scheduled batch aggregation over long time windows; it lacks the orchestration and scheduling capabilities needed for periodic batch jobs. Option D is wrong because Azure Data Factory is not a real-time streaming service; it is an ETL and data orchestration tool that cannot perform low-latency sentiment analysis on streaming data, and Azure Databricks, while capable of batch processing, is overkill for simple scheduled aggregation compared to Data Factory.

Full explanation →

876

MCQhard

A company needs to migrate 10 on-premises SQL Server databases (each 50–200 GB) to Azure. The databases frequently run cross-database queries using three-part names (e.g., DB1.dbo.table) and rely on SQL Server Agent for maintenance tasks. They want to minimize management overhead and share resources across databases to reduce costs. Which Azure SQL deployment option should they choose?

A.Azure SQL Database elastic pool

B.Azure SQL Managed Instance

C.SQL Server on Azure Virtual Machine

D.Azure SQL Database single database

AnswerB

Managed Instance supports cross-database queries, SQL Server Agent, and allows multiple databases in one instance sharing resources.

Why this answer

Azure SQL Managed Instance is correct because it provides native support for cross-database queries using three-part names (e.g., DB1.dbo.table) and SQL Server Agent for maintenance tasks, which are critical requirements. It also offers a fully managed platform that minimizes management overhead while allowing resource sharing across databases within the instance, reducing costs compared to single databases or VMs.

Exam trap

The trap here is that candidates often choose Azure SQL Database elastic pool (Option A) because they think it supports cross-database queries and SQL Agent, but it actually lacks native three-part name support and SQL Agent, which are only available in Managed Instance.

How to eliminate wrong answers

Option A is wrong because Azure SQL Database elastic pool supports cross-database queries only via elastic query (which requires external data sources and is not native three-part name support) and does not include SQL Server Agent, so it cannot meet the maintenance task requirement. Option C is wrong because SQL Server on Azure Virtual Machine requires full management of the OS and SQL Server (patching, backups, HA), increasing management overhead, and does not inherently share resources across databases to reduce costs like a managed instance does. Option D is wrong because Azure SQL Database single database does not support cross-database queries using three-part names (it requires elastic query or external tables) and lacks SQL Server Agent, making it unsuitable for the stated needs.

Full explanation →

877

MCQmedium

A company runs a sales analytics workload on Azure Synapse Analytics. They notice that queries against the fact table are slow. The fact table is hash-distributed on the SalesDate column. Which design change would most likely improve query performance?

A.Change distribution key to the primary key column

B.Replicate the fact table to all nodes

C.Change distribution to round-robin

D.Drop all indexes on the fact table

AnswerB

Replication avoids data movement during joins and is effective for smaller fact tables.

Why this answer

Option D is correct because hash-distributing on a high-cardinality date column can cause data skew and slow queries. Replicating the fact table avoids shuffling and improves performance for smaller tables. Option A is wrong because round-robin distribution is for staging tables, not fact tables.

Option B is wrong because dropping indexes makes queries slower. Option C is wrong because the distribution key should be a frequently joined column, not necessarily the primary key.

Full explanation →

878

MCQmedium

A healthcare application stores patient vital signs readings. Each reading is a JSON document with fields: PatientID, Timestamp, HeartRate, BloodPressure (systolic and diastolic). The application frequently queries for all readings of a specific patient within a time range, and the schema varies occasionally (e.g., new optional fields are added). How should this data be classified?

A.Structured

B.Semi-structured

C.Unstructured

D.Relational

AnswerB

Semi-structured data uses tags or markers (like JSON) to separate data elements and allows schema flexibility.

Why this answer

The data is semi-structured because it is stored as JSON documents, which have a flexible schema that can vary between records (e.g., new optional fields can be added). JSON documents are self-describing and do not require a fixed schema like relational tables, but they still have organizational properties (fields like PatientID, Timestamp) that distinguish them from unstructured data like plain text or images. The application's queries on specific fields (PatientID, Timestamp) further confirm the data has structure, but the schema flexibility rules out structured or relational classifications.

Exam trap

The trap here is that candidates confuse 'structured' with 'having fields'—they see PatientID and Timestamp and assume it must be structured, but the key differentiator is schema flexibility (optional fields, varying structure) which defines semi-structured data.

How to eliminate wrong answers

Option A is wrong because structured data requires a rigid, predefined schema (e.g., fixed columns and data types in a SQL table), but JSON documents allow schema variation and optional fields, which violates the strict schema constraint. Option C is wrong because unstructured data has no predefined data model or organization (e.g., raw text files, images, videos), whereas JSON documents have named fields and a hierarchical structure that can be parsed and queried. Option D is wrong because relational data is a subset of structured data that enforces relationships through foreign keys and normalization, but JSON documents in this scenario are not stored in relational tables and do not enforce referential integrity or a fixed schema.

Full explanation →

879

MCQmedium

A global gaming company uses Azure Cosmos DB to store player scores and profiles. The application reads and writes player data from multiple regions worldwide. The company wants to ensure that when a player updates their high score in one region, any subsequent read from another region will always see the latest value, even if there is network latency between regions. Which consistency level should they choose?

A.Strong

B.Bounded staleness

C.Session

D.Eventual

AnswerA

Strong consistency ensures reads always see the latest committed write across all regions, meeting the requirement.

Why this answer

Strong consistency ensures that any read operation returns the most recent write across all regions. In Azure Cosmos DB, this is achieved by synchronously replicating writes to all replicas before acknowledging the write, guaranteeing that subsequent reads in any region always see the latest value, regardless of network latency.

Exam trap

The trap here is that candidates often confuse 'strong consistency' with 'global replication speed,' assuming that eventual or session consistency might be sufficient if the application can tolerate slight delays, but the question explicitly requires that any subsequent read from another region always sees the latest value, which only strong consistency guarantees.

How to eliminate wrong answers

Option B (Bounded staleness) is wrong because it allows reads to lag behind writes by a configurable time interval or number of versions, meaning a read in another region might not see the latest high score immediately. Option C (Session) is wrong because it guarantees monotonic reads and writes only within a single client session; reads from a different region or session could return stale data. Option D (Eventual) is wrong because it offers no ordering guarantees and reads may return outdated values for an unbounded period, which violates the requirement that any subsequent read always sees the latest value.

Full explanation →

880

MCQhard

Your application stores millions of small log entries in Azure Table Storage. Queries by partition key and row key are fast, but you also need to query by timestamp across partitions. The query performance is slow. What is the best way to improve query performance?

A.Migrate the data to Azure Cosmos DB and use the SQL API.

B.Use Azure Cognitive Search to index the Table Storage data.

C.Create a new table with the timestamp as the partition key and copy data there for time-based queries.

D.Add a secondary index on the timestamp column.

AnswerC

This allows efficient queries by timestamp using partition key design.

Why this answer

Creating a separate table with timestamp as the partition key allows efficient range queries across all data. Option A is wrong because indexing options in Table Storage are limited. Option B is wrong because Cosmos DB is more expensive and may not be necessary.

Option D is wrong because a secondary index is not supported in Table Storage.

Full explanation →

881

MCQmedium

A financial services company runs a single SQL Server database that is 6 TB in size and handles a high volume of concurrent transactions. The database needs to support near real-time analytics without impacting OLTP performance. The company wants to migrate to Azure SQL Database and requires fast scale-out for read workloads, as well as the ability to independently scale compute and storage. Which Azure SQL Database service tier should they choose?

A.Serverless

B.Hyperscale

C.Business Critical

D.General Purpose

AnswerB

Hyperscale supports databases up to 100 TB, allows independent scaling of compute and storage, and provides up to four readable replicas to offload reporting and analytics workloads.

Why this answer

Hyperscale is the correct choice because it is designed for databases up to 100 TB, supports high-volume concurrent transactions, and provides near real-time read scale-out via named replicas that offload read workloads without affecting OLTP performance. It also allows independent scaling of compute (vCores) and storage (auto-scaled), meeting the company's requirements for fast scale-out and decoupled resources.

Exam trap

The trap here is that candidates may confuse Business Critical's high availability features with read scale-out, but Business Critical does not provide dedicated read replicas for analytics workloads, and its storage is not independently scalable from compute.

How to eliminate wrong answers

Option A is wrong because Serverless is for intermittent, unpredictable workloads with auto-pause and auto-scale compute, not for high-volume concurrent transactions requiring near real-time analytics and fast scale-out. Option C is wrong because Business Critical provides high availability and low latency via Always On Availability Groups but does not support independent scaling of compute and storage (storage is tied to compute) and lacks the read scale-out architecture of Hyperscale. Option D is wrong because General Purpose is a balanced tier for typical workloads but does not support fast scale-out for read workloads (no named replicas) and has a maximum database size of 4 TB, which is insufficient for a 6 TB database.

Full explanation →

882

MCQeasy

A business user wants to ask natural language questions about their data in Power BI and get answers without writing DAX. Which Power BI feature should they use?

A.Copilot for Microsoft 365

B.Q&A visual

C.Power Automate

D.Quick Insights

AnswerB

The Q&A visual allows users to type natural language questions and get answers.

Why this answer

The Q&A visual in Power BI allows users to type natural language questions about their data and receive answers in the form of charts or tables, without needing to write DAX expressions. It uses an underlying natural language engine that interprets the query and automatically generates the appropriate visual or summary. This directly matches the business user's requirement for a no-code, natural language interface.

Exam trap

The trap here is that candidates may confuse the Q&A visual with Quick Insights, because both involve automated analysis, but Quick Insights is a one-click automated pattern discovery tool, not an interactive natural language query interface.

How to eliminate wrong answers

Option A is wrong because Copilot for Microsoft 365 is an AI assistant integrated into Microsoft 365 apps (like Word, Excel, Teams) and does not provide a dedicated natural language query interface within Power BI reports. Option C is wrong because Power Automate is a workflow automation tool for creating flows between services, not a feature for asking natural language questions about data in Power BI. Option D is wrong because Quick Insights automatically generates visualizations and patterns from a dataset without user input, but it does not allow users to ask specific natural language questions; it is an automated, non-interactive analysis.

Full explanation →

883

MCQmedium

Refer to the exhibit. The exhibit shows a query used in Azure Cosmos DB. What is the correct interpretation of this query?

A.It is a SQL Server query that selects from a table named 'c'.

B.It queries an Azure Cosmos DB container for items where the age property exceeds 25.

C.It executes a stored procedure in Azure Cosmos DB.

D.It retrieves all documents from the Azure Cosmos DB database named 'c'.

AnswerB

The query uses the FROM clause with alias 'c' and filters on age.

Why this answer

In Azure Cosmos DB SQL API, 'c' is an alias for each item in the container. The query retrieves all items where the 'age' property is greater than 25. Option A is wrong because it's not SQL Server.

Option C is wrong because it's not filtering documents. Option D is wrong because it's not a stored procedure.

Full explanation →

884

MCQmedium

A media company stores large video files and associated metadata (title, duration, tags) as JSON documents. The application requires low-latency streaming of videos to users worldwide and the ability to quickly query metadata by tag. Which combination of Azure services should the company use?

A.Azure Blob Storage for videos and Azure Cosmos DB for metadata

B.Azure Blob Storage for both videos and metadata

C.Azure Cosmos DB for videos and Azure Table Storage for metadata

D.Azure Files for videos and Azure SQL Database for metadata

AnswerA

Correct. Blob Storage handles large video files efficiently, while Cosmos DB provides fast, indexed querying on flexible JSON metadata.

Why this answer

Azure Blob Storage is optimized for storing large binary objects like video files, offering high-throughput streaming via HTTP/HTTPS and integration with CDN for low-latency global delivery. Azure Cosmos DB provides single-digit millisecond read and write latencies with automatic indexing, making it ideal for quickly querying JSON metadata by tag using SQL or MongoDB API. This combination separates storage concerns (blobs for raw video, document DB for structured metadata) to meet both streaming and query performance requirements.

Exam trap

The trap here is that candidates may assume a single service (like Blob Storage or Cosmos DB) can handle both data types, but the exam tests understanding that each Azure service has specific strengths—blobs for large binary objects and Cosmos DB for low-latency document queries—and that mixing them is the correct architectural pattern.

How to eliminate wrong answers

Option B is wrong because storing metadata as blobs in Azure Blob Storage would require loading entire JSON documents to query by tag, resulting in high latency and no native indexing or query capabilities. Option C is wrong because Azure Cosmos DB is not designed for storing large video files (it has a 2 MB document size limit per item) and Azure Table Storage lacks native JSON document querying and indexing for tag-based searches. Option D is wrong because Azure Files uses SMB protocol optimized for file shares, not streaming, and Azure SQL Database is a relational store that would require schema design and normalization for JSON metadata, adding complexity and latency compared to a native document database.

Full explanation →

885

MCQhard

Your organization has a data warehouse in Azure Synapse Analytics. You need to load data from Azure Blob Storage daily, transforming it using a data flow. Which Azure service should you use for the ETL process?

A.Azure Databricks

B.Azure Data Factory

C.Azure Logic Apps

D.Azure Synapse Pipelines

AnswerB

Offers mapping data flows for visual ETL without coding.

Why this answer

Azure Data Factory (ADF) is the correct choice because it provides native integration with Azure Synapse Analytics and Azure Blob Storage, and it includes a visual data flow designer for transforming data without writing code. ADF's mapping data flows execute at scale on Spark clusters, making it ideal for daily ETL workloads that require both ingestion and transformation.

Exam trap

The trap here is that candidates confuse Azure Synapse Pipelines (which is just ADF inside Synapse) as a separate service, but the correct Azure service name for the ETL tool is Azure Data Factory, not Synapse Pipelines.

How to eliminate wrong answers

Option A is wrong because Azure Databricks is a big data analytics platform that requires you to write code (Python, Scala, SQL) to build transformations, and it does not have a native, no-code data flow designer like ADF; it is overkill for a simple daily load with transformations. Option C is wrong because Azure Logic Apps is a workflow automation service designed for integrating SaaS applications and orchestrating business processes, not for performing data transformations at scale or loading data into a data warehouse. Option D is wrong because Azure Synapse Pipelines is actually built on top of Azure Data Factory and shares the same engine, but the standalone service name for the ETL tool is Azure Data Factory; Synapse Pipelines is a feature within Synapse, not a separate service, and the question asks for the Azure service, which is Azure Data Factory.

Full explanation →

886

Multi-Selecteasy

Which TWO are benefits of using Azure Synapse Analytics for a data warehouse workload?

Select 2 answers

A.Ability to query data in the data lake using serverless SQL

B.Native support for MongoDB data sources

C.Unified experience for data integration, warehousing, and big data analytics

D.Automatic indexing of all data

E.Built-in email alerts for query performance

AnswersA, C

Serverless SQL pool allows querying data lake data.

Why this answer

Azure Synapse Analytics provides a serverless SQL pool that allows you to query data directly from your data lake (e.g., Azure Data Lake Storage Gen2) without needing to load it into a dedicated SQL pool. This enables cost-effective, on-demand querying of large-scale data in open formats like Parquet or CSV, making it a key benefit for data warehouse workloads that integrate lake and warehouse patterns.

Exam trap

The trap here is that candidates may confuse 'unified experience' (which is correct) with features like automatic indexing or native NoSQL support, which are not part of Synapse's core data warehouse capabilities.

Full explanation →

887

MCQhard

A company uses Azure Databricks to process data stored in Azure Data Lake Storage Gen2. They need to enforce fine-grained access control on files and folders based on user identity. Which security feature should they implement?

A.Storage account firewall rules

B.Shared access signatures (SAS)

C.Access control lists (ACLs)

D.Azure RBAC roles on the storage account

AnswerC

ACLs provide fine-grained POSIX permissions on files and folders.

Why this answer

Access control lists (ACLs) on Azure Data Lake Storage Gen2 provide POSIX-compliant, fine-grained permissions at the file and folder level. This allows you to grant read, write, or execute permissions to specific users or groups, which is exactly what is needed for enforcing identity-based access control on individual files and folders.

Exam trap

The trap here is that candidates often confuse Azure RBAC (which controls management-plane access) with ACLs (which control data-plane access at the file/folder level), leading them to select RBAC when fine-grained data access is required.

How to eliminate wrong answers

Option A is wrong because storage account firewall rules control network-level access (IP addresses or virtual networks), not user-identity-based permissions on files or folders. Option B is wrong because shared access signatures (SAS) grant time-limited, delegated access to storage resources via a token, but they do not enforce access based on the user's identity; the token itself is the credential. Option D is wrong because Azure RBAC roles on the storage account provide coarse-grained control over management operations (e.g., read keys, list containers) but cannot enforce fine-grained permissions on individual files or folders within a container.

Full explanation →

888

Multi-Selecthard

A company plans to migrate an on-premises SQL Server database to Azure. The database relies on SQL Agent jobs for nightly maintenance, uses Service Broker for asynchronous messaging, and requires high CPU and memory resources. The company wants to minimize application code changes. Which two Azure SQL deployment options meet all these requirements? (Choose two.)

Select 2 answers

A.Azure SQL Database Single Database

B.Azure SQL Managed Instance

C.SQL Server on Azure Virtual Machine

D.Azure SQL Database Elastic Pool

AnswersB, C

Managed Instance provides built-in support for SQL Agent and Service Broker, and allows scaling to meet high resource requirements with minimal code changes.

Why this answer

Azure SQL Managed Instance is correct because it provides near-100% compatibility with on-premises SQL Server, including SQL Agent jobs and Service Broker, while offering high CPU and memory resources. This minimizes application code changes by preserving existing features and configurations.

Exam trap

The trap here is that candidates assume Azure SQL Database Single Database or Elastic Pool can support all SQL Server features, but they lack SQL Agent and Service Broker, forcing code changes that the question explicitly wants to minimize.

Full explanation →

889

MCQmedium

A company uses Azure SQL Database for an e-commerce system. The Orders table has millions of rows with a clustered index on OrderID (the primary key). Queries that filter on OrderDate and CustomerID to find recent orders for a specific customer are very slow. Which indexing strategy will most improve the performance of these queries?

A.Create a nonclustered index on OrderID only

B.Create separate nonclustered indexes on OrderDate and CustomerID

C.Create a nonclustered composite index on (OrderDate, CustomerID)

D.Create a clustered index on CustomerID instead of OrderID

AnswerC

A composite index on both columns allows the database to find rows matching both filter conditions in a single index seek. This is the most efficient strategy for queries that filter on multiple columns together.

Why this answer

The query filters on both OrderDate and CustomerID, so a composite nonclustered index on (OrderDate, CustomerID) allows SQL Server to perform a single index seek to locate the matching rows without touching the clustered index until the final key lookup. This dramatically reduces I/O compared to scanning the entire clustered index or using multiple separate indexes.

Exam trap

The trap here is that candidates often think separate indexes on each filter column are sufficient, not realizing that a composite index is far more efficient for queries that filter on multiple columns together, because it avoids the need for index intersection or multiple lookups.

How to eliminate wrong answers

Option A is wrong because creating a nonclustered index on OrderID only does not help filter on OrderDate or CustomerID; the query would still need to scan the clustered index. Option B is wrong because separate indexes on OrderDate and CustomerID would require SQL Server to choose one index for a seek and then filter the other column in a bookmark lookup, or perform an index intersection, both of which are less efficient than a single composite index that covers both filter columns. Option D is wrong because changing the clustered index to CustomerID would reorder the entire table by CustomerID, which could improve queries filtering by CustomerID alone but would not help the OrderDate filter and would disrupt the primary key's order, potentially harming other queries that rely on OrderID ordering.

Full explanation →

890

Multi-Selecteasy

Which TWO of the following are features of Azure Cosmos DB that help ensure high availability?

Select 2 answers

A.Data encryption at rest

B.Change feed

C.Automatic failover

D.Point-in-time restore

E.Multi-region writes

AnswersC, E

Automatic failover ensures continuity if a region fails.

Why this answer

Cosmos DB supports multi-region writes and automatic failover for high availability. Point-in-time restore and change feed are features but not directly for high availability.

Full explanation →

891

MCQmedium

A company stores customer data in a relational database. The database design includes a rule that every order must be associated with a valid customer ID that exists in the Customers table. This rule is an example of which data concept?

A.Referential integrity

B.Data normalization

C.Entity integrity

D.Data consistency

AnswerA

Referential integrity uses foreign keys to ensure values in one table match primary keys in another, exactly as described.

Why this answer

Referential integrity ensures that relationships between tables remain consistent. In a relational database, a foreign key constraint enforces that every order's customer ID must match an existing customer ID in the Customers table, preventing orphaned records. This rule directly implements referential integrity as defined by the SQL standard (e.g., via FOREIGN KEY constraints).

Exam trap

The trap here is that candidates often confuse referential integrity with entity integrity, mistakenly thinking that any rule involving a 'valid ID' is about primary keys, when in fact it is about foreign key relationships between tables.

How to eliminate wrong answers

Option B is wrong because data normalization is a design process to reduce data redundancy and avoid anomalies (e.g., 1NF, 2NF, 3NF), not a rule that enforces valid cross-table relationships. Option C is wrong because entity integrity ensures that the primary key of a table is unique and not null, which applies to the Customers table's customer ID column, not to the foreign key relationship from Orders to Customers. Option D is wrong because data consistency is a broader property of the database state (e.g., ensuring all constraints are satisfied), not a specific constraint type; referential integrity is one mechanism to achieve consistency, but the rule itself is a referential integrity constraint.

Full explanation →

892

MCQmedium

A company stores customer transaction data in Azure Blob Storage. The data is rarely accessed after 30 days, but must be retained for 7 years for compliance. Which access tier minimizes storage cost while meeting the retention requirement?

A.Hot tier

B.Cool tier

C.Premium tier

D.Archive tier

AnswerD

Archive tier is the lowest-cost option for data that is rarely accessed and requires long-term retention.

Why this answer

The Archive tier is the correct choice because it offers the lowest storage cost for data that is rarely accessed, which aligns with the scenario where data is accessed infrequently after 30 days but must be retained for 7 years. Azure Blob Storage's Archive tier is designed for long-term retention with a retrieval latency of several hours, making it cost-effective for compliance-driven data that does not require immediate access.

Exam trap

The trap here is that candidates may choose the Cool tier thinking it balances cost and access, but they overlook that the Archive tier is significantly cheaper for data that is accessed less than once a year, which is typical for 7-year compliance retention.

How to eliminate wrong answers

Option A is wrong because the Hot tier is optimized for frequent access and has the highest storage cost, which would be wasteful for data that is rarely accessed after 30 days. Option B is wrong because the Cool tier is designed for data accessed infrequently (e.g., every 30 days or more) but still has higher storage costs than Archive and is not the most cost-effective for 7-year retention with rare access. Option C is wrong because the Premium tier is for high-performance, low-latency access (e.g., via Azure Virtual Machines) and is the most expensive, making it unsuitable for rarely accessed compliance data.

Full explanation →

893

MCQhard

A financial services company stores account balances in Azure SQL Database (strong consistency) and transaction audit logs in Azure Cosmos DB (eventual consistency by default). A compliance requirement demands that when a transaction is rolled back in the SQL database, the corresponding audit log entries in Cosmos DB must also be removed within a short time frame. Which term best describes the difficulty of maintaining this constraint?

A.ACID compliance

B.Idempotency

C.Distributed transaction coordination

D.Schema flexibility

AnswerC

Correct. Managing atomicity across heterogeneous stores that lack native distributed transaction support requires custom compensation logic or a saga pattern.

Why this answer

Option C is correct because the scenario requires coordinating a rollback across two distinct data stores—Azure SQL Database (ACID-compliant, strong consistency) and Azure Cosmos DB (eventual consistency by default). This cross-system transactional consistency is a classic distributed transaction coordination problem, often addressed via patterns like the two-phase commit (2PC) or the saga pattern, but not natively supported between these two services without custom orchestration.

Exam trap

The trap here is that candidates confuse ACID compliance (which is a property of a single database) with the ability to maintain atomicity across multiple independent data stores, leading them to select Option A instead of recognizing the need for distributed transaction coordination.

How to eliminate wrong answers

Option A is wrong because ACID compliance applies to a single database system (like Azure SQL Database) and does not extend to coordinating transactions across heterogeneous data stores. Option B is wrong because idempotency ensures that repeated operations produce the same result, but it does not solve the problem of atomically removing audit logs across two systems when a rollback occurs. Option D is wrong because schema flexibility (e.g., schema-agnostic design in Cosmos DB) is unrelated to transactional consistency or cross-store coordination.

Full explanation →

894

MCQmedium

A travel booking application stores user itineraries in Azure Cosmos DB using the NoSQL API. Each itinerary document contains: UserID (unique to user), ItineraryID, Destination, BookingDate, and a nested array of Activities. The most common query is: 'Retrieve all itineraries for a specific UserID sorted by BookingDate descending.' To minimize Request Unit (RU) consumption, which partition key should be chosen?

A.ItineraryID

B.UserID

C.Destination

D.BookingDate

AnswerB

UserID is the most common filter. All documents for one user are stored on the same partition, making the query single-partition and highly efficient.

Why this answer

UserID is the correct partition key because the most common query filters on UserID, ensuring that all itineraries for a specific user are stored in the same physical partition. This allows the query to target a single partition, minimizing cross-partition queries and reducing Request Unit (RU) consumption. A partition key that aligns with the primary query filter is essential for optimal performance and cost efficiency in Azure Cosmos DB.

Exam trap

The trap here is that candidates often choose a unique identifier like ItineraryID as the partition key, thinking it ensures even distribution, but they overlook that the query pattern (filtering by UserID) requires the partition key to match the filter to avoid expensive cross-partition queries.

How to eliminate wrong answers

Option A (ItineraryID) is wrong because ItineraryID is unique per document, leading to a high-cardinality partition key that distributes each itinerary across different partitions; queries for a specific UserID would then require a fan-out to all partitions, increasing RU consumption. Option C (Destination) is wrong because it does not directly align with the query filter on UserID, and multiple users may share the same destination, causing hot partitions and inefficient cross-partition queries. Option D (BookingDate) is wrong because it is a time-based attribute that can create hot partitions (e.g., all bookings on the same date) and does not support the primary query pattern of filtering by UserID, forcing cross-partition scans.

Full explanation →

895

MCQmedium

A company runs an e-commerce application backed by an on-premises SQL Server database. They plan to migrate to Azure SQL Database and require automatic failover across two Azure regions for disaster recovery. The application must continue to connect using the same connection string after a failover, with no code changes. Which feature should they implement?

A.Active Geo-Replication

B.Elastic pools

C.Failover groups

D.SQL Server on Azure Virtual Machine with Always On Availability Groups

AnswerC

Failover groups enable automatic asynchronous replication and automatic failover across regions. The application connects to a listener endpoint that remains unchanged after failover, requiring no code changes.

Why this answer

Failover groups (Option C) enable automatic, geo-redundant failover across two Azure regions while providing a single read-write listener endpoint that remains unchanged after failover. This ensures the application can continue using the same connection string without any code modifications, meeting the stated requirement for disaster recovery with zero application changes.

Exam trap

The trap here is that candidates often confuse Active Geo-Replication with Failover groups, not realizing that only Failover groups provide a single, unchanged connection string endpoint for automatic failover, while Active Geo-Replication requires manual connection string updates.

How to eliminate wrong answers

Option A is wrong because Active Geo-Replication provides asynchronous replication but does not offer a single, unchanged connection string endpoint; the application would need to manually update the connection string to point to the new primary after failover. Option B is wrong because Elastic pools are a resource management feature for scaling multiple databases within a single server, not a disaster recovery or failover mechanism. Option D is wrong because SQL Server on Azure Virtual Machine with Always On Availability Groups requires managing a listener endpoint and typically involves more complex configuration; it does not provide the same seamless, automatic connection string continuity as Failover groups in Azure SQL Database.

Full explanation →

896

MCQmedium

A company plans to migrate a 500 GB SQL Server database from on-premises to Azure SQL Database. They require minimal downtime during the migration. Which approach should they use?

A.Export a BACPAC file and import to Azure SQL Database

B.Use Azure Database Migration Service with online migration mode

C.Use Azure Site Recovery to replicate the database

D.Create a full database backup and restore to Azure SQL Database

AnswerB

Correct. Online migration via DMS synchronizes changes in near real-time, allowing the source to stay online until cutover, minimizing downtime.

Why this answer

Azure Database Migration Service (DMS) with online migration mode is the correct approach because it supports minimal-downtime migrations by continuously replicating ongoing changes from the source SQL Server to Azure SQL Database using the transactional replication technology. This allows the source database to remain operational during the migration, and only a brief cutover is needed at the end to switch applications to the target.

Exam trap

The trap here is that candidates often confuse offline backup/restore or BACPAC methods (which are simpler but cause downtime) with the online migration capability of DMS, assuming any Azure tool can achieve minimal downtime without understanding the underlying replication mechanism.

How to eliminate wrong answers

Option A is wrong because exporting a BACPAC file is an offline process that requires the database to be idle or in read-only mode, causing significant downtime for a 500 GB database. Option C is wrong because Azure Site Recovery is designed for disaster recovery of entire VMs or workloads, not for granular database migration with schema and data consistency, and it cannot perform a live migration of a SQL Server database to Azure SQL Database. Option D is wrong because restoring a full database backup to Azure SQL Database is an offline operation that requires the source database to be taken offline or in read-only mode during the backup and restore, resulting in substantial downtime.

Full explanation →

897

MCQmedium

A company uses Azure Data Factory to copy data from an on-premises SQL Server to Azure Data Lake Storage Gen2. The transfer must be accelerated using WAN optimization. Which Data Factory feature should the company enable?

A.Use the Copy Activity with automatic partitioning.

B.Deploy a self-hosted integration runtime on-premises.

C.Enable parallel copy within the Copy Activity.

D.Use staging with Azure Blob Storage and PolyBase.

AnswerD

Staging allows data to be transferred efficiently using PolyBase or COPY statement, which can optimize transfer over WAN.

Why this answer

Option D is correct because staging with Azure Blob Storage and PolyBase enables WAN optimization by using Azure Blob as an intermediate staging area, allowing Data Factory to leverage PolyBase's high-throughput bulk loading into Azure Data Lake Storage Gen2. This approach reduces the load on the on-premises network by transferring data first to Blob Storage over the WAN, then using PolyBase's parallel, optimized transfer within Azure, which is designed for large-scale data movement and can bypass typical WAN bottlenecks.

Exam trap

The trap here is that candidates often confuse general performance features like parallel copy (Option C) or connectivity solutions like self-hosted IR (Option B) with the specific WAN optimization technique of staging with PolyBase, which is designed to offload heavy data movement to Azure's internal network.

How to eliminate wrong answers

Option A is wrong because automatic partitioning in Copy Activity is used to split data into chunks for parallel processing, not for WAN optimization or accelerating transfers over a wide area network. Option B is wrong because deploying a self-hosted integration runtime on-premises is necessary for connectivity to on-premises SQL Server but does not inherently provide WAN optimization; it handles data movement but does not accelerate it. Option C is wrong because enabling parallel copy within the Copy Activity increases throughput by using multiple threads, but it does not specifically address WAN optimization or reduce latency over a wide area network; it is a general performance tuning feature.

Full explanation →

898

Multi-Selecteasy

A manufacturing company uses IoT sensors to monitor machine temperature. The data is analyzed immediately to trigger alerts if temperature exceeds a threshold. The same data is also stored and later analyzed to identify long-term trends. Which two terms best describe these data processing approaches?

Select 2 answers

A.Real-time processing for alerts, batch processing for trend analysis

B.Stream processing for alerts, transactional processing for trend analysis

C.Batch processing for alerts, real-time processing for trend analysis

D.OLAP for alerts, OLTP for trend analysis

AnswersA, B

Correct. Alerts require immediate action (real-time), while trend analysis typically uses accumulated data processed periodically (batch).

Why this answer

Option A is correct because the scenario describes two distinct processing requirements: immediate alerting on temperature thresholds requires real-time (or stream) processing to minimize latency, while long-term trend analysis can be performed on stored data using batch processing, which is efficient for large historical datasets. Real-time processing handles data as it arrives with low latency, and batch processes data in bulk at scheduled intervals.

Exam trap

The trap here is that candidates confuse 'real-time' with 'batch' based on the word 'immediately' but fail to recognize that trend analysis is inherently a batch workload, leading them to reverse the pairings in options C or D.

Full explanation →

899

MCQeasy

A retail company stores customer data in a relational database table with columns for CustomerID, Name, and Email. Product reviews are stored as JSON documents where each document contains review text and a rating. Product images are stored as binary files in Azure Blob Storage. Which of the following correctly categorizes these data types in order: relational table, JSON documents, binary images?

A.Structured, semi-structured, unstructured

B.Semi-structured, structured, unstructured

C.Unstructured, semi-structured, structured

D.Structured, unstructured, semi-structured

AnswerA

Correct. Relational tables are structured (fixed schema), JSON is semi-structured (flexible schema), and binary images are unstructured (no schema).

Why this answer

A is correct because relational tables enforce a fixed schema (columns with defined data types), making them structured data. JSON documents have a flexible schema (key-value pairs) but still contain metadata, classifying them as semi-structured. Binary image files in Azure Blob Storage have no inherent structure or schema, making them unstructured data.

This matches the order: structured, semi-structured, unstructured.

Exam trap

The trap here is that candidates often confuse semi-structured data (like JSON) with unstructured data because JSON appears 'flexible,' but it still has a defined key-value structure, whereas truly unstructured data (binary blobs) has no schema at all.

How to eliminate wrong answers

Option B is wrong because it incorrectly classifies the relational table as semi-structured and the JSON documents as structured; relational tables are strictly structured (fixed schema) while JSON documents are semi-structured (self-describing, schema-on-read). Option C is wrong because it reverses the entire order, labeling binary images as structured and the relational table as unstructured; binary images have no schema or metadata, making them unstructured, not structured. Option D is wrong because it places JSON documents as unstructured and binary images as semi-structured; JSON documents have a defined structure (key-value pairs) and are semi-structured, while binary images lack any structure and are unstructured.

Full explanation →

900

Multi-Selectmedium

Which TWO of the following are characteristics of structured data? (Choose two.)

Select 2 answers

A.No predefined schema

B.Stored in rows and columns

C.Fixed schema

D.Key-value pairs

E.Schema-on-read

AnswersB, C

Structured data is typically tabular.

Why this answer

Structured data is organized in a tabular format with rows and columns, which is the defining characteristic of relational databases like SQL Server or Azure SQL Database. This structure enforces a fixed schema, meaning the data types and relationships are defined before data is entered, ensuring consistency and enabling efficient querying via SQL.

Exam trap

Microsoft often tests the distinction between 'fixed schema' (structured) and 'schema-on-read' (semi-structured), and candidates mistakenly associate key-value pairs with structured data instead of NoSQL.

Full explanation →

Microsoft Azure Data Fundamentals DP-900 (DP-900) — Questions 826–900