Microsoft Azure Data Fundamentals DP-900 DP-900 Questions 751–825 | Page 11/14

751

MCQmedium

A company plans to migrate a 2-TB on-premises SQL Server database to Azure. The database uses SQL Server Agent jobs for scheduled maintenance and requires automatic failover across Azure regions. The company wants a fully managed service with minimal application changes. Which Azure SQL service should they choose?

A.Azure SQL Database

B.Azure SQL Managed Instance

C.SQL Server on Azure Virtual Machines

D.Azure Synapse Analytics

AnswerB

SQL Managed Instance supports SQL Server Agent, can scale up to 16 TB, and provides auto-failover groups for cross-region high availability.

Why this answer

Azure SQL Managed Instance is correct because it provides near 100% compatibility with SQL Server, including support for SQL Server Agent jobs, and offers automatic failover across Azure regions via failover groups. It is a fully managed service that requires minimal application changes, unlike Azure SQL Database which lacks SQL Server Agent and has limited cross-region failover capabilities.

Exam trap

The trap here is that candidates often choose Azure SQL Database because it is the most well-known fully managed service, overlooking the specific requirement for SQL Server Agent jobs and automatic cross-region failover, which Managed Instance uniquely supports.

How to eliminate wrong answers

Option A is wrong because Azure SQL Database does not support SQL Server Agent jobs and has limited compatibility with on-premises SQL Server features, requiring application changes. Option C is wrong because SQL Server on Azure Virtual Machines is not a fully managed service; it requires manual management of the OS and SQL Server, including high availability setup. Option D is wrong because Azure Synapse Analytics is designed for large-scale data warehousing and analytics, not for transactional OLTP workloads with SQL Server Agent jobs and automatic failover.

Full explanation →

752

Drag & Dropmedium

Drag and drop the steps to load data into Azure Synapse Analytics using PolyBase in the correct order.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps

Order

Why this order

PolyBase loading involves defining external data source, file format, external table, then using CTAS to move data into the warehouse.

Full explanation →

753

MCQmedium

A company runs a web application on Azure SQL Database that experiences unpredictable spikes in traffic. They want to automatically adjust compute resources based on demand without manual intervention and without over-provisioning. Which Azure SQL Database feature should they use?

A.Serverless compute tier

B.Active geo-replication

C.Hyperscale service tier

D.Read scale-out

AnswerA

Correct - Serverless automatically scales compute on-demand and pauses during inactivity, matching the need for handling unpredictable spikes without manual intervention.

Why this answer

The Serverless compute tier for Azure SQL Database automatically scales compute resources based on workload demand, pausing databases during idle periods and resuming them when traffic spikes occur. This eliminates the need for manual intervention and prevents over-provisioning by charging only for the compute used per second, making it ideal for unpredictable traffic patterns.

Exam trap

The trap here is that candidates confuse the Hyperscale service tier's storage scalability with compute auto-scaling, but Hyperscale requires manual vCore adjustment and does not support auto-pause, whereas Serverless is specifically designed for unpredictable, intermittent workloads with automatic compute scaling.

How to eliminate wrong answers

Option B (Active geo-replication) is wrong because it focuses on disaster recovery and read-scale availability by replicating data to a secondary region, not on dynamic compute scaling based on demand. Option C (Hyperscale service tier) is wrong because it provides high scalability for storage and fast backup/restore but requires manual scaling of compute resources (vCores) and does not auto-pause or auto-scale compute like Serverless. Option D (Read scale-out) is wrong because it offloads read-only queries to a secondary replica for performance, but it does not automatically adjust compute resources or handle unpredictable traffic spikes without manual configuration.

Full explanation →

754

MCQhard

A data analyst needs to run ad-hoc SQL queries on large datasets stored as Parquet files in Azure Data Lake Storage Gen2. The queries are infrequent and the data volume varies. The analyst wants to pay only for the amount of data processed per query and does not want to manage any infrastructure. They also need to create views in T-SQL to simplify queries for Power BI reports. Which Azure service should they use?

A.Azure Synapse Serverless SQL pool

B.Azure Data Lake Analytics

C.Azure HDInsight

D.Azure Databricks

AnswerA

Correct. It provides serverless T-SQL querying on data lakes with pay-per-query pricing, supports views, and works with Power BI.

Why this answer

Azure Synapse Serverless SQL pool is the correct choice because it allows running ad-hoc T-SQL queries directly on Parquet files in Azure Data Lake Storage Gen2 without provisioning any infrastructure. It uses a pay-per-query model, charging only for the amount of data processed, and supports creating T-SQL views that can be used directly by Power BI for simplified reporting.

Exam trap

The trap here is that candidates may confuse Azure Data Lake Analytics (which also processes data in ADLS Gen2) with a serverless SQL option, but it does not support T-SQL or views, making it unsuitable for the analyst's requirement to create T-SQL views for Power BI.

How to eliminate wrong answers

Option B (Azure Data Lake Analytics) is wrong because it uses U-SQL, not T-SQL, and requires managing a job submission model rather than providing a serverless SQL endpoint for ad-hoc queries. Option C (Azure HDInsight) is wrong because it requires provisioning and managing a Hadoop cluster (infrastructure), and does not offer a serverless pay-per-query model for SQL queries on Parquet files. Option D (Azure Databricks) is wrong because it is primarily a Spark-based analytics platform that requires cluster management and uses Spark SQL or Python, not native T-SQL, and does not support creating T-SQL views for Power BI without additional configuration.

Full explanation →

755

MCQhard

You are designing a data solution for a healthcare application that requires ACID transactions for patient records and needs to run complex analytics queries. Which combination of Azure services should you recommend?

A.Azure Cosmos DB for transactions, Power BI for analytics

B.Azure Database for MySQL for transactions, Azure Analysis Services for analytics

C.Azure Blob Storage for transactions, Azure Machine Learning for analytics

D.Azure SQL Database for transactions, Azure Synapse Analytics for analytics

AnswerD

Azure SQL Database provides ACID transactions, and Azure Synapse Analytics can run complex analytics queries.

Why this answer

Azure SQL Database provides full ACID (Atomicity, Consistency, Isolation, Durability) transaction support, which is essential for healthcare patient records where data integrity is critical. Azure Synapse Analytics is a cloud-based analytics service that can run complex queries against large datasets, including those from Azure SQL Database, using its massively parallel processing (MPP) architecture. This combination allows transactional and analytical workloads to coexist without compromising performance or consistency.

Exam trap

The trap here is that candidates often confuse 'analytics' with visualization tools like Power BI or OLAP cubes, failing to recognize that complex analytics queries require a dedicated MPP engine like Synapse, not just a reporting layer.

How to eliminate wrong answers

Option A is wrong because Azure Cosmos DB is a NoSQL database that does not guarantee full ACID transactions across multiple documents (it offers single-document atomicity only), and Power BI is a visualization tool, not an analytics engine capable of running complex queries directly. Option B is wrong because Azure Analysis Services is an OLAP engine for pre-aggregated data, not designed for running complex ad-hoc analytics queries on raw transactional data; it requires a separate data warehouse or model. Option C is wrong because Azure Blob Storage is an object store with no transaction support (it lacks ACID properties), and Azure Machine Learning is for building predictive models, not for running complex analytics queries on transactional data.

Full explanation →

756

MCQhard

A company has a Power BI dashboard that refreshes daily from an Azure SQL Database. During refresh, the database experiences high CPU usage that impacts transactional applications. They need to minimize impact while keeping the dashboard up-to-date. What should they do?

A.Disable the Query Store in the database

B.Create a read-only user for Power BI

C.Set Power BI to use a lower-capacity license

D.Configure Azure SQL Database with a readable secondary replica

AnswerD

Readable secondaries allow read workloads without impacting the primary.

Why this answer

Configuring Azure SQL Database with a readable secondary replica offloads read-only workloads, such as Power BI refreshes, to the secondary replica. This eliminates CPU contention on the primary replica, protecting transactional applications from performance degradation while keeping the dashboard up-to-date with near-real-time data.

Exam trap

The trap here is that candidates often confuse user permissions (read-only user) with workload isolation, not realizing that read-only replicas are required to physically separate read and write workloads at the infrastructure level.

How to eliminate wrong answers

Option A is wrong because disabling the Query Store does not reduce CPU usage from Power BI refreshes; it only removes query performance insights and may even degrade plan stability. Option B is wrong because creating a read-only user does not change the physical execution of queries—Power BI will still run on the primary replica and consume CPU. Option C is wrong because a lower-capacity Power BI license (e.g., changing from Premium to Pro) affects dataset size limits and refresh frequency, not the database CPU load during refresh.

Full explanation →

757

MCQeasy

A retail company receives real-time data from IoT sensors in its warehouses. Each sensor sends a JSON payload containing a device ID, timestamp, and temperature reading. A data engineer needs to classify this data for storage planning. Which data type best describes the JSON payload?

A.Structured data

B.Semi-structured data

C.Unstructured data

D.Relational data

AnswerB

JSON is a classic example of semi-structured data. It uses key-value pairs and can have nested structures, but it does not enforce a rigid schema. This flexibility is ideal for IoT payloads where fields may vary over time.

Why this answer

The JSON payload is considered semi-structured data because it has organizational properties (key-value pairs, nested structure) that provide a schema, but it does not conform to a rigid tabular schema like a relational database. JSON allows flexible fields and varying data types, which is characteristic of semi-structured data.

Exam trap

The trap here is that candidates confuse 'structured' with 'has a format' — JSON has a clear structure, but it is not rigidly tabular, so it falls under semi-structured, not structured data.

How to eliminate wrong answers

Option A is wrong because structured data requires a strict, predefined schema (e.g., relational tables with fixed columns and data types), whereas JSON allows optional fields and nested objects. Option C is wrong because unstructured data lacks any predefined structure or metadata (e.g., raw text, images, video), while JSON has a clear key-value format. Option D is wrong because relational data specifically refers to data organized into tables with rows and columns, enforced by constraints and relationships, which JSON does not inherently provide.

Full explanation →

758

MCQhard

Refer to the exhibit. You are analyzing a Kusto query in Azure Data Explorer. The query is intended to return the top 5 event types that caused the most property damage in Florida. However, the query returns an error. What is the most likely cause?

A.The where clause must specify a numeric value.

B.The summarize operator cannot use sum aggregation.

C.The table or column names are incorrect.

D.The top operator requires an order by clause.

AnswerC

The error is likely due to missing table or column; the query syntax is otherwise valid.

Why this answer

The query returns an error because the table or column names referenced in the query do not match the actual schema in Azure Data Explorer. In Kusto Query Language (KQL), if a table name like 'Events' or a column like 'PropertyDamage' does not exist in the database, the query will fail with a 'semantic error' indicating an unknown table or column. This is the most likely cause given that the query logic (where, summarize, top) is syntactically correct.

Exam trap

The trap here is that candidates may assume the error is due to a syntax or operator misuse (like top needing order by or sum being invalid), when in reality the error stems from a simple schema mismatch—a common oversight when reading queries without verifying the underlying data model.

How to eliminate wrong answers

Option A is wrong because the where clause in KQL can filter on string columns using equality or pattern matching (e.g., 'State == "Florida"'), not only numeric values. Option B is wrong because the summarize operator fully supports the sum() aggregation function for numeric columns, which is a standard and valid operation. Option D is wrong because the top operator in KQL does not require an explicit order by clause; it internally sorts by the specified column(s) in descending order and returns the top N rows.

Full explanation →

759

MCQeasy

A social media application stores user sessions as JSON documents. Each session document has fields like sessionId, userId, startTime, endTime, and a list of pageviews. The application needs to quickly retrieve a session by its sessionId and also run queries like 'find all sessions for a user in the last 24 hours' using SQL-like syntax. The data has no fixed schema; different sessions may include additional optional fields like 'deviceType' or 'promotionCode'. Which Azure data store should the company use?

A.Azure Cosmos DB with SQL API

B.Azure Table Storage

C.Azure SQL Database

D.Azure Blob Storage

AnswerA

Correct. Azure Cosmos DB with SQL API natively stores JSON documents, supports schema-agnostic data, provides SQL-like querying, and offers low-latency point reads by sessionId.

Why this answer

Azure Cosmos DB with SQL API is the correct choice because it natively supports storing JSON documents with flexible schemas, allows fast point reads by sessionId using a unique identifier, and enables SQL-like queries (e.g., filtering by userId and startTime) with automatic indexing. Its schema-agnostic design handles optional fields like deviceType or promotionCode without requiring schema changes, and it provides low-latency reads essential for real-time session retrieval.

Exam trap

The trap here is that candidates often confuse Azure Table Storage's key-value simplicity with JSON document support, but Table Storage does not provide SQL-like querying or native JSON handling, making Cosmos DB the only option that combines flexible schema, SQL syntax, and fast point reads.

How to eliminate wrong answers

Option B (Azure Table Storage) is wrong because it is a key-value store that does not support SQL-like query syntax or native JSON document storage; it requires a fixed schema with partition and row keys, making it unsuitable for querying by arbitrary fields like userId and startTime without expensive scans. Option C (Azure SQL Database) is wrong because it enforces a fixed relational schema, requiring predefined columns for all fields, which contradicts the requirement for a flexible schema with optional fields; adding new fields like deviceType would require ALTER TABLE statements. Option D (Azure Blob Storage) is wrong because it is an object store for unstructured binary or text data, not designed for interactive queries or indexing; retrieving a specific session by sessionId would require scanning all blobs or maintaining an external index, and it does not support SQL-like queries.

Full explanation →

760

MCQmedium

A company has an on-premises SQL Server 2016 database that uses SQL Server Agent jobs and cross-database queries across multiple databases. They want to migrate to Azure with minimal application changes, requiring native VNet integration and near 100% compatibility with the on-premises SQL Server instance. Which Azure deployment option should they choose?

A.Azure SQL Database single database

B.Azure SQL Database elastic pool

C.Azure SQL Managed Instance

D.SQL Server on Azure Virtual Machines

AnswerC

Correct. Azure SQL Managed Instance is designed for lift-and-shift migrations, offering high compatibility, SQL Server Agent, cross-database queries, and native VNet integration, all while being a platform-as-a-service (PaaS) offering.

Why this answer

Azure SQL Managed Instance is correct because it provides near 100% compatibility with on-premises SQL Server, including support for SQL Server Agent jobs and cross-database queries, while offering native VNet integration. This minimizes application changes during migration, unlike other Azure SQL options that lack these features.

Exam trap

The trap here is that candidates often choose SQL Server on Azure Virtual Machines (Option D) thinking it offers the highest compatibility, but they overlook that Azure SQL Managed Instance provides the same compatibility with less management overhead and native VNet integration, making it the optimal choice for minimal application changes.

How to eliminate wrong answers

Option A is wrong because Azure SQL Database single database does not support SQL Server Agent jobs or cross-database queries across multiple databases, requiring significant application rewrites. Option B is wrong because Azure SQL Database elastic pool also lacks SQL Server Agent and cross-database query support, and is designed for resource sharing among single databases, not for compatibility with on-premises features. Option D is wrong because SQL Server on Azure Virtual Machines requires manual management of VNet integration and does not provide the same level of managed service or near 100% compatibility without additional configuration, though it can support these features if properly set up.

Full explanation →

761

MCQmedium

A social media application stores user posts in Azure Cosmos DB using the NoSQL API. Each document includes: PostID (unique), UserID, Timestamp, Content. The most common query is: 'Get all posts for a specific UserID, sorted by Timestamp descending.' Which partition key should be chosen to distribute load evenly across physical partitions while also supporting this query efficiently?

A.PostID

B.UserID

C.Timestamp

D.Content

AnswerB

UserID groups all posts for a user into one logical partition, making the query efficient. With many users, the load is balanced across physical partitions, avoiding hot spots.

Why this answer

UserID is the correct partition key because it evenly distributes write operations across physical partitions (each user has a unique ID) and directly supports the most common query: filtering by UserID. With UserID as the partition key, the query 'Get all posts for a specific UserID, sorted by Timestamp descending' becomes a single-partition query (using the partition key in the WHERE clause), which is efficient and avoids cross-partition fan-out. This design also allows Cosmos DB to use the Timestamp field as a sort key within each logical partition, enabling efficient sorting without additional indexing overhead.

Exam trap

The trap here is that candidates often choose a unique identifier like PostID (Option A) thinking it guarantees even distribution, but they overlook that the partition key must also match the most frequent query filter to avoid cross-partition queries and high RU costs.

How to eliminate wrong answers

Option A is wrong because PostID is unique per document, which would create a separate logical partition for each post, leading to an extremely high number of small partitions and poor query performance for the common query (which filters by UserID, not PostID). Option C is wrong because Timestamp is a high-cardinality, monotonically increasing value; using it as a partition key would cause all new posts to land on a single hot partition (the latest timestamp), creating a throughput bottleneck and uneven load distribution. Option D is wrong because Content is a large, variable-length string with no guarantee of even distribution; it would result in unpredictable partition sizes and cannot efficiently support the required filter on UserID.

Full explanation →

762

MCQmedium

You are a data engineer for a financial services company. The company uses Azure Synapse Analytics dedicated SQL pool for its data warehouse. They have a fact table named Transactions that contains 2 billion rows. The table is hash-distributed on the AccountID column. Users run reports that aggregate transaction amounts by date and account type. The reports are slow. Upon investigation, you find that the distribution is highly skewed because a few accounts have millions of transactions. You need to improve query performance without redesigning the entire schema. Which action should you take?

A.Change the distribution key to a column with more unique values, such as TransactionID

B.Change the distribution to round-robin

C.Create a clustered columnstore index on the table

D.Replicate the Transactions table to all distributions

AnswerA

A column with high cardinality reduces skew and improves parallel processing.

Why this answer

The correct answer is A because changing the distribution key to TransactionID, which has far more unique values than AccountID, will eliminate the data skew that is causing performance degradation. In a hash-distributed table, a skewed distribution key leads to some distributions holding a disproportionate amount of data, causing parallel query execution to be bottlenecked by the largest distribution. By using a highly unique column like TransactionID, the data will be evenly distributed across all 60 distributions, enabling balanced parallelism and faster aggregation queries.

Exam trap

The trap here is that candidates often assume that a clustered columnstore index (Option C) is the universal fix for slow queries, but they fail to recognize that data skew in a hash-distributed table is a distribution-level problem that columnstore indexes cannot solve.

How to eliminate wrong answers

Option B is wrong because changing to round-robin distribution would distribute rows evenly but would eliminate data collocation benefits, causing all queries that filter or join on AccountID to require data movement across distributions, which would severely degrade performance. Option C is wrong because the table already has a clustered columnstore index (the default for dedicated SQL pool tables), and while columnstore indexes improve compression and scan performance, they do not address the root cause of data skew in a hash-distributed table. Option D is wrong because replicating a 2-billion-row fact table to all distributions would consume excessive storage and cause significant overhead during data loading and maintenance, and it is not a supported or practical action for large fact tables in Azure Synapse Analytics.

Full explanation →

763

Multi-Selecthard

Which THREE components are typically part of a modern data warehouse architecture on Azure? (Choose three.)

Select 3 answers

A.Azure Data Factory

B.Azure Synapse Analytics

C.Azure Stream Analytics

D.Azure Data Lake Storage Gen2

E.Azure Cosmos DB

AnswersA, B, D

Orchestrates data ingestion and transformation pipelines.

Why this answer

Azure Data Factory is correct because it serves as the cloud-based ETL (Extract, Transform, Load) service that orchestrates and automates data movement and transformation across various sources and destinations. In a modern data warehouse architecture, Data Factory is used to ingest raw data from on-premises or cloud sources, transform it using mapping data flows or external compute (e.g., Azure Databricks), and load it into the data warehouse or data lake for analytics. It provides a code-free visual interface or SDK-based control for scheduling and monitoring pipelines, making it essential for the ingestion and preparation layer.

Exam trap

The trap here is that candidates often confuse Azure Stream Analytics (a real-time processing service) with a batch data warehouse component, or mistakenly think Azure Cosmos DB can serve as an analytical data store due to its multi-model capabilities, but it lacks the columnar storage and MPP architecture required for modern data warehousing.

Full explanation →

764

MCQeasy

You are storing log files from multiple applications in Azure Blob Storage. Each log file is a text file with timestamp data. You need to query logs for a specific date range using SQL. Which Azure service can query these files directly?

A.Azure Stream Analytics

B.Azure Data Lake Storage

C.Azure Synapse Serverless SQL

D.Azure Analysis Services

AnswerC

Serverless SQL can query files in Blob Storage using T-SQL.

Why this answer

Option B is correct because Azure Synapse Serverless SQL can query text files in Blob Storage using OPENROWSET. Option A is wrong because Azure Stream Analytics is for real-time streaming, not batch query. Option C is wrong because Azure Analysis Services is for semantic models.

Option D is wrong because Azure Data Lake Storage is storage, not query.

Full explanation →

765

MCQhard

A company's application uses Microsoft SQL Server with multiple databases that need to run complex queries joining tables across databases. They are migrating to Azure and need a fully managed relational database service with high availability, automated backups, and minimal management overhead. They do not need a separate SQL Server installation and want to avoid managing VMs. Which Azure deployment option should they choose?

A.Azure SQL Database single database

B.Azure SQL Managed Instance

C.SQL Server on Azure Virtual Machines

D.Azure SQL Database elastic pool

AnswerB

Azure SQL Managed Instance supports cross-database queries using linked servers, provides built-in high availability, automatic backups, and is fully managed. It offers the broadest SQL Server compatibility without requiring VM management.

Why this answer

Azure SQL Managed Instance is the correct choice because it provides near-100% compatibility with on-premises SQL Server, including support for cross-database queries and linked servers, while being a fully managed platform-as-a-service (PaaS) offering. It eliminates the need to manage VMs or a separate SQL Server installation, and it includes built-in high availability (99.99% SLA) and automated backups, meeting all stated requirements.

Exam trap

The trap here is that candidates often confuse Azure SQL Database elastic pool with Managed Instance, assuming elastic pools support cross-database queries, but elastic pools only manage resource allocation for single databases and do not provide the instance-level features needed for cross-database joins.

How to eliminate wrong answers

Option A is wrong because Azure SQL Database single database does not support cross-database queries or linked servers; it is designed for isolated databases and requires elastic query or external tools for cross-database joins, which adds complexity. Option C is wrong because SQL Server on Azure Virtual Machines is an infrastructure-as-a-service (IaaS) option that requires managing VMs, patching, and SQL Server installation, contradicting the need for minimal management overhead and a fully managed service. Option D is wrong because Azure SQL Database elastic pool is a resource-sharing model for multiple single databases within the same logical server, but it inherits the same cross-database query limitations as single databases and does not enable native cross-database joins.

Full explanation →

766

MCQmedium

A social media application stores user profiles as JSON documents. Each profile has standard fields like userId, name, and email, but also optional fields such as education and work history. The application needs to query profiles by userId with low latency and also run SQL-like queries to find all profiles with a specific work history value. Which Azure Cosmos DB API should they choose?

A.SQL (Core) API

B.MongoDB API

C.Gremlin (Graph) API

D.Table API

AnswerA

The SQL API supports SQL-like queries on JSON documents, ideal for flexible schema and queries by userId and optional fields.

Why this answer

The SQL (Core) API is the correct choice because it natively supports querying JSON documents with SQL-like syntax, enabling both low-latency point reads by userId and complex queries on nested fields like work history. It provides automatic indexing of all JSON properties, which ensures efficient execution of queries across optional fields without requiring schema management.

Exam trap

The trap here is that candidates often choose the MongoDB API because they associate JSON documents with MongoDB, but the question explicitly requires SQL-like queries, which is a native feature of the Core API and not MongoDB's query syntax.

How to eliminate wrong answers

Option B (MongoDB API) is wrong because while it supports JSON documents, its query language is based on MongoDB's query operators (e.g., $elemMatch) rather than standard SQL, and the requirement for SQL-like queries makes the Core API more appropriate. Option C (Gremlin (Graph) API) is wrong because it is designed for graph traversal queries on entities and relationships, not for document-level SQL queries on nested JSON fields. Option D (Table API) is wrong because it is optimized for key-value lookups with a flat schema and does not support querying nested JSON fields like work history without complex workarounds.

Full explanation →

767

Multi-Selectmedium

Which TWO Azure services can be used to perform data transformation in a data pipeline? (Choose two.)

Select 2 answers

A.Azure Blob Storage

B.Azure SQL Database

C.Azure Databricks

D.Azure Event Hubs

E.Azure Data Factory

AnswersC, E

Databricks provides Spark-based transformation.

Why this answer

Azure Databricks is correct because it provides an Apache Spark-based analytics platform that can perform complex data transformations, such as ETL (Extract, Transform, Load) operations, using notebooks and clusters. It allows you to write code in Python, Scala, or SQL to transform data at scale, making it a core compute service for data transformation in a pipeline.

Exam trap

The trap here is that candidates often confuse storage or ingestion services (like Blob Storage or Event Hubs) with compute services that actually execute transformation logic, leading them to select options that only move or store data.

Full explanation →

768

MCQmedium

A marketing team wants to analyze social media sentiment in near real-time. They will use Azure Event Hubs to capture tweets and need to aggregate sentiment scores over 5-minute windows. The aggregated results must be stored in Azure Blob Storage for later analysis. Which Azure service should they use to perform the stream processing?

A.Azure Stream Analytics

B.Azure Data Factory

C.Azure Databricks

D.Azure Synapse Analytics

AnswerA

Azure Stream Analytics is specifically designed for real-time stream processing and can easily ingest from Event Hubs, apply windowed aggregations, and output to Blob Storage.

Why this answer

Azure Stream Analytics is the correct choice because it is a real-time stream processing engine designed to ingest data from sources like Azure Event Hubs, apply temporal aggregations (e.g., 5-minute tumbling windows), and output results directly to Azure Blob Storage. It provides built-in support for windowed functions and exactly-once delivery semantics, making it ideal for near-real-time sentiment analysis without requiring custom code.

Exam trap

The trap here is that candidates often confuse Azure Data Factory or Azure Synapse Analytics as stream processing tools, but Data Factory is batch-only and Synapse is primarily a data warehouse, not a real-time stream processor.

How to eliminate wrong answers

Option B (Azure Data Factory) is wrong because it is a cloud-based ETL and data integration service for batch-oriented data movement and orchestration, not designed for real-time stream processing or windowed aggregations. Option C (Azure Databricks) is wrong because while it can process streaming data using Structured Streaming, it requires a Spark cluster to be provisioned and managed, and is overkill for simple windowed aggregations; it is not the simplest or most cost-effective choice for this specific near-real-time scenario. Option D (Azure Synapse Analytics) is wrong because it is a unified analytics platform focused on large-scale data warehousing and big data analytics, not a real-time stream processing service; it can query streaming data via linked services but does not natively perform windowed aggregations on live streams.

Full explanation →

769

MCQhard

Refer to the exhibit. You deploy an Azure Cosmos DB account using this ARM template. Users in East US report that reads are slower than writes. What is the most likely cause?

A.The consistency level is set to Session, which requires a round-trip to the write region for every read.

B.The East US region is configured as a read region, but the client is connecting to the write endpoint in West US.

C.The failover priority for East US is 1, which means it is not enabled for reads.

D.The database account offer type is Standard, which limits throughput.

AnswerB

If clients use the write endpoint, reads will be slower due to cross-region latency.

Why this answer

The exhibit shows that the write region (failoverPriority 0) is West US, and East US is a read region (failoverPriority 1). Users in East US are reading from the local region, but because the consistency level is Session, reads may not see the latest writes if the write is still propagating from West US. However, the question asks about slower reads than writes.

The most likely cause is that read requests are being served from a different region than the user's location due to Cosmos DB's routing policy. But since East US is a read region, reads should be local. The exhibit does not show multi-region writes.

The typical cause is that reads are not local; but in this case, East US is a read region, so reads should be local. Another possibility: the consistency level Session requires a session token, but that doesn't cause slowness. The exhibit does not specify multi-region writes.

However, a common mistake is that if the account has only one write region, reads from other regions might be served from the write region if not configured properly. But the exhibit shows two regions, so reads should be local. Perhaps the issue is that the client is not using the correct endpoint.

The most likely cause from the exhibit is that the account has only one write region (West US), and reads in East US are still going to West US due to client configuration. But the exhibit does not show client config. Another possibility: the default consistency level Session can cause slightly higher latency for reads that need to check the session token.

However, the best answer is that the read requests are being routed to the write region because the client is not configured to use the local endpoint. But since the question is diagnostic, we'll go with a common issue: the client is using the write endpoint for reads. However, the options need to reflect plausible causes.

Let's design options accordingly.

Full explanation →

770

MCQeasy

A company is migrating on-premises Hadoop HDFS data to Azure. They want to keep the same file system semantics for compatibility with existing analytics jobs. Which Azure storage solution should they use?

A.Azure Blob Storage

B.Azure SQL Database

C.Azure Data Lake Storage Gen2

D.Azure Cosmos DB

AnswerC

ADLS Gen2 supports a hierarchical namespace and is fully compatible with Hadoop analytics jobs.

Why this answer

Azure Data Lake Storage Gen2 (ADLS Gen2) provides Hadoop-compatible file system semantics (hierarchical namespace) and is built on Blob Storage. Azure Blob Storage does not have a hierarchical namespace by default. Azure Cosmos DB and Azure SQL are not file systems.

Full explanation →

771

MCQhard

A financial services company needs to build a data pipeline that ingests daily transaction files from multiple sources. The pipeline must perform data quality checks, transform data using complex business logic, and load it into Azure Synapse Analytics. The transformations involve conditional branching (e.g., if a transaction amount exceeds a threshold, apply additional validation). The company wants to minimize coding effort and prefers a visual, configuration-based approach. Which Azure service should they use as the primary orchestration and transformation engine?

A.Azure Data Factory with Data Flows

B.Azure Databricks with notebooks

C.Azure Stream Analytics

D.Azure Logic Apps

AnswerA

Correct. ADF Data Flows allow visual, code-free transformations with conditional logic, and ADF handles orchestration.

Why this answer

Azure Data Factory with Data Flows is the correct choice because it provides a visual, configuration-based interface for both orchestration and transformation, including support for complex business logic like conditional branching (e.g., via conditional split transformations). It natively integrates with Azure Synapse Analytics for loading transformed data, minimizing coding effort compared to code-heavy alternatives.

Exam trap

The trap here is that candidates may confuse Azure Data Factory with Azure Databricks, assuming Databricks is required for complex transformations, but Data Flows provide the same Spark power with a visual interface, meeting the 'minimize coding' requirement.

How to eliminate wrong answers

Option B is wrong because Azure Databricks with notebooks requires writing code (Python, Scala, or SQL) for transformations, which contradicts the requirement to minimize coding effort and prefer a visual approach. Option C is wrong because Azure Stream Analytics is designed for real-time streaming data processing, not for batch ingestion of daily transaction files with complex conditional transformations. Option D is wrong because Azure Logic Apps is a workflow automation service for integrating applications and services, not a data transformation engine capable of handling complex business logic or loading into Azure Synapse Analytics.

Full explanation →

772

MCQmedium

A data engineer needs to design a solution for a healthcare organization that must store patient records for 7 years to comply with regulatory requirements. The data will be accessed infrequently after the first year. Which Azure storage tier should be used for data older than one year?

A.Transaction-optimized tier

B.Cool tier

C.Archive tier

D.Hot tier

AnswerB

Cool tier is a low-cost storage tier for data that is infrequently accessed and stored for at least 30 days, ideal for data older than one year.

Why this answer

The Cool tier is designed for data that is accessed infrequently but must be available immediately when needed, with a minimum storage duration of 30 days. Since patient records older than one year are accessed rarely but may still require low-latency retrieval for compliance audits, Cool tier balances cost and accessibility. Hot tier is for frequent access, Archive tier has a retrieval latency of hours (not suitable for occasional access), and Transaction-optimized is not a standard Azure Blob Storage tier.

Exam trap

Microsoft often tests the misconception that Archive tier is the cheapest and therefore always the best for old data, but the trap here is ignoring the retrieval latency and access requirements—Archive is only appropriate if data can tolerate hours of delay for rehydration.

How to eliminate wrong answers

Option A is wrong because Transaction-optimized tier is not a valid Azure Blob Storage access tier; Azure offers Hot, Cool, Cold, and Archive tiers, and 'Transaction-optimized' is a misleading term that does not exist in the service. Option C is wrong because Archive tier has a retrieval time of up to 15 hours (rehydration) and is intended for data that is rarely accessed and can tolerate significant delay, making it unsuitable for records that may need occasional access within a reasonable time. Option D is wrong because Hot tier is optimized for frequent access patterns and higher storage costs, which would be wasteful for data accessed only infrequently after the first year.

Full explanation →

773

MCQmedium

A company is migrating a large on-premises SQL Server database to Azure. They require high availability with automatic failover and read-scale for reporting workloads. Which Azure service should they use?

A.Azure SQL Database Hyperscale tier

B.Azure Database for PostgreSQL with geo-redundant backup

C.SQL Server on Azure Virtual Machines with Always On availability groups

D.Azure SQL Database Business Critical tier with failover groups

AnswerD

Business Critical provides built-in high availability and failover groups enable automatic failover with readable secondaries.

Why this answer

Option B is correct because Azure SQL Database with Business Critical tier and failover group provides automatic failover and read-scale. Option A is wrong because Hyperscale is for very large databases but not primarily for high availability with read-scale. Option C is wrong because SQL Server on Azure VMs requires manual configuration for high availability.

Option D is wrong because Azure Database for PostgreSQL is not a SQL Server migration target.

Full explanation →

774

MCQhard

A company ingests streaming data from thousands of devices into Azure Event Hubs. They need to transform and aggregate the data in real time before storing it in Azure Data Lake Storage Gen2. Which Azure service should they use between Event Hubs and ADLS Gen2?

A.Azure Functions

B.Azure Databricks

C.Azure Stream Analytics

D.Azure Data Factory

AnswerC

Stream Analytics is purpose-built for real-time stream processing with SQL-like queries.

Why this answer

Azure Stream Analytics is purpose-built for real-time data processing and analytics, allowing you to define SQL-like queries to transform and aggregate streaming data from Event Hubs before outputting it directly to Azure Data Lake Storage Gen2. It provides exactly-once delivery semantics and low-latency processing, making it the ideal service for this ingestion-to-storage pipeline.

Exam trap

The trap here is that candidates often confuse Azure Stream Analytics with Azure Functions or Azure Databricks for real-time processing, but Stream Analytics is the only service that provides a fully managed, low-latency, SQL-based streaming pipeline without requiring custom code or cluster management.

How to eliminate wrong answers

Option A is wrong because Azure Functions is a serverless compute service for event-driven code execution, but it lacks native streaming aggregation capabilities and would require custom code to handle stateful operations like windowed aggregates, leading to increased complexity and potential data loss. Option B is wrong because Azure Databricks is a big data analytics platform that can process streaming data via Structured Streaming, but it is overkill for a simple transform-and-aggregate pipeline and introduces higher latency and operational overhead compared to a dedicated streaming service. Option D is wrong because Azure Data Factory is an ETL and orchestration service designed for batch data movement and transformation, not real-time streaming; it cannot process data as it arrives in Event Hubs with sub-second latency.

Full explanation →

775

MCQmedium

A company uses Azure Synapse Analytics dedicated SQL pool as its data warehouse. New data is loaded into the warehouse every few minutes. The company wants to visualize the data with near real-time updates in a dashboard that can be refreshed automatically. Which tool and connection mode should they use?

A.Power BI with DirectQuery mode

B.Azure Data Studio with visualizations

C.SQL Server Reporting Services (SSRS) with cached reports

D.Excel Power Pivot with imported data

AnswerA

Correct. DirectQuery sends queries directly to the database, reflecting updates as soon as data is available, enabling near real-time dashboards.

Why this answer

Power BI with DirectQuery mode is correct because it allows the dashboard to query the Azure Synapse dedicated SQL pool directly for each visual refresh, enabling near real-time updates without importing data. This mode avoids the latency of data import and supports automatic page refresh, which aligns with the requirement for data loaded every few minutes.

Exam trap

The trap here is that candidates often confuse DirectQuery with Import mode, assuming Import mode is always faster for dashboards, but Import mode cannot achieve near real-time updates without manual or scheduled refreshes, which fails the 'every few minutes' requirement.

How to eliminate wrong answers

Option B is wrong because Azure Data Studio is a database management and query tool, not a visualization or dashboard tool; it lacks native automatic refresh capabilities for near real-time dashboards. Option C is wrong because SQL Server Reporting Services (SSRS) with cached reports stores report snapshots, which introduces data staleness and cannot provide near real-time updates. Option D is wrong because Excel Power Pivot with imported data requires manual or scheduled data refresh, which cannot achieve sub-minute near real-time updates and is not designed for automatic dashboard refresh.

Full explanation →

776

MCQeasy

A company is evaluating Azure database services for two different workloads. Workload A processes high-volume, low-latency transactions such as order entry and payment processing, where each transaction updates a few rows. Workload B involves running complex aggregations on terabytes of historical sales data to generate monthly business intelligence reports. Which Azure service is best suited for each workload?

A.A. Workload A: Azure SQL Database; Workload B: Azure Cosmos DB

B.B. Workload A: Azure Cosmos DB; Workload B: Azure Synapse Analytics

C.C. Workload A: Azure Synapse Analytics; Workload B: Azure SQL Database

D.D. Workload A: Azure Cosmos DB; Workload B: Azure Cosmos DB

AnswerB

Cosmos DB provides low-latency transactions (OLTP) and Synapse Analytics is built for large-scale analytics (OLAP), matching the workloads correctly.

Why this answer

Workload A requires a low-latency, high-throughput transactional database capable of handling many small, row-level updates. Azure Cosmos DB is a NoSQL database designed for single-digit millisecond latency and horizontal scaling, making it ideal for order entry and payment processing. Workload B involves complex aggregations on terabytes of historical data, which is best handled by Azure Synapse Analytics, a distributed analytics service that uses massively parallel processing (MPP) to run large-scale queries efficiently.

Exam trap

The trap here is that candidates often confuse Azure SQL Database as the default for all transactional workloads, overlooking that Cosmos DB is specifically designed for ultra-low-latency, globally distributed transactions, and they may also assume Azure Synapse Analytics is only for data warehousing without recognizing its role in complex aggregations on historical data.

How to eliminate wrong answers

Option A is wrong because Azure SQL Database is optimized for relational transactions but does not provide the sub-10-millisecond latency and multi-region distribution that Cosmos DB offers for high-volume, low-latency workloads; also, Cosmos DB is not designed for complex aggregations on terabytes of historical data. Option C is wrong because Azure Synapse Analytics is a data warehouse for analytical workloads, not suitable for high-frequency, low-latency transactional updates, and Azure SQL Database lacks the MPP architecture needed for petabyte-scale aggregations. Option D is wrong because using Azure Cosmos DB for both workloads fails to address Workload B's need for complex aggregations on historical data, as Cosmos DB's query engine is optimized for point reads and simple queries, not large-scale analytical processing.

Full explanation →

777

Multi-Selecthard

Which THREE considerations are important when choosing between Azure SQL Database and Azure SQL Managed Instance?

Select 3 answers

A.Need for SQL Server Agent

B.Database size greater than 4 TB

C.Requirement for elastic pools

D.Need for automated backups

E.Lift-and-shift migration with minimal changes

AnswersA, B, E

SQL Managed Instance supports SQL Server Agent, while SQL Database uses elastic jobs.

Why this answer

Option A is correct because SQL Server Agent is fully supported in Azure SQL Managed Instance, enabling native job scheduling and automation, whereas Azure SQL Database does not include SQL Server Agent and requires alternative solutions like Elastic Jobs or Azure Automation. Option B is correct because Azure SQL Managed Instance supports databases up to 16 TB, while Azure SQL Database (single database) has a maximum size of 4 TB for the Hyperscale service tier, making Managed Instance necessary for databases exceeding 4 TB in non-Hyperscale tiers. Option E is correct because Azure SQL Managed Instance provides near-100% compatibility with on-premises SQL Server, allowing lift-and-shift migrations with minimal or no application changes, whereas Azure SQL Database often requires schema and code modifications due to differences in features like CLR, Service Broker, or cross-database queries.

Exam trap

The trap here is that candidates often assume automated backups are unique to one service, but both Azure SQL Database and Azure SQL Managed Instance include automated backups by default, so this is not a valid differentiator.

Full explanation →

778

MCQeasy

You are a business analyst at a manufacturing company. The company uses Azure SQL Database to store production data. You need to create a Power BI report that shows real-time machine efficiency. The report must refresh every 5 minutes to show current metrics. You have been granted read-only access to the database. The database is under heavy load from transactional applications, and you want to minimize additional impact. Which approach should you take to create the report?

A.Disable the Power BI Query Cache to reduce database load

B.Use Power BI Import mode with incremental refresh policy to load only new data every 5 minutes

C.Configure Power BI to use DirectQuery mode to ensure real-time data

D.Export the data to a CSV file and import it into Power BI daily

AnswerB

Import mode with incremental refresh reduces load by only querying changed data.

Why this answer

B is correct because Power BI Import mode with an incremental refresh policy allows you to load only new or changed data every 5 minutes, minimizing the load on the heavily used Azure SQL Database. This approach avoids repeated full table scans, reduces query impact, and still provides near-real-time metrics for the report.

Exam trap

The trap here is that candidates often confuse DirectQuery with real-time capability, not realizing that DirectQuery sends live queries to the source for every interaction, which would exacerbate database load rather than minimize it.

How to eliminate wrong answers

Option A is wrong because disabling the Power BI Query Cache would increase database load, not reduce it, as every report interaction would require a fresh query against the database. Option C is wrong because DirectQuery mode sends a query to the database for every visual interaction, which would add significant load to the already heavily stressed database and is not suitable for minimizing impact. Option D is wrong because exporting to a CSV file daily does not support the required 5-minute refresh interval and introduces manual steps, making it impractical for real-time machine efficiency monitoring.

Full explanation →

779

MCQmedium

A telecommunications company needs to analyze call detail records (CDRs) to detect fraud patterns and minimize revenue leakage. The data arrives as a continuous stream from network switches and must be queried within seconds of ingestion to flag suspicious activity. The analysts also need to run interactive ad-hoc queries over the last 90 days of CDR data using a Kusto query language. Which Azure service should they use as the primary data store and analytics engine?

A.Azure Data Explorer

B.Azure Synapse Analytics

C.Azure Stream Analytics

D.Azure Analysis Services

AnswerA

Correct. Azure Data Explorer is built for real-time analytics on streaming data, with sub-second query performance and native Kusto Query Language support for interactive exploration.

Why this answer

Azure Data Explorer is optimized for high-velocity telemetry data like CDRs, supporting ingestion of continuous streams with sub-second query latency. Its native Kusto Query Language (KQL) enables both real-time fraud detection and interactive ad-hoc queries over large time windows (e.g., 90 days) without pre-aggregation or indexing overhead.

Exam trap

The trap here is that candidates confuse Azure Stream Analytics (real-time processing) with Azure Data Explorer (real-time analytics), failing to recognize that Stream Analytics lacks a native query language for interactive ad-hoc exploration over historical data.

How to eliminate wrong answers

Option B is wrong because Azure Synapse Analytics is a distributed SQL-based data warehouse designed for large-scale batch analytics and data integration, not for real-time streaming ingestion with sub-second query latency on raw telemetry. Option C is wrong because Azure Stream Analytics is a real-time stream processing engine that outputs to sinks (e.g., storage, databases) but does not provide a native Kusto query interface for interactive ad-hoc queries over historical data. Option D is wrong because Azure Analysis Services is an OLAP engine for pre-aggregated tabular models, not designed for direct ingestion of high-velocity streaming data or Kusto query language support.

Full explanation →

780

MCQmedium

Your company has a Power BI report that uses DirectQuery to Azure SQL Database. Users report that the report is slow when multiple users access it simultaneously. The database is underprovisioned. Which action should you take to improve performance without changing the report design?

A.Enable query caching in Power BI Premium.

B.Change the report to use Import mode instead of DirectQuery.

C.Scale up the Azure SQL Database to a higher service tier.

D.Add indexes to the tables used in the report.

AnswerC

More DTUs/vCores handle more concurrent queries.

Why this answer

Option C is correct because the root cause is an underprovisioned Azure SQL Database, which cannot handle the concurrent query load from multiple Power BI users using DirectQuery. Scaling up to a higher service tier (e.g., from S2 to S3 or a DTU-based tier) increases the database's DTUs (Database Transaction Units), directly improving throughput and reducing query latency without altering the report design.

Exam trap

The trap here is that candidates often confuse performance tuning at the database level (scaling up) with caching or data import strategies, overlooking the explicit constraint that the report design must remain unchanged and that the database is the bottleneck.

How to eliminate wrong answers

Option A is wrong because query caching in Power BI Premium caches results at the Power BI service layer, but with DirectQuery, the report still sends live queries to the database; caching does not address the database's inability to handle concurrent queries efficiently. Option B is wrong because changing to Import mode would require redesigning the report (e.g., setting up refresh schedules, handling data latency), which violates the constraint of not changing the report design. Option D is wrong because adding indexes can improve query performance for specific queries, but it does not resolve the fundamental issue of an underprovisioned database lacking sufficient compute and I/O resources to handle concurrent user load.

Full explanation →

781

Multi-Selectmedium

Which TWO Azure services can be used to perform real-time stream processing?

Select 2 answers

A.Azure Data Factory

B.Azure Stream Analytics

C.Azure Analysis Services

D.Azure Databricks Structured Streaming

E.Azure Synapse Pipelines

AnswersB, D

Azure Stream Analytics is a real-time stream processing engine.

Why this answer

Azure Stream Analytics is a fully managed service designed for real-time stream processing. It can ingest data from sources like Azure Event Hubs or IoT Hub, apply SQL-based queries to the streaming data, and output results to sinks such as Power BI or Azure SQL Database, all with sub-second latency.

Exam trap

The trap here is that candidates often confuse Azure Data Factory and Azure Synapse Pipelines with stream processing because they can handle data movement, but they are fundamentally batch-oriented orchestration tools, not real-time stream processors.

Full explanation →

782

MCQmedium

A company uses Azure Data Lake Storage Gen2 to store IoT sensor data. The data is partitioned by date and sensor ID. A data scientist needs to efficiently query only the last 7 days of data for a specific sensor. Which strategy minimizes the amount of data scanned?

A.Use a directory structure that enables partition elimination

B.Create a view that filters on date and sensor ID

C.Read all Parquet files and filter using a WHERE clause

D.Create an index on the date and sensor ID columns

AnswerA

By organizing data in a hierarchical directory structure (e.g., /sensorID=xxx/date=yyyy-mm-dd/), query engines can prune partitions and scan only relevant files.

Why this answer

Option A is correct because Azure Data Lake Storage Gen2 supports hierarchical directory structures that enable partition elimination at the storage layer. By organizing data under a path like `/sensorID=123/date=2025-03-20/`, a query engine (e.g., Azure Synapse Serverless SQL or Spark) can skip entire directories that do not match the filter, drastically reducing the amount of data scanned.

Exam trap

The trap here is that candidates confuse database indexing (Option D) with data lake partitioning, or assume that a WHERE clause alone (Option C) is sufficient to minimize data scanned, not realizing that partition elimination requires a physical directory structure.

How to eliminate wrong answers

Option B is wrong because a view is merely a saved query definition; it does not physically reorganize data or skip partitions, so it still scans all underlying files. Option C is wrong because reading all Parquet files and then applying a WHERE clause forces a full scan of every file, even though Parquet supports predicate pushdown—without partition elimination, the engine must still open and read metadata from all files. Option D is wrong because Azure Data Lake Storage Gen2 does not support traditional database indexes; indexing is a relational database concept and cannot be applied to files in a data lake.

Full explanation →

783

MCQmedium

A company develops an IoT device registry that stores device metadata as JSON documents. Each device has a unique DeviceID, and the attributes vary per device type (e.g., sensors, actuators). The application requires low-latency reads by DeviceID and needs global distribution to support devices worldwide. The team wants to use a fully managed NoSQL database in Azure. Which API should they choose for Azure Cosmos DB?

A.SQL API

B.MongoDB API

C.Cassandra API

D.Table API

AnswerA

Correct. The SQL API is the native API for JSON documents in Cosmos DB, offering rich querying and global distribution.

Why this answer

The SQL API (formerly DocumentDB API) is the native API for Azure Cosmos DB, providing full support for querying JSON documents with a SQL-like syntax. It offers the lowest latency reads by ID (point reads) and native global distribution, making it ideal for a device registry where each device has a unique DeviceID and variable attributes. The SQL API also supports indexing all properties automatically, which is critical for the varied device types.

Exam trap

The trap here is that candidates often choose the MongoDB API because they associate JSON documents with MongoDB, but the SQL API is the native Cosmos DB API that provides the best performance and feature integration for JSON workloads on Azure.

How to eliminate wrong answers

Option B (MongoDB API) is wrong because while it supports JSON documents and global distribution, it introduces unnecessary protocol overhead and is designed for MongoDB ecosystem compatibility, not for optimal point reads by ID with automatic indexing of all attributes. Option C (Cassandra API) is wrong because it uses a wide-column store model with a CQL interface, which is not optimized for JSON document storage and requires defining a schema for partition keys and clustering columns, conflicting with the requirement for variable attributes per device type. Option D (Table API) is wrong because it is designed for key-value and tabular data with a flat schema, not for nested JSON documents with varying attributes, and it lacks the rich query capabilities needed for the device registry.

Full explanation →

784

MCQhard

A logistics company stores sensor data from delivery trucks in Azure Table Storage. Each sensor reading includes a TruckID, Timestamp, Location, and EngineTemperature. The most common query retrieves all readings for all trucks within a specific one-hour time window (e.g., between 10:00 and 11:00 on a given day). Currently, the table uses PartitionKey = TruckID and RowKey = Timestamp (ISO format). However, queries filtering by time range are slow and consume many transactions. Which design change will most improve the performance of these time-range queries?

A.Change PartitionKey to a date-based value (e.g., YYYY-MM-DD) and RowKey to a composite of TruckID and Timestamp.

B.Change RowKey to be a composite of TruckID and Timestamp while keeping PartitionKey as TruckID.

C.Use Azure Cosmos DB with a partition key on Timestamp instead of Azure Table Storage.

D.Enable indexing on the Timestamp column in Azure Table Storage.

AnswerA

This design groups all data for a given day in one partition, so a time-range query scans exactly one partition, drastically reducing transactions and cost.

Why this answer

Option A is correct because Azure Table Storage queries are most efficient when they target a specific PartitionKey and a range of RowKey values. By setting PartitionKey to a date-based value (e.g., YYYY-MM-DD), all readings for a given day are co-located in the same partition. Then, using a composite RowKey of TruckID and Timestamp allows the query to filter by time range within that partition using a single partition scan, drastically reducing the number of transactions and improving performance.

Exam trap

The trap here is that candidates often assume indexing on a column (like Timestamp) will speed up queries in Azure Table Storage, but Azure Table Storage does not support secondary indexes—only the PartitionKey and RowKey are indexed, so the only way to optimize time-range queries is to redesign the key schema to include the time dimension in the PartitionKey or RowKey.

How to eliminate wrong answers

Option B is wrong because keeping PartitionKey as TruckID scatters each truck's data across many partitions (one per truck), so a time-range query across all trucks would require a full table scan (querying every partition), which is slow and consumes many transactions. Option C is wrong because migrating to Azure Cosmos DB is not a design change to the existing Azure Table Storage schema; it introduces unnecessary cost and complexity, and the question asks for a design change to the current storage solution, not a migration. Option D is wrong because Azure Table Storage does not support secondary indexes on arbitrary columns; indexing is only available on PartitionKey and RowKey, so enabling indexing on Timestamp is not a valid operation in Azure Table Storage.

Full explanation →

785

MCQmedium

A data engineer needs to process streaming data from IoT devices and store the results in Azure Data Lake Storage for long-term analytics. The data must be processed in near real-time to detect anomalies and trigger alerts. Which Azure service should the engineer use for stream processing?

A.Azure Data Factory

B.Azure Stream Analytics

C.Azure Analysis Services

D.Azure Data Lake Analytics

AnswerB

Azure Stream Analytics is the correct service for real-time stream processing of IoT data, with built-in support for windowing, aggregations, and anomaly detection.

Why this answer

Azure Stream Analytics is a serverless, real-time stream processing engine designed to handle high-velocity data from sources like IoT devices. It can ingest data from Azure Event Hubs or IoT Hub, apply SQL-based queries to detect anomalies in near real-time, and output results directly to Azure Data Lake Storage for long-term analytics. This makes it the correct choice for the described near-real-time anomaly detection and alerting requirement.

Exam trap

The trap here is that candidates often confuse Azure Data Factory's ability to copy data from streaming sources (like Event Hubs) with actual stream processing, failing to recognize that Data Factory lacks the real-time query and windowing capabilities required for anomaly detection.

How to eliminate wrong answers

Option A is wrong because Azure Data Factory is an orchestration and data integration service for batch and scheduled data movement, not a real-time stream processing engine; it cannot process streaming data with sub-second latency. Option C is wrong because Azure Analysis Services is an analytical engine for creating semantic models and performing business intelligence queries on pre-processed data, not for ingesting or processing raw streaming data. Option D is wrong because Azure Data Lake Analytics is a batch analytics service that uses U-SQL to process large datasets in Data Lake Storage, but it does not support real-time stream processing or event-driven anomaly detection.

Full explanation →

786

MCQmedium

A company stores IoT sensor data in Azure Blob Storage. Data scientists need to query the data using SQL without moving it to another store. Which Azure service should they use?

A.Azure Synapse Serverless SQL pool

B.Azure Analysis Services

C.Azure Data Lake Storage

D.Azure SQL Database

AnswerA

Serverless SQL pool can query data in Blob Storage using SQL without moving it.

Why this answer

Azure Synapse Serverless SQL pool allows you to query data directly from Azure Blob Storage using T-SQL without moving or copying the data. It uses a distributed query engine that reads files (Parquet, CSV, JSON) in place, making it ideal for ad-hoc analytics over IoT sensor data stored in Blob Storage.

Exam trap

The trap here is that candidates confuse Azure Data Lake Storage (a storage layer) with a query service, or assume Azure SQL Database can query external files directly, when in fact only Synapse Serverless SQL pool (or PolyBase in dedicated SQL pool) provides native SQL-on-file capabilities for Blob Storage.

How to eliminate wrong answers

Option B is wrong because Azure Analysis Services is an OLAP engine that requires data to be loaded into a tabular model, not a service for querying raw files in Blob Storage with SQL. Option C is wrong because Azure Data Lake Storage is a storage service (not a query service) that provides hierarchical namespace and POSIX-like access, but it does not natively support SQL querying without an additional compute layer like Synapse. Option D is wrong because Azure SQL Database is a fully managed relational database that requires data to be imported or ingested into tables, not a service for querying files in Blob Storage directly.

Full explanation →

787

MCQmedium

A company uses Azure SQL Database for a customer management system. The Customers table has columns: CustomerID (int, primary key), FullName (varchar(100)), Email (varchar(200)), SignUpDate (date), LastLoginDate (date). Queries frequently filter on LastLoginDate to find customers who have not logged in for over a year for a promotional campaign. The table has 10 million rows. Which type of index should they create to optimize these queries?

A.Clustered index on CustomerID

B.Non-clustered index on LastLoginDate

C.Non-clustered index on FullName

D.Columnstore index on SignUpDate and LastLoginDate

AnswerB

A non-clustered index on LastLoginDate enables the database to efficiently seek rows with a specific date range, significantly speeding up the query.

Why this answer

A non-clustered index on LastLoginDate allows the query to quickly locate rows where LastLoginDate is older than one year without scanning the entire 10-million-row table. Azure SQL Database uses B-tree structures for non-clustered indexes, enabling efficient range scans and key lookups for the filtered rows. This directly supports the promotional campaign query pattern.

Exam trap

The trap here is that candidates often choose a clustered index on the primary key by default, failing to recognize that the query predicate (LastLoginDate) is not the clustering key, so the index cannot be used to efficiently filter the data.

How to eliminate wrong answers

Option A is wrong because a clustered index on CustomerID organizes the table by CustomerID, which does not help filter on LastLoginDate; the query would still require a full table scan to evaluate the date condition. Option C is wrong because an index on FullName is irrelevant to filtering on LastLoginDate and would not improve query performance for this specific predicate. Option D is wrong because a columnstore index is optimized for large-scale analytical aggregations and data warehousing workloads, not for point lookups or narrow range queries on a single date column in an OLTP table.

Full explanation →

788

MCQmedium

A software company is migrating an on-premises SQL Server database to Azure SQL Database. The database currently uses SQL Server Agent jobs for regular maintenance tasks. The company wants to minimize code changes during migration. Which Azure SQL Database feature should they use to replace SQL Server Agent jobs?

A.Azure Functions

B.Azure Automation

C.Elastic Jobs

D.SQL Server Agent (available in Azure SQL Database)

AnswerC

Elastic Jobs allow scheduling T-SQL jobs across databases, similar to SQL Agent.

Why this answer

Option B is correct because Elastic Jobs (preview) in Azure SQL Database can replace SQL Server Agent jobs for scheduling tasks. Option A is wrong because Azure Automation can run scripts but is not a direct replacement. Option C is wrong because SQL Server Agent is not available in Azure SQL Database.

Option D is wrong because Azure Functions can run code but requires more development effort.

Full explanation →

789

MCQeasy

A company runs individual Azure SQL Databases for each of its departments. The databases experience varying usage patterns; sometimes one database is idle while another is heavily loaded. The company wants to pool resources to reduce cost while ensuring each database gets resources when needed. Which Azure feature should they use?

A.Azure SQL Database single database with provisioned DTUs

B.Azure SQL Database elastic pool

C.Azure SQL Managed Instance

D.SQL Server on Azure Virtual Machines

AnswerB

An elastic pool allows multiple databases to share a common pool of resources. This is cost-effective for databases with fluctuating usage, as idle databases free up resources for active ones.

Why this answer

Azure SQL Database elastic pools are designed to share resources (eDTUs or eVCores) across multiple databases with varying usage patterns. This allows idle databases to contribute their unused capacity to heavily loaded ones, reducing overall cost while ensuring each database gets resources when needed.

Exam trap

The trap here is that candidates might confuse elastic pools with single databases or managed instances, thinking that 'pooling' means using a single large instance rather than a shared resource model across multiple databases.

How to eliminate wrong answers

Option A is wrong because a single database with provisioned DTUs allocates fixed resources to one database, which cannot be shared across departments and would waste cost when idle. Option C is wrong because Azure SQL Managed Instance is a fully managed instance of SQL Server with fixed resources per instance, not designed for pooling resources across multiple databases with variable loads. Option D is wrong because SQL Server on Azure Virtual Machines requires manual management of resources and licensing, and does not offer built-in elastic pooling across databases.

Full explanation →

790

MCQmedium

An organization runs a mission-critical analytics workload on Azure Synapse Analytics. They need to ensure high availability and automatic failover in case of a regional outage. Which configuration should they implement?

A.Enable read-scale replicas

B.Configure automatic backups with point-in-time restore

C.Use active geo-replication for the dedicated SQL pool

D.Enable always-on availability groups

AnswerC

Active geo-replication provides automatic failover to a secondary region.

Why this answer

Option C is correct because active geo-replication for a dedicated SQL pool in Azure Synapse Analytics creates a readable secondary database in a paired Azure region, enabling automatic failover during a regional outage. This ensures high availability and business continuity for mission-critical analytics workloads by maintaining a synchronized replica that can be promoted to primary with minimal downtime.

Exam trap

The trap here is that candidates often confuse read-scale replicas (Option A) or always-on availability groups (Option D) with geo-replication, but Azure Synapse Analytics dedicated SQL pools do not support those features and rely specifically on active geo-replication for regional failover.

How to eliminate wrong answers

Option A is wrong because read-scale replicas are designed to offload read-only workloads from the primary database, not to provide automatic failover or high availability during a regional outage. Option B is wrong because automatic backups with point-in-time restore allow recovery from data corruption or accidental deletion, but they do not provide automatic failover or continuous availability during a regional outage; restoring from backup incurs significant downtime. Option D is wrong because always-on availability groups are a SQL Server feature for on-premises or IaaS deployments, not supported for Azure Synapse Analytics dedicated SQL pools, which use a different architecture based on Azure SQL Database's geo-replication.

Full explanation →

791

Multi-Selecthard

A company uses Azure Data Lake Storage Gen2 for a data lake. They need to ensure that only authorized users can access files and that access is audited. Which two Azure services should they combine? (Choose two options that together form the solution.)

Select 2 answers

A.Azure Policy

B.Azure Key Vault

C.Azure RBAC

D.Azure Monitor

E.Microsoft Entra ID

AnswersC, D

RBAC controls access to storage resources.

Why this answer

Azure RBAC (Role-Based Access Control) is correct because it provides fine-grained access management for Azure Data Lake Storage Gen2, allowing you to assign roles (e.g., Storage Blob Data Contributor) to users, groups, or service principals to control who can read, write, or delete files. Azure Monitor is correct because it can collect and analyze activity logs and diagnostic settings for the storage account, enabling auditing of access events such as successful and failed authentication attempts.

Exam trap

The trap here is that candidates often confuse Microsoft Entra ID (the identity provider) with the actual access control mechanism (RBAC) and auditing service (Monitor), thinking Entra ID alone handles both, but it only authenticates identities—RBAC authorizes them and Monitor audits the actions.

Full explanation →

792

MCQmedium

Your company uses Azure SQL Database and needs to ensure that transactions are durable even if the database instance fails. Which feature should you enable?

A.Active geo-replication

B.Zone-redundant storage

C.Transparent Data Encryption

D.Auto-failover groups

AnswerB

Replicates data across availability zones, ensuring durability.

Why this answer

Zone-redundant storage (ZRS) replicates your Azure SQL Database transaction logs and data files synchronously across three Azure availability zones within the same region. This ensures that even if an entire zone fails, committed transactions are preserved and the database remains available, providing durability at the storage layer without requiring a separate database replica.

Exam trap

The trap here is that candidates often confuse durability (ensuring committed data survives failures) with high availability or disaster recovery features like geo-replication or failover groups, which address availability rather than the storage-level persistence of transactions.

How to eliminate wrong answers

Option A is wrong because active geo-replication creates asynchronous replicas in a paired region for disaster recovery, but it does not guarantee durability of transactions within the primary region during a zone-level failure. Option C is wrong because Transparent Data Encryption (TDE) only encrypts data at rest and in transit, providing security but no durability or availability guarantees. Option D is wrong because auto-failover groups manage failover between primary and secondary databases, but they rely on the underlying storage durability; they do not themselves make transactions durable against a storage failure.

Full explanation →

793

MCQeasy

A retail company stores data about their products in different formats. Product ID and price are stored in a relational database table. Product descriptions are stored as plain text files. Product images are stored as JPEG files. Which of the following best categorizes these data types in order?

A.Structured, semi-structured, unstructured

B.Structured, unstructured, unstructured

C.Structured, semi-structured, structured

D.Semi-structured, structured, unstructured

AnswerB

The relational table is structured. Product descriptions as plain text are unstructured because they have no predefined format. Images are also unstructured binary data.

Why this answer

Product ID and price in a relational database table are structured because they follow a fixed schema with rows and columns. Product descriptions as plain text files have no predefined structure, making them unstructured. Product images as JPEG files are also unstructured because they consist of binary data without a schema.

Thus, the order is structured, unstructured, unstructured, which matches option B.

Exam trap

The trap here is confusing unstructured data (e.g., plain text files) with semi-structured data (e.g., JSON or XML), leading candidates to misclassify product descriptions as semi-structured when they lack any metadata or tags.

How to eliminate wrong answers

Option A is wrong because it incorrectly categorizes product descriptions as semi-structured; plain text files have no tags or metadata to impose partial organization, so they are unstructured. Option C is wrong because it claims product images are structured; JPEG files are binary blobs with no relational schema, making them unstructured. Option D is wrong because it starts with semi-structured for the relational database table; relational tables enforce a strict schema, so they are structured, not semi-structured.

Full explanation →

794

MCQmedium

A company stores historical sales data in Azure Blob Storage. The data is accessed about once per month and must be immediately available for read when needed. The company wants to minimize storage cost while meeting the access requirement. Which access tier should they use?

A.Hot tier

B.Cool tier

C.Cold tier

D.Archive tier

AnswerB

Correct. Cool tier provides low-latency access at a lower storage cost than Hot, suitable for data accessed monthly.

Why this answer

The Cool tier is optimal because the data is accessed only once per month, but must be immediately available for reads. Cool tier offers lower storage cost than Hot tier while maintaining low-latency access (no retrieval delay), meeting the requirement at minimal cost.

Exam trap

The trap here is that candidates may choose Hot tier because they think 'immediately available' requires the highest performance tier, overlooking that Cool and Cold tiers also provide instant access with lower storage costs.

How to eliminate wrong answers

Option A is wrong because Hot tier has the highest storage cost, which is unnecessary for data accessed only once per month. Option C is wrong because Cold tier, while cheaper for storage, is designed for data accessed less than once per quarter and may incur higher access costs for monthly reads. Option D is wrong because Archive tier requires rehydration (hours of delay) before data can be read, violating the 'immediately available' requirement.

Full explanation →

795

Matchingmedium

Match each Azure storage redundancy option to its description.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts

Matches

Locally redundant storage within a single datacenter

Zone-redundant storage across availability zones

Geo-redundant storage with cross-region replication

Read-access geo-redundant storage

Geo-zone-redundant storage

Why these pairings

Azure offers multiple redundancy options for data durability.

Full explanation →

796

MCQhard

A company stores customer data in a relational table with fixed columns: CustomerID (integer), FirstName (string), LastName (string), Email (string). They also store product images as JPEG files, and customer feedback as JSON documents that may contain varying fields such as rating, comment, and optional metadata. Which of the following correctly orders these data types from most structured to least structured?

A.JSON documents, relational table, JPEG files

B.Relational table, JSON documents, JPEG files

C.JPEG files, JSON documents, relational table

D.Relational table, JPEG files, JSON documents

AnswerB

The relational table has a fixed schema (structured). JSON documents have a flexible schema (semi-structured). JPEG files have no schema (unstructured). This is the correct ordering.

Why this answer

The relational table is the most structured because it enforces a fixed schema with predefined columns and data types (e.g., CustomerID integer, FirstName string). JSON documents are semi-structured: they have a flexible schema where fields like rating and comment can vary per document, but they still provide key-value organization. JPEG files are unstructured binary data with no internal schema or queryable structure, making them the least structured.

Exam trap

The trap here is that candidates often confuse semi-structured data (JSON) with unstructured data (JPEG), mistakenly thinking JSON is unstructured because its fields can vary, when in fact it retains a key-value structure that makes it semi-structured.

How to eliminate wrong answers

Option A is wrong because it orders JSON documents as more structured than a relational table, but relational tables enforce a rigid schema with constraints (e.g., fixed columns, data types) while JSON documents allow varying fields and are semi-structured. Option C is wrong because it places JPEG files (unstructured binary) as more structured than JSON documents (semi-structured with key-value pairs), which reverses the correct order. Option D is wrong because it places JPEG files as more structured than JSON documents, but JPEG files lack any internal schema or metadata that can be queried, whereas JSON documents have a defined structure (e.g., key-value pairs) and are semi-structured.

Full explanation →

797

MCQmedium

You are designing a solution to store large binary files (videos) for a media company. The solution must support tiered storage to optimize costs based on access frequency. Which Azure storage option should you use?

A.Azure Cosmos DB

B.Azure Files

C.Azure Blob Storage

D.Azure Disk Storage

AnswerC

Azure Blob Storage supports access tiers for cost optimization.

Why this answer

Option C is correct because Azure Blob Storage supports access tiers (Hot, Cool, Archive) for cost optimization. Option A is wrong because Azure Files is for file shares, not tiered storage for videos. Option B is wrong because Azure Disk Storage is for VM disks, not object storage.

Option D is wrong because Azure Cosmos DB is a database, not for storing large binary files.

Full explanation →

798

MCQeasy

Your company needs to build an analytics solution that can handle both batch and streaming data from IoT devices. The solution must allow complex event processing and real-time dashboards. Which Azure service should you use as the primary data ingestion and processing layer?

A.Azure Data Factory

B.Azure Stream Analytics

C.Azure Databricks

D.Azure Synapse Analytics

AnswerB

Stream Analytics is built for real-time analytics on streaming data.

Why this answer

Azure Stream Analytics is the correct choice because it is designed specifically for real-time data ingestion and processing, supporting both batch and streaming data from IoT devices. It enables complex event processing (CEP) through SQL-like queries and can output directly to real-time dashboards in Power BI, meeting the requirement for both batch and streaming analytics.

Exam trap

The trap here is that candidates often confuse Azure Stream Analytics with Azure Data Factory or Azure Databricks, mistakenly thinking that batch and streaming can be handled by a single general-purpose ETL tool, but Stream Analytics is the only service natively optimized for real-time complex event processing and direct dashboard integration.

How to eliminate wrong answers

Option A is wrong because Azure Data Factory is primarily an ETL and orchestration service for batch data movement and transformation, not designed for real-time streaming or complex event processing. Option C is wrong because Azure Databricks is a big data analytics platform that can handle both batch and streaming, but it is not the primary service for real-time dashboards and complex event processing; it requires additional configuration and is more suited for advanced analytics and machine learning. Option D is wrong because Azure Synapse Analytics is a unified analytics service that integrates data warehousing and big data analytics, but its primary strength is in large-scale batch processing and data warehousing, not real-time streaming ingestion and complex event processing.

Full explanation →

799

MCQeasy

A small business wants to start using Azure for analytics. They have a few CSV files stored on-premises that they want to analyze. They have no budget for complex infrastructure and prefer a fully managed, serverless solution. They need to create interactive visualizations and share them with their team. The data does not change frequently, so they are okay with daily refreshes. Which of the following options should they choose? A) Upload the CSV files to Azure Data Lake Storage Gen2, use Azure Databricks to create a data processing pipeline, and then use Power BI to visualize the results. B) Upload the CSV files to Azure Blob Storage, use Azure Data Factory to load the data into Azure SQL Database, and then use Power BI to connect and visualize. C) Upload the CSV files to OneDrive for Business, use Power BI Desktop to import the data, and publish to Power BI Service with scheduled refresh. D) Upload the CSV files to Azure Data Lake Storage Gen2, use Azure Synapse Serverless SQL pool to query the data, and then use Power BI to connect. Which option is the simplest and most cost-effective?

A.Option B

B.Option D

C.Option C

D.Option A

AnswerC

Simplest, uses Power BI Desktop with OneDrive, no extra Azure services.

Why this answer

Option C is correct because it uses OneDrive for Business as a simple storage location, Power BI Desktop for importing CSV data, and Power BI Service for publishing and sharing interactive visualizations with scheduled daily refresh. This is fully managed, serverless, and requires no complex infrastructure, aligning perfectly with the small business's budget and simplicity requirements.

Exam trap

The trap here is that candidates often overcomplicate the solution by choosing Azure-specific storage and compute services (like Data Lake, Databricks, or Synapse) when a simpler, fully managed tool like Power BI with OneDrive is sufficient and more cost-effective for small-scale, static data analytics.

How to eliminate wrong answers

Option A is wrong because it involves Azure Databricks, which is a complex, cluster-based data processing platform that introduces significant cost and management overhead, far beyond the needs of a small business with simple CSV files and daily refreshes. Option B is wrong because it uses Azure Data Factory and Azure SQL Database, which are over-engineered for static CSV data; Data Factory adds pipeline complexity and SQL Database incurs ongoing costs, contradicting the 'no budget for complex infrastructure' requirement. Option D is wrong because Azure Synapse Serverless SQL pool, while serverless, is designed for large-scale analytics and requires creating external tables and managing permissions, adding unnecessary complexity for a few CSV files that could be handled more directly with Power BI.

Full explanation →

800

Multi-Selecthard

Which THREE factors should you consider when choosing between Azure Blob Storage and Azure Cosmos DB for a new application? (Choose three.)

Select 3 answers

A.Global distribution and multi-region writes

B.Data structure (unstructured vs. semi-structured)

C.Encryption at rest support

D.Scalability limits

E.Query capabilities (simple key-value vs. complex queries)

AnswersA, B, E

Cosmos DB offers global distribution with multi-region writes; Blob Storage is limited.

Why this answer

Option A (Data structure) is correct because Blob Storage stores unstructured blobs, while Cosmos DB stores structured documents. Option C (Query requirements) is correct because Cosmos DB supports complex queries, while Blob Storage does not. Option E (Global distribution) is correct because Cosmos DB offers multi-region writes, while Blob Storage does not.

Option B is wrong because both services are highly scalable. Option D is wrong because both support encryption at rest.

Full explanation →

801

MCQeasy

A retail company runs a nightly process that reads all sales transactions from the previous day, aggregates them by product category and store location, and writes the summary results into a data warehouse for reporting. Which type of data processing workload best describes this nightly process?

A.Online Transaction Processing (OLTP)

B.Batch processing

C.Stream processing

D.Data warehousing

AnswerB

Batch processing is the correct classification because the job processes a large volume of accumulated data at a scheduled time (nightly).

Why this answer

The nightly process reads all sales transactions from the previous day, aggregates them, and writes summary results into a data warehouse. This is a classic batch processing workload because data is collected over a period (the entire previous day), processed in a single offline job, and the output is stored for later reporting. Batch processing is ideal for high-volume, non-real-time transformations like nightly ETL (Extract, Transform, Load) jobs.

Exam trap

The trap here is that candidates confuse the destination (data warehousing) with the processing workload, or mistake a scheduled nightly aggregation for stream processing because they see 'data' and 'processing' without recognizing the batch window.

How to eliminate wrong answers

Option A is wrong because OLTP is designed for real-time, low-latency transaction processing (e.g., inserting individual sales records as they occur), not for aggregating large volumes of historical data in a nightly job. Option C is wrong because stream processing handles continuous, real-time data flows with sub-second latency (e.g., Apache Kafka or Azure Stream Analytics), whereas this process runs once per day on a fixed batch of data. Option D is wrong because data warehousing is the destination or storage system for the aggregated results, not the processing workload itself; the nightly process is a batch ETL job that feeds the data warehouse.

Full explanation →

802

MCQeasy

An organization uses Azure SQL Database and needs to maintain a copy of the database for read-only reporting without affecting the production workload. Which feature should they use?

A.Azure SQL Database read replica

B.Automated backups

C.Active geo-replication

D.Failover groups

AnswerC

Active geo-replication provides readable secondary replicas that can be used for reporting.

Why this answer

Active geo-replication (Option C) creates a readable secondary replica of an Azure SQL Database in a different Azure region. This secondary replica is continuously updated asynchronously from the primary and can be used for read-only query workloads, offloading reporting traffic without impacting the production database's performance or transaction throughput.

Exam trap

The trap here is that candidates confuse 'read replica' (which exists in Azure SQL Database Hyperscale and Azure SQL Managed Instance) with the standard Azure SQL Database feature, or they mistakenly think failover groups themselves provide the readable copy, when in fact it is Active geo-replication that creates the readable secondary.

How to eliminate wrong answers

Option A is wrong because Azure SQL Database does not support read replicas in the same way as Azure SQL Database for Hyperscale or Azure SQL Managed Instance; the term 'read replica' is not a standard feature for a single Azure SQL Database (non-Hyperscale) — instead, Active geo-replication provides the read-only secondary. Option B is wrong because automated backups are point-in-time restore copies stored in blob storage, not live, readable replicas; they cannot serve ongoing read-only queries without first being restored, which would create a separate database. Option D is wrong because failover groups manage geo-replication and failover orchestration for a group of databases, but the read-only secondary is provided by the underlying Active geo-replication, not by the failover group itself; failover groups are a management layer, not the feature that creates the readable copy.

Full explanation →

803

MCQhard

A data engineering team is designing a modern data warehouse on Azure. They have raw data landing in Azure Data Lake Storage Gen2 (ADLS Gen2) as Parquet files. They need to perform transformations using Apache Spark, and then load the transformed data into Azure Synapse Analytics for high-performance analytical queries. The team wants to use a single orchestration service to schedule, monitor, and manage the entire pipeline. Which Azure service should they choose for orchestration?

A.Azure Data Factory

B.Azure Databricks

C.Azure Logic Apps

D.Azure Data Lake Analytics

AnswerA

Data Factory orchestrates data movement and transformation, supporting both Spark and Synapse workloads.

Why this answer

Azure Data Factory (ADF) is the correct choice because it is a cloud-based ETL and orchestration service designed to schedule, monitor, and manage data pipelines at scale. It natively supports triggers (e.g., time-based, event-based) and can orchestrate Apache Spark transformations via Azure Databricks or HDInsight, then load the transformed data into Azure Synapse Analytics using built-in copy activities or pipelines. ADF provides a single pane of glass for end-to-end pipeline management, including dependency handling and error monitoring.

Exam trap

The trap here is that candidates may confuse Azure Databricks (a compute/transform service) with an orchestration tool, but the question explicitly asks for a service to 'schedule, monitor, and manage the entire pipeline,' which is the core function of Azure Data Factory, not Databricks.

How to eliminate wrong answers

Option B (Azure Databricks) is wrong because it is an Apache Spark-based analytics platform for data transformation and machine learning, not a dedicated orchestration service; it lacks native scheduling and monitoring capabilities for multi-step pipelines across heterogeneous services. Option C (Azure Logic Apps) is wrong because it is designed for workflow automation and integration across SaaS applications using connectors, not for orchestrating big data ETL pipelines with Spark and Synapse Analytics; it does not support native Spark execution or large-scale data movement. Option D (Azure Data Lake Analytics) is wrong because it is a deprecated service for distributed analytics using U-SQL, not an orchestration tool; it cannot schedule or manage pipelines that involve ADLS Gen2, Spark, and Synapse Analytics.

Full explanation →

804

MCQmedium

A healthcare organization uses Azure SQL Database to store patient records. To comply with HIPAA regulations, they need to encrypt sensitive columns (e.g., Social Security numbers) at rest and control access to the encryption keys. Which feature should they use?

A.Dynamic Data Masking

B.Row-Level Security

C.Always Encrypted

D.Transparent Data Encryption (TDE)

AnswerC

Always Encrypted encrypts specific columns and allows client-side key management.

Why this answer

Option A is correct because Always Encrypted with secure enclaves protects sensitive columns at rest and allows client-side key management. Option B is wrong because Transparent Data Encryption encrypts the entire database but does not allow column-level key control. Option C is wrong because Dynamic Data Masking obfuscates data but does not encrypt it.

Option D is wrong because Row-Level Security restricts row access but does not encrypt columns.

Full explanation →

805

MCQeasy

A company is migrating a relational database to Azure SQL Database. They anticipate that the amount of stored data will grow significantly over time, but the compute requirements (CPU and memory) will remain relatively stable. Which purchasing model should they choose to allow independent scaling of storage and compute?

A.DTU-based purchasing model

B.vCore-based purchasing model

C.Serverless compute tier

D.Hyperscale service tier

AnswerB

vCore-based model separates compute and storage billing, allowing storage to scale independently of compute.

Why this answer

The vCore-based purchasing model separates compute and storage costs, allowing you to scale storage independently without changing compute resources. This matches the scenario where data grows but compute requirements remain stable, as you can increase storage capacity without upgrading CPU or memory.

Exam trap

The trap here is confusing purchasing models (DTU vs. vCore) with service tiers (Hyperscale) or compute options (Serverless), leading candidates to pick Hyperscale or Serverless when the question specifically asks for a purchasing model that allows independent scaling of storage and compute.

How to eliminate wrong answers

Option A is wrong because the DTU-based model bundles compute, storage, and I/O into a fixed package, so scaling storage also forces a change in compute resources. Option C is wrong because the Serverless compute tier is a compute scaling option within the vCore model, not a separate purchasing model; it auto-pauses compute but does not allow independent storage scaling. Option D is wrong because Hyperscale is a service tier (architecture) for large databases, not a purchasing model; it uses vCore pricing but is designed for high scalability and storage up to 100 TB, not for independent scaling of storage and compute in the general sense described.

Full explanation →

806

MCQhard

You run the above Kusto query in Azure Data Explorer. What does the query return?

A.The total number of tornado events across all states.

B.The top 5 states with the most tornado events.

C.The top 5 years with the most tornadoes.

D.The number of events by type.

AnswerB

Correct: count by state, order desc, top 5.

Why this answer

The Kusto query uses the `summarize` operator to count events per state, then `top 5 by count_` to return the five states with the highest event counts. The `project` operator ensures only the state and count columns are shown, making option B correct.

Exam trap

The trap here is that candidates may confuse the grouping column (State) with other columns like EventType or Year, leading them to select options that describe aggregations by those columns instead of by state.

How to eliminate wrong answers

Option A is wrong because the query groups by state and returns only the top 5 states, not a total across all states. Option C is wrong because the query groups by `State`, not by `Year`, so it cannot return top years. Option D is wrong because the query does not group by event type; it groups by state and counts all events regardless of type.

Full explanation →

807

MCQmedium

You are designing a solution to store customer preferences as key-value pairs. The data will be accessed frequently and must support high availability across multiple Azure regions. The solution should minimize management overhead. Which Azure service should you use?

A.Azure SQL Database

B.Azure Cache for Redis

C.Azure Cosmos DB

D.Azure Queue Storage

AnswerC

Cosmos DB is a globally distributed, multi-model database that supports key-value and provides high availability.

Why this answer

Azure Cosmos DB provides global distribution, high availability, and low-latency access for key-value data with minimal management. Option A is wrong because Azure Cache for Redis is a cache, not a durable store. Option C is wrong because Azure SQL Database is relational and overkill.

Option D is wrong because Azure Storage Queue is for messaging.

Full explanation →

808

Multi-Selectmedium

Which TWO Azure services are primarily used for data integration and orchestration?

Select 2 answers

A.Azure Logic Apps

B.Azure Synapse Analytics

C.Azure Stream Analytics

D.Azure Analysis Services

E.Azure Data Factory

AnswersA, E

Workflow automation and integration.

Why this answer

Azure Logic Apps is correct because it is a serverless workflow service that integrates apps, data, and services using connectors and triggers, making it ideal for data integration and orchestration. Azure Data Factory is correct because it is a cloud-based ETL and data integration service that orchestrates and automates data movement and transformation across various data stores.

Exam trap

The trap here is that candidates often confuse Azure Synapse Analytics (a data warehouse) or Azure Stream Analytics (a real-time processing service) with data integration tools, because they involve data movement or processing, but they are not primarily designed for orchestration and integration.

Full explanation →

809

MCQhard

You are reviewing an ARM template for an Azure Storage account. The container named 'data' is created with public access set to 'None'. What is the primary benefit of this configuration?

A.It encrypts data at rest.

B.It restricts access to authorized users only.

C.It enables soft delete for the container.

D.It prevents accidental deletion of blobs.

AnswerB

Setting public access to 'None' disables anonymous access, requiring authentication.

Why this answer

Setting public access to 'None' on a container means that anonymous read requests are not allowed. The primary benefit is that only requests with proper authorization (e.g., using an account key, a shared access signature, or Azure AD credentials) can access the blobs within that container. This directly restricts access to authorized users only, which is the core security advantage.

Exam trap

The trap here is that candidates often confuse 'public access set to None' with broader security features like encryption or deletion protection, when in fact it only controls anonymous read access and does not affect data encryption, soft delete, or accidental deletion safeguards.

How to eliminate wrong answers

Option A is wrong because encryption at rest is enabled by default at the storage account level via Azure Storage Service Encryption (SSE), regardless of the container's public access setting. Option C is wrong because soft delete is a separate data protection feature that must be explicitly enabled on the storage account or container, and it is not a benefit of setting public access to 'None'. Option D is wrong because preventing accidental deletion of blobs is achieved through features like soft delete or immutable storage, not by disabling anonymous access.

Full explanation →

810

MCQmedium

A retail company runs an e-commerce platform on a single SQL Server database. The database experiences unpredictable spikes in read-only reporting queries that degrade transactional performance. They want to migrate to Azure SQL Database and isolate the reporting workload while ensuring it sees the most current data. Which feature should they enable?

A.Read Scale-out (using a readable secondary)

B.Elastic Pool

C.Geo-replication

D.SQL Server Agent

AnswerA

This feature creates a read-only replica that can serve reporting queries with minimal latency, isolating them from the primary's transactional workload.

Why this answer

Read Scale-out allows you to offload read-only reporting queries to a readable secondary replica, isolating them from the primary transactional workload. This ensures reporting queries see the most current data because the secondary replica is kept transactionally consistent with the primary via synchronous replication. It directly addresses the unpredictable spikes in read-only queries without degrading transactional performance.

Exam trap

The trap here is that candidates often confuse Geo-replication (which provides a readable secondary but with asynchronous replication and potential data lag) with Read Scale-out, which uses synchronous replication for current data, or mistakenly think Elastic Pools can isolate workloads within a single database.

How to eliminate wrong answers

Option B (Elastic Pool) is wrong because it is designed for managing and sharing resources among multiple databases with varying usage patterns, not for isolating read-only workloads from a single database. Option C (Geo-replication) is wrong because it provides asynchronous replication for disaster recovery and geographic distribution, not for real-time read-only workload isolation with current data visibility. Option D (SQL Server Agent) is wrong because it is a scheduling and automation service for jobs like backups and maintenance, not a feature for read-only workload isolation.

Full explanation →

811

MCQhard

A company uses Azure SQL Database with active geo-replication for disaster recovery. During a regional outage, the secondary database is promoted. After the primary region recovers, what is the best practice to re-establish geo-replication?

A.Delete the old primary database and create a new geo-replication

B.Add the old primary as a secondary to the new primary

C.Reverse the geo-replication direction automatically

D.Perform a planned failover to make the old primary the primary again

AnswerD

Best practice to restore original configuration.

Why this answer

Option D is correct because you should fail back to the original primary using planned failover, then reconfigure geo-replication. Option A is wrong because reversing geo-replication automatically is not supported. Option B is wrong because deleting the old primary loses data.

Option C is wrong because you cannot simply add the old primary as a secondary without resolution.

Full explanation →

812

MCQhard

Refer to the exhibit. A data engineer runs the PowerShell script shown. What is the purpose of this script?

A.List all blobs in the container

B.Copy blobs to another container

C.List blobs modified in the last 7 days

D.Delete old blobs from the container

AnswerC

Filters blobs by LastModified within 7 days.

Why this answer

The script uses `Get-AzStorageBlob` with the `-Prefix` parameter to filter blobs by name, then applies a `Where-Object` filter to select only blobs whose `LastModified` property is greater than or equal to 7 days ago. This effectively lists blobs modified in the last 7 days. The script does not perform any copy or delete operations, and it does not list all blobs without filtering.

Exam trap

The trap here is that candidates see `Get-AzStorageBlob` and assume it lists all blobs (option A), overlooking the `Where-Object` filter that restricts results to only recently modified blobs.

How to eliminate wrong answers

Option A is wrong because the script includes a `Where-Object` filter on `LastModified`, so it does not list all blobs — it only returns blobs modified within the last 7 days. Option B is wrong because the script contains no `Start-AzStorageBlobCopy` or any copy cmdlet; it only retrieves blob properties and filters them. Option D is wrong because the script does not call `Remove-AzStorageBlob` or any deletion cmdlet; it only reads and filters blob metadata without modifying storage.

Full explanation →

813

MCQeasy

Your company needs to store large amounts of data that will be accessed only a few times a year for compliance audits. The data must be retained for 7 years. Which Azure Blob Storage access tier should you choose?

A.Cool

B.Premium

C.Hot

D.Archive

AnswerD

Archive tier is the lowest-cost storage for data that is rarely accessed and has a long retention period.

Why this answer

Azure Blob Storage Archive tier is the most cost-effective for data that is rarely accessed and has a long retention period. Option A is wrong because Hot tier is for frequently accessed data. Option B is wrong because Cool tier is for infrequently accessed data but cheaper than Hot; however, Archive is even cheaper.

Option D is wrong because Premium tier is for high-performance scenarios.

Full explanation →

814

Multi-Selecteasy

Which TWO are benefits of using Azure SQL Database elastic pools?

Select 2 answers

A.Predictable pricing for a group of databases

B.Resource sharing across multiple databases

C.Unlimited storage per database

D.Isolated performance for each database

E.Support for databases over 1 TB each

AnswersA, B

Fixed pool cost.

Why this answer

Option B is correct because elastic pools allow sharing of resources across multiple databases. Option C is correct because elastic pools provide predictable pricing for a group of databases. Option A is wrong because elastic pools do not provide unlimited storage.

Option D is wrong because elastic pools are not for single large databases. Option E is wrong because elastic pools are for multiple databases, not a single database.

Full explanation →

815

MCQhard

Your company is designing a data solution for IoT sensor data that arrives in high volume and must be stored for long-term analytics. The data is append-only and rarely updated. You need to choose a storage solution that balances cost and query performance for historical analysis. Which Azure data store should you recommend?

A.Azure Cosmos DB

B.Azure Table Storage

C.Azure SQL Database

D.Azure Data Lake Storage Gen2

AnswerD

ADLS Gen2 provides cost-effective storage for large datasets and integrates with analytics services like Synapse and Databricks.

Why this answer

Azure Data Lake Storage Gen2 is the correct choice because it combines a hierarchical namespace with Azure Blob Storage, offering scalable, cost-effective storage for high-volume append-only data like IoT sensor logs. It supports both structured and unstructured data, integrates with analytics engines like Azure Synapse and Spark, and provides POSIX-compliant access control, making it ideal for long-term historical analysis at low cost.

Exam trap

The trap here is that candidates often confuse Azure Cosmos DB's low-latency capabilities with suitability for high-volume historical analytics, overlooking its cost model and lack of native file-system semantics for append-only workloads.

How to eliminate wrong answers

Option A is wrong because Azure Cosmos DB is a NoSQL database optimized for low-latency, transactional workloads with global distribution, not for cost-effective long-term storage of append-only IoT data; its per-request pricing and high throughput costs make it unsuitable for high-volume historical analytics. Option B is wrong because Azure Table Storage is a key-value store designed for simple, semi-structured data with limited query capabilities (only on partition and row keys), lacking the hierarchical namespace, file-level security, and native analytics integration needed for complex historical queries on IoT data. Option C is wrong because Azure SQL Database is a relational database with ACID transactions and indexing, which is over-provisioned and expensive for append-only IoT data that rarely updates; its per-core pricing and storage limits make it cost-prohibitive for high-volume, long-term storage compared to object storage.

Full explanation →

816

MCQmedium

You are a data engineer for a large e-commerce company. The company uses Azure Data Lake Storage Gen2 to store customer transaction data. They also use Azure Databricks for data transformation and Azure Synapse Serverless SQL pool for ad-hoc queries. Recently, the data lake has grown to 10 TB, and query performance in Synapse Serverless has degraded significantly. Users complain that queries that used to take seconds now take minutes. You need to improve query performance without moving data to a dedicated SQL pool. The data is stored in Parquet format, partitioned by date. You notice that the queries often filter on CustomerID and Date. Current queries scan all partitions even when only a few days are needed. What is the most effective solution to improve performance?

A.Create materialized views in the Serverless SQL database on the partitioned data

B.Convert all Parquet files to CSV and use row-level security to limit data access

C.Repartition the Parquet files by both date and CustomerID, and optimize file sizes to 1 GB each

D.Increase the service level of the Synapse workspace to improve query concurrency

AnswerC

Partition pruning will eliminate irrelevant partitions, reducing data scanned.

Why this answer

Option C is correct because repartitioning the Parquet files by both date and CustomerID enables partition pruning in Azure Synapse Serverless SQL pool. When queries filter on CustomerID and Date, the engine can skip irrelevant partitions entirely, drastically reducing the amount of data scanned. Optimizing file sizes to around 1 GB ensures efficient parallelism and avoids the overhead of many small files, which degrades performance in a serverless environment.

Exam trap

The trap here is that candidates may think materialized views (Option A) or scaling up (Option D) can fix performance issues caused by poor data partitioning, but they overlook that serverless SQL pools rely heavily on data layout and partition pruning for efficient query execution.

How to eliminate wrong answers

Option A is wrong because materialized views in Serverless SQL pool are pre-computed aggregations that can speed up certain queries, but they do not address the root cause of scanning all partitions; the underlying data layout remains unchanged, so queries that filter on CustomerID and Date would still scan unnecessary partitions unless the view itself is partitioned, which is not supported. Option B is wrong because converting Parquet to CSV would increase storage size and query cost (CSV is not columnar), and row-level security only controls access, not performance; it would actually worsen query performance due to lack of compression and predicate pushdown. Option D is wrong because increasing the service level (e.g., changing the Synapse workspace tier) improves concurrency and resource allocation but does not change the data layout or partition pruning; queries would still scan all partitions, so the performance gain is marginal and does not solve the fundamental issue.

Full explanation →

817

MCQeasy

A company stores JSON documents for a product catalog. Each document has a flexible schema because different product categories have different attributes. The catalog is read-heavy and requires low-latency lookups by product ID. The company expects to handle millions of products and needs to serve customers globally with low latency. Which Azure NoSQL data store should they choose?

A.Azure Table Storage

B.Azure Blob Storage

C.Azure Cosmos DB

D.Azure SQL Database

AnswerC

Azure Cosmos DB is a globally distributed NoSQL database that supports flexible schemas and document models via its SQL API. It offers low-latency reads and writes with guarantees of <10 ms for reads and can be replicated across Azure regions.

Why this answer

Azure Cosmos DB is the correct choice because it is a globally distributed, multi-model NoSQL database that natively supports JSON documents with flexible schemas, provides single-digit-millisecond latency for read-heavy workloads via automatic indexing, and offers turnkey global distribution across Azure regions to serve customers worldwide with low latency.

Exam trap

The trap here is that candidates often confuse Azure Table Storage's key-value model with a document database, overlooking that Table Storage does not support flexible JSON schemas or global distribution with low-latency reads, while Cosmos DB is explicitly designed for these requirements.

How to eliminate wrong answers

Option A is wrong because Azure Table Storage is a key-value store that does not natively support JSON documents with flexible schemas; it stores entities with a fixed set of properties and lacks the rich querying and indexing capabilities needed for product catalog lookups. Option B is wrong because Azure Blob Storage is an object storage service for unstructured binary or text data, not a NoSQL database; it cannot perform low-latency lookups by product ID without additional indexing or compute layers. Option D is wrong because Azure SQL Database is a relational database with a fixed schema, requiring predefined tables and columns, which contradicts the requirement for flexible JSON schemas across different product categories.

Full explanation →

818

MCQeasy

A hospital collects patient vital signs every minute using IoT sensors. Each reading contains a timestamp, patient ID, heart rate, blood pressure, and temperature. This data is ingested continuously for real-time monitoring and alerting. Which type of data workload does this scenario best represent?

A.A. Transactional workload

B.B. Analytical workload

C.C. Batch processing

D.D. Real-time streaming

AnswerD

Real-time streaming workloads handle continuous data flows that are processed as soon as they arrive, often with low latency requirements. The hospital's IoT sensors generate data every minute that must be acted on promptly, making this a clear example of a real-time streaming workload.

Why this answer

This scenario requires continuous ingestion of sensor data with immediate processing for real-time monitoring and alerting. Real-time streaming workloads, such as those handled by Azure Stream Analytics or Apache Kafka, are designed to process unbounded data streams with low latency, making option D correct.

Exam trap

The trap here is confusing 'real-time streaming' with 'analytical workload' because both involve data processing, but analytical workloads are designed for historical analysis and reporting, not for sub-second alerting on live data streams.

How to eliminate wrong answers

Option A is wrong because transactional workloads focus on ACID-compliant operations (e.g., OLTP) that handle discrete, small-scale read/write operations, not continuous high-velocity sensor streams. Option B is wrong because analytical workloads typically involve batch or interactive queries over historical data (e.g., using Azure Synapse or Power BI), not millisecond-level alerting on live data. Option C is wrong because batch processing processes data in large, scheduled chunks (e.g., nightly ETL jobs), which cannot meet the real-time alerting requirement of this scenario.

Full explanation →

819

MCQeasy

A database system ensures that a transaction either completes fully and all changes are applied, or it is completely rolled back and no partial changes are saved. Which property of ACID transactions does this describe?

A.Atomicity

B.Consistency

C.Isolation

D.Durability

AnswerA

Atomicity guarantees that all operations within a transaction are completed successfully or none are applied. This 'all or nothing' property prevents partial updates.

Why this answer

Atomicity ensures that a transaction is treated as a single, indivisible unit of work. If any part of the transaction fails, the entire transaction is rolled back, leaving the database in its original state. This property guarantees that no partial changes are saved, which directly matches the description in the question.

Exam trap

Microsoft often tests the distinction between atomicity and consistency by describing a scenario where a transaction either fully applies or fully rolls back, leading candidates to mistakenly choose consistency because they associate 'valid state' with 'complete execution'.

How to eliminate wrong answers

Option B (Consistency) is wrong because consistency ensures that a transaction brings the database from one valid state to another, preserving all defined rules (e.g., constraints, cascades, triggers), but it does not address the 'all-or-nothing' execution of the transaction itself. Option C (Isolation) is wrong because isolation controls how transaction changes are visible to other concurrent transactions (e.g., via locking or snapshot isolation), not whether the transaction completes fully or rolls back. Option D (Durability) is wrong because durability guarantees that once a transaction is committed, its changes persist even after a system failure (e.g., via write-ahead logging), but it does not describe the rollback behavior on failure.

Full explanation →

820

MCQmedium

A retail company captures real-time sensor data from IoT devices to detect anomalies and predict equipment failures. The data must be processed immediately as it arrives. Which type of data processing workload best describes this scenario?

A.Batch processing

B.Streaming processing

C.Online transaction processing (OLTP)

D.Data warehousing

AnswerB

Streaming processing ingests and analyzes data in real time, enabling prompt anomaly detection and failure prediction from IoT sensor feeds.

Why this answer

B is correct because streaming processing is designed for continuous, real-time data ingestion and immediate analysis, which matches the requirement to process sensor data as it arrives. Technologies like Azure Stream Analytics or Apache Kafka enable low-latency processing of IoT data streams to detect anomalies and predict failures without batching.

Exam trap

Microsoft often tests the distinction between batch and streaming by describing a scenario with 'immediate' or 'real-time' requirements, and candidates mistakenly choose batch processing because they overlook the latency constraint.

How to eliminate wrong answers

Option A is wrong because batch processing processes data in large, scheduled chunks (e.g., hourly or daily), which introduces latency and cannot handle real-time sensor data that must be processed immediately. Option C is wrong because OLTP focuses on managing transactional operations (e.g., order entry, inventory updates) with ACID compliance, not on continuous, high-velocity stream analytics. Option D is wrong because data warehousing is optimized for storing and querying historical, structured data for reporting and BI, not for real-time ingestion and immediate anomaly detection.

Full explanation →

821

MCQhard

A retail chain needs to blend two data sources for a near real-time dashboard: daily batch files from store systems (CSV files on Azure Blob Storage updated once per day) and live web clickstream data from Azure Event Hubs. The dashboard must refresh every 5 minutes with combined data. Which combination of Azure services should be used to ingest and process both data types most efficiently?

A.A) Azure Data Factory + Azure Analysis Services

B.B) Azure Stream Analytics + Power BI

C.C) Azure Synapse Pipelines + Azure Stream Analytics

D.D) Azure Databricks + Azure Data Lake Storage

AnswerC

Synapse Pipelines can orchestrate batch ingestion from Blob Storage, and Stream Analytics can process streaming data from Event Hubs, both feeding into Synapse SQL for combined queries.

Why this answer

Option C is correct because Azure Synapse Pipelines can orchestrate the daily batch CSV files from Azure Blob Storage, while Azure Stream Analytics processes the live web clickstream data from Azure Event Hubs in near real-time. Together, they enable a combined data pipeline that refreshes every 5 minutes, meeting the dashboard's latency requirement efficiently.

Exam trap

The trap here is that candidates often assume Power BI alone can handle both batch and streaming ingestion, but it lacks native batch file ingestion from Blob Storage and requires a separate processing service like Stream Analytics for real-time data.

How to eliminate wrong answers

Option A is wrong because Azure Data Factory is a batch-oriented ETL service that cannot handle live streaming data from Event Hubs, and Azure Analysis Services is a semantic modeling layer that does not ingest or process raw streaming data. Option B is wrong because while Azure Stream Analytics can process the clickstream data, Power BI alone cannot ingest and blend the daily batch CSV files from Blob Storage; it requires a separate ingestion service for batch data. Option D is wrong because Azure Databricks is a big data analytics platform that is overkill for this simple batch-plus-streaming scenario and lacks native integration for near real-time dashboard refresh without additional services, and Azure Data Lake Storage is just a storage layer, not a processing service.

Full explanation →

822

MCQhard

A healthcare organization must store patient records with strict compliance requirements. They need to classify data as public, internal, or confidential, and apply encryption and access policies accordingly. Which Microsoft Purview feature should they use?

A.Microsoft Purview Data Map

B.Azure Policy

C.Microsoft Defender for Cloud

D.Azure Information Protection

AnswerA

Purview Data Map allows scanning and classifying data assets across sources.

Why this answer

Microsoft Purview Data Map is the correct choice because it provides a unified data governance solution that enables automated data classification (public, internal, confidential) across hybrid and multi-cloud environments. It integrates with sensitivity labels and encryption policies to enforce access controls based on classification, meeting strict compliance requirements for patient records.

Exam trap

The trap here is that candidates often confuse Azure Information Protection (a legacy labeling tool) with Microsoft Purview Data Map, not realizing that Purview provides the unified data governance and automated classification capabilities required for compliance-driven data management.

How to eliminate wrong answers

Option B (Azure Policy) is wrong because it enforces organizational rules and compliance standards on Azure resources (e.g., tagging, location restrictions) but does not natively classify data or apply encryption/access policies at the data level. Option C (Microsoft Defender for Cloud) is wrong because it focuses on cloud security posture management, threat detection, and vulnerability assessment, not on data classification or granular access policies. Option D (Azure Information Protection) is wrong because it is a legacy labeling and classification solution that has been superseded by Microsoft Purview Information Protection; it lacks the unified data map and automated scanning capabilities of Purview Data Map.

Full explanation →

823

MCQhard

A global gaming company develops a multiplayer game. Player profile data (username, email, preferences) is stored as simple key-value pairs and must be accessible with single-digit millisecond latency from any region. Game session logs are stored as JSON documents with varying fields (session ID, player actions, timestamps) and must be queryable by player ID and timestamp range using SQL-like syntax. The company wants to use a single Azure database service for both workloads. Which combination of Azure Cosmos DB APIs should they choose?

A.Table API for profiles and SQL API for logs

B.SQL API for both profiles and logs

C.MongoDB API for profiles and Cassandra API for logs

D.Table API for both profiles and logs

AnswerA

The Table API provides key-value storage with single-digit millisecond latencies, ideal for player profiles. The SQL API supports JSON documents and full SQL query syntax, perfect for querying session logs by player ID and timestamp.

Why this answer

Option A is correct because the Table API provides a simple key-value store ideal for low-latency profile lookups, while the SQL API supports querying JSON documents with SQL-like syntax, enabling efficient queries on game session logs by player ID and timestamp range. This combination meets both workloads within a single Azure Cosmos DB account.

Exam trap

The trap here is that candidates assume a single API must serve both workloads, overlooking Azure Cosmos DB's ability to host multiple APIs in one account, and they may incorrectly choose the SQL API for both because it is the most versatile, ignoring the cost and simplicity benefits of the Table API for key-value data.

How to eliminate wrong answers

Option B is wrong because using the SQL API for both profiles and logs would work but is not the optimal choice; the Table API is more cost-effective and simpler for key-value profile data, and the question asks for the 'best' combination, not just a possible one. Option C is wrong because the MongoDB API is designed for document stores with MongoDB query syntax, not simple key-value pairs, and the Cassandra API is a wide-column store that does not natively support SQL-like queries on JSON documents. Option D is wrong because the Table API cannot query JSON documents with varying fields using SQL-like syntax, as it only supports simple key-value lookups by partition and row keys.

Full explanation →

824

MCQeasy

A social media application displays the number of posts each user has created. After a user submits a new post, the count must reflect the update across all servers within a few seconds. Which data consistency model best describes this requirement?

A.Strong consistency

B.Eventual consistency

C.Sequential consistency

D.Causal consistency

AnswerB

Eventual consistency allows updates to propagate asynchronously to replicas, guaranteeing that if no further updates occur, all replicas will return the same value after a short period. This matches the requirement of reflecting the update within a few seconds.

Why this answer

Eventual consistency is correct because the requirement allows a few seconds for the update to propagate across all servers, meaning the system does not guarantee immediate uniformity but will converge to the same count eventually. This is typical in distributed systems like social media applications where high availability and partition tolerance are prioritized over immediate consistency, often using techniques like asynchronous replication.

Exam trap

The trap here is that candidates confuse 'eventual consistency' with 'weak consistency' or assume that any delay means strong consistency is required, but the key is the explicit tolerance of a few seconds, which aligns with eventual consistency's convergence guarantee.

How to eliminate wrong answers

Option A is wrong because strong consistency would require all servers to reflect the new post count immediately upon write, which conflicts with the 'within a few seconds' tolerance and would impose performance penalties in a distributed system. Option C is wrong because sequential consistency ensures operations appear in a global order consistent with program order, which is stricter than needed and not typically used for simple count updates across servers. Option D is wrong because causal consistency preserves the order of causally related events, which is unnecessary for a simple counter update that has no causal dependencies with other operations.

Full explanation →

825

MCQhard

A company is migrating an on-premises SQL Server database to Azure SQL Managed Instance. The database has a large fact table that is partitioned by date (monthly partitions) to improve query performance and simplify data archiving. The company wants to maintain the same partitioning strategy in Azure to avoid rewriting queries. Which feature in Azure SQL Managed Instance should they use to achieve this?

A.Table partitioning with partition functions and schemes

B.Sharding across multiple Azure SQL Managed Instances

C.Index partitioning only

D.Federated tables

AnswerA

Correct. Azure SQL Managed Instance supports the same table partitioning as SQL Server, allowing you to define partitions on a table using a partition function and scheme.

Why this answer

Azure SQL Managed Instance supports table partitioning using partition functions and partition schemes, which is the same feature available in SQL Server. This allows you to define monthly partitions on the fact table using a date column, preserving the existing partitioning strategy and query logic without modification. The partition function maps rows to partitions based on the date boundary values, and the partition scheme assigns those partitions to filegroups.

Exam trap

The trap here is that candidates confuse table partitioning with sharding or index partitioning, assuming any form of data distribution will work, but only table partitioning with partition functions and schemes preserves the exact same structure and query semantics in Azure SQL Managed Instance.

How to eliminate wrong answers

Option B is wrong because sharding distributes data across multiple databases or instances, which would require rewriting queries and does not maintain the same partitioning strategy within a single database. Option C is wrong because index partitioning only applies to indexes, not to the table itself, and cannot achieve the goal of partitioning the fact table by date for query performance and archiving. Option D is wrong because federated tables are a legacy feature in SQL Server (deprecated) and are not supported in Azure SQL Managed Instance; they involve distributed queries across remote servers, not native table partitioning.

Full explanation →

Microsoft Azure Data Fundamentals DP-900 (DP-900) — Questions 751–825