Microsoft Azure Data Fundamentals DP-900 (DP-900) — Questions 901975

982 questions total · 14pages · All types, answers revealed

Page 12

Page 13 of 14

Page 14
901
MCQhard

Your company stores sensitive customer data in Azure SQL Database. You need to implement column-level encryption for the 'SSN' column using a customer-managed key stored in Azure Key Vault. Which feature should you use?

A.Azure Policy
B.Always Encrypted
C.Transparent Data Encryption (TDE)
D.Dynamic Data Masking
AnswerB

Always Encrypted provides column-level encryption with customer-managed keys.

Why this answer

Always Encrypted is the correct feature because it allows client-side encryption of sensitive columns, such as 'SSN', using a customer-managed key stored in Azure Key Vault. The encryption keys are never exposed to the database engine, ensuring that even database administrators cannot view the plaintext data. This meets the requirement for column-level encryption with customer-managed keys.

Exam trap

The trap here is that candidates often confuse Transparent Data Encryption (TDE) with column-level encryption, but TDE only protects data at rest and does not prevent database administrators or the cloud provider from reading the data in memory or during queries.

How to eliminate wrong answers

Option A is wrong because Azure Policy is a governance tool used to enforce organizational standards and compliance rules across Azure resources, not a data encryption feature for individual columns. Option C is wrong because Transparent Data Encryption (TDE) encrypts the entire database at rest (the storage layer), not at the column level, and it does not support customer-managed keys for column-specific encryption. Option D is wrong because Dynamic Data Masking obfuscates data at query time for unauthorized users but does not encrypt the underlying data; the masked values are still stored in plaintext and can be accessed by privileged users.

902
MCQmedium

A company runs a critical OLTP application on a single Azure SQL Database in the West US region. They need to ensure high availability with automatic failover to a secondary region in case of a regional outage. The solution must minimize data loss and allow the secondary database to be readable for reporting queries when the primary is healthy. Which feature should they enable?

A.Failover groups with automatic failover policy
B.Transparent Data Encryption (TDE)
C.Long-term backup retention (LTR)
D.Auto-tuning
AnswerA

Correct. Failover groups provide geo-replication with a readable secondary and can be configured for automatic failover, meeting all requirements.

Why this answer

Failover groups with an automatic failover policy enable Azure SQL Database to replicate data synchronously to a secondary region, ensuring zero data loss during a planned failover and minimal data loss during an unplanned failover. The secondary database is readable for reporting queries when the primary is healthy, meeting both the high availability and read-scale requirements. This feature is specifically designed for cross-region disaster recovery with automatic failover, unlike other options that address security, backup retention, or performance tuning.

Exam trap

The trap here is that candidates may confuse high-availability features within a single region (like zone-redundant configuration) with cross-region disaster recovery, or assume that backup retention or encryption can provide failover capabilities.

How to eliminate wrong answers

Option B is wrong because Transparent Data Encryption (TDE) provides encryption at rest for the database but does not offer any replication, failover, or read-scale capabilities. Option C is wrong because Long-term backup retention (LTR) allows you to keep full backups for up to 10 years for compliance, but it does not provide automatic failover or a readable secondary database for reporting. Option D is wrong because Auto-tuning uses machine learning to optimize query performance and index management, but it has no role in high availability or disaster recovery.

903
MCQmedium

A company uses Azure SQL Database for an HR system. The Employees table has a clustered index on EmployeeID. Queries frequently filter on DepartmentID and LastName and also retrieve the Salary column. The table contains over a million rows. Which index strategy will most improve query performance for these filters?

A.A: Create a nonclustered index on (DepartmentID, LastName) INCLUDE (Salary)
B.B: Create a nonclustered index on LastName only
C.C: Create a nonclustered index on DepartmentID and another nonclustered index on LastName
D.D: Change the clustered index to (DepartmentID, LastName)
AnswerA

This composite covering index includes all columns needed for the query, avoiding expensive key lookups into the clustered index.

Why this answer

Option A is correct because it creates a covering nonclustered index on the filter columns (DepartmentID, LastName) and includes the Salary column as an included column. This allows the query to be fully satisfied by scanning only the nonclustered index pages, avoiding key lookups to the clustered index. The order of columns in the index key matches the query filter pattern, maximizing seek efficiency.

Exam trap

The trap here is that candidates often think separate single-column indexes are sufficient for multi-column filters, but they overlook the need for a covering composite index to avoid expensive key lookups or index intersection operations.

How to eliminate wrong answers

Option B is wrong because an index on LastName only does not help with filtering on DepartmentID, forcing a residual predicate or full scan of the clustered index for DepartmentID lookups. Option C is wrong because separate indexes on DepartmentID and LastName would require SQL Server to either use one index and then perform key lookups for the other filter, or use both with an expensive index intersection operation, neither of which is as efficient as a single composite index. Option D is wrong because changing the clustered index to (DepartmentID, LastName) would reorder the entire table physically, which could degrade performance for the existing EmployeeID-based lookups and other queries that rely on the current clustered key order.

904
MCQmedium

A company uses Azure Synapse Analytics dedicated SQL pool for its data warehouse. Every night, they need to load 500 GB of new sales data from CSV files stored in Azure Data Lake Storage Gen2. The loading process must be automated, scheduled, and include error handling (e.g., skip corrupt rows and log them). Which Azure service should be used to orchestrate this load pipeline?

A.Azure Data Factory
B.Azure Stream Analytics
C.Azure HDInsight
D.Azure Logic Apps
AnswerA

Correct. Azure Data Factory can orchestrate the copy activity on a schedule, handle errors with custom logic, and scale to move 500 GB daily.

Why this answer

Azure Data Factory (ADF) is the correct choice because it is a cloud-based ETL and data orchestration service that supports scheduled execution, error handling (e.g., skipping corrupt rows via fault tolerance settings in the Copy activity), and native integration with Azure Data Lake Storage Gen2 and Azure Synapse dedicated SQL pool. ADF can automate the nightly 500 GB load using a trigger, and its mapping data flows or Copy activity can log errors to a separate file or table, meeting the requirement for automated, scheduled, and error-tolerant ingestion.

Exam trap

The trap here is that candidates often confuse Azure Data Factory with Azure Logic Apps because both support scheduling and automation, but Logic Apps lacks the native data movement capabilities and fault tolerance for large-scale batch ETL workloads like loading 500 GB into a dedicated SQL pool.

How to eliminate wrong answers

Option B (Azure Stream Analytics) is wrong because it is designed for real-time stream processing of data from sources like Event Hubs or IoT Hub, not for scheduled batch loading of large CSV files from ADLS Gen2 into a data warehouse. Option C (Azure HDInsight) is wrong because it is a managed big data analytics platform for running Hadoop, Spark, or Hive jobs, but it lacks built-in scheduling and orchestration capabilities for nightly loads and requires custom coding for error handling, making it overly complex compared to ADF. Option D (Azure Logic Apps) is wrong because while it can automate workflows and handle scheduling, it is optimized for lightweight integration and API-based triggers, not for orchestrating large-scale data movement (500 GB) with native fault tolerance and direct connectivity to Synapse dedicated SQL pool.

905
MCQeasy

Refer to the exhibit. The JSON shows a configuration for which Azure service?

A.Azure Analysis Services
B.Azure Data Factory
C.Power BI
D.Azure Synapse Analytics
AnswerB

Data Factory defines linked services and datasets in JSON.

Why this answer

The JSON snippet defines a pipeline with a copy activity that moves data from a source (Azure Blob Storage) to a sink (Azure SQL Database). This is the core pattern of Azure Data Factory (ADF), which orchestrates and automates data movement and transformation. The structure with 'name', 'properties', 'activities', 'typeProperties', 'source', and 'sink' is specific to ADF pipeline definitions.

Exam trap

The trap here is that candidates confuse the JSON pipeline definition with Azure Synapse Analytics pipelines, which share the same underlying engine but are accessed via a different portal and have additional Synapse-specific features like Spark job definitions and SQL script activities.

How to eliminate wrong answers

Option A is wrong because Azure Analysis Services is a semantic model and analytics engine (using Tabular or Multidimensional models), not a data orchestration service; it does not use JSON pipeline definitions with copy activities. Option C is wrong because Power BI is a visualization and reporting tool that uses datasets and dashboards, not JSON-based pipeline definitions with source/sink configurations. Option D is wrong because Azure Synapse Analytics is a unified analytics platform that includes dedicated SQL pools, serverless SQL, and Spark, but its native pipeline definitions (Synapse Pipelines) are derived from ADF; the exhibit shows a generic ADF pipeline JSON, not a Synapse-specific artifact like a SQL script or Spark job.

906
MCQmedium

A company needs to store order data for an e-commerce platform. The system requires high concurrency, fast inserts, and the ability to enforce referential integrity between tables (e.g., Customers and Orders). Which Azure service should they use?

A.Azure SQL Database
B.Azure Cosmos DB
C.Azure Blob Storage
D.Azure Data Lake Storage Gen2
AnswerA

Correct. Azure SQL Database is a fully managed relational database that supports ACID transactions, foreign keys, and high concurrency, making it ideal for transactional e-commerce order data.

Why this answer

Azure SQL Database is a fully managed relational database service that supports high concurrency, fast inserts, and enforces referential integrity through foreign key constraints. It provides ACID transactions and row-level locking to handle concurrent writes efficiently, making it ideal for e-commerce order processing where data consistency between Customers and Orders tables is critical.

Exam trap

The trap here is that candidates confuse high concurrency and fast inserts with NoSQL solutions like Cosmos DB, overlooking the explicit requirement for referential integrity which only a relational database like Azure SQL Database can enforce.

How to eliminate wrong answers

Option B is wrong because Azure Cosmos DB is a NoSQL database that does not enforce referential integrity between tables (it uses flexible schemas and lacks foreign key constraints). Option C is wrong because Azure Blob Storage is an object storage service for unstructured data (e.g., images, backups) and cannot enforce relational integrity or support SQL joins. Option D is wrong because Azure Data Lake Storage Gen2 is a hierarchical file system for big data analytics, not a transactional database, and it lacks support for referential integrity and high-concurrency row-level inserts.

907
MCQmedium

A logistics company receives real-time GPS tracking data from its delivery fleet via Azure Event Hubs. The data is a continuous stream of location updates (vehicle ID, latitude, longitude, timestamp). Additionally, the company has daily static route plan files in CSV format stored in Azure Data Lake Storage Gen2. The operations team needs to combine the live GPS stream with the route plans to create a near real-time dashboard showing if delivery vehicles are on schedule. They also want to run historical queries on both the stream data and route plans using T-SQL, without moving the data to another store. Which Azure service should they use as the primary analytics platform?

A.Azure Synapse Analytics
B.Azure Stream Analytics
C.Azure Data Factory
D.Azure Databricks
AnswerA

Correct. Azure Synapse Analytics provides a unified platform for streaming ingestion (via pipelines), batch data in Data Lake, and T-SQL querying over both hot and cold data using serverless SQL pools. It supports near real-time dashboards and historical analysis.

Why this answer

Azure Synapse Analytics is the correct choice because it provides a unified analytics platform that can ingest real-time streaming data from Azure Event Hubs and combine it with static data in Azure Data Lake Storage Gen2. It supports T-SQL queries directly against both the live stream (via Synapse Pipelines or SQL Serverless) and the CSV files in the lake, enabling near real-time dashboards and historical analysis without moving data to another store.

Exam trap

The trap here is that candidates often confuse Azure Stream Analytics as the primary analytics platform because it handles real-time streams, but they overlook the requirement for T-SQL-based historical queries on stored data, which only Azure Synapse Analytics supports natively without moving data.

How to eliminate wrong answers

Option B is wrong because Azure Stream Analytics is a real-time stream processing engine that can query live GPS data and route plans, but it does not support T-SQL-based historical queries on stored data; it is designed for continuous streaming jobs, not ad-hoc T-SQL analytics. Option C is wrong because Azure Data Factory is an orchestration and data movement service, not an analytics platform; it cannot run T-SQL queries directly on the data or provide a dashboard. Option D is wrong because Azure Databricks is a big data analytics platform based on Apache Spark, which does not natively support T-SQL queries; it uses Spark SQL or Python, not T-SQL, and would require additional configuration to enable T-SQL compatibility.

908
MCQmedium

A smart home company stores device telemetry in Azure Cosmos DB using the NoSQL API. Each document contains: deviceId (string), timestamp (datetime), temperature (float), humidity (float). The most common query retrieves all documents for a specific deviceId within a time range, ordered by timestamp descending. This query performs well. However, a new query that finds devices with temperature > 50 in the last hour (without specifying deviceId) is extremely slow and consumes many request units (RUs). What is the most likely cause?

A.The temperature field is not indexed by default, so the query forces a full scan of all documents.
B.The query does not specify the partition key (deviceId), causing a cross-partition query that scans every physical partition.
C.The time range filter on timestamp cannot be combined with the temperature filter efficiently.
D.The default indexing policy only indexes strings and numbers as range indexes, but temperature is stored as a number and is indexed.
AnswerB

Cosmos DB uses partition key (deviceId) to distribute data. When a query does not include the partition key, it must fan out to all partitions, which is slow and expensive in RUs.

Why this answer

In Azure Cosmos DB NoSQL API, the partition key (deviceId) determines data distribution across physical partitions. Queries that do not include the partition key in the filter become cross-partition queries, which must fan out to every physical partition, scanning all documents. This is extremely slow and consumes many RUs, especially in large containers.

The original query specifying deviceId performs well because it targets a single partition.

Exam trap

The trap here is that candidates assume the slowness is due to a missing index on temperature, but Azure Cosmos DB automatically indexes all fields by default, so the real issue is the cross-partition query caused by omitting the partition key.

How to eliminate wrong answers

Option A is wrong because all fields in Azure Cosmos DB are automatically indexed by default, including the temperature field, so a missing index is not the cause. Option C is wrong because Azure Cosmos DB can combine filters on timestamp and temperature efficiently using its indexing; the slowness is due to the missing partition key, not the combination of filters. Option D is wrong because the default indexing policy does index numbers as range indexes, so temperature is indeed indexed; this statement is factually incorrect.

909
MCQeasy

You are designing a solution to store relational data that requires support for graph relationships and JSON queries. Which Azure service should you choose?

A.Azure Cosmos DB
B.Azure Table Storage
C.Azure SQL Database
D.Azure Database for PostgreSQL
AnswerC

Azure SQL Database supports graph tables and JSON queries.

Why this answer

Azure SQL Database is the correct choice because it natively supports graph relationships through graph tables (NODE and EDGE tables) and JSON queries via built-in JSON functions like JSON_VALUE, JSON_QUERY, and OPENJSON. This makes it ideal for storing relational data that also needs to handle graph traversals and semi-structured JSON data without requiring a separate service.

Exam trap

The trap here is that candidates often assume graph and JSON support require a NoSQL database like Cosmos DB, but Azure SQL Database provides both features within a relational model, which is the key distinction tested in DP-900.

How to eliminate wrong answers

Option A is wrong because Azure Cosmos DB is a NoSQL database that, while supporting graph APIs (Gremlin) and JSON natively, is not designed for strict relational data with ACID transactions across multiple tables. Option B is wrong because Azure Table Storage is a key-value store that lacks support for relational schemas, graph relationships, and JSON query capabilities. Option D is wrong because Azure Database for PostgreSQL, while supporting JSONB and graph extensions like Apache AGE, is not the primary Azure service for relational data with built-in graph and JSON support; Azure SQL Database offers tighter integration with Azure ecosystem features like elastic pools and built-in graph tables.

910
MCQeasy

A data engineer needs to process streaming data from IoT devices in near real-time and store the results in Azure Cosmos DB. Which Azure service should they use for the stream processing?

A.Azure Synapse Analytics
B.Azure Databricks
C.Azure Stream Analytics
D.Azure Data Factory
AnswerC

Stream Analytics provides near real-time stream processing with native Cosmos DB sink.

Why this answer

Azure Stream Analytics is the correct choice because it is a fully managed, real-time stream processing engine designed specifically for low-latency, near-real-time analytics on streaming data. It can ingest data from IoT devices via Event Hubs or IoT Hub, apply SQL-based transformations, and directly output the results to Azure Cosmos DB with millisecond latency, making it ideal for this scenario.

Exam trap

The trap here is that candidates often confuse Azure Stream Analytics with Azure Data Factory or Azure Databricks, mistakenly thinking that any 'data processing' tool can handle real-time streaming, but only Stream Analytics is purpose-built for near-real-time, serverless stream processing with direct Cosmos DB integration.

How to eliminate wrong answers

Option A is wrong because Azure Synapse Analytics is a unified analytics platform focused on large-scale batch processing and data warehousing, not real-time stream processing; it lacks native support for continuous streaming queries with sub-second latency. Option B is wrong because Azure Databricks is a big data and machine learning platform that can process streaming data via Structured Streaming, but it requires cluster management and is overkill for simple near-real-time IoT processing; it is not the simplest or most cost-effective choice for direct Cosmos DB output. Option D is wrong because Azure Data Factory is a cloud-based ETL and data integration service designed for batch-oriented data movement and orchestration, not for real-time stream processing; it cannot handle continuous, low-latency streaming workloads.

911
MCQmedium

A media company needs to store millions of high-resolution photos for a public website. Each photo can be up to 50 MB. The storage solution must support secure access via URLs. Which Azure service should they use?

A.Azure Table Storage
B.Azure Blob Storage
C.Azure Files
D.Azure SQL Database
AnswerB

Blob Storage is optimized for storing large amounts of unstructured binary data and supports direct URL access.

Why this answer

Azure Blob Storage is the correct choice because it is designed for storing massive amounts of unstructured data, such as high-resolution photos, and supports objects up to 4.7 TB per blob, easily accommodating 50 MB files. It provides secure access via URLs using shared access signatures (SAS) or public access levels, making it ideal for a public website serving media content.

Exam trap

The trap here is that candidates often confuse Azure Files with Blob Storage because both can store files, but Azure Files uses SMB protocol for network file shares, not HTTP/HTTPS URL-based access for public web serving.

How to eliminate wrong answers

Option A is wrong because Azure Table Storage is a NoSQL key-value store for structured, non-relational data, not for large binary files like photos. Option C is wrong because Azure Files provides fully managed file shares using SMB protocol, designed for shared file access across VMs or on-premises, not for serving public web content via URLs. Option D is wrong because Azure SQL Database is a relational database service for structured data with schemas, not for storing large binary objects like photos, and it lacks native URL-based access for public distribution.

912
MCQeasy

A company receives data from a point-of-sale system. Each row contains TransactionID, ProductID, Quantity, and Price. The data has a fixed schema and is stored in a table. How should this data be classified?

A.Structured data
B.Semi-structured data
C.Unstructured data
D.Transactional data
AnswerA

Correct. The data has a fixed schema (columns TransactionID, ProductID, Quantity, Price) and is stored in a table, which is the definition of structured data.

Why this answer

The data has a fixed schema with clearly defined columns (TransactionID, ProductID, Quantity, Price) and each row follows the same structure, which is the definition of structured data. In Azure, this would map directly to a table in Azure SQL Database or a fixed-schema table in Azure Synapse Analytics. The rigid schema and consistent data types make it ideal for relational storage and querying.

Exam trap

The trap here is that candidates confuse 'transactional data' (a workload pattern) with 'structured data' (a data classification), leading them to pick Option D because the data comes from a point-of-sale system, but the question explicitly asks about data structure, not data source or usage.

How to eliminate wrong answers

Option B is wrong because semi-structured data (e.g., JSON, XML, Parquet) does not enforce a fixed schema; fields can vary between rows, unlike this rigid table. Option C is wrong because unstructured data (e.g., images, videos, text files) has no predefined schema or organization, whereas this data has a strict columnar structure. Option D is wrong because 'transactional data' describes a workload type (OLTP) or data generated by transactions, not a classification of data structure; the question asks how the data should be classified by structure, not by its source or usage.

913
MCQeasy

A retail company stores customer data in three formats: a relational database table with fixed columns for CustomerID, Name, and Email; customer feedback as JSON documents with varying fields such as rating and comment; and product images as JPEG files. Which of the following correctly classifies these data types from most structured to least structured?

A.JSON documents, relational table, image files
B.Relational table, JSON documents, image files
C.Image files, JSON documents, relational table
D.Relational table, image files, JSON documents
AnswerB

Correct. Relational tables have a fixed schema (structured), JSON documents allow varying fields (semi-structured), and image files lack a predefined schema (unstructured).

Why this answer

Option B is correct because relational tables enforce a fixed schema with defined columns and data types, making them the most structured. JSON documents are semi-structured, allowing varying fields and flexible schemas, while image files are unstructured binary data with no inherent schema. This ordering from most to least structured aligns with the core data classification concept in the DP-900 exam.

Exam trap

The trap here is that candidates often confuse semi-structured JSON with unstructured data, or assume that any file format (like images) has inherent structure, leading them to misorder the classification from most to least structured.

How to eliminate wrong answers

Option A is wrong because it incorrectly places JSON documents (semi-structured) as more structured than relational tables (structured), reversing the correct order. Option C is wrong because it lists image files (unstructured) as the most structured, which is a fundamental misunderstanding of data classification. Option D is wrong because it places image files (unstructured) above JSON documents (semi-structured), failing to recognize that semi-structured data has more organization than unstructured binary files.

914
MCQhard

A company is migrating an on-premises SQL Server database to Azure. The database is 800 GB, uses SQL Server Agent jobs for scheduled tasks, and needs to link to another on-premises SQL Server instance via linked servers. The company wants a fully managed service with minimal application changes. Which Azure SQL service should they choose?

A.Azure SQL Database elastic pool
B.Azure SQL Database single database
C.Azure SQL Managed Instance
D.Azure Synapse Analytics dedicated SQL pool
AnswerC

Correct because it supports SQL Agent, linked servers, and is fully managed with high compatibility.

Why this answer

Azure SQL Managed Instance is correct because it provides near 100% compatibility with on-premises SQL Server, including support for SQL Server Agent jobs and linked servers, while being a fully managed platform-as-a-service (PaaS) offering. This allows the company to migrate the 800 GB database with minimal application changes, as it preserves the existing instance-level features without requiring a rearchitecture.

Exam trap

The trap here is that candidates often confuse Azure SQL Database (single or elastic pool) with Azure SQL Managed Instance, assuming all PaaS offerings support SQL Server Agent and linked servers, when in fact only Managed Instance provides these instance-scoped features.

How to eliminate wrong answers

Option A is wrong because Azure SQL Database elastic pool is a multi-tenant resource-sharing model for single databases, which does not support SQL Server Agent jobs or linked servers, and would require significant application changes to remove these dependencies. Option B is wrong because Azure SQL Database single database also lacks support for SQL Server Agent jobs and linked servers, forcing the company to refactor scheduled tasks and cross-instance queries. Option D is wrong because Azure Synapse Analytics dedicated SQL pool is a massively parallel processing (MPP) data warehouse designed for large-scale analytics, not for OLTP workloads, and it does not support SQL Server Agent jobs or linked servers, making it incompatible with the migration requirements.

915
Multi-Selecteasy

Which TWO Azure services are designed for big data batch processing?

Select 2 answers
A.Azure Databricks
B.Azure Data Explorer
C.Azure Stream Analytics
D.Azure Analysis Services
E.Azure HDInsight
AnswersA, E

Batch and streaming using Spark.

Why this answer

Azure Databricks is correct because it provides an Apache Spark-based analytics platform optimized for batch processing large datasets, enabling ETL, data transformation, and machine learning at scale. It uses distributed computing to process data in parallel across clusters, making it ideal for big data batch workloads.

Exam trap

The trap here is that candidates often confuse real-time analytics services (like Stream Analytics or Data Explorer) with batch processing services, or mistakenly think Analysis Services handles raw big data processing when it is actually a presentation layer for pre-aggregated data.

916
MCQeasy

You need to store JSON documents for a web application that requires low-latency reads and writes globally. The data has no fixed schema. Which Azure service should you use?

A.Azure Table Storage
B.Azure Blob Storage
C.Azure SQL Database
D.Azure Cosmos DB
AnswerD

Cosmos DB provides global distribution, low latency, and native JSON support.

Why this answer

Azure Cosmos DB is a globally distributed, multi-model database that natively supports JSON documents with low-latency reads and writes. Option A is wrong because Azure Blob Storage is not optimized for low-latency document queries. Option B is wrong because Azure Table Storage is a key-value store, not ideal for complex JSON queries.

Option D is wrong because Azure SQL Database is relational and requires a fixed schema.

917
MCQmedium

A company is designing a data analytics solution. They need to store large volumes of raw data in its native format and support schema-on-read for data science exploration. Which storage technology should they use?

A.Azure Data Lake Storage Gen2
B.Azure Blob Storage
C.Azure Cosmos DB
D.Azure SQL Database
AnswerA

ADLS Gen2 is a scalable data lake that supports schema-on-read and stores raw data.

Why this answer

Azure Data Lake Storage Gen2 (ADLS Gen2) is the correct choice because it combines a hierarchical namespace with Azure Blob Storage's scalable object storage, allowing raw data to be stored in its native format (e.g., CSV, JSON, Parquet) without transformation. It supports schema-on-read, meaning the schema is applied at query time (e.g., via Apache Spark or Azure Synapse SQL), which is ideal for data science exploration where the data structure may not be predefined.

Exam trap

The trap here is that candidates often confuse Azure Blob Storage with ADLS Gen2 because both store objects, but Blob Storage lacks the hierarchical namespace and native schema-on-read support required for data science exploration, making it unsuitable for this specific analytics workload.

How to eliminate wrong answers

Option B (Azure Blob Storage) is wrong because while it can store raw data, it lacks a hierarchical namespace and native schema-on-read capabilities; it is optimized for unstructured object storage and requires additional services (like Azure Data Lake Analytics) to enable schema-on-read. Option C (Azure Cosmos DB) is wrong because it is a NoSQL database designed for low-latency transactional workloads with a fixed schema (or flexible schema via JSON), not for storing large volumes of raw data in native format for ad-hoc analytics. Option D (Azure SQL Database) is wrong because it is a relational database that enforces a rigid schema (schema-on-write), requiring data to be transformed and loaded before querying, which contradicts the need for schema-on-read and raw data storage.

918
MCQmedium

A company uses Azure SQL Database for an order management system. The Orders table has millions of rows with columns: OrderID (primary key, clustered), CustomerID, OrderDate, Status (e.g., 'Shipped', 'Pending'), TotalAmount. Queries frequently filter on OrderDate and Status, and sort results by OrderDate in descending order. They return several columns including TotalAmount. Which indexing strategy will most improve query performance?

A.Create a nonclustered index on (OrderDate DESC, Status) INCLUDE (CustomerID, TotalAmount)
B.Create a clustered index on (OrderDate, Status)
C.Create a nonclustered index on (Status) only
D.Create a columnstore index on the entire table
AnswerA

Correct. This index matches the filter columns in the correct order and includes the ORDER BY direction. Included columns cover additional columns needed, making it a covering index for many queries.

Why this answer

Option A is correct because it creates a covering nonclustered index that matches the query's filter and sort order exactly. The index on (OrderDate DESC, Status) allows SQL Server to seek directly on OrderDate and Status, and the descending order avoids a sort operation for the ORDER BY OrderDate DESC clause. Including CustomerID and TotalAmount as included columns makes this a covering index, so the query can be satisfied entirely from the index without key lookups to the clustered index, which is critical for performance on a table with millions of rows.

Exam trap

The trap here is that candidates often think a clustered index on the filter columns is always best, but they forget that the clustered index already exists on OrderID and that a covering nonclustered index with included columns is more efficient for specific query patterns without disrupting the existing table structure.

How to eliminate wrong answers

Option B is wrong because a clustered index defines the physical order of the table, and changing it to (OrderDate, Status) would require rebuilding the entire table, potentially causing fragmentation and performance degradation for other queries that rely on the OrderID primary key. Additionally, a clustered index on (OrderDate, Status) would not be as efficient for the specific filter and sort pattern because it does not include the descending order and would still require a sort operation. Option C is wrong because a nonclustered index on (Status) only would allow seeking on Status, but it would not help with the OrderDate filter or the ORDER BY OrderDate DESC clause, leading to residual predicate filtering and a sort operation, and it would not cover the query, requiring expensive key lookups for the other columns.

Option D is wrong because a columnstore index is designed for large-scale analytical queries and data warehousing workloads, not for high-frequency transactional queries that filter and sort on specific columns; it would introduce overhead for point lookups and small-range scans, and the query pattern described is OLTP, not analytical.

919
MCQmedium

You are deploying the above ARM template snippet for a storage account. What is the effect of setting 'isHnsEnabled' to true?

A.Enables Azure Data Lake Storage Gen2.
B.Enables Azure Blob Storage lifecycle management.
C.Enables geo-redundant storage (GRS).
D.Enables Azure Files share.
AnswerA

HNS is the feature that enables ADLS Gen2.

Why this answer

Setting 'isHnsEnabled' to true enables the Hierarchical Namespace (HNS) feature on the Azure Storage account, which is the core requirement for Azure Data Lake Storage Gen2. This allows the storage account to support a file system-like directory structure with POSIX-compliant access control lists, enabling analytics workloads to use both blob and file system semantics.

Exam trap

The trap here is that candidates confuse 'isHnsEnabled' with enabling a general 'data lake' feature, but it specifically enables the Hierarchical Namespace, which is the fundamental difference between Azure Blob Storage and Azure Data Lake Storage Gen2.

How to eliminate wrong answers

Option B is wrong because lifecycle management is a separate feature enabled via the 'LifecycleManagement' policy on a storage account, not by setting 'isHnsEnabled'. Option C is wrong because geo-redundant storage (GRS) is a replication setting configured via the 'sku.name' property (e.g., 'Standard_GRS'), not via 'isHnsEnabled'. Option D is wrong because Azure Files shares are enabled by creating a file share resource within a storage account, not by enabling the Hierarchical Namespace; in fact, enabling HNS on a storage account prevents the creation of Azure file shares in that account.

920
MCQeasy

A company stores customer records in a relational table with columns like CustomerID, Name, and Email. Product reviews are stored as JSON documents, and marketing images are stored as PNG files. Which of the following correctly orders these data types from most structured to least structured?

A.A. Product reviews, Customer records, Marketing images
B.B. Customer records, Product reviews, Marketing images
C.C. Marketing images, Customer records, Product reviews
D.D. Customer records, Marketing images, Product reviews
AnswerB

Customer records in a relational table are strictly structured (fixed schema), product reviews as JSON are semi-structured (schema-on-read), and marketing images are unstructured (binary files). This is the correct order from most to least structured.

Why this answer

Customer records in a relational table have a fixed schema with defined columns (e.g., CustomerID, Name, Email), making them the most structured. Product reviews stored as JSON documents are semi-structured because they have a flexible schema with key-value pairs but no fixed columns. Marketing images as PNG files are unstructured binary data with no inherent schema.

Option B correctly orders these from most to least structured.

Exam trap

The trap here is that candidates often confuse semi-structured JSON with unstructured data, or assume that any file format (like PNG) has inherent structure, leading them to misorder the data types by perceived complexity rather than schema rigidity.

How to eliminate wrong answers

Option A is wrong because it places product reviews (semi-structured JSON) before customer records (structured relational table), incorrectly suggesting JSON is more structured than a fixed-schema table. Option C is wrong because it lists marketing images (unstructured binary) as the most structured, which is the opposite of the correct order. Option D is wrong because it places marketing images (unstructured) before product reviews (semi-structured), failing to recognize that JSON documents have more structure than raw binary files.

921
MCQmedium

A social media platform stores user profiles as JSON documents where each profile can have different attributes (e.g., education, work history, interests). The platform also needs to traverse friend connections to recommend new connections using graph queries. The development team wants to use a single Azure Cosmos DB account for both workloads while minimizing complexity. Which combination of Azure Cosmos DB APIs should they choose?

A.SQL API and Gremlin API
B.MongoDB API and Table API
C.Cassandra API and SQL API
D.Gremlin API only
AnswerA

Correct. The SQL API handles JSON document storage and querying, while the Gremlin API enables graph traversal for friend recommendations. Both can be used in the same account.

Why this answer

The correct choice is A because the SQL API provides native JSON document storage and querying for user profiles, while the Gremlin API supports graph traversal for friend connections. Using both APIs in a single Azure Cosmos DB account minimizes complexity by allowing each workload to use the most appropriate data model without managing separate databases.

Exam trap

The trap here is that candidates may think a single API must serve all workloads, but Azure Cosmos DB supports multiple APIs per account, and the question specifically asks for a combination that minimizes complexity while meeting both document and graph requirements.

How to eliminate wrong answers

Option B is wrong because the MongoDB API is for document storage but uses MongoDB wire protocol, not native JSON, and the Table API is for key-value/wide-column data, neither of which supports graph queries. Option C is wrong because the Cassandra API is for wide-column data and the SQL API is for documents, but neither provides graph traversal capabilities needed for friend connections. Option D is wrong because the Gremlin API alone supports graph queries but cannot efficiently handle the flexible JSON document storage required for user profiles with varying attributes.

922
MCQmedium

A retail company uses Azure SQL Database to store a large fact table of sales transactions with millions of rows. They run complex aggregate queries (SUM, COUNT, AVG) across many rows for monthly reports. These queries take too long. Which index type should they add to the table to improve performance?

A.Clustered B-tree index
B.Nonclustered rowstore index
C.Clustered columnstore index
D.Nonclustered columnstore index
AnswerC

Correct. A clustered columnstore index is ideal for data warehousing and analytical workloads, significantly improving aggregate query performance.

Why this answer

Clustered columnstore indexes are optimized for large fact tables and analytical workloads because they store data in a columnar format, which significantly reduces the amount of data read from disk for aggregate queries like SUM, COUNT, and AVG. This index type also uses batch processing and compression to accelerate query performance on millions of rows, making it ideal for monthly reporting queries.

Exam trap

The trap here is that candidates often confuse nonclustered columnstore indexes with clustered columnstore indexes, assuming any columnstore index will suffice, but only the clustered version is designed for large fact tables with heavy aggregation workloads and avoids the overhead of maintaining a separate rowstore index.

How to eliminate wrong answers

Option A is wrong because a clustered B-tree index stores data in row-based pages and is optimized for point lookups and range scans, not for large-scale aggregations across millions of rows. Option B is wrong because a nonclustered rowstore index also uses a row-based structure and, while it can cover some queries, it does not provide the columnar compression and batch processing needed for efficient aggregate operations. Option D is wrong because a nonclustered columnstore index is a secondary index that requires the table to have a separate clustered index, adding overhead for writes and not being as efficient as a clustered columnstore index for full-table scans and aggregations.

923
MCQeasy

You need to store a collection of JSON documents that contain user profile data. The data is frequently queried by user ID and by email address. The solution must support indexing on multiple fields and provide low-latency queries. Which Azure service should you use?

A.Azure Table Storage
B.Azure Cache for Redis
C.Azure Cosmos DB
D.Azure Blob Storage
AnswerC

Supports indexing on multiple fields and low-latency queries.

Why this answer

Option C is correct because Azure Cosmos DB is a NoSQL database that supports indexing on multiple fields and provides low-latency queries on JSON documents. Option A is wrong because Azure Blob Storage does not support indexing on document fields. Option B is wrong because Azure Table Storage is a key-value store with limited indexing.

Option D is wrong because Azure Cache for Redis is an in-memory cache, not a durable indexed store.

924
MCQhard

A company uses Azure SQL Database with geo-replication for disaster recovery. During a regional outage, they manually failover to the secondary region. After the primary region is restored, they need to re-establish geo-replication with minimal downtime. What should they do?

A.Initiate a planned failover to switch back to the original region
B.Drop the secondary database and create a new one
C.Delete the geo-replication link and create a new one
D.Manually swap the roles of the primary and secondary
AnswerA

A planned failover will switch roles with no data loss and re-establish geo-replication.

Why this answer

After a manual failover to the secondary region, the original primary becomes a secondary database. To re-establish geo-replication with minimal downtime, you should initiate a planned failover (also called a graceful failover) to switch back to the original region. This operation reverses the roles without data loss and avoids the need to reseed the database, keeping downtime to seconds.

Exam trap

The trap here is that candidates confuse the initial failover (which may be forced) with the recovery process, assuming they must recreate the geo-replication link or drop the database, when in fact a planned failover cleanly reverses the roles with minimal downtime.

How to eliminate wrong answers

Option B is wrong because dropping the secondary database and creating a new one would require a full data reseed, causing significant downtime and data transfer. Option C is wrong because deleting the geo-replication link and creating a new one would also force a full reseed, which is unnecessary and introduces longer downtime. Option D is wrong because manually swapping roles is not a supported operation; Azure SQL Database uses the ALTER DATABASE ...

FAILOVER command to perform a controlled role swap, not a manual process.

925
MCQhard

A multinational corporation needs to store archival data for 10 years with the lowest possible storage cost, while still being able to retrieve it within 24 hours if needed. Which Azure storage tier should they use?

A.Archive Blob Storage
B.Cool Blob Storage
C.Premium Blob Storage
D.Hot Blob Storage
AnswerA

Lowest cost, retrieval within 15 hours, meets 24-hour requirement.

Why this answer

Archive Blob Storage is the correct choice because it is designed for long-term retention of data that is rarely accessed, offering the lowest storage cost among Azure blob tiers. The 10-year retention requirement and 24-hour retrieval window align perfectly with Archive's capabilities, as data can be rehydrated to a hot or cool tier within hours (typically up to 15 hours for standard priority rehydration).

Exam trap

The trap here is that candidates often confuse 'lowest storage cost' with 'lowest overall cost' and overlook the retrieval time constraint, mistakenly choosing Cool Blob Storage because it offers lower cost than Hot but still allows immediate access, ignoring that Archive is even cheaper and meets the 24-hour retrieval window.

How to eliminate wrong answers

Option B (Cool Blob Storage) is wrong because it is optimized for data accessed infrequently but with immediate retrieval needs, not for archival durations of 10 years, and its storage cost is higher than Archive. Option C (Premium Blob Storage) is wrong because it uses SSD-backed storage for low-latency, high-frequency access scenarios, making it the most expensive tier and unsuitable for archival data. Option D (Hot Blob Storage) is wrong because it is designed for data accessed frequently with millisecond latency, incurring the highest storage cost, which contradicts the requirement for lowest possible cost.

926
MCQeasy

A startup is building a mobile app that requires offline data synchronization. The app needs to store user-generated content locally on the device and sync with Azure when connectivity is available. Which Azure service should they use for the cloud backend?

A.Azure Cosmos DB
B.Azure Blob Storage
C.Azure SQL Database
D.Azure Table Storage
AnswerA

Supports offline data sync with the Azure Mobile Apps SDK.

Why this answer

Option B is correct because Azure Cosmos DB, combined with offline data sync (via Mobile Apps SDK), provides offline capabilities. Option A is wrong because Azure Blob Storage does not have built-in offline sync for mobile apps. Option C is wrong because Azure SQL Database is relational and not the best for offline sync scenarios.

Option D is wrong because Azure Table Storage lacks robust offline sync support.

927
MCQhard

A logistics company ingests real-time GPS data from delivery vehicles via Azure Event Hubs. The data includes vehicle ID, latitude, longitude, and timestamp. The company also has historical route plan data stored as CSV files in Azure Data Lake Storage Gen2. Data analysts need to combine the live stream with the historical data in near real-time to create a dashboard showing if vehicles are on schedule. They also need to run complex T-SQL queries on the combined dataset for ad-hoc reporting. Which Azure service should they use as the primary analytics platform?

A.A: Azure Stream Analytics
B.B: Azure Data Lake Analytics
C.C: Azure Synapse Analytics
D.D: Azure Analysis Services
AnswerC

Synapse Analytics provides a unified platform for real-time streaming ingestion, T-SQL querying over both streaming and historical data in Data Lake, and integration with Power BI for dashboards.

Why this answer

Azure Synapse Analytics is the correct choice because it provides a unified analytics platform that can ingest real-time data from Azure Event Hubs via its built-in streaming capabilities (e.g., using Synapse Pipelines or Spark Structured Streaming) and combine it with historical data stored in Azure Data Lake Storage Gen2. It supports complex T-SQL queries through its dedicated SQL pool (formerly SQL Data Warehouse) for ad-hoc reporting, enabling near real-time dashboards and interactive analytics on the combined dataset.

Exam trap

The trap here is that candidates often confuse Azure Stream Analytics as the primary analytics platform because it handles real-time streaming, but they overlook the requirement for complex T-SQL queries and ad-hoc reporting, which Stream Analytics cannot natively support, making Azure Synapse Analytics the correct unified solution.

How to eliminate wrong answers

Option A is wrong because Azure Stream Analytics is a real-time stream processing engine that outputs to sinks like Azure SQL Database or Power BI, but it does not natively support complex T-SQL queries for ad-hoc reporting on combined historical and streaming data; it is not a primary analytics platform for running T-SQL queries. Option B is wrong because Azure Data Lake Analytics is a U-SQL-based analytics service that has been deprecated and does not support T-SQL queries; it is not suitable for running complex T-SQL queries or near real-time streaming ingestion. Option D is wrong because Azure Analysis Services is an OLAP engine for semantic modeling and business intelligence, not a platform for ingesting real-time streams or running complex T-SQL queries on raw combined data; it requires pre-processed data and does not handle streaming ingestion directly.

928
MCQmedium

A company has an Azure SQL Database that supports a critical business application in the West US region. They want to ensure that if the primary region becomes unavailable, the database can automatically fail over to a secondary replica in the East US region with minimal data loss. The secondary replica must also be readable to offload some reporting queries when the primary is healthy. Which Azure SQL Database feature should they enable?

A.Active Geo-Replication
B.Long-term retention backups
C.Transparent Data Encryption (TDE)
D.Point-in-time restore
AnswerA

Correct. Active Geo-Replication maintains a readable secondary replica in another region and enables automatic failover for disaster recovery with minimal data loss.

Why this answer

Active Geo-Replication creates a readable secondary replica in a different Azure region (East US) that stays synchronized asynchronously with the primary database in West US. In the event of a regional outage, the secondary can be manually or automatically promoted to become the new primary, with minimal data loss (typically a few seconds of transactions). The secondary replica is also readable, allowing reporting queries to be offloaded from the primary while it is healthy.

Exam trap

The trap here is that candidates often confuse Active Geo-Replication with failover groups, but failover groups provide automatic failover and a single endpoint, whereas Active Geo-Replication requires manual or scripted failover and separate connection strings for the secondary.

How to eliminate wrong answers

Option B (Long-term retention backups) is wrong because it provides archival backups for compliance or recovery beyond the automated backup retention period, not real-time replication or automatic failover. Option C (Transparent Data Encryption) is wrong because it encrypts data at rest and in transit but does not provide any replication, failover, or read-scale capability. Option D (Point-in-time restore) is wrong because it allows restoring a database to a specific past time from automated backups, but it does not maintain a continuously synchronized secondary replica or support automatic failover.

929
MCQeasy

You are creating an Azure SQL Database and need to connect using Microsoft Entra ID authentication. Which user type must you create in the database to represent the authenticated Microsoft Entra ID identity?

A.SQL login with password
B.Contained database user mapped to a Microsoft Entra ID identity
C.External user from Microsoft Entra ID
D.Database user without login
AnswerB

Enables Microsoft Entra ID authentication.

Why this answer

Option C is correct because a contained database user mapped to a Microsoft Entra ID identity is required. Option A is wrong because SQL authentication uses a SQL login. Option B is wrong because a database user without login is for SQL authentication.

Option D is wrong because an external user is not a valid concept.

930
MCQhard

A company uses Azure Synapse Analytics dedicated SQL pool to store a large fact table with billions of rows. The table is distributed using hash distribution on the SaleDate column. Queries that join this fact table with a small dimension table (Product) on ProductID are slow because the join requires shuffling data across distributions. Which design change would most improve the performance of these join queries?

A.Change the distribution of the fact table to round-robin.
B.Replicate the Product dimension table to all distributions.
C.Partition the fact table by SaleDate.
D.Create a nonclustered index on ProductID in the fact table.
AnswerB

Correct. Replicating small dimension tables ensures that each distribution has a local copy of the entire table, avoiding data movement during joins.

Why this answer

Replicating the Product dimension table to all distributions eliminates the need to shuffle data across distributions during the join. In Azure Synapse dedicated SQL pool, hash distribution distributes rows across 60 distributions based on the hash of the distribution column (SaleDate). When joining on ProductID, which is not the distribution column, data must be moved between distributions.

Replicating the small dimension table ensures each distribution has a local copy, allowing the join to be performed without data movement, significantly improving performance.

Exam trap

The trap here is that candidates often confuse partitioning with distribution, thinking partitioning on SaleDate will help the join on ProductID, but partitioning only segments data within a distribution and does not reduce cross-distribution data movement for joins on a different column.

How to eliminate wrong answers

Option A is wrong because changing the distribution to round-robin would distribute data evenly but without any hash alignment, causing even more data movement for all joins, not just this one. Option C is wrong because partitioning by SaleDate organizes data within each distribution but does not reduce data shuffling across distributions for joins on ProductID; partitioning is primarily for partition elimination and maintenance operations. Option D is wrong because a nonclustered index on ProductID within each distribution can speed up local lookups but does not address the cross-distribution data movement required when the join key does not match the distribution key.

931
MCQmedium

A company needs to build a centralized analytics platform that can query both structured data in a relational data warehouse and unstructured data in a data lake using a single SQL-based interface. They want to minimize data movement and use a serverless, on-demand compute model for ad-hoc queries. Which Azure service should they use?

A.A. Azure SQL Database
B.B. Azure Synapse Serverless SQL pool
C.C. Azure HDInsight
D.D. Azure Analysis Services
AnswerB

Azure Synapse Serverless SQL pool provides a serverless, on-demand query interface that can read data from various sources in the data lake using T-SQL, without data movement or provisioning.

Why this answer

Azure Synapse Serverless SQL pool is correct because it provides a SQL-based interface to query both structured data in a relational data warehouse and unstructured data in a data lake (e.g., Parquet, CSV, JSON) without moving data. It uses a serverless, on-demand compute model that charges per query, making it ideal for ad-hoc analytics with minimal data movement.

Exam trap

The trap here is that candidates often confuse Azure Synapse Serverless SQL pool with Azure SQL Database or HDInsight, mistakenly thinking a traditional relational database or a managed cluster is needed for querying unstructured data, when the serverless SQL pool is specifically designed for this hybrid, on-demand scenario.

How to eliminate wrong answers

Option A is wrong because Azure SQL Database is a fully managed relational database service for OLTP workloads, not designed to query unstructured data in a data lake or provide a serverless on-demand model for ad-hoc analytics across heterogeneous sources. Option C is wrong because Azure HDInsight is a managed big data analytics service that uses Hadoop, Spark, or Hive, requiring cluster provisioning and management, not a serverless SQL-based interface for ad-hoc queries. Option D is wrong because Azure Analysis Services is an enterprise-grade analytics engine for semantic modeling and OLAP, not a serverless SQL query service for directly querying data lake files without data movement.

932
MCQmedium

You are designing a solution that requires storing large binary files (up to 5 TB each) that are updated frequently by multiple processes. Which Azure storage feature allows concurrent writes to the same file?

A.Azure Blob Storage with soft delete
B.Azure Files
C.Azure Disks
D.Azure NetApp Files
AnswerB

Azure Files provides fully managed file shares accessible via SMB, supporting concurrent access with file locking.

Why this answer

Azure Files supports SMB protocol with leasing and oplocks, allowing multiple clients to read/write the same file with proper coordination. Blob Storage does not support concurrent writes to the same blob without custom logic. Azure Disks are for VM disks, not shared access.

Azure NetApp Files provides shared file access but is more expensive and complex.

933
MCQeasy

A logistics company uses an online system to process incoming delivery requests one at a time, updating the database immediately upon each transaction. They also run a weekly job that analyzes thousands of delivery records to identify average delivery times and trends. Which set of terms correctly classifies these two workloads?

A.OLTP and OLAP
B.Batch processing and real-time processing
C.Relational and non-relational
D.Structured and semi-structured
AnswerA

Correct. OLTP handles day-to-day transactions, and OLAP handles analysis across many records.

Why this answer

The first workload processes individual delivery requests with immediate database updates, which is the definition of Online Transaction Processing (OLTP). The second workload runs a weekly job analyzing thousands of records for trends and averages, which is Online Analytical Processing (OLAP). These two terms correctly classify the transactional and analytical workloads described.

Exam trap

The trap here is that candidates confuse the processing mode (batch vs. real-time) with the workload classification (OLTP vs. OLAP), but the question specifically asks for the terms that classify the workloads, not describe their timing.

How to eliminate wrong answers

Option B is wrong because 'batch processing and real-time processing' describes processing modes, not workload classifications; the question asks for terms that classify the workloads, and while the weekly job is batch, the first workload is real-time, but the correct pair is OLTP/OLAP. Option C is wrong because 'relational and non-relational' refers to database types (e.g., SQL vs. NoSQL), not to the nature of the workloads (transactional vs. analytical).

Option D is wrong because 'structured and semi-structured' describes data formats (e.g., tables vs. JSON), not the operational characteristics of the workloads.

934
MCQeasy

A medical imaging company stores high-resolution MRI scans in Azure Blob Storage. The scans are accessed frequently for the first 6 months after being generated, then rarely after that, but must be available immediately when accessed for comparisons. The company wants to minimize storage costs. Which Azure Blob Storage access tier should they use for scans older than 6 months?

A.Hot access tier
B.Cool access tier
C.Archive access tier
D.Premium access tier
AnswerB

Cool tier is optimized for infrequently accessed data that still needs immediate availability. It offers lower storage costs than Hot tier while still allowing instant access.

Why this answer

The Cool access tier is ideal for data that is infrequently accessed but must be available immediately when needed. It offers lower storage costs than the Hot tier while maintaining low-latency retrieval, matching the requirement for scans older than 6 months that are rarely accessed but require instant availability.

Exam trap

The trap here is that candidates often confuse 'rarely accessed' with 'Archive tier,' forgetting that Archive requires hours of rehydration time, which fails the 'available immediately' constraint in the question.

How to eliminate wrong answers

Option A is wrong because the Hot access tier is optimized for frequent access and has higher storage costs, making it cost-inefficient for data accessed rarely after 6 months. Option C is wrong because the Archive access tier has the lowest storage cost but requires rehydration (which can take hours) before data can be accessed, violating the 'available immediately' requirement. Option D is wrong because the Premium access tier is designed for high-performance, low-latency workloads with transactional costs and is not cost-effective for rarely accessed data.

935
MCQmedium

A company plans to migrate an on-premises SQL Server database to Azure. The database uses SQL Server Agent jobs for scheduled maintenance and relies on linked servers to query data from another SQL Server instance. It also performs cross-database queries within the same instance. The company wants a fully managed PaaS service that requires minimal application changes and provides automated backups and patching. Which Azure SQL service should they choose?

A.A. Azure SQL Database (single database)
B.B. Azure SQL Managed Instance
C.C. SQL Server on Azure Virtual Machines
D.D. Azure SQL Database elastic pool
AnswerB

Azure SQL Managed Instance provides high compatibility with on-premises SQL Server, including Agent jobs, linked servers, and cross-database queries, while being fully managed.

Why this answer

Azure SQL Managed Instance is the correct choice because it provides near 100% compatibility with on-premises SQL Server, including support for SQL Server Agent jobs, linked servers, and cross-database queries within the same instance. It is a fully managed PaaS service that offers automated backups and patching, minimizing application changes during migration.

Exam trap

The trap here is that candidates often confuse Azure SQL Database's limited feature set with the full SQL Server engine compatibility of Azure SQL Managed Instance, assuming all PaaS offerings support agent jobs and linked servers when only Managed Instance does.

How to eliminate wrong answers

Option A is wrong because Azure SQL Database (single database) does not support SQL Server Agent jobs, linked servers, or cross-database queries within the same instance; it is designed for modern cloud-native applications and requires significant application changes. Option C is wrong because SQL Server on Azure Virtual Machines is an IaaS solution, not a fully managed PaaS service; it requires manual patching and backup management, contradicting the requirement for automated operations. Option D is wrong because Azure SQL Database elastic pool is a resource-sharing model for multiple single databases and inherits the same limitations as Azure SQL Database, lacking support for SQL Server Agent jobs, linked servers, and cross-database queries.

936
MCQmedium

A mobile app stores user preferences in Azure Cosmos DB using the NoSQL API. The app frequently reads a single user's profile by user ID (the partition key). The development team wants the fastest possible read performance globally and is willing to accept that reads might not reflect the latest write immediately. Which consistency level should they choose to minimize read latency?

A.Strong
B.Bounded staleness
C.Session
D.Eventual
AnswerD

Correct. Eventual consistency offers the lowest read latency and is optimal when eventual convergence is acceptable, as in profile viewing.

Why this answer

Eventual consistency offers the lowest read latency because it allows reads to return data from any replica without waiting for confirmation that the write has been fully replicated. Since the app can tolerate stale reads (i.e., not reflecting the latest write immediately), Eventual consistency eliminates the synchronization overhead required by stronger models, making it the fastest choice for global read performance.

Exam trap

The trap here is that candidates often confuse 'fastest read performance' with 'strongest consistency' and choose Strong or Bounded staleness, not realizing that the question explicitly allows stale reads, making Eventual the optimal choice for minimizing latency.

How to eliminate wrong answers

Option A is wrong because Strong consistency requires all replicas to acknowledge the write before the read is served, which introduces significant latency, especially across global regions. Option B is wrong because Bounded staleness still enforces a maximum lag (time or operations) before reads must reflect the latest write, adding coordination overhead that increases latency compared to Eventual. Option C is wrong because Session consistency guarantees monotonic reads and writes within a single client session, which requires session context tracking and can still introduce latency beyond the minimal possible with Eventual.

937
Multi-Selectmedium

Which THREE are benefits of using a data warehouse in Azure?

Select 3 answers
A.Optimizes query performance for analytical workloads
B.Centralizes data from multiple sources
C.Supports historical trend analysis
D.Stores unstructured data like videos
E.Enables real-time streaming analytics
AnswersA, B, C

Designed for fast complex queries.

Why this answer

A data warehouse in Azure (e.g., Azure Synapse Analytics) is optimized for analytical workloads through columnar storage and massively parallel processing (MPP), which significantly improves query performance on large datasets. This architecture is designed for read-heavy, aggregation-based queries typical of business intelligence and reporting, not for transactional or real-time operations.

Exam trap

The trap here is that candidates confuse the capabilities of a data warehouse with those of a data lake or real-time analytics service, assuming a data warehouse can handle any data type or latency requirement, when in fact it is purpose-built for structured, batch-oriented analytical workloads.

938
MCQeasy

You need to store event data from multiple sources in a schema-less format for later analysis. The data arrives as JSON and must be durable and highly available. Which Azure service should you use?

A.Azure Blob Storage
B.Azure SQL Database
C.Azure Event Hubs
D.Azure Data Factory
AnswerA

Blob Storage stores unstructured data durably.

Why this answer

Option A is correct because Azure Blob Storage provides durable, highly available storage for JSON blobs with schema-less format. Option B is wrong because Azure SQL Database requires a schema. Option C is wrong because Azure Event Hubs is for ingestion, not long-term storage.

Option D is wrong because Azure Data Factory is an orchestration service.

939
MCQeasy

A company needs to create a relational database in Azure that is compatible with existing SQL Server applications and provides built-in high availability without requiring configuration. Which service should they choose?

A.SQL Server on Azure Virtual Machines
B.Azure Database for MariaDB
C.Azure Cosmos DB
D.Azure SQL Database
AnswerD

PaaS with built-in high availability and SQL Server compatibility.

Why this answer

Azure SQL Database is a fully managed Platform-as-a-Service (PaaS) relational database that is built on the latest stable version of the Microsoft SQL Server engine, ensuring compatibility with existing SQL Server applications. It provides built-in high availability with a 99.99% SLA through automatic failover groups and zone-redundant configurations, requiring no manual setup or configuration from the user.

Exam trap

The trap here is that candidates often confuse IaaS (SQL Server on VMs) with PaaS (Azure SQL Database) and assume both require manual HA setup, or they mistakenly think MariaDB or Cosmos DB can be used as drop-in replacements for SQL Server applications.

How to eliminate wrong answers

Option A is wrong because SQL Server on Azure Virtual Machines is an Infrastructure-as-a-Service (IaaS) offering that requires manual configuration of SQL Server Always On Availability Groups or failover clustering to achieve high availability, not built-in. Option B is wrong because Azure Database for MariaDB is a fork of MySQL and is not compatible with SQL Server applications, which rely on T-SQL and SQL Server-specific features. Option C is wrong because Azure Cosmos DB is a NoSQL multi-model database service that does not support the relational model or T-SQL, making it incompatible with existing SQL Server applications.

940
MCQhard

Your company runs a global e-commerce platform that generates over 5 TB of clickstream data daily. The data is currently stored as raw CSV files in Azure Blob Storage. The data engineering team needs to transform this data into a star schema for business intelligence reporting. They want to use a serverless, code-first approach where they can write Python or SQL transformations. The transformed data should be stored in a format that optimizes query performance for Power BI. You also need to ensure that the solution can handle variable data volumes without manual scaling. Which Azure service should you use for the transformation?

A.Azure Stream Analytics
B.Azure Databricks
C.Azure Synapse Serverless SQL
D.Azure Data Factory
AnswerB

Serverless, code-first Spark environment supporting Python and SQL for large-scale transformations.

Why this answer

Azure Databricks is the correct choice because it provides a serverless, code-first environment where data engineers can write Python or SQL transformations using Apache Spark. It can handle variable data volumes without manual scaling, and it can output transformed data in optimized formats like Parquet, which significantly improves query performance for Power BI. This aligns perfectly with the requirement for a serverless, code-first approach and star schema transformation.

Exam trap

The trap here is that candidates often confuse Azure Data Factory as a transformation service, but it is actually an orchestration tool that requires a separate compute engine (like Databricks or Synapse) to perform the actual data transformations.

How to eliminate wrong answers

Option A is wrong because Azure Stream Analytics is designed for real-time stream processing, not batch transformations of large CSV files in Blob Storage, and it does not support writing Python transformations. Option C is wrong because Azure Synapse Serverless SQL is a SQL-only query engine that cannot execute Python transformations, and it is not a code-first transformation service. Option D is wrong because Azure Data Factory is primarily an orchestration and ETL/ELT pipeline service that uses visual pipelines or code snippets, but it is not designed for writing custom Python or SQL transformations on large datasets; it relies on compute engines like Databricks or Synapse for actual data processing.

941
MCQmedium

A healthcare organization needs to store medical imaging files (DICOM) that average 50 MB each, with some up to 500 MB. The images must be retained for 10 years for regulatory compliance. During the first year, images are accessed frequently by radiologists. After one year, access drops to once or twice a year. The organization also needs to support application-level encryption with customer-managed keys. The storage solution must be HIPAA eligible and provide high durability. You need to recommend a storage solution. Which Azure service and configuration should you choose?

A.Azure Blob Storage in the Hot tier with a lifecycle management policy to move blobs to Cool after 30 days and to Archive after 1 year, with customer-managed encryption keys.
B.Azure Cosmos DB with a large document size limit, storing images as base64-encoded strings.
C.Azure NetApp Files with a capacity pool and export policy, using snapshots for backup.
D.Azure Files with Azure File Sync to local servers, storing all images in the Premium tier for fast access.
AnswerA

Blob Storage supports large files, lifecycle management, and customer-managed keys. HIPAA eligible.

Why this answer

Option A is correct because Azure Blob Storage is HIPAA eligible, supports large blobs, and offers lifecycle management to move blobs from Hot to Cool to Archive tiers automatically. Customer-managed keys are supported. Option B is wrong because Azure Files has a file size limit of 4 TiB and is not optimized for DICOM images.

Option C is wrong because Azure NetApp Files is expensive and not necessary. Option D is wrong because Azure Cosmos DB is not designed for large binary files.

942
MCQmedium

A data analyst needs to create a real-time dashboard in Power BI that displays streaming data from Azure Event Hubs. The data must be refreshed every second. Which Power BI feature should they use?

A.Streaming dataset
B.Import mode with scheduled refresh
C.DirectQuery
D.Power BI Dataflows
AnswerA

Supports real-time data ingestion at sub-second intervals.

Why this answer

A is correct because Power BI's streaming dataset feature is specifically designed to handle real-time data ingestion and visualization with sub-second latency. It supports direct integration with Azure Event Hubs, allowing the dashboard to refresh every second without the need for scheduled refresh or query-based retrieval.

Exam trap

The trap here is that candidates often confuse DirectQuery with real-time capabilities, but DirectQuery is not designed for sub-second streaming updates and relies on query execution latency, whereas streaming datasets use a push-based model for true real-time refresh.

How to eliminate wrong answers

Option B is wrong because Import mode with scheduled refresh can only refresh data at intervals of 30 minutes or more (minimum 30 minutes for shared capacity, 1 minute for Premium), not every second, and it requires data to be stored and reloaded. Option C is wrong because DirectQuery sends queries to the source on each interaction, but it is not optimized for high-frequency streaming updates like every second; it is designed for interactive querying of large datasets, not real-time push-based streaming. Option D is wrong because Power BI Dataflows are used for data preparation and transformation in the cloud, not for real-time streaming ingestion or dashboard refresh at sub-minute intervals.

943
MCQmedium

A ride-sharing application needs to store real-time GPS location updates from drivers and passengers. The data is ingested as key-value pairs where the key is the user ID and the value is a timestamped location. The application requires low-latency reads and writes for millions of concurrent users, and the data model is simple with no need for complex queries or joins. Which Azure NoSQL database API should be used for this workload?

A.Azure Cosmos DB Table API
B.Azure Cosmos DB SQL (Core) API
C.Azure Cosmos DB for MongoDB API
D.Azure Cosmos DB for Apache Gremlin API
AnswerA

The Table API is designed for key-value storage with simple queries by partition key and row key, providing low-latency access at global scale. It is ideal for this type of high-throughput, simple data access pattern.

Why this answer

Azure Cosmos DB Table API is the correct choice because it provides a key-value store with low-latency reads and writes, ideal for high-throughput scenarios like real-time GPS updates. It supports a simple schema-less data model where each item is a key-value pair, and it offers single-millisecond latency at the 99th percentile for both reads and writes, meeting the requirement for millions of concurrent users without complex queries or joins.

Exam trap

The trap here is that candidates often choose the SQL (Core) API because it is the most versatile and well-known, but they overlook that the Table API is specifically optimized for simple key-value workloads with lower latency and cost, as it avoids the overhead of document parsing and indexing for complex queries.

How to eliminate wrong answers

Option B is wrong because Azure Cosmos DB SQL (Core) API is a document-oriented API that supports complex queries using SQL syntax, which is overkill for a simple key-value workload and introduces unnecessary overhead. Option C is wrong because Azure Cosmos DB for MongoDB API is designed for document data with MongoDB-compatible features like aggregation pipelines and secondary indexes, which are not needed for simple key-value pairs and may add latency. Option D is wrong because Azure Cosmos DB for Apache Gremlin API is a graph database API optimized for traversing relationships between entities, which is irrelevant for a simple key-value store with no need for joins or graph queries.

944
Multi-Selecteasy

Which TWO of the following are benefits of using a data lake architecture? (Choose two.)

Select 2 answers
A.ACID transactions for all operations
B.Optimized for high-frequency OLTP workloads
C.Ability to store raw data in its native format
D.Built-in data governance without additional tools
E.Support for structured, semi-structured, and unstructured data
AnswersC, E

Schema-on-read allows storing raw data.

Why this answer

Option C is correct because a data lake architecture is designed to store raw data in its native format without requiring schema-on-write transformations. This allows organizations to ingest data as-is from various sources, preserving the original structure and enabling schema-on-read flexibility for analytics.

Exam trap

The trap here is that candidates often confuse data lakes with data warehouses, assuming data lakes enforce ACID transactions and schema-on-write, or they overestimate built-in governance capabilities without realizing additional tools are required.

945
MCQeasy

A data analyst needs to create a report in Power BI that combines sales data from Azure SQL Database and inventory data from Azure Cosmos DB. The report should refresh daily. Which Power BI feature should be used to combine these data sources?

A.Quick Measures
B.Data Analysis Expressions (DAX)
C.Power BI Desktop
D.Power Query
AnswerD

Power Query is a data connection and transformation tool that allows connecting to multiple data sources and combining them into a single dataset.

Why this answer

Power Query allows connecting to multiple data sources and combining them through queries and merges. Option A is wrong because Quick Measures are for creating calculations within a single table. Option B is wrong because Power BI Desktop is the tool, not a feature for combining sources.

Option D is wrong because DAX is a formula language for creating calculations, not for data ingestion.

946
Multi-Selectmedium

Which THREE components are part of a typical modern data warehouse architecture on Azure? (Choose three.)

Select 3 answers
A.Azure Synapse Analytics
B.Power BI
C.Azure Stream Analytics
D.Azure Data Factory
E.Azure Data Lake Storage Gen2
AnswersA, D, E

The data warehouse engine.

Why this answer

Azure Synapse Analytics is a core component of a modern data warehouse architecture on Azure because it provides a unified analytics platform that combines big data and data warehousing capabilities. It enables T-SQL-based querying of both relational and non-relational data, integrating with Azure Data Lake Storage Gen2 for scalable storage and Azure Data Factory for orchestration.

Exam trap

The trap here is that candidates may confuse Power BI as a data warehouse component because it is commonly used with Azure Synapse, but it is a reporting/visualization layer, not part of the core storage, compute, or ingestion architecture.

947
MCQhard

A company ingests streaming data from IoT devices into Azure Event Hubs. They need to perform real-time analytics on the data, such as aggregating temperature readings over 5-minute windows and triggering alerts when thresholds are exceeded. They also want to store the processed data in a data warehouse for historical analysis. Which Azure service should they use for the real-time processing?

A.Azure Data Factory
B.Azure Stream Analytics
C.Azure Databricks
D.Azure Logic Apps
AnswerB

Azure Stream Analytics provides real-time stream processing with SQL-like queries, supports windowed aggregations, and can output to a data warehouse.

Why this answer

Azure Stream Analytics is purpose-built for real-time stream processing, allowing you to define SQL-like queries that aggregate data over tumbling or hopping windows (e.g., 5-minute windows) and trigger alerts based on thresholds. It integrates directly with Azure Event Hubs as a source and can output processed results to Azure Synapse Analytics or other data warehouses for historical storage, making it the correct choice for this real-time analytics workload.

Exam trap

The trap here is that candidates often confuse Azure Stream Analytics with Azure Databricks, thinking that any Spark-based service is required for streaming, but Stream Analytics is the simpler, fully managed service specifically designed for real-time analytics on Azure Event Hubs without needing to manage clusters or write complex code.

How to eliminate wrong answers

Option A is wrong because Azure Data Factory is a cloud-based ETL and data orchestration service designed for batch data movement and transformation, not for real-time stream processing with sub-second latency. Option C is wrong because Azure Databricks is an Apache Spark-based analytics platform that can handle streaming data, but it requires more complex setup and is not the simplest or most direct service for simple windowed aggregations and alerting on IoT data from Event Hubs. Option D is wrong because Azure Logic Apps is a workflow automation service for integrating applications and services, not a stream processing engine capable of performing real-time analytics like time-windowed aggregations or threshold-based alerts.

948
MCQeasy

A startup wants to build a new web application with a relational database. They expect variable traffic and want to minimize costs by paying only for the compute resources they use. Which Azure SQL Database deployment option should they choose?

A.Provisioned compute tier
B.Elastic pool
C.SQL Server on Azure Virtual Machines
D.Serverless compute tier
AnswerD

Serverless automatically pauses and scales, billing per second of use.

Why this answer

Option C is correct because Azure SQL Database serverless automatically pauses during inactivity and bills per second for compute. Option A is wrong because provisioned computing charges for allocated resources regardless of usage. Option B is wrong because elastic pools are for multiple databases with pooled resources, not serverless.

Option D is wrong because SQL on VMs is IaaS, costs for VMs always on.

949
MCQhard

A gaming application requires a high-performance leaderboard that stores player scores and retrieves the top 10 scores quickly. The data does not require complex queries or a fixed schema. The leaderboard must support updates as new scores are submitted. Which Azure data store is most appropriate for this scenario?

A.Azure Cosmos DB with SQL API
B.Azure Table storage
C.Azure Cache for Redis
D.Azure Blob Storage
AnswerC

Correct. Redis provides sorted sets that allow efficient insertion and retrieval of top scores with low latency, making it ideal for leaderboards.

Why this answer

Azure Cache for Redis is the most appropriate choice because it provides an in-memory data structure store with native support for sorted sets (via the ZADD and ZRANGE commands), which are ideal for maintaining a real-time leaderboard. It can handle high-throughput score updates and retrieve the top 10 scores in O(log(N)) time, meeting the low-latency and performance requirements without needing a fixed schema.

Exam trap

The trap here is that candidates often choose Azure Cosmos DB (Option A) because they associate it with high performance and NoSQL, but they overlook that Azure Cache for Redis is purpose-built for in-memory, sub-millisecond operations like sorted sets, which are exactly what a leaderboard requires.

How to eliminate wrong answers

Option A is wrong because Azure Cosmos DB with SQL API, while fast, is a fully managed NoSQL database that incurs higher latency and cost for simple leaderboard operations compared to an in-memory cache, and it requires provisioning throughput (RU/s) even for simple sorted set operations. Option B is wrong because Azure Table storage is a key-value store that does not support sorted sets or built-in ranking operations; retrieving the top 10 scores would require scanning all entities and sorting client-side, which is inefficient and slow. Option D is wrong because Azure Blob Storage is designed for unstructured large object storage (blobs) and does not support atomic score updates or real-time querying of individual scores; it would require downloading and rewriting entire files for each update, making it unsuitable for a high-performance leaderboard.

950
MCQmedium

A company stores large video files in Azure Blob Storage. The files are accessed frequently for the first 30 days after upload, then rarely for the next 180 days, and after that they are only needed for compliance but never accessed. The company wants to minimize storage costs while ensuring the files remain durable and accessible. Which strategy should they implement?

A.Store all files in the Cool access tier and apply lifecycle management to move files to the Archive tier after 30 days.
B.Store files initially in the Hot tier, then use lifecycle management to move files to Cool after 30 days and to Archive after 210 days.
C.Store files in the Archive tier from the beginning to maximize cost savings.
D.Store files in the Premium tier for fast access, then manually delete files after 30 days.
AnswerB

This strategy matches the access patterns: Hot for the frequent first 30 days, Cool for the rare next 180 days, and Archive for the never-accessed compliance period. Lifecycle management automates transitions, minimizing costs.

Why this answer

Option B is correct because it aligns the access patterns with the appropriate Azure Blob Storage access tiers: Hot for frequent initial access, Cool for reduced-cost infrequent access after 30 days, and Archive for the lowest-cost long-term retention after 210 days. Azure lifecycle management policies automate these transitions, ensuring durability and accessibility while minimizing costs.

Exam trap

The trap here is that candidates often assume the Cool tier is the cheapest option for long-term storage, overlooking the Archive tier's significantly lower cost for compliance data that is never accessed, and they may also forget that lifecycle management can automate multiple tier transitions over time.

How to eliminate wrong answers

Option A is wrong because storing files in the Cool tier from the start incurs higher early-access costs and a 30-day early deletion penalty, and moving to Archive after only 30 days ignores the 180-day period of rare access where Cool is more cost-effective than Archive. Option C is wrong because storing files in the Archive tier from the beginning makes them inaccessible for immediate frequent access (Archive requires rehydration, which can take hours) and violates the requirement for frequent access in the first 30 days. Option D is wrong because the Premium tier is designed for low-latency, high-transaction workloads (e.g., Azure Virtual Desktop) and is significantly more expensive than Hot or Cool; manually deleting files after 30 days loses the 180-day rare-access period and incurs unnecessary costs.

951
MCQmedium

A university's enrollment system stores data in a single table with columns: EnrollmentID, StudentID, StudentName, CourseID, CourseName, and Grade. Students can take multiple courses, and each course has multiple students. The team notices data redundancy: StudentName is repeated for each enrollment of the same student, and CourseName is repeated for each enrollment in the same course. They want to reduce redundancy while preserving the ability to query all enrollments with student and course details. What is the most appropriate design approach?

A.Keep the single table but use compression to reduce storage
B.Create a view that mirrors the single table but physically store data in separate normalized tables
C.Normalize the schema by creating separate Students, Courses, and Enrollments tables with foreign keys
D.Denormalize by adding more columns to the single table
AnswerC

Normalization partitions data into entities (Students, Courses) and the relationship (Enrollments). This removes duplication of StudentName and CourseName and enforces referential integrity.

Why this answer

Option C is correct because normalizing the schema into separate Students, Courses, and Enrollments tables eliminates data redundancy by storing each student's name and each course's name only once, while using foreign keys to maintain relationships. This preserves the ability to query all enrollments with student and course details via JOIN operations, which is the standard relational database design principle for reducing anomalies and storage overhead.

Exam trap

The trap here is that candidates confuse views with physical schema changes, thinking a view can magically eliminate redundancy without altering table structure, or they mistakenly believe compression is a substitute for proper normalization.

How to eliminate wrong answers

Option A is wrong because compression reduces storage size but does not eliminate logical data redundancy; repeated StudentName and CourseName values remain, leading to update anomalies and inconsistency risks. Option B is wrong because a view is a virtual table that does not physically store data; creating a view over a single table does not reduce redundancy, and physically storing data in separate normalized tables would require changing the underlying schema, not just adding a view. Option D is wrong because denormalization adds more columns, which increases redundancy and storage waste, contradicting the goal of reducing redundancy.

952
MCQhard

A data engineer loads raw log files into a storage system. The structure of the data is interpreted at the time of reading, allowing queries to apply schema on the fly without preprocessing. This approach is best described as:

A.Schema-on-write
B.Schema-on-read
C.Data warehouse
D.Data virtualization
AnswerB

Schema-on-read applies the data structure when the data is accessed, typical in data lake architectures.

Why this answer

Schema-on-read means the data is stored in its raw, unstructured form, and the schema is applied dynamically when the data is queried. This is exactly what happens when raw log files are loaded into a storage system like Azure Data Lake Storage and queried with tools like Azure Synapse Serverless SQL or Apache Spark, which infer the schema at query time without requiring preprocessing.

Exam trap

The trap here is confusing schema-on-read with data virtualization, as both involve querying data without moving it, but schema-on-read specifically refers to interpreting the structure at read time from raw files, not abstracting multiple sources.

How to eliminate wrong answers

Option A is wrong because schema-on-write requires defining and enforcing a schema before data is written, which contradicts the scenario of interpreting structure at read time. Option C is wrong because a data warehouse typically uses schema-on-write with a predefined, optimized schema for structured data, not raw log files with on-the-fly interpretation. Option D is wrong because data virtualization provides a unified view of data from multiple sources without moving it, but it does not specifically describe the schema-on-read approach where the structure is interpreted at query time from raw storage.

953
MCQmedium

Refer to the exhibit. An administrator creates a storage account with the Hot tier and then creates a container with the Cool tier. Data is uploaded to the container. Which access tier applies to the uploaded blobs by default?

A.Hot, because the storage account tier is Hot.
B.No tier; blobs are not charged until accessed.
C.Archive, because no tier is set.
D.Cool, because the container's default tier is Cool.
AnswerD

Blobs inherit the container's default access tier if not specified.

Why this answer

In Azure Blob Storage, when a container has a default access tier set, blobs uploaded without specifying a tier inherit the container's default tier. The exhibit shows the container's accessTier is Cool, so blobs will be Cool. Option A is wrong because the container's tier overrides the account tier for new blobs.

Option C is wrong because this is not a serverless scenario. Option D is wrong because the cool tier is set at the container level.

954
MCQeasy

Refer to the exhibit. A database administrator runs the KQL query shown to monitor failed logouts? Actually, the query filters for 'DATABASE_LOGOUT' events. What is the primary purpose of this query?

A.Identify users with the most failed login attempts
B.Determine which users are currently connected to the database
C.Identify users with the highest number of logout events per hour
D.Monitor the total number of database connections over time
AnswerC

The query summarizes logout events by user and hour, then shows top 10.

Why this answer

Option D is correct because the query counts logout events per user per hour. Option A is wrong because it counts logouts, not failed logins. Option B is wrong because it counts events, not active logins.

Option C is wrong because it's not about total connections but logout events.

955
MCQhard

A company stores customer data in Azure Table Storage. They need to query by a combination of partition key (customer region) and row key (customer ID). Which query pattern is most efficient?

A.Query using RowKey only
B.Scan all entities
C.Query using both PartitionKey and RowKey
D.Query using PartitionKey only
AnswerC

This is a point query that directly accesses the entity, the most efficient pattern in Table Storage.

Why this answer

Table Storage is optimized for point queries using both PartitionKey and RowKey. Using both keys allows direct access to the entity without scanning. Filtering only by RowKey across partitions results in a full table scan.

Using only PartitionKey retrieves all rows in that partition, which is less efficient. Scanning all entities is the worst.

956
MCQmedium

A data scientist needs to train a machine learning model using data stored in Azure Data Lake Storage. They want to use a collaborative notebook environment with built-in experiment tracking. Which Azure service should they use?

A.Azure Synapse Analytics
B.Azure Databricks
C.Azure Machine Learning
D.Azure Data Studio
AnswerC

Full ML lifecycle management including notebooks and tracking.

Why this answer

Azure Machine Learning provides a collaborative notebook environment (Jupyter notebooks) with built-in experiment tracking, model management, and automated ML capabilities. It is the correct choice for training machine learning models with data from Azure Data Lake Storage while tracking experiments.

Exam trap

Microsoft often tests the distinction between general analytics platforms (Synapse, Databricks) and dedicated ML services (Azure Machine Learning), where candidates mistakenly choose Databricks for its notebook interface without recognizing the specific requirement for built-in experiment tracking.

How to eliminate wrong answers

Option A is wrong because Azure Synapse Analytics is an analytics service focused on big data and data warehousing, not a dedicated machine learning platform with built-in experiment tracking. Option B is wrong because Azure Databricks is a big data and AI platform based on Apache Spark, but it does not have native experiment tracking like Azure Machine Learning; it requires additional tools like MLflow for that purpose. Option D is wrong because Azure Data Studio is a database management and query tool for SQL Server and Azure SQL databases, not a collaborative notebook environment for machine learning with experiment tracking.

957
MCQmedium

A global social media platform allows users to like posts. The platform is designed to prioritize availability and partition tolerance over strong consistency across its globally distributed Azure Cosmos DB instance. When a user likes a post, the like count may not be immediately visible to all users, but it will eventually become consistent across all regions. Which consistency model does this application follow?

A.Strong consistency
B.Bounded staleness consistency
C.Session consistency
D.Eventual consistency
AnswerD

Eventual consistency is the weakest consistency level, prioritizing availability and low latency. It guarantees that if no new writes are made, all replicas will converge to the same state over time. This aligns with the platform's design goals.

Why this answer

Eventual consistency is the correct choice because the platform prioritizes availability and partition tolerance (AP from the CAP theorem) over strong consistency. In Azure Cosmos DB, eventual consistency guarantees that all replicas will converge to the same value over time without any ordering guarantees, which matches the scenario where like counts are not immediately visible but become consistent eventually.

Exam trap

The trap here is that candidates often confuse 'eventual consistency' with 'session consistency' because both involve delays, but session consistency is scoped to a single client session and provides stronger guarantees like monotonic reads, whereas eventual consistency has no such session-level guarantees and is the weakest model in Cosmos DB.

How to eliminate wrong answers

Option A is wrong because strong consistency guarantees linearizability—reads always return the most recent write—which would sacrifice availability and partition tolerance, contradicting the platform's design priorities. Option B is wrong because bounded staleness consistency allows a configurable lag (time or version count) but still provides a bounded guarantee, not the unbounded eventual convergence described. Option C is wrong because session consistency guarantees monotonic reads and writes within a single client session, but the scenario describes global, cross-region behavior without session scope.

958
Drag & Dropmedium

A data engineering team is designing an ELT (Extract, Load, Transform) pipeline using Azure Data Lake Storage Gen2. They will ingest raw sales data from multiple sources, store it in the data lake, transform it using Azure Databricks, and finally store the transformed data in a curated zone for analytics. Place the steps in the correct order for an ELT pipeline.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps
Order
1Step 1
2Step 2
3Step 3
4Step 4

Why this order

In an ELT pipeline, data is first extracted and loaded into the data lake, then transformed. This order follows the ELT paradigm: extract, load, transform.

959
MCQeasy

A company operates an online store where customers place orders and the system immediately updates inventory and records payments. This workload is best described as:

A.OLAP (Online Analytical Processing)
B.OLTP (Online Transaction Processing)
C.Batch processing
D.Data warehousing
AnswerB

Correct. The immediate processing of orders, inventory updates, and payments with ACID properties is a classic OLTP workload.

Why this answer

This workload is best described as OLTP because it involves real-time, high-frequency transactions that immediately update inventory and record payments. OLTP systems are designed for concurrent, atomic operations that maintain data integrity, which is exactly what an online store's order processing requires.

Exam trap

The trap here is that candidates confuse OLTP with batch processing because both involve data updates, but OLTP requires immediate, row-level transactions while batch processing defers updates to a scheduled window.

How to eliminate wrong answers

Option A is wrong because OLAP is used for complex analytical queries and aggregations over large historical datasets, not for real-time transactional updates. Option C is wrong because batch processing involves delayed, scheduled processing of data in bulk, whereas the scenario requires immediate updates. Option D is wrong because data warehousing is a repository for structured, historical data used for reporting and analysis, not for handling live transactional workloads.

960
MCQmedium

A media company stores video metadata in Azure Table Storage. Each video has a unique VideoID, and the application frequently queries for videos uploaded on a specific date. The current table uses PartitionKey = VideoID and RowKey = UploadDate. Queries filtering by UploadDate are slow and consume many transactions. Which design change will most optimize queries that retrieve all videos from a given date?

A.A. Use UploadDate as the RowKey only, but keep PartitionKey as VideoID.
B.B. Create a secondary index on UploadDate.
C.C. Change the PartitionKey to a date-based value (e.g., YYYY-MM-DD) and use VideoID as the RowKey.
D.D. Migrate the data to Azure Cosmos DB Table API for better indexing.
AnswerC

By using a date as the PartitionKey, all videos uploaded on the same date are stored in the same partition. A query filtering by date can then fetch all rows from that single partition using the PartitionKey, which is extremely fast and cost-efficient.

Why this answer

Option C is correct because Azure Table Storage queries are most efficient when the PartitionKey is used as the primary filter. By changing the PartitionKey to a date-based value (e.g., YYYY-MM-DD), queries for all videos uploaded on a specific date become partition scans, which are fast and consume minimal transactions. Using VideoID as the RowKey still allows unique identification of each video within that date partition.

Exam trap

The trap here is that candidates often assume secondary indexes (like in SQL databases) exist in Azure Table Storage, or they think changing RowKey alone is sufficient, failing to realize that PartitionKey is the only partition-level filter and must align with the query pattern.

How to eliminate wrong answers

Option A is wrong because keeping PartitionKey as VideoID and only using UploadDate as RowKey does not help; queries filtering by UploadDate would still require a full table scan since the PartitionKey is not used in the filter. Option B is wrong because Azure Table Storage does not support secondary indexes; it only provides a single index on (PartitionKey, RowKey). Option D is wrong because migrating to Azure Cosmos DB Table API would not inherently optimize the query; the same partition key design issue would persist, and the cost and complexity of migration are unnecessary when a simple schema redesign solves the problem.

961
MCQeasy

You are designing a data pipeline for a social media analytics platform. The pipeline needs to ingest posts from multiple sources (Twitter, Facebook) in real time, transform the data by adding sentiment scores, and store the results in a data store for later analysis. The transformation logic is simple and can be expressed as a SQL query. You want to minimize coding effort. Which Azure service should you use for the transformation step?

A.Azure Data Factory
B.Azure Databricks
C.Azure Functions
D.Azure Stream Analytics
AnswerD

SQL-based transformation for streaming data, low-code.

Why this answer

Azure Stream Analytics is the correct choice because it is designed for real-time data processing with SQL-like query language, allowing you to transform streaming data (e.g., from Twitter and Facebook) by adding sentiment scores using simple SQL expressions without writing custom code. It integrates natively with Azure Event Hubs or IoT Hub for ingestion and outputs to Azure SQL Database, Cosmos DB, or Blob Storage for analysis, minimizing coding effort.

Exam trap

The trap here is that candidates often confuse Azure Data Factory (batch ETL) with real-time stream processing, or assume Azure Functions is simpler for SQL-like transformations, but Stream Analytics is the only service that combines real-time ingestion, SQL-based transformation, and minimal coding effort.

How to eliminate wrong answers

Option A is wrong because Azure Data Factory is an orchestration and ETL service for batch data movement and transformation, not designed for real-time stream processing; it cannot handle sub-second latency or continuous SQL-based transformations on live streams. Option B is wrong because Azure Databricks is a big data analytics platform that requires writing Spark code (Python, Scala, or SQL) and managing clusters, which involves more coding effort than a simple SQL query on a stream. Option C is wrong because Azure Functions is a serverless compute service for event-driven code execution, but it requires writing custom code (e.g., C#, JavaScript) for each transformation, and it lacks native SQL-based stream processing capabilities, making it less efficient for simple SQL transformations on real-time data.

962
MCQhard

A manufacturing company collects sensor data from factory equipment as a continuous stream of events ingested into Azure Event Hubs. Additionally, the company receives daily inventory CSV files uploaded to Azure Data Lake Storage Gen2. The analytics team needs to build near real-time dashboards that combine streaming sensor data with batch inventory data, and also support historical reporting by querying data directly in the data lake using SQL without moving it. Which Azure service should they choose as the primary analytics platform?

A.Azure Synapse Analytics
B.Azure Stream Analytics
C.Azure Data Factory
D.Azure HDInsight with Spark
AnswerA

Correct. Azure Synapse Analytics unifies data ingestion, processing, and analytics, supporting both streaming (via Event Hubs integration) and batch (via PolyBase or serverless SQL pool to query data lake directly). It provides near real-time and historical analytics capabilities.

Why this answer

Azure Synapse Analytics is the correct choice because it provides a unified analytics platform that can ingest both real-time streaming data from Azure Event Hubs and batch data from Azure Data Lake Storage Gen2. Its SQL Serverless feature allows querying data directly in the data lake using T-SQL without moving it, enabling near real-time dashboards and historical reporting in a single service.

Exam trap

The trap here is that candidates often confuse Azure Stream Analytics as the primary platform for streaming data, overlooking that Synapse Analytics provides the unified query layer needed to combine streaming and batch data for both dashboards and historical reporting without additional services.

How to eliminate wrong answers

Option B is wrong because Azure Stream Analytics is a real-time stream processing engine that cannot directly query batch data in Azure Data Lake Storage Gen2 using SQL without moving it, nor does it support unified batch and streaming analytics for historical reporting. Option C is wrong because Azure Data Factory is an ETL and orchestration service for data movement and transformation, not an analytics platform for querying or building dashboards. Option D is wrong because Azure HDInsight with Spark requires data to be loaded into Spark DataFrames for querying, and it does not provide a serverless SQL endpoint to query data directly in the data lake without moving it, adding complexity for near real-time dashboards.

963
MCQeasy

Your company is migrating an on-premises SQL Server data warehouse to Azure. The solution must support both historical analytics and real-time reporting. Which Azure service should you recommend as the primary data store?

A.Azure Analysis Services
B.Azure Data Lake Storage Gen2
C.Azure SQL Database
D.Azure Synapse Analytics
AnswerD

Purpose-built cloud data warehouse with support for real-time analytics.

Why this answer

Azure Synapse Analytics is the correct choice because it is a cloud-native analytics service that unifies big data and data warehousing, supporting both historical analytics (via dedicated SQL pools for large-scale relational data warehousing) and real-time reporting (via serverless SQL pools or Apache Spark pools for streaming and interactive queries). It is designed to handle the migration of an on-premises SQL Server data warehouse while providing integrated capabilities for batch and real-time workloads.

Exam trap

The trap here is that candidates often confuse Azure SQL Database (an OLTP service) with a data warehouse solution, overlooking that Synapse Analytics is the dedicated Azure service for hybrid transactional/analytical processing (HTAP) and large-scale analytics workloads.

How to eliminate wrong answers

Option A is wrong because Azure Analysis Services is a semantic modeling and OLAP engine that provides curated data models for business intelligence, not a primary data store for raw historical and real-time data. Option B is wrong because Azure Data Lake Storage Gen2 is a scalable storage layer for big data analytics, but it lacks native SQL-based data warehousing and real-time query capabilities without additional compute services like Synapse or Databricks. Option C is wrong because Azure SQL Database is a transactional OLTP database optimized for online transaction processing, not designed for large-scale historical analytics or mixed workloads requiring both batch and real-time reporting.

964
MCQmedium

You are reviewing the Azure Data Factory mapping data flow configuration above. Which transformation is missing to ensure that only sales from the current year are loaded?

A.Derived column transformation
B.Aggregate transformation
C.Window transformation
D.Filter transformation
AnswerD

Filter can limit rows to current year.

Why this answer

The Filter transformation is used in mapping data flows to restrict rows based on a condition. To load only sales from the current year, you would apply a filter condition such as `year(SalesDate) == year(currentDate())`, which removes all rows not matching the current year. This is the correct transformation for row-level filtering.

Exam trap

The trap here is that candidates confuse column-level transformations (Derived column) with row-level filtering, assuming that extracting the year automatically filters data, whereas Filter is the only transformation that actually removes rows.

How to eliminate wrong answers

Option A is wrong because the Derived column transformation creates or modifies columns (e.g., extracting the year from a date), but it does not remove rows; it only adds or alters column values. Option B is wrong because the Aggregate transformation groups rows and computes summary statistics (e.g., sum, count), which would lose individual sales row details and is not designed for row filtering. Option C is wrong because the Window transformation performs calculations over a set of rows (e.g., running totals, ranking) without eliminating rows from the output.

965
Multi-Selectmedium

Which TWO of the following are valid relational database services in Azure?

Select 2 answers
A.Azure Data Lake Storage
B.Azure SQL Database
C.Azure Cosmos DB
D.Azure Cache for Redis
E.Azure Database for MySQL
AnswersB, E

Relational database service for SQL Server workloads.

Why this answer

Options A and D are correct. Azure SQL Database is a relational database as a service. Azure Database for MySQL is a relational database service.

Option B is wrong because Azure Cosmos DB is a NoSQL database. Option C is wrong because Azure Data Lake Storage is a storage service. Option E is wrong because Azure Cache for Redis is an in-memory cache.

966
Multi-Selecteasy

Which TWO Azure services are primarily used for batch processing of large data volumes?

Select 2 answers
A.Azure Databricks
B.Azure Data Lake Storage
C.Azure Synapse Pipelines
D.Azure Stream Analytics
E.Azure Logic Apps
AnswersA, C

Databricks is a unified analytics platform for batch and streaming.

Why this answer

Azure Databricks is correct because it provides an Apache Spark-based analytics platform that can process large volumes of data in batch mode using distributed computing. It allows you to run ETL jobs, transformations, and machine learning pipelines on data stored in Azure Data Lake Storage or other sources, making it ideal for batch processing.

Exam trap

The trap here is that candidates often confuse storage services (like Azure Data Lake Storage) with processing services, or they mistakenly think stream processing tools (like Stream Analytics) can handle batch workloads, when in fact batch processing requires tools designed for static, large-scale data transformations.

967
MCQmedium

A healthcare application stores patient medical records as JSON documents. Each document contains a variable set of fields depending on the patient's conditions. The application needs to query records by any field and support high write throughput. Which Azure data store is most appropriate?

A.Azure Blob Storage
B.Azure Synapse Analytics
C.Azure Cosmos DB with SQL API
D.Azure Table Storage
AnswerC

Correct because Cosmos DB supports schema-agnostic JSON documents, automatic indexing, and high throughput, ideal for this use case.

Why this answer

Azure Cosmos DB with SQL API is the most appropriate choice because it natively supports storing and querying JSON documents with variable schemas, enabling efficient queries on any field. Its multi-model architecture and configurable indexing policies allow high write throughput while maintaining low-latency queries, which is critical for healthcare applications with dynamic patient records.

Exam trap

The trap here is that candidates often confuse Azure Table Storage's key-value capabilities with JSON document support, but Table Storage does not allow querying on arbitrary fields within a JSON document—it only supports queries on the partition key and row key, making it unsuitable for variable-schema medical records.

How to eliminate wrong answers

Option A is wrong because Azure Blob Storage is an object store for unstructured binary data (e.g., images, backups) and does not support native JSON querying or indexing on arbitrary fields. Option B is wrong because Azure Synapse Analytics is a distributed analytics service designed for large-scale data warehousing and batch processing, not for high-throughput transactional writes or real-time queries on individual JSON documents. Option D is wrong because Azure Table Storage is a NoSQL key-value store that does not support querying on arbitrary fields within JSON documents—it requires a fixed partition key and row key schema, and lacks native JSON document querying capabilities.

968
MCQmedium

A retail company stores historical sales data from multiple stores in Azure Data Lake Storage Gen2 as CSV files. They need to run complex SQL queries that join and aggregate data across multiple files to generate weekly sales reports. They want a serverless query service that can directly query the data in the lake without loading it into a separate database. Which Azure service should they use?

A.Azure SQL Database
B.Azure Synapse Serverless SQL pool
C.Azure Stream Analytics
D.Azure Data Factory
AnswerB

Azure Synapse Serverless SQL pool enables serverless querying of data stored in Azure Data Lake Storage (Parquet, CSV, etc.) without needing to load data into a separate store. It scales automatically and charges per query.

Why this answer

Azure Synapse Serverless SQL pool is the correct choice because it provides a serverless, on-demand SQL query engine that can directly query CSV files stored in Azure Data Lake Storage Gen2 using T-SQL syntax. It supports complex joins and aggregations across multiple files without requiring data movement or loading into a separate database, making it ideal for ad-hoc reporting on data lakes.

Exam trap

The trap here is that candidates often confuse Azure Synapse Serverless SQL pool with Azure SQL Database, assuming both can query data lakes directly, but Azure SQL Database requires data to be imported first, while the serverless SQL pool is purpose-built for on-demand querying of data lake files.

How to eliminate wrong answers

Option A is wrong because Azure SQL Database is a fully managed relational database that requires data to be loaded into its storage; it cannot directly query CSV files in a data lake without an ETL process. Option C is wrong because Azure Stream Analytics is designed for real-time stream processing (e.g., from Event Hubs or IoT Hub) and is not suited for batch SQL queries on historical CSV files in a data lake. Option D is wrong because Azure Data Factory is an orchestration and ETL/ELT service used to move and transform data, not a query engine that can run interactive SQL queries directly against files in the lake.

969
MCQmedium

A mobile game company stores player scores in Azure Cosmos DB. Each document contains the fields PlayerID (unique to the player), GameID, Score, and Timestamp. The most common query is: 'Retrieve all scores for a specific GameID, ordered by Score descending.' Which property should be chosen as the partition key to minimize Request Unit (RU) consumption?

A.PlayerID
B.GameID
C.Score
D.Timestamp
AnswerB

GameID is the filter in the most common query. Using it as the partition key keeps all scores for the same game on one partition, making the query single-partition and efficient.

Why this answer

GameID is the correct partition key because the most common query filters on GameID, and using it as the partition key ensures that all documents for a given GameID are stored in the same physical partition. This allows the query to target a single partition, minimizing cross-partition fan-out and reducing Request Unit (RU) consumption. A partition key that matches the query filter is essential for efficient, low-latency reads in Azure Cosmos DB.

Exam trap

The trap here is that candidates often choose PlayerID because it is unique and seems like a natural key, but they overlook that a partition key must align with the most common query filter to avoid cross-partition queries and high RU costs.

How to eliminate wrong answers

Option A (PlayerID) is wrong because PlayerID is unique per player, so each partition would contain only one document, causing every query to fan out across all partitions and consume high RUs. Option C (Score) is wrong because Score is a high-cardinality, frequently changing value, which would lead to hot partitions and inefficient range queries; it also does not align with the query filter on GameID. Option D (Timestamp) is wrong because Timestamp is a monotonically increasing value that would create hot partitions (all writes to the latest partition) and does not group data by GameID, forcing cross-partition queries.

970
Drag & Dropmedium

Drag and drop the steps to configure a geo-replication for Azure Cosmos DB in the correct order.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps
Order

Why this order

Geo-replication is configured by selecting regions and enabling multi-region writes if needed, then saving to initiate replication.

971
MCQhard

An e-commerce application uses Azure SQL Database. During flash sales, the database experiences high CPU usage and query timeouts. The team needs a solution that can handle sudden increases in demand without downtime. Which scaling option should they choose?

A.Read scale-out
B.Hyperscale service tier
C.Elastic Pool
D.Geo-replication
AnswerB

Correct. Hyperscale can scale compute resources up to 100 vCores within seconds without downtime, ideal for handling sudden demand spikes.

Why this answer

The Hyperscale service tier is designed for high-performance, rapidly growing workloads that require instant scalability. It separates compute from storage, allowing compute nodes to be added or scaled up in seconds without downtime, making it ideal for handling sudden spikes in demand like flash sales.

Exam trap

The trap here is confusing 'scaling for demand spikes' with 'scaling for read-heavy workloads' or 'managing multiple databases,' leading candidates to incorrectly choose Read scale-out or Elastic Pool instead of the compute-scalable Hyperscale tier.

How to eliminate wrong answers

Option A is wrong because Read scale-out is a feature for offloading read-only queries to a replica, not for handling high CPU usage or write-heavy transactional spikes. Option C is wrong because Elastic Pools are designed for managing multiple databases with varying, predictable usage patterns, not for a single database experiencing sudden, extreme spikes. Option D is wrong because Geo-replication provides disaster recovery and read-scale capabilities, but does not directly address compute scaling or CPU bottlenecks during a demand surge.

972
MCQmedium

An e-commerce company uses Azure SQL Database for its product catalog. During promotional events, the database experiences unpredictable spikes in traffic. The company wants a solution that automatically adjusts compute resources based on demand without manual intervention. Which Azure SQL Database option should they use?

A.A) Read replicas
B.B) Active geo-replication
C.C) Serverless compute tier
D.D) Elastic pool
AnswerC

The serverless tier auto-scales compute and bills per second, perfect for unpredictable workloads.

Why this answer

The Serverless compute tier for Azure SQL Database automatically scales compute resources based on workload demand and pauses the database during idle periods, charging only for storage and used compute. This matches the requirement for handling unpredictable traffic spikes without manual intervention, as it provides instant scaling and cost efficiency for intermittent workloads.

Exam trap

The trap here is that candidates confuse the Serverless compute tier with elastic pools, assuming both handle scaling, but elastic pools scale resources across multiple databases, not automatically for a single database's unpredictable spikes.

How to eliminate wrong answers

Option A is wrong because read replicas are designed to offload read-only queries for reporting or analytics, not to automatically scale compute resources for write-heavy or unpredictable transactional spikes. Option B is wrong because active geo-replication provides disaster recovery and read-scale capabilities by maintaining synchronized replicas in different regions, but it does not dynamically adjust compute resources based on demand. Option D is wrong because elastic pools are used to share resources among multiple databases with predictable, aggregated usage patterns, not to automatically scale a single database's compute in response to unpredictable spikes.

973
MCQmedium

A data engineering team needs to build a batch processing pipeline that transforms large volumes of sales data stored in Azure Data Lake Storage Gen2. The transformations include aggregations and joins, and the output should be stored back in the data lake as Parquet files. The team wants a serverless compute option that automatically scales and charges per second. Which Azure service should they use?

A.Azure Synapse Analytics dedicated SQL pool
B.Azure Databricks with auto-scaling clusters
C.Azure Data Factory with mapping data flows
D.Azure Stream Analytics
AnswerB

Azure Databricks offers auto-scaling clusters and serverless compute options that scale down to zero, charging per second for the resources consumed, ideal for batch transformations on data lakes.

Why this answer

Azure Databricks with auto-scaling clusters is the correct choice because it provides a serverless, Apache Spark-based compute platform that automatically scales resources based on workload demand and charges per second (via serverless or low-concurrency modes). It is ideal for batch processing large volumes of data in Azure Data Lake Storage Gen2, supporting complex transformations like aggregations and joins, and can write output directly as Parquet files.

Exam trap

The trap here is that candidates often confuse Azure Data Factory mapping data flows (which is also serverless and scales automatically) with a Spark-based batch processing service, but Data Factory charges per execution duration and lacks native Spark API support for complex joins and aggregations at the same performance level as Databricks.

How to eliminate wrong answers

Option A is wrong because Azure Synapse Analytics dedicated SQL pool is a provisioned, non-serverless data warehouse that requires manual scaling and charges per hour, not per second, and is optimized for SQL-based analytics rather than Spark-based batch transformations. Option C is wrong because Azure Data Factory with mapping data flows is a code-free ETL service that scales to meet demand but charges per Data Flow Activity execution (based on cluster size and duration), not per second, and is less suited for complex Spark-native transformations like joins and aggregations at scale. Option D is wrong because Azure Stream Analytics is a real-time stream processing service, not designed for batch processing of large volumes of static data in Data Lake Storage Gen2, and it charges per streaming unit per hour.

974
MCQhard

A company stores IoT temperature readings in Azure Cosmos DB using the NoSQL API. Each document contains: DeviceID, Timestamp, Temperature, Location. Data is ingested at a rate of 10,000 documents per second from thousands of devices. The most common query is 'Get all readings for a specific DeviceID in the last hour.' Which partition key should be chosen to avoid hot partitions while still supporting the query efficiently?

A.DeviceID
B.Timestamp (e.g., per minute)
C.Location
D.A synthetic key combining DeviceID and Timestamp (e.g., DeviceID_yyyy-MM-dd-HH)
AnswerD

This distributes writes across partitions because the suffix changes each hour, preventing a single device from overloading one partition. For the query 'get readings for DeviceID in the last hour', the application can compute the exact partition key(s) for the relevant hour(s) and perform efficient point or limited cross-partition queries.

Why this answer

Option D is correct because a synthetic key combining DeviceID and Timestamp (e.g., DeviceID_yyyy-MM-dd-HH) ensures that data for a specific device is distributed across multiple physical partitions based on the hour, preventing a single partition from becoming a hot spot. This design still supports the most common query efficiently by allowing Cosmos DB to route the query to only the partitions containing the relevant hour's data, using the partition key in the WHERE clause.

Exam trap

The trap here is that candidates often choose DeviceID (Option A) because it seems natural for the query, but they overlook the hot partition problem caused by high-ingestion devices, failing to realize that partition key choice must balance query efficiency with write distribution.

How to eliminate wrong answers

Option A is wrong because using DeviceID alone as the partition key would cause a hot partition for any device that generates a high volume of data (e.g., thousands of readings per second), as all data for that device would be stored on a single physical partition, throttling performance. Option B is wrong because using Timestamp (e.g., per minute) would scatter data for a single device across many partitions, making the query 'Get all readings for a specific DeviceID in the last hour' require a cross-partition query (fan-out) that scans all partitions, which is inefficient and costly. Option C is wrong because Location is likely to have low cardinality (e.g., only a few distinct values), leading to uneven data distribution and hot partitions, and it does not directly support the query filtering by DeviceID and time range.

975
MCQhard

A manufacturing company collects sensor data from thousands of IoT devices. Each reading contains a device ID, timestamp, value, and device-specific measurement fields. The company needs to analyze the data in real time to detect anomalies and trigger alerts. They also need to store the same data for historical batch analysis to identify long-term trends. Which architecture pattern best describes this combination of data processing approaches?

A.Batch processing only
B.Stream processing only
C.Lambda architecture
D.Data lake
AnswerC

Lambda architecture combines batch and stream processing, allowing both real-time anomaly detection and historical batch analysis on the same data set.

Why this answer

The Lambda architecture is the correct pattern because it combines both stream processing for real-time anomaly detection and alerting, and batch processing for historical analysis of long-term trends. This architecture uses a speed layer for low-latency stream processing (e.g., Apache Kafka, Azure Stream Analytics) and a batch layer for comprehensive, accurate historical computations (e.g., Azure Data Lake, Apache Spark). The serving layer then merges results from both paths to provide a unified view.

Exam trap

The trap here is that candidates confuse a storage architecture (data lake) with a processing architecture pattern, or mistakenly think that either stream or batch processing alone can satisfy both real-time and historical requirements.

How to eliminate wrong answers

Option A is wrong because batch processing alone cannot handle real-time anomaly detection and alerting, as it processes data in large, scheduled intervals with high latency. Option B is wrong because stream processing alone is not designed for efficient historical batch analysis over long periods, as it focuses on low-latency, in-memory computations and typically does not retain full historical data for reprocessing. Option D is wrong because a data lake is a storage repository for raw data in its native format, not a processing architecture pattern that combines real-time and batch analytics.

Page 12

Page 13 of 14

Page 14