Microsoft Azure Data Fundamentals DP-900 (DP-900) — Questions 676750

982 questions total · 14pages · All types, answers revealed

Page 9

Page 10 of 14

Page 11
676
MCQhard

A data lake stores Parquet files in Azure Data Lake Storage Gen2, organized by date (e.g., /data/2023/01/15/). Analysts frequently run queries that filter on a specific date range. Which feature of Azure Data Lake Storage Gen2 directly enables efficient directory-level operations like renaming or moving entire date partitions without rewriting files?

A.Hierarchical namespace
B.Blob soft delete
C.Change feed
D.Immutable storage
AnswerA

Correct. The hierarchical namespace enables directory-level atomic operations, allowing efficient reorganization of partitions (e.g., moving a month's worth of data) without scanning or copying individual files.

Why this answer

The hierarchical namespace feature in Azure Data Lake Storage Gen2 enables true directory-level operations, such as renaming or moving entire partitions (e.g., /data/2023/01/15/), by treating directories as first-class objects. This allows atomic metadata operations without rewriting or copying the underlying Parquet files, which is essential for efficient partition management in data lake scenarios.

Exam trap

The trap here is that candidates often confuse the hierarchical namespace with general blob storage features like soft delete or change feed, mistakenly thinking those features provide directory-level management, when in fact only the hierarchical namespace enables atomic partition operations.

How to eliminate wrong answers

Option B is wrong because blob soft delete is a data protection feature that preserves deleted blobs for a retention period, not a mechanism for directory-level rename or move operations. Option C is wrong because the change feed provides a log of blob creation, modification, and deletion events for auditing or incremental processing, but it does not enable efficient directory-level operations. Option D is wrong because immutable storage (WORM policy) prevents blobs from being modified or deleted for a specified period, which would actually block the ability to rename or move partitions, not enable it.

677
MCQmedium

A company has multiple independent databases for different business units, each with low to moderate usage and varying workload patterns. They want to consolidate these databases into a single Azure SQL Database deployment option to share resources and reduce costs, while ensuring that databases do not starve each other of resources. Which Azure SQL Database deployment option should they choose?

A.Elastic pool
B.Single database (DTU model)
C.Managed Instance
D.Database per server (hyperscale)
AnswerA

Correct. Elastic pools provide a set of shared resources for multiple databases, optimizing cost and performance for databases with varying load patterns.

Why this answer

Elastic pools are designed for exactly this scenario: multiple databases with low to moderate usage and varying workload patterns. They allow databases to share a fixed pool of resources (eDTUs or vCores) while using built-in resource governance to prevent any single database from starving others, thus optimizing cost and performance.

Exam trap

The trap here is that candidates often choose Single database (DTU model) thinking it is the simplest option, but they overlook the cost and resource-sharing benefits of elastic pools for consolidating multiple low-usage databases with varying workloads.

How to eliminate wrong answers

Option B (Single database DTU model) is wrong because it allocates dedicated resources per database, which would be more expensive and wasteful for low-usage databases, and does not provide resource sharing or isolation across databases. Option C (Managed Instance) is wrong because it is a full SQL Server instance with dedicated resources, designed for lift-and-shift migrations, not for sharing resources across multiple independent databases with varying patterns. Option D (Database per server hyperscale) is wrong because hyperscale is a single-database tier for very large databases (up to 100 TB) with high throughput needs, not for consolidating multiple small databases, and it does not offer resource pooling across databases.

678
MCQeasy

A bank's online transaction processing system records every withdrawal and deposit in a database. The bank also runs a monthly report that summarizes total transactions per customer. Which statement correctly identifies these two workloads?

A.Both workloads are OLTP.
B.The transaction recording is OLTP, and the monthly report is OLAP.
C.The transaction recording is OLAP, and the monthly report is OLTP.
D.Both workloads are OLAP.
AnswerB

Correct. OLTP handles the real-time transaction recording, while OLAP analyzes the aggregated historical data for reporting.

Why this answer

The transaction recording system is an OLTP (Online Transaction Processing) workload because it handles individual, real-time transactions (withdrawals and deposits) with high concurrency and low latency. The monthly report summarizing total transactions per customer is an OLAP (Online Analytical Processing) workload because it aggregates historical data for reporting and analysis, typically using batch processing or columnar storage. Option B correctly pairs each workload with its appropriate processing type.

Exam trap

The trap here is that candidates confuse the purpose of the workload—thinking that any database operation is OLTP—and fail to recognize that analytical reporting, even if run on the same database, is an OLAP workload due to its aggregate nature and different performance requirements.

How to eliminate wrong answers

Option A is wrong because it incorrectly classifies both workloads as OLTP, ignoring that the monthly report involves aggregation and analysis, not real-time transaction processing. Option C is wrong because it reverses the roles, claiming transaction recording is OLAP (which is for analytical queries on large datasets) and the monthly report is OLTP (which is for transactional operations). Option D is wrong because it classifies both as OLAP, failing to recognize that the transaction recording system requires immediate, atomic writes characteristic of OLTP.

679
MCQeasy

A retail company stores product inventory data in a SQL database, customer reviews as JSON files, and product images as JPEG files. Which of the following accurately describes the types of data stored?

A.A. Only structured data is stored because the SQL database contains the primary records.
B.B. Only semi-structured and unstructured data is stored because JSON and images are not purely structured.
C.C. Only unstructured data is stored because images have no predefined schema.
D.D. Structured, semi-structured, and unstructured data are stored.
AnswerD

Correct. The SQL database contains structured data (rows and columns), JSON files contain semi-structured data (key-value pairs with some schema flexibility), and JPEG files contain unstructured data (no inherent structure). All three categories are represented.

Why this answer

The company stores product inventory data in a SQL database, which enforces a fixed schema (tables, rows, columns) and is therefore structured data. Customer reviews stored as JSON files are semi-structured because they have a flexible schema (key-value pairs) but no rigid table structure. Product images as JPEG files are unstructured because they lack any predefined schema or organization.

Option D correctly identifies that all three data types are present.

Exam trap

The trap here is that candidates often assume 'data type' is determined by the storage medium (e.g., SQL = structured only) rather than recognizing that a single system can store multiple data types, leading them to overlook the presence of semi-structured and unstructured data.

How to eliminate wrong answers

Option A is wrong because it incorrectly claims only structured data is stored, ignoring the JSON files (semi-structured) and JPEG images (unstructured). Option B is wrong because it omits the structured data from the SQL database, which is clearly structured. Option C is wrong because it states only unstructured data is stored, ignoring the structured SQL data and semi-structured JSON data.

680
MCQeasy

A company needs to store large amounts of unstructured data (videos and images) for a media streaming application. The data must be accessible via HTTP/HTTPS and support tiered storage for cost optimization. Which Azure storage solution should they choose?

A.Azure Disk Storage
B.Azure Blob Storage
C.Azure Table Storage
D.Azure Files
AnswerB

Azure Blob Storage supports unstructured data, HTTP/HTTPS access, and tiered storage.

Why this answer

Azure Blob Storage is designed for unstructured data, supports HTTP access, and offers hot, cool, and archive tiers. Option A (Azure Files) is for file shares; Option B (Azure Disk Storage) is for VM disks; Option D (Azure Table Storage) is for NoSQL key-value data.

681
MCQmedium

Your company runs a sales analytics dashboard on Power BI that refreshes every hour from Azure Synapse Analytics. During peak hours, the dashboard refresh fails with a 'timeout' error. Which action should you take FIRST to resolve the issue?

A.Scale up the Azure Synapse dedicated SQL pool to handle more concurrent queries.
B.Configure the dashboard to use DirectQuery instead of Import mode.
C.Implement incremental refresh in Power BI to refresh only changed data.
D.Export the data to CSV files and load into Power BI from Azure Blob Storage.
AnswerC

Incremental refresh reduces the data volume per refresh, lowering the load and timeout risk.

Why this answer

Option C is correct because implementing incremental refresh in Power BI reduces the amount of data loaded during each refresh cycle, which directly addresses timeout errors by limiting the refresh to only changed or new data rather than the entire dataset. This is the most efficient first step to reduce refresh duration without changing the underlying architecture or data source connection mode.

Exam trap

The trap here is that candidates often assume scaling the source (Option A) or changing the connection mode (Option B) is the immediate fix, but the DP-900 exam emphasizes that incremental refresh is the primary technique to optimize refresh performance for large datasets without altering the underlying infrastructure.

How to eliminate wrong answers

Option A is wrong because scaling up the Azure Synapse dedicated SQL pool increases compute resources for concurrent queries but does not address the root cause of a Power BI refresh timeout, which is typically due to the volume of data being transferred or the complexity of the refresh operation. Option B is wrong because switching to DirectQuery would eliminate the scheduled refresh entirely but would introduce query-time latency and potentially degrade dashboard performance during peak hours, as each visual would query the source directly. Option D is wrong because exporting data to CSV files and loading from Azure Blob Storage adds unnecessary complexity, introduces data staleness, and does not solve the timeout issue—it merely shifts the data movement bottleneck.

682
MCQmedium

An organization has a data lake that contains both structured and unstructured data. They need to catalog the data assets and enable data discovery for users. Which Azure service should they use?

A.Azure Data Factory
B.Microsoft Purview
C.Azure Synapse Analytics
D.Azure Data Lake Storage
AnswerB

Data governance and cataloging service.

Why this answer

Microsoft Purview is a unified data governance service that helps you manage and govern your on-premises, multicloud, and software-as-a-service (SaaS) data. It provides automated data discovery, sensitive data classification, and end-to-end data lineage, making it the correct choice for cataloging both structured and unstructured data assets in a data lake and enabling data discovery for users.

Exam trap

The trap here is that candidates often confuse Azure Data Factory’s data movement and transformation capabilities with data cataloging, or they assume Azure Synapse Analytics includes a built-in catalog, when in fact Microsoft Purview is the dedicated service for data discovery and governance.

How to eliminate wrong answers

Option A is wrong because Azure Data Factory is a data integration and orchestration service used to create, schedule, and manage ETL/ELT pipelines; it does not provide a persistent catalog or data discovery capabilities. Option C is wrong because Azure Synapse Analytics is an analytics service that combines big data and data warehousing, offering querying and processing engines, but it lacks the dedicated data cataloging and governance features needed for asset discovery across a data lake. Option D is wrong because Azure Data Lake Storage is a scalable and secure data lake storage solution for big data analytics; it is the underlying storage layer and does not include cataloging or discovery functionality.

683
MCQmedium

A travel booking application stores booking data in Azure Cosmos DB using the NoSQL API. Each booking document contains: BookingID (unique), UserID, Destination, TravelDate, Price. The most common query is: 'Retrieve all bookings for a specific UserID, sorted by TravelDate descending.' To minimize Request Unit (RU) consumption, which property should be chosen as the partition key?

A.BookingID
B.UserID
C.Destination
D.TravelDate
AnswerB

UserID is the filter in the common query. With UserID as partition key, all bookings for a user reside in one partition, making queries efficient and reducing RU consumption.

Why this answer

UserID is the correct partition key because the most common query filters on UserID, and Cosmos DB routes queries to the exact physical partition(s) containing that UserID's data. This minimizes cross-partition fan-out, reducing RU consumption. A partition key should align with the primary query filter to enable efficient point-read or single-partition query execution.

Exam trap

The trap here is that candidates often pick a high-cardinality key like BookingID or a date-based key like TravelDate, thinking uniqueness or time-ordering helps, but they ignore that the partition key must match the most frequent query filter to avoid cross-partition queries and high RU costs.

How to eliminate wrong answers

Option A (BookingID) is wrong because it would scatter each booking across partitions, forcing every query to fan out to all partitions to find bookings for a specific UserID, increasing RU cost. Option C (Destination) is wrong because queries filter by UserID, not Destination; using Destination would still require a cross-partition query unless the filter also included Destination, and it would not collocate all bookings for a single user. Option D (TravelDate) is wrong because it would spread a single user's bookings across many partitions (one per date), again causing cross-partition queries and high RU consumption for the common query pattern.

684
MCQeasy

A company stores customer reviews for an e-commerce site. Each review contains a product ID, user ID, rating, and optional comments and images. The reviews are written once and rarely updated. The company needs to query reviews by product ID with low latency and also perform simple key-value lookups. They want a cost-effective, serverless solution that requires no scaling management. Which Azure data store should they choose?

A.Azure Cosmos DB SQL API
B.Azure Table Storage
C.Azure Blob Storage
D.Azure SQL Database
AnswerB

Azure Table Storage is a serverless, cost-effective key-value store ideal for storing and querying semi-structured data by a key (product ID). It meets the latency and budget requirements.

Why this answer

Azure Table Storage is a cost-effective, serverless NoSQL key-value store that supports simple key-value lookups and querying by partition key (e.g., ProductID) with low latency. It requires no scaling management, as it automatically scales based on demand, and is ideal for immutable, rarely-updated data like customer reviews. The pay-per-request pricing model makes it highly cost-effective for this workload.

Exam trap

The trap here is that candidates often choose Azure Cosmos DB for any NoSQL scenario, overlooking that Azure Table Storage is the simpler, more cost-effective serverless option for basic key-value workloads without global distribution or complex querying needs.

How to eliminate wrong answers

Option A is wrong because Azure Cosmos DB SQL API is a globally distributed, multi-model database that is overkill and more expensive for simple key-value lookups on rarely-updated data, and it requires throughput provisioning (RU/s) rather than being truly serverless with no scaling management. Option C is wrong because Azure Blob Storage is designed for unstructured binary data (images, videos) and does not support efficient key-value lookups or querying by product ID with low latency; it lacks native indexing for such queries. Option D is wrong because Azure SQL Database is a relational database that requires schema management, scaling configuration, and is not serverless by default (unless using the serverless tier, which still incurs compute costs and is not optimized for simple key-value lookups).

685
MCQeasy

A social media platform stores user posts as JSON documents. Each document contains text content, image URLs, timestamps, and user tags. The structure is consistent for most fields, but users can add custom key-value pairs. How should this data be classified?

A.Structured data
B.Semi-structured data
C.Unstructured data
D.Relational data
AnswerB

Semi-structured data has some organizational properties (like tags or key-value pairs) but does not require a fixed schema. JSON documents with optional fields fit this category.

Why this answer

The data is semi-structured because it has a consistent schema for most fields (text, image URLs, timestamps, user tags) but allows custom key-value pairs, which introduces schema flexibility. JSON documents inherently support this mix of fixed and variable attributes, fitting the semi-structured data classification. This aligns with Azure Cosmos DB's handling of JSON items, where each document can have a different set of properties.

Exam trap

Microsoft often tests the misconception that any data with a consistent field is structured, but the presence of optional custom key-value pairs makes it semi-structured, not structured.

How to eliminate wrong answers

Option A is wrong because structured data requires a rigid schema with fixed columns and data types (e.g., a SQL table), but JSON documents with optional custom fields violate that strict schema. Option C is wrong because unstructured data has no predefined structure or organization (e.g., raw text files, images, videos), whereas JSON documents have a defined format with keys and values. Option D is wrong because relational data specifically refers to data organized into tables with rows and columns linked by foreign keys, which JSON documents do not enforce.

686
MCQeasy

A company stores customer names and addresses in a relational table, product descriptions as JSON files, and product images as JPEG files. Which of the following correctly classifies these data types from most structured to least structured?

A.Structured (customer table), Semi-structured (JSON), Unstructured (JPEG)
B.Structured (customer table), Unstructured (JSON), Semi-structured (JPEG)
C.Semi-structured (customer table), Structured (JSON), Unstructured (JPEG)
D.Unstructured (customer table), Structured (JSON), Semi-structured (JPEG)
AnswerA

This is correct because a table with fixed columns is structured, JSON allows varying fields (semi-structured), and JPEG images have no internal structure (unstructured).

Why this answer

A is correct because structured data (customer table) has a fixed schema with rows and columns, semi-structured data (JSON) uses tags or key-value pairs without a rigid schema, and unstructured data (JPEG) has no predefined structure. The question tests the standard classification hierarchy from most to least structured.

Exam trap

The trap here is confusing semi-structured (JSON) with unstructured (JPEG) because both lack a rigid schema, but JSON has a logical structure (key-value pairs) while JPEG is raw binary data.

How to eliminate wrong answers

Option B is wrong because JSON is semi-structured (not unstructured) as it has a flexible schema with key-value pairs, and JPEG is unstructured (not semi-structured) as it lacks any schema or metadata hierarchy. Option C is wrong because a relational table is structured (not semi-structured) with a fixed schema, and JSON is semi-structured (not structured) as it does not enforce a rigid schema. Option D is wrong because a relational table is structured (not unstructured), JSON is semi-structured (not structured), and JPEG is unstructured (not semi-structured).

687
MCQeasy

You need to run a complex T-SQL query that joins tables from Azure SQL Database and Azure SQL Managed Instance in a single query. Which feature should you use?

A.Database Mail
B.PolyBase
C.Linked server
D.Elastic Query
AnswerD

Elastic Query enables querying multiple databases in Azure SQL Database and Managed Instance.

Why this answer

Elastic Query (Option D) is the correct feature because it enables running complex T-SQL queries that span multiple databases in Azure SQL Database and Azure SQL Managed Instance, allowing you to join tables across these services in a single query. It uses a logical database as a shard map manager to distribute queries across remote databases, supporting cross-instance and cross-database queries without data movement.

Exam trap

The trap here is that candidates often confuse PolyBase with Elastic Query because both involve querying external data, but PolyBase is for non-relational or file-based sources, while Elastic Query is specifically for cross-database queries within Azure SQL Database and Azure SQL Managed Instance.

How to eliminate wrong answers

Option A is wrong because Database Mail is a feature for sending email messages from SQL Server, not for querying across databases or instances. Option B is wrong because PolyBase is designed for querying external data sources like Hadoop or Azure Blob Storage using T-SQL, not for joining tables across Azure SQL Database and Azure SQL Managed Instance. Option C is wrong because Linked Server is a SQL Server feature for connecting to remote data sources, but it is not supported for cross-service queries between Azure SQL Database and Azure SQL Managed Instance; it works only within on-premises or SQL Server VMs, and Azure SQL Database does not support linked servers to external instances.

688
MCQmedium

A company uses Azure SQL Database to store customer order data. They need to automatically track changes to the 'OrderStatus' column in the 'Orders' table. They want to be able to query the current status and also easily retrieve historical status changes for a given order without writing custom triggers or history tables. Which feature should they enable?

A.Change Data Capture (CDC)
B.Temporal tables
C.Automatic tuning
D.Geo-replication
AnswerB

Temporal tables are a built-in feature that automatically keeps a full history of changes within the same table using period columns. You can query both the current state and historical states without custom logic.

Why this answer

Temporal tables (system-versioned) automatically track full row history, including changes to the 'OrderStatus' column, by maintaining a paired history table. This allows querying both current and historical states with simple T-SQL clauses like FOR SYSTEM_TIME, without custom triggers or manual history tables.

Exam trap

The trap here is confusing Change Data Capture (CDC) with temporal tables, as both involve change tracking, but CDC is for streaming changes to downstream systems, not for querying historical row states per key with point-in-time accuracy.

How to eliminate wrong answers

Option A is wrong because Change Data Capture (CDC) captures changes at the transaction log level for incremental data loading or replication, not for querying historical status per row with point-in-time accuracy. Option C is wrong because Automatic tuning optimizes query performance (e.g., index recommendations, plan forcing) and does not track historical data changes. Option D is wrong because Geo-replication provides disaster recovery and read-scale by replicating the database to a secondary region, with no capability to track or query historical row changes.

689
MCQmedium

You have applied the lifecycle management policy shown in the exhibit to an Azure Storage account. A blob named 'logs/error.log' was last modified 200 days ago. In which tier is the blob currently stored?

A.Hot tier
B.The blob has been deleted
C.Cool tier
D.Archive tier
AnswerD

The policy archives blobs after 90 days; at 200 days, the blob is in Archive.

Why this answer

The policy moves blobs to Cool after 30 days and to Archive after 90 days. Since the blob was modified 200 days ago, it has already been moved to Archive after 90 days. The delete action occurs after 365 days, so it has not been deleted yet.

Therefore, the blob is in the Archive tier.

690
MCQhard

A manufacturing company has a streaming data pipeline that ingests sensor data from factory equipment into Azure Event Hubs. The data must be prepared for reporting by cleaning invalid records, removing duplicates, and aggregating readings into 5-minute windows. The transformed data needs to be stored in a columnar format in a data lake to support efficient querying by data analysts using SQL. Which Azure service should perform the data transformation and loading?

A.Azure Data Factory
B.Azure Databricks
C.Azure Stream Analytics
D.Azure Synapse Pipelines
AnswerC

Azure Stream Analytics is a serverless real-time analytics service that can ingest data from Event Hubs, perform time-windowed aggregations, clean data, and output to Azure Data Lake Storage in the desired columnar format. It is the most straightforward and cost-effective choice for this streaming ETL scenario.

Why this answer

Azure Stream Analytics is the correct choice because it is designed for real-time stream processing, directly consuming data from Azure Event Hubs, performing transformations like cleaning invalid records, removing duplicates, and aggregating over tumbling windows (e.g., 5-minute windows), and outputting the results in a columnar format (e.g., Parquet) to Azure Data Lake Storage. This aligns perfectly with the requirement for a low-latency, continuous transformation pipeline without needing additional orchestration or compute clusters.

Exam trap

The trap here is that candidates often confuse Azure Data Factory or Synapse Pipelines as suitable for streaming transformations because they see 'pipeline' or 'data movement' keywords, but these services are batch-oriented and cannot perform real-time windowed aggregations directly from Event Hubs.

How to eliminate wrong answers

Option A is wrong because Azure Data Factory is primarily an orchestration and data movement service for batch workloads, not a real-time stream processing engine; it cannot natively perform windowed aggregations on streaming data from Event Hubs. Option B is wrong because Azure Databricks is a powerful analytics platform that can handle streaming via Structured Streaming, but it requires provisioning and managing a Spark cluster, which is overkill for a simple transformation and loading task that Stream Analytics can handle more cost-effectively and with less operational overhead. Option D is wrong because Azure Synapse Pipelines (now part of Azure Synapse Analytics) is essentially Azure Data Factory's orchestration capabilities within Synapse, inheriting the same batch-oriented limitations and lacking native stream processing for Event Hubs.

691
MCQmedium

An administrator runs an Azure CLI command to update an Azure SQL Database's service objective. The output shows the above properties. The database is currently at S2 and the administrator wants to scale to S3. What is the issue?

A.The command did not specify the new service objective
B.The property names are incorrect
C.The maxSizeBytes value is invalid
D.The Azure CLI does not support scaling Azure SQL Database
AnswerA

The output shows the current state; the requested objective remains S2, so no change was made.

Why this answer

Option A is correct because the properties show the current service objective but the command likely did not change it. The requestedServiceObjectiveName is still S2, meaning the update was not applied. Option B is wrong because maxSizeBytes is within range for S2.

Option C is wrong because the property names are correct. Option D is wrong because the CLI command can update service objective.

692
MCQmedium

A retail company runs its legacy order management application on an on-premises SQL Server. They plan to migrate to Azure with minimal application changes and need high availability with automatic failover to a secondary Azure region. They also require full database-level isolation and the ability to use SQL Server Agent jobs. Which Azure deployment option should they choose?

A.Azure SQL Database single database
B.Azure SQL Database elastic pool
C.Azure SQL Managed Instance
D.SQL Server on Azure Virtual Machine
AnswerC

Correct. SQL Managed Instance offers high compatibility with SQL Server, supports SQL Server Agent, and enables auto-failover groups for disaster recovery.

Why this answer

Azure SQL Managed Instance (C) is correct because it provides near-100% compatibility with on-premises SQL Server, including full database-level isolation (a dedicated instance) and full support for SQL Server Agent jobs. It also supports auto-failover groups for high availability with automatic failover to a secondary Azure region, meeting the migration requirement with minimal application changes.

Exam trap

The trap here is that candidates often confuse Azure SQL Database (single or elastic pool) with SQL Managed Instance, overlooking that SQL Agent jobs and full instance-level isolation are exclusive to Managed Instance, while also mistakenly thinking that SQL Server on Azure VM is the only option for high availability with automatic failover.

How to eliminate wrong answers

Option A is wrong because Azure SQL Database single database does not provide full database-level isolation (it runs in a shared logical server) and does not support SQL Server Agent jobs. Option B is wrong because Azure SQL Database elastic pool is a multi-tenant resource-sharing model that also lacks SQL Server Agent job support and full instance-level isolation. Option D is wrong because SQL Server on Azure Virtual Machine requires manual configuration of Always On Availability Groups for automatic failover to a secondary region, and it does not offer the same managed experience with minimal application changes as Azure SQL Managed Instance.

693
Drag & Dropmedium

Drag and drop the steps to configure a firewall rule for Azure SQL Database in the correct order.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps
Order

Why this order

Firewall rules are set at the server level to allow client IP addresses to access the database.

694
MCQmedium

A manufacturing company collects sensor readings from thousands of IoT devices. Each reading consists of a device ID, a timestamp, and a numeric value. The data is stored as key-value pairs and must support low-latency reads and writes at a global scale. The company also needs to query the data by device ID and time range. Which Azure Cosmos DB API should they choose?

A.Core (SQL) API
B.MongoDB API
C.Table API
D.Gremlin API
AnswerC

The Table API is built for key-value workloads and stores data as items with a partition key and row key. It allows efficient point reads and range queries, making it ideal for IoT sensor data.

Why this answer

The Table API is the correct choice because it is designed for key-value workloads with a schema-less design, supporting low-latency reads and writes at global scale. It allows querying by partition key (device ID) and row key (timestamp) to efficiently retrieve data by device ID and time range, matching the IoT sensor data requirements.

Exam trap

The trap here is that candidates often choose the Core (SQL) API because they associate SQL with querying, but the Table API is specifically built for key-value and time-series workloads with composite key queries, which is the exact pattern described.

How to eliminate wrong answers

Option A is wrong because the Core (SQL) API is optimized for document-based queries with SQL-like syntax, not for simple key-value lookups with partition and row keys, and it adds unnecessary complexity for this use case. Option B is wrong because the MongoDB API is designed for document databases with BSON format and is not optimized for key-value pair storage with composite key queries by device ID and timestamp. Option D is wrong because the Gremlin API is for graph databases used to model relationships between entities, not for key-value or time-series data from IoT devices.

695
Multi-Selecteasy

Which TWO are advantages of using a NoSQL database like Azure Cosmos DB over a relational database like Azure SQL Database?

Select 2 answers
A.Support for complex joins
B.ACID transactions across multiple records
C.Flexible schema design
D.Horizontal scaling across multiple regions
E.Enforced referential integrity
AnswersC, D

NoSQL allows schema-less data models.

Why this answer

Option C is correct because NoSQL databases like Azure Cosmos DB are schema-agnostic, allowing each document or item to have a different structure without requiring migrations or predefined table schemas. This flexibility is ideal for rapidly evolving applications or when ingesting heterogeneous data, whereas Azure SQL Database enforces a rigid schema that must be defined and altered explicitly.

Exam trap

The trap here is that candidates confuse the ACID support in NoSQL databases (which is limited to single-document operations) with the full multi-record ACID transactions of relational databases, leading them to incorrectly select Option B.

696
MCQmedium

A company wants to analyze data from multiple Azure SQL Databases using a single query. Which Azure service should they use?

A.Azure SQL Database elastic query
B.Azure SQL Database failover groups
C.Azure Synapse Link
D.Azure Stream Analytics
AnswerA

Enables querying across multiple databases using external tables.

Why this answer

Option C is correct because Azure SQL Database elastic query allows querying across multiple databases from a single connection point. Option A is wrong because Azure Synapse Link is for near-real-time analytics on operational data, but not specifically for querying multiple databases with a single query. Option B is wrong because Azure SQL Database failover groups are for high availability, not cross-database queries.

Option D is wrong because Azure Stream Analytics is for real-time stream processing, not ad-hoc queries across databases.

697
MCQmedium

A company stores customer support chat transcripts as plain text files in Azure Blob Storage. The files are accessed frequently for the first 30 days, then infrequently for the next 2 years, and after that must be retained for 7 years for compliance but are rarely accessed. The company wants to minimize storage costs by automatically moving data through appropriate access tiers. Which Azure Blob Storage lifecycle management policy should they implement?

A.Move blobs from Hot to Cool after 30 days, then to Archive after 2 years
B.Store all data in Hot tier for the full retention period
C.Move blobs from Hot to Archive after 30 days and delete after 2 years
D.Store all data in Cool tier for the first 30 days, then move to Archive
AnswerA

This policy correctly matches the access pattern: Hot tier for frequent initial access, Cool for infrequent intermediate access (still retained for 2 years but accessed rarely), and Archive for long-term compliance retention where data is rarely accessed and retrieval latency is acceptable.

Why this answer

Option A is correct because the lifecycle management policy matches the access pattern: move blobs from Hot (frequent access for first 30 days) to Cool (infrequent access for next 2 years) after 30 days, then to Archive (rare access for 7-year compliance) after 2 years. This minimizes storage costs by using the cheapest tier for each phase while retaining data for the required 7-year compliance period.

Exam trap

The trap here is that candidates may overlook the rehydration latency of the Archive tier and incorrectly move data to Archive during a period of frequent access, or fail to account for the full compliance retention period when choosing deletion actions.

How to eliminate wrong answers

Option B is wrong because storing all data in the Hot tier for the full retention period incurs the highest storage cost, ignoring the infrequent and rare access phases. Option C is wrong because moving blobs directly to Archive after 30 days makes them inaccessible for frequent access (Archive has a 15-minute to 15-hour rehydration latency) and deleting after 2 years violates the 7-year compliance retention requirement. Option D is wrong because storing data in the Cool tier for the first 30 days does not match the frequent access pattern (Cool tier has higher access costs and lower availability than Hot) and fails to use the Hot tier for the initial high-access period.

698
Multi-Selecteasy

Which TWO Azure services can be used to store semi-structured data like JSON or Parquet files for analytics? (Choose two.)

Select 2 answers
A.Azure Synapse Analytics
B.Azure Blob Storage
C.Azure SQL Database
D.Azure Data Lake Storage Gen2
E.Azure Cosmos DB
AnswersB, D

Object storage, can store any file type.

Why this answer

Azure Blob Storage is a highly scalable, cost-effective object storage service that can store any type of unstructured data, including semi-structured formats like JSON and Parquet. It is commonly used as a data lake for analytics workloads, where raw or processed data is stored and later queried by services like Azure Synapse Analytics or Azure Databricks.

Exam trap

The trap here is that candidates often confuse Azure Synapse Analytics (a query service) with a storage service, or think Azure Cosmos DB is suitable for storing large Parquet files for analytics, when it is actually a transactional NoSQL database with a different cost and performance profile.

699
MCQeasy

A logistics company uses IoT sensors on delivery trucks to transmit GPS location, speed, and engine diagnostics every 10 seconds. The data is ingested into Azure Event Hubs. The company needs to analyze the data in real time to identify speeding trucks and send alerts. The analysis requires joining the live sensor data with a reference table of truck details (e.g., driver name, route number) stored in Azure SQL Database. Which Azure service should they use for the real-time processing?

A.Azure Stream Analytics
B.Azure Synapse Analytics dedicated SQL pool
C.Azure Data Factory
D.Azure Databricks
AnswerA

Stream Analytics is designed for real-time data processing from Event Hubs and can join streams with reference data from SQL Database to enrich and detect patterns like speeding.

Why this answer

Azure Stream Analytics is the correct choice because it is a real-time event processing engine designed to handle streaming data from sources like Azure Event Hubs. It can perform temporal joins between the live IoT sensor stream and a static reference table (e.g., truck details from Azure SQL Database) to enrich the data and trigger alerts when speeding is detected, all with sub-second latency.

Exam trap

The trap here is that candidates often confuse batch-oriented services like Azure Synapse Analytics or Azure Data Factory with real-time processing, or they overcomplicate the solution by choosing Azure Databricks when a simpler, purpose-built service like Stream Analytics is sufficient for the join-and-alert pattern.

How to eliminate wrong answers

Option B is wrong because Azure Synapse Analytics dedicated SQL pool is a massively parallel processing (MPP) data warehouse optimized for large-scale batch analytics and complex queries on historical data, not for real-time stream processing with sub-second latency. Option C is wrong because Azure Data Factory is a cloud-based ETL and data orchestration service designed for scheduled, batch-oriented data movement and transformation, not for continuous, low-latency stream processing. Option D is wrong because Azure Databricks is a unified analytics platform that can process streaming data using Structured Streaming, but it is overkill for this simple join-and-alert scenario; it requires more complex setup, cluster management, and is not as straightforward as Stream Analytics for directly joining Event Hubs data with Azure SQL reference data.

700
MCQmedium

A company stores terabytes of historical log data in Azure Blob Storage. The data is rarely accessed but must be retained for 10 years for compliance. The company wants to minimize storage costs. Which storage tier should you use?

A.Cool tier
B.Archive tier
C.Hot tier
D.Premium tier
AnswerB

Archive tier is the lowest cost for long-term retention.

Why this answer

The Archive tier is the correct choice because it is designed for data that is rarely accessed and has a flexible retrieval latency (hours), making it ideal for long-term retention of historical logs. It offers the lowest storage cost among Azure Blob Storage tiers, which directly minimizes costs for data that must be kept for 10 years but is seldom read.

Exam trap

The trap here is that candidates often confuse the Archive tier's low storage cost with immediate accessibility, forgetting that retrieval latency and rehydration costs apply, but the question explicitly states 'rarely accessed' and 'minimize storage costs,' making Archive the clear choice.

How to eliminate wrong answers

Option A is wrong because the Cool tier is optimized for data accessed infrequently (e.g., every 30 days) but still incurs higher storage costs than Archive and has a minimum storage duration of 30 days, making it less cost-effective for 10-year retention. Option C is wrong because the Hot tier is designed for frequently accessed data with the highest storage cost, which would unnecessarily increase expenses for rarely accessed logs. Option D is wrong because the Premium tier uses SSD-backed storage for low-latency, high-transaction workloads and is the most expensive option, completely unsuitable for archival data.

701
MCQmedium

A company needs to ingest data from an on-premises SQL Server database into Azure SQL Database every hour. During the ingestion, they need to filter out rows where Status = 'Inactive' and convert a date column to a different format. They want a cloud-based, code-free solution that can schedule and orchestrate this task. Which Azure service should they use?

A.Azure Logic Apps
B.Azure Data Factory with Mapping Data Flows
C.Azure Functions
D.Azure SQL Database Change Data Capture
AnswerB

Azure Data Factory provides mapping data flows, a visual designer for building data transformations at scale. It integrates with on-premises data via self-hosted integration runtime, supports scheduling, and requires no code, making it the ideal choice.

Why this answer

Azure Data Factory with Mapping Data Flows is the correct choice because it provides a cloud-based, code-free ETL service that can ingest data from on-premises SQL Server into Azure SQL Database, apply transformations like filtering rows (Status = 'Inactive') and converting date formats, and schedule the task using triggers. Mapping Data Flows run on Spark clusters and allow visual data transformation without writing code, making it ideal for this orchestrated, scheduled ingestion.

Exam trap

The trap here is that candidates often confuse Azure Logic Apps with Azure Data Factory because both can schedule and orchestrate tasks, but Logic Apps lacks the native data transformation capabilities (like filtering and date conversion) required for ETL workloads, making Data Factory with Mapping Data Flows the correct choice for code-free data transformation.

How to eliminate wrong answers

Option A is wrong because Azure Logic Apps is a workflow automation service that can connect to on-premises SQL Server via the on-premises data gateway, but it lacks native data transformation capabilities for filtering rows and converting date formats within the data flow; it is designed for lightweight integration and orchestration, not for complex ETL transformations. Option C is wrong because Azure Functions is a serverless compute service that requires writing custom code (e.g., C#, Python) to perform the ingestion and transformation, which contradicts the requirement for a code-free solution. Option D is wrong because Azure SQL Database Change Data Capture (CDC) is a feature that tracks changes in a database for incremental data capture, but it does not provide scheduling, orchestration, or transformation capabilities; it is a data capture mechanism, not an ETL or orchestration service.

702
MCQeasy

A company needs to migrate a large on-premises SQL Server database to Azure. The database uses features like SQL Server Agent jobs, cross-database queries, and common language runtime (CLR) assemblies. Which Azure service supports these features?

A.SQL Server on Azure Virtual Machines
B.Azure SQL Database
C.Azure Database for PostgreSQL
D.Azure SQL Managed Instance
AnswerD

Managed Instance supports SQL Agent, cross-database queries, and CLR.

Why this answer

Option B is correct because Azure SQL Managed Instance supports SQL Agent, cross-database queries, and CLR. Option A (Azure SQL Database) does not support SQL Agent or cross-database queries. Option C (SQL Server on Azure VM) supports all features but is not a managed service.

Option D (Azure Database for PostgreSQL) is a different engine.

703
MCQmedium

Refer to the exhibit. You execute the above T-SQL statements in Azure Synapse Analytics. What is the purpose of this code?

A.To create an external table that can query Parquet files stored in Azure Data Lake Storage Gen2.
B.To create a view over the Parquet files.
C.To import data from Parquet files into a permanent table in Synapse.
D.To create a regular table in the Synapse database.
AnswerA

PolyBase external tables enable querying external data.

Why this answer

The T-SQL code creates an external data source pointing to Azure Data Lake Storage Gen2, an external file format for Parquet, and an external table that references the Parquet files. This allows querying the Parquet files directly without importing them into the database, which is the definition of an external table in Azure Synapse Analytics.

Exam trap

The trap here is that candidates confuse an external table (which reads files in place) with importing data into a permanent table or creating a view, because the syntax resembles regular table creation but includes external source and format clauses.

How to eliminate wrong answers

Option B is wrong because the code creates an external table, not a view; a view is a saved SELECT query that does not define a schema over external files. Option C is wrong because the code does not use CREATE TABLE AS SELECT (CTAS) or INSERT INTO to import data into a permanent table; it only creates an external table that reads the Parquet files on demand. Option D is wrong because the table is defined with an external data source and file format, making it an external table, not a regular (managed) table stored in the Synapse database.

704
Multi-Selectmedium

Which THREE of the following are benefits of using Azure Data Lake Storage Gen2?

Select 3 answers
A.Cost-effective for storing petabytes of data
B.POSIX-compliant access control lists
C.Built-in NoSQL document store
D.Integrated data transformation engine
E.Hierarchical namespace for directory-level operations
AnswersA, B, E

Optimized for large-scale data lakes.

Why this answer

Azure Data Lake Storage Gen2 is cost-effective for storing petabytes of data because it decouples compute from storage, allowing you to store massive amounts of data in Azure Blob Storage at low cost, while leveraging a hierarchical namespace for efficient data organization. This makes it ideal for big data analytics workloads where large-scale data retention is required without incurring high costs.

Exam trap

Microsoft often tests the misconception that Azure Data Lake Storage Gen2 includes built-in data processing capabilities, but it is purely a storage layer, while transformation engines are separate services.

705
MCQmedium

A retail company needs to analyze streaming clickstream data from their website to detect shopping cart abandonment in real-time. They want to use Azure Stream Analytics to output results that can be visualized on a live dashboard. Which output sink allows the fastest data visualization for a real-time dashboard in Power BI?

A.Azure Blob Storage
B.Azure Event Hubs
C.Power BI dataset
D.Azure SQL Database
AnswerC

Direct output to Power BI dataset enables live dashboards with minimal latency from Stream Analytics.

Why this answer

Power BI dataset is the correct output sink because Azure Stream Analytics can directly stream data into a Power BI dataset via the Power BI output adapter, enabling real-time dashboard updates with sub-second latency. This integration uses the Power BI REST API to push streaming data events, which Power BI then visualizes immediately without requiring intermediate storage or batch processing.

Exam trap

The trap here is that candidates often confuse Azure Event Hubs as a visualization output because it is a streaming service, but Event Hubs is an ingestion endpoint, not a visualization sink; the correct sink for real-time Power BI dashboards is the Power BI dataset output directly from Stream Analytics.

How to eliminate wrong answers

Option A is wrong because Azure Blob Storage is a batch-oriented, file-based storage service that introduces latency due to write operations and lacks native real-time streaming visualization capabilities; data must be read and processed again before it can be displayed in Power BI. Option B is wrong because Azure Event Hubs is a message ingestion service, not a visualization sink; it can receive streaming data but requires a downstream consumer (like Stream Analytics or a custom application) to forward data to Power BI, adding an extra hop and latency. Option D is wrong because Azure SQL Database is a relational database optimized for transactional workloads and batch inserts; streaming data into SQL Database incurs write latency and row-level locking, and Power BI would need to poll or refresh the dataset, which is not real-time.

706
MCQmedium

A company uses Azure SQL Database for an inventory management system. The Inventory table has millions of rows. Queries frequently filter on WarehouseID and then sort by LastUpdatedDate. The table currently has a clustered index on InventoryID (primary key). Which action will most improve query performance for these frequent filters?

A.Create a non-clustered index on (WarehouseID, LastUpdatedDate) INCLUDE (Quantity)
B.Add a clustered index on WarehouseID
C.Create a non-clustered index on LastUpdatedDate only
D.Partition the table by InventoryID
AnswerA

This composite index covers the filter and sort conditions. Including Quantity as an included column makes the index covering for this query, avoiding expensive key lookups.

Why this answer

A non-clustered index on (WarehouseID, LastUpdatedDate) allows the database engine to efficiently locate rows matching a specific WarehouseID and return them already sorted by LastUpdatedDate without accessing the clustered index (or with minimal lookup). Including the Quantity column as a non-key included column avoids key lookups for that column, further improving performance. Changing the clustered index to WarehouseID could cause fragmentation and is not ideal for uniqueness.

A single-column index on LastUpdatedDate does not support the filter on WarehouseID. Partitioning by InventoryID does not help this query pattern.

707
Matchingmedium

Match each Azure SQL Database tier to its description.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts
Matches

Low-cost for small workloads

Balanced performance and cost

High performance and low latency

Highly scalable for large databases

Auto-scaling compute based on demand

Why these pairings

Azure SQL Database offers various service tiers.

708
MCQhard

A logistics company tracks shipments. For each shipment, metadata (ID, weight, destination) is stored in a relational table. The route history is a sequence of events (timestamp, location, status) that is frequently appended but never updated or deleted. The application needs to quickly retrieve the latest status of a shipment and occasionally run analytical queries over the full route history. The company wants to minimize storage cost and use Azure services. Which Azure data store should they choose for the route history?

A.Azure Cosmos DB Core (SQL) API
B.Azure Table Storage
C.Azure Blob Storage with append blobs
D.Azure SQL Database with a JSON column
AnswerC

Correct. Append blobs are designed for log data, offer low cost, and can be queried using serverless SQL or Azure Data Lake Storage analytics tools for full historical analysis.

Why this answer

Azure Blob Storage with append blobs is the correct choice because route history is write-once, read-many (WORM) data that is frequently appended but never modified or deleted. Append blobs are optimized for sequential append operations, offering low-cost storage for large volumes of event data, and they support fast retrieval of the latest status by reading the last block. This minimizes storage cost while allowing occasional analytical queries over the full history via Azure Synapse or other analytics services.

Exam trap

The trap here is that candidates often choose Azure Cosmos DB or Azure SQL Database because they associate 'fast retrieval' with transactional databases, overlooking that append blobs provide both low-cost storage and efficient last-block retrieval for append-only event sequences.

How to eliminate wrong answers

Option A is wrong because Azure Cosmos DB Core (SQL) API is a globally distributed, multi-model NoSQL database optimized for low-latency reads and writes with flexible schemas, but it is significantly more expensive than blob storage for append-only event data and does not provide the cost efficiency of append blobs for this workload. Option B is wrong because Azure Table Storage is a key-value store designed for structured, schema-less data with point queries, but it is not optimized for append-only sequences and incurs higher costs per operation for frequent appends compared to blob storage. Option D is wrong because Azure SQL Database with a JSON column is a relational database that supports JSON data, but it introduces unnecessary relational overhead, higher storage costs, and transactional constraints for append-only event data that never requires updates or deletes.

709
Multi-Selecteasy

Which TWO are benefits of using a NoSQL database like Azure Cosmos DB? (Choose two.)

Select 2 answers
A.Enforcing referential integrity
B.Support for complex joins
C.Horizontal scalability
D.Schema flexibility
E.Full ACID transactions across multiple documents
AnswersC, D

NoSQL databases are designed to scale out across many nodes.

Why this answer

Azure Cosmos DB is a NoSQL database designed for horizontal scalability, which means it can distribute data across multiple servers and regions to handle massive workloads and low-latency access. This is achieved through partitioning and replication, allowing you to scale throughput and storage independently by adding more physical partitions or nodes.

Exam trap

Microsoft often tests the misconception that NoSQL databases support full ACID transactions across multiple documents like relational databases, but in Cosmos DB, multi-document transactions are limited to the same logical partition and are not fully ACID across partitions.

710
Multi-Selecthard

Which THREE factors should you consider when choosing between Azure SQL Database and Azure Cosmos DB for a new application?

Select 3 answers
A.Need for multi-region writes
B.Schema flexibility requirements
C.Transaction consistency requirements
D.Support for JSON data
E.Support for T-SQL queries
AnswersA, B, C

Cosmos DB supports multi-master replication; SQL Database requires failover groups for multi-region.

Why this answer

Option A is correct because Azure Cosmos DB supports multi-region writes with its multi-master replication, enabling low-latency writes across multiple geographic regions, which is a key differentiator from Azure SQL Database that only supports a single write region. This is critical for globally distributed applications requiring high availability and local write performance.

Exam trap

The trap here is that candidates may assume JSON support or T-SQL compatibility are deciding factors, but both databases handle JSON (though differently) and T-SQL is exclusive to SQL Database, so the real differentiators are multi-region writes, schema flexibility, and transaction consistency guarantees (e.g., ACID in SQL Database vs. tunable consistency levels in Cosmos DB).

711
MCQmedium

A retail company uploads daily sales data from all stores to Azure Blob Storage at midnight. They then run a series of data transformations using Azure Data Factory on a scheduled trigger at 2:00 AM. This processing pattern is best described as:

A.Batch processing
B.Stream processing
C.Transactional processing
D.Interactive query
AnswerA

Correct. Data is collected over time and processed in bulk on a schedule, which is the definition of batch processing.

Why this answer

This pattern is batch processing because the sales data is collected in Azure Blob Storage over a period (daily) and then processed as a group at a scheduled time (2:00 AM) using Azure Data Factory. Batch processing is designed for high-volume, periodic data loads where latency is acceptable, and the transformation job runs on a complete dataset rather than individual records.

Exam trap

The trap here is that candidates confuse scheduled data movement with stream processing, but the key differentiator is the time delay and the processing of a complete dataset in one job rather than individual events as they occur.

How to eliminate wrong answers

Option B is wrong because stream processing handles data in real-time or near-real-time as it arrives (e.g., using Azure Stream Analytics or Event Hubs), not on a scheduled trigger with a 2-hour delay. Option C is wrong because transactional processing (OLTP) focuses on individual, atomic transactions with ACID guarantees (e.g., Azure SQL Database), not bulk transformations of daily files. Option D is wrong because interactive query implies ad-hoc, user-driven exploration (e.g., using Azure Synapse Serverless SQL or Azure Data Explorer), not a scheduled, automated transformation pipeline.

712
MCQmedium

An e-commerce company runs a product inventory database on Azure SQL Database. During a flash sale, write transactions are slow because many read queries are running simultaneously and consuming resources. The company wants to isolate read workloads without modifying application code or database schemas. Which Azure SQL Database feature should they implement?

A.Active geo-replication
B.Read scale-out
C.Auto-failover groups
D.Elastic pools
AnswerB

Read scale-out uses a built-in readable secondary replica in the same region to serve read-only queries, reducing contention on the primary. This can be enabled without code changes.

Why this answer

Read scale-out (B) is correct because it allows read-only queries to be offloaded to a read-only replica of the database, freeing up the primary replica for write transactions. This feature is built into Azure SQL Database at the Business Critical and Premium service tiers, and it requires no application code changes—just a connection string modification to use the `ApplicationIntent=ReadOnly` parameter. It directly addresses the performance bottleneck caused by concurrent read queries during the flash sale.

Exam trap

The trap here is that candidates confuse read scale-out with geo-replication or failover groups, assuming any replica can offload reads, but only read scale-out provides a local read-only replica without requiring application code changes or cross-region latency.

How to eliminate wrong answers

Option A is wrong because active geo-replication creates readable replicas in a different Azure region for disaster recovery, not for offloading read workloads from the same region, and it requires application code changes to redirect read queries. Option C is wrong because auto-failover groups manage automatic failover between primary and secondary databases for high availability, not for isolating read workloads; they also require modifying the connection string or application logic. Option D is wrong because elastic pools are used to manage and share resources among multiple databases, not to isolate read workloads within a single database, and they do not provide a read-only replica.

713
MCQmedium

A company stores backup files in Azure Blob Storage. The backups are taken daily and must be retained for 7 years. The backup files are rarely accessed after the first month. The company wants to minimize storage costs while ensuring that backups are available for retrieval within 5 hours when needed. Which storage tier should they use after the first month?

A.Archive tier
B.Cool tier
C.Hot tier
D.Premium tier
AnswerB

Cost-effective and retrieval time is within minutes to hours.

Why this answer

Option C is correct because the Cool tier has a retrieval time of minutes to hours (typically), and is cheaper than Hot. Archive tier is the cheapest but retrieval time can be up to 15 hours, which exceeds the 5-hour requirement. Option A is wrong because Hot is more expensive.

Option B is wrong because Archive retrieval time is too long. Option D is wrong because Premium is expensive and designed for low-latency access.

714
MCQhard

Your company runs a global e-commerce platform on Azure SQL Database. The platform experiences heavy read traffic on product catalog and inventory tables. You need to reduce read latency for users in different geographic regions while keeping write latency low. The solution must be cost-effective and require minimal application changes. Current architecture: a single Azure SQL Database in West US. You have budget for additional Azure resources. What should you implement?

A.Configure Active Geo-Replication to create readable secondaries in regions where users are located.
B.Migrate the database to Azure Cosmos DB with multi-region writes.
C.Use Azure Traffic Manager to route read requests to the primary database in West US.
D.Enable Read Scale-Out on the existing database and configure application to use the read-only endpoint.
AnswerA

Readable secondaries reduce read latency globally; writes go to primary.

Why this answer

Option B is correct because Active Geo-Replication creates readable secondaries in multiple regions, allowing local reads and single-region writes. Option A (Read Scale-Out) works only in the same region. Option C (Cosmos DB) would require application rewrite.

Option D (Traffic Manager) does not replicate data.

715
Matchingmedium

Match each Azure Cosmos DB API to its supported data model.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts
Matches

Document (JSON)

Document (BSON)

Column-family

Graph

Key-value

Why these pairings

Azure Cosmos DB supports multiple data models via APIs.

716
MCQmedium

A company runs an e-commerce platform on Azure SQL Database. The database handles many concurrent transactions (OLTP). The business team runs complex reporting queries on the same database during business hours, which slows down the transactional workload. The company wants to offload the reporting queries to a separate read-only copy of the database to avoid performance impact. Which Azure SQL Database feature should they enable?

A.Hyperscale service tier
B.Geo-replication
C.Read scale-out
D.Elastic query
AnswerC

Correct. Read scale-out, also called Active Geo-Replication with readable secondaries, allows offloading read-only workloads to a secondary replica in the same region, improving OLTP performance.

Why this answer

Read scale-out (C) is the correct feature because it allows Azure SQL Database to offload read-only workloads, such as complex reporting queries, to a separate read-only replica. This is achieved by using the `ApplicationIntent=ReadOnly` connection string parameter, which routes queries to a secondary replica, thereby preventing performance impact on the primary transactional (OLTP) workload. This feature is specifically designed for scenarios where you need to isolate reporting from high-concurrency OLTP operations without requiring a separate database copy.

Exam trap

The trap here is that candidates often confuse Geo-replication with read scale-out because both provide readable secondaries, but Geo-replication is for disaster recovery and requires a separate database in a different region, while read scale-out is for performance isolation within the same region and uses the existing high-availability replicas.

How to eliminate wrong answers

Option A is wrong because Hyperscale is a service tier that provides high scalability and fast backup/restore, but it does not inherently create a separate read-only replica for offloading reporting queries; it focuses on storage and compute elasticity, not read workload isolation. Option B is wrong because Geo-replication creates a readable secondary replica for disaster recovery and geographic redundancy, but it is not designed for offloading reporting queries during business hours—it requires manual failover and is primarily for availability, not performance isolation. Option D is wrong because Elastic query enables querying across multiple Azure SQL databases or external data sources (e.g., Azure SQL Database, Azure SQL Data Warehouse) using T-SQL, but it does not provide a read-only replica for offloading reporting from a single database.

717
MCQeasy

A company needs to run complex analytical queries that aggregate terabytes of sales data across multiple years. The queries are used for monthly business reports and are not latency-sensitive. The data is stored in Azure Data Lake Storage Gen2. The company wants a fully managed, petabyte-scale data warehouse solution that supports SQL queries and integrates with Power BI for reporting. Which Azure service should they use?

A.Azure Synapse Analytics
B.Azure Analysis Services
C.Azure Data Factory
D.Azure HDInsight
AnswerA

Azure Synapse Analytics provides a cloud-based data warehouse that can scale to petabytes. It uses dedicated SQL pools for high-performance analytical queries and has built-in integration with Power BI, Azure Data Lake Storage, and other Azure services.

Why this answer

Azure Synapse Analytics (formerly SQL Data Warehouse) is a fully managed, petabyte-scale analytics service that provides a dedicated SQL pool for running complex, high-performance T-SQL queries against massive datasets. It natively integrates with Azure Data Lake Storage Gen2 for reading data directly via PolyBase or external tables, and it offers built-in connectors to Power BI for reporting. This makes it the ideal choice for the described workload, which requires large-scale aggregation without low-latency demands.

Exam trap

The trap here is that candidates may confuse Azure Analysis Services (an OLAP modeling tool) with a data warehouse, or assume HDInsight is suitable for SQL-based reporting, but Synapse is the only fully managed, petabyte-scale SQL data warehouse with native Power BI integration.

How to eliminate wrong answers

Option B is wrong because Azure Analysis Services is an OLAP engine for semantic modeling and in-memory cubes, not a petabyte-scale data warehouse for raw SQL queries on terabytes of data. Option C is wrong because Azure Data Factory is a cloud-based ETL and data orchestration service, not a data warehouse or query engine. Option D is wrong because Azure HDInsight is a managed Hadoop/Spark cluster for big data processing, but it is not a fully managed SQL-based data warehouse and does not provide the same native SQL query experience or direct Power BI integration as Synapse.

718
MCQmedium

A company runs an online booking system on Azure SQL Database. The system handles many concurrent transactions (OLTP). The business team runs complex reporting queries on the same database during business hours, which slows down the booking transactions. The company needs a solution to separate the analytical workload from the transactional workload without duplicating data manually. Which Azure SQL Database feature should they use?

A.Read Scale-out (readable secondary replica)
B.Active Geo-Replication
C.Elastic pools
D.Hyperscale service tier
AnswerA

Correct. Read Scale-out allows you to offload read-only queries to a secondary replica, reducing load on the primary.

Why this answer

Read Scale-out (readable secondary replica) is the correct choice because it allows the company to offload complex reporting queries to a read-only replica of the primary database, thereby isolating the analytical workload from the OLTP transactions. This feature is built into Azure SQL Database and does not require manual data duplication or ETL processes, directly addressing the requirement to separate workloads without manual effort.

Exam trap

The trap here is that candidates often confuse Active Geo-Replication (which also provides readable secondaries) with Read Scale-out, but Geo-Replication is regionally separated and intended for disaster recovery, not for local workload isolation within the same region.

How to eliminate wrong answers

Option B (Active Geo-Replication) is wrong because it is designed for disaster recovery and business continuity by maintaining readable secondary replicas in a different Azure region, not for offloading read-only analytical workloads within the same region during business hours. Option C (Elastic pools) is wrong because they are a resource management model for sharing resources among multiple databases, not a feature for separating analytical and transactional workloads on a single database. Option D (Hyperscale service tier) is wrong because, while it offers high scalability and fast backup/restore, it does not inherently provide a built-in mechanism to separate analytical queries from transactional ones without additional configuration like Read Scale-out.

719
Multi-Selectmedium

Which THREE Azure services can be used to move data from on-premises SQL Server to Azure?

Select 3 answers
A.Azure Database Migration Service
B.Azure Data Factory
C.Azure Analysis Services
D.Azure Synapse Serverless SQL pool
E.Azure Data Box
AnswersA, B, E

DMS is specifically designed for migrating databases to Azure with minimal downtime.

Why this answer

Azure Database Migration Service (DMS) is correct because it is specifically designed for migrating databases from on-premises SQL Server to Azure SQL Database or SQL Managed Instance with minimal downtime, using the Data Migration Assistant (DMA) for assessment and a self-hosted integration runtime for data movement.

Exam trap

The trap here is that candidates may confuse Azure Analysis Services (a BI modeling tool) or Azure Synapse Serverless SQL pool (a query-only service) with data migration tools, when only services that actively move or copy data from on-premises to Azure are correct.

720
MCQmedium

A mobile app stores user preferences as JSON documents in Azure Cosmos DB. The document includes userId, theme, language, and notification settings. The most common query retrieves the document for a specific userId. To minimize cost and ensure even distribution, which property should be chosen as the partition key?

A.userId
B.theme
C.language
D.a concatenation of userId and language
AnswerA

userId has high cardinality and evenly distributes data across partitions, minimizing Request Unit (RU) consumption for point reads.

Why this answer

The userId property is the ideal partition key because it provides high cardinality (each user has a unique ID) and ensures even request distribution across physical partitions. Since the most common query retrieves a document by userId, using it as the partition key makes those queries point reads (single-partition queries), which are the most cost-efficient and fastest in Azure Cosmos DB.

Exam trap

The trap here is that candidates often choose a concatenated key (option D) thinking it adds uniqueness or query flexibility, but Azure Cosmos DB's partition key design favors a single high-cardinality attribute for even distribution and simple point reads.

How to eliminate wrong answers

Option B (theme) is wrong because theme has low cardinality (only a few possible values like 'light' or 'dark'), leading to hot partitions and uneven data distribution. Option C (language) is wrong because language also has low cardinality (e.g., 'en', 'fr', 'es'), causing similar skew and throttling under load. Option D (a concatenation of userId and language) is wrong because it adds unnecessary complexity without benefit—userId alone already provides unique document identification and even distribution, and concatenation would increase storage overhead and partition key size (up to 2 KB limit) without improving query performance.

721
Multi-Selecthard

Which THREE components are essential for building a modern data warehouse architecture on Azure?

Select 3 answers
A.Azure Analysis Services
B.Azure Cosmos DB
C.Azure Databricks
D.Azure Data Lake Storage
E.Azure Synapse Analytics
AnswersC, D, E

For data transformation and processing.

Why this answer

Azure Databricks is correct because it provides an Apache Spark-based analytics platform that enables data engineering, data science, and machine learning on large-scale data. In a modern data warehouse architecture, Azure Databricks is used for data transformation, preparation, and advanced analytics, often feeding into or complementing Azure Synapse Analytics for structured querying and reporting.

Exam trap

The trap here is that candidates may confuse Azure Analysis Services as a core component of the data warehouse architecture, when it is actually a downstream BI tool, or mistakenly think Azure Cosmos DB is suitable for analytical workloads, whereas it is optimized for transactional and real-time NoSQL scenarios.

722
Multi-Selectmedium

Which TWO are valid deployment options for Azure SQL?

Select 2 answers
A.Azure SQL Managed Instance
B.Azure SQL Database
C.Azure Cosmos DB
D.Azure Database for MariaDB
E.Azure Synapse Analytics Dedicated SQL Pool
AnswersA, B

Managed instance with full SQL Server compatibility.

Why this answer

Options A and D are correct. Azure SQL Database and Azure SQL Managed Instance are deployment options. Options B and C are wrong because they are other Azure services.

Option E is wrong because Azure Synapse Dedicated SQL Pool is a separate service for data warehousing.

723
MCQmedium

A retail company analyzes customer purchase patterns. Every night, they run a batch job that aggregates millions of transactions from the past day into summary tables for reporting. Which type of data processing workload best describes this nightly job?

A.Batch processing
B.Real-time processing
C.Streaming processing
D.Transactional processing
AnswerA

Correct because the job runs at a scheduled time to process a large volume of data in batches.

Why this answer

This nightly job processes a large volume of transactions accumulated over the past day in a single, scheduled run, which is the defining characteristic of batch processing. In Azure, this workload would typically be implemented using Azure Synapse Analytics or Azure Data Factory to orchestrate the aggregation of millions of rows into summary tables for reporting, without requiring immediate output.

Exam trap

The trap here is that candidates confuse 'batch processing' with 'transactional processing' because both involve databases, but batch processing is designed for high-volume, scheduled analytics (OLAP), not for real-time, row-by-row operations (OLTP).

How to eliminate wrong answers

Option B is wrong because real-time processing requires data to be ingested and analyzed with sub-second latency, not on a nightly schedule. Option C is wrong because streaming processing continuously processes data as it arrives from sources like IoT devices or clickstreams, not from a static batch of past-day transactions. Option D is wrong because transactional processing (OLTP) handles individual, low-latency CRUD operations on current data, not large-scale aggregations of historical data.

724
MCQeasy

A company stores customer data in a relational table with fixed columns: CustomerID (integer), FirstName (string), LastName (string), Email (string). They also store product images as JPEG files in Azure Blob Storage, and customer feedback as JSON documents where each document may contain fields such as rating, comment, and optional metadata. Which of the following correctly classifies these data types?

A.Relational table – structured, JPEG – unstructured, JSON – semi-structured
B.Relational table – structured, JPEG – semi-structured, JSON – unstructured
C.Relational table – semi-structured, JPEG – unstructured, JSON – structured
D.Relational table – unstructured, JPEG – structured, JSON – semi-structured
AnswerA

Correct. Relational tables enforce a fixed schema (structured), JSON allows flexible fields (semi-structured), and JPEG files are binary blobs (unstructured).

Why this answer

Option A is correct because a relational table with fixed columns and data types (CustomerID, FirstName, LastName, Email) stores structured data with a rigid schema. JPEG files in Azure Blob Storage are binary blobs with no internal structure that a database can interpret, making them unstructured. JSON documents with optional fields (like rating, comment, metadata) have a flexible schema that can vary per document, which is the definition of semi-structured data.

Exam trap

The trap here is that candidates often confuse 'semi-structured' with 'unstructured' because JSON looks like free-form text, but its key-value structure with optional fields makes it semi-structured, not unstructured.

How to eliminate wrong answers

Option B is wrong because JPEG files have no schema or metadata that can be parsed as key-value pairs, so they are unstructured, not semi-structured; JSON documents are semi-structured, not unstructured. Option C is wrong because a relational table with fixed columns enforces a strict schema, making it structured, not semi-structured; JSON documents are semi-structured, not structured. Option D is wrong because a relational table is not unstructured—it has a predefined schema; JPEG files are not structured—they lack a row/column format.

725
MCQmedium

A gaming company stores player game scores in Azure Cosmos DB. Each document contains PlayerID, GameID, Score, Timestamp. The most common query is: 'Get all scores for a specific game ordered by score descending'. Which partition key should be chosen to minimize Request Unit (RU) consumption?

A.PlayerID
B.GameID
C.Score
D.Timestamp
AnswerB

Partitioning by GameID collocates all scores for a game in one partition, so the query targeting a specific GameID is a single-partition query, consuming fewer RUs.

Why this answer

GameID is the correct partition key because the most common query filters on GameID, and Cosmos DB routes queries to the exact physical partition(s) containing that GameID. This avoids cross-partition fan-out, minimizing RU consumption. A partition key that matches the query filter ensures efficient index lookup and data retrieval.

Exam trap

The trap here is that candidates often pick PlayerID thinking it uniquely identifies each player, but they overlook that the query filters on GameID, making GameID the only partition key that avoids cross-partition queries and minimizes RU consumption.

How to eliminate wrong answers

Option A (PlayerID) is wrong because it would scatter scores for the same game across multiple partitions, forcing a cross-partition query that scans all partitions and increases RU cost. Option C (Score) is wrong because it is a high-cardinality, frequently updated value that can cause hot partitions and does not align with the query filter on GameID. Option D (Timestamp) is wrong because it would distribute data by time, not by game, so querying for a specific game would still require scanning all partitions.

726
MCQhard

Your company has a data lake in Azure Data Lake Storage Gen2 containing terabytes of parquet files. Data scientists need to explore and prepare this data using Python and SQL. They want to use a collaborative notebook environment that integrates with Git for version control. The solution should automatically scale compute resources based on workload demand and minimize management overhead. Which Azure service should you use?

A.Azure Databricks
B.Azure Machine Learning studio
C.Azure Data Studio
D.Azure Synapse Studio
AnswerA

Provides notebooks, Git integration, auto-scaling, and supports Python and SQL.

Why this answer

Azure Databricks is the correct choice because it provides a collaborative notebook environment that natively supports Python and SQL, integrates with Git for version control, and offers auto-scaling clusters that dynamically adjust compute resources based on workload demand. It is purpose-built for big data analytics and data preparation on data lakes, minimizing management overhead through its serverless and managed Spark infrastructure.

Exam trap

The trap here is that candidates often confuse Azure Synapse Studio with Databricks because both offer notebook experiences and Spark support, but Synapse Studio is optimized for enterprise data warehousing and ETL pipelines, not the ad-hoc, collaborative data exploration and auto-scaling flexibility that Databricks provides for data science teams.

How to eliminate wrong answers

Option B is wrong because Azure Machine Learning studio is primarily designed for building, training, and deploying machine learning models, not for ad-hoc data exploration and preparation using Python and SQL in a collaborative notebook environment with Git integration. Option C is wrong because Azure Data Studio is a desktop tool for querying SQL Server and Azure SQL databases, not a cloud-based collaborative notebook environment that auto-scales compute resources. Option D is wrong because Azure Synapse Studio is a unified analytics workspace that does support notebooks and Git, but it is more focused on enterprise data warehousing and large-scale analytics pipelines, and its auto-scaling capabilities are tied to dedicated SQL pools or serverless SQL endpoints, not the flexible, on-demand Spark clusters that Databricks provides for data exploration and preparation.

727
MCQmedium

A retail company uses Azure SQL Database for an order management system. The Orders table has columns: OrderID (primary key, clustered), CustomerID, OrderDate, TotalAmount. Queries frequently filter on CustomerID and OrderDate, and sort results by OrderDate in descending order. The queries also return the TotalAmount. Which indexing strategy will most improve query performance for these operations?

A.Maintain the existing clustered index on OrderID only.
B.Create a nonclustered index on (CustomerID, OrderDate DESC) INCLUDE (TotalAmount).
C.Create a nonclustered index on (OrderDate DESC) INCLUDE (CustomerID, TotalAmount).
D.Create a clustered columnstore index on the entire table.
AnswerB

This index is ordered by CustomerID then OrderDate descending, allowing efficient seeks for a specific CustomerID and range scans over OrderDate in descending order. Including TotalAmount covers the SELECT clause without needing to access the base table.

Why this answer

Option B is correct because it creates a covering nonclustered index that supports both the filter predicates (CustomerID and OrderDate) and the sort order (OrderDate DESC) while including TotalAmount as an included column to avoid key lookups. This index allows SQL Server to satisfy the query entirely from the index pages, minimizing I/O and improving performance.

Exam trap

Microsoft often tests the distinction between covering indexes and columnstore indexes, and the trap here is assuming a columnstore index is appropriate for transactional queries with filtering and sorting, when it is actually designed for large-scale analytics and data warehousing workloads.

How to eliminate wrong answers

Option A is wrong because the existing clustered index on OrderID does not support filtering on CustomerID or OrderDate, forcing a full clustered index scan. Option C is wrong because while it supports sorting on OrderDate, it does not include CustomerID as a leading key column, so filtering on CustomerID would require a scan or additional lookups. Option D is wrong because a clustered columnstore index is optimized for large-scale analytical workloads and batch processing, not for point lookups or range queries with sorting on a single table; it would degrade performance for the described transactional queries.

728
Multi-Selecteasy

Which TWO of the following are valid Azure data storage services for storing unstructured data?

Select 2 answers
A.Azure SQL Database
B.Azure Table Storage
C.Azure Blob Storage
D.Azure Data Lake Storage Gen2
E.Azure Cosmos DB
AnswersC, D

Azure Blob Storage stores unstructured data like text and binary data.

Why this answer

Azure Blob Storage is a fully managed, massively scalable object storage service designed for unstructured data such as text, binary data, images, videos, and backups. It supports REST-based access and can store any type of file or binary object without requiring a schema, making it a core service for unstructured data workloads.

Exam trap

The trap here is that candidates often confuse semi-structured data (e.g., Table Storage, Cosmos DB) with unstructured data, or incorrectly assume that any NoSQL service qualifies as unstructured storage, when in fact only object storage services like Blob Storage and Data Lake Storage Gen2 are designed for raw, schema-less binary data.

729
MCQhard

A company stores IoT sensor data in Azure Table Storage. The data is accessed frequently for the first 30 days, then rarely. You need to minimize storage costs while ensuring data is available for queries within 24 hours of a request. What should you implement?

A.Configure a lifecycle management policy on the Table Storage account to move data to Cool tier after 30 days.
B.Store all data in Azure SQL Database and use index maintenance to improve query performance.
C.Migrate the data to Azure Cosmos DB and use Time-to-Live (TTL) to expire old data.
D.Move data older than 30 days to Azure Blob Storage Cool tier and use an Azure Data Factory pipeline to copy data back to Table Storage when requested.
AnswerD

This optimizes cost by using Cool tier for rarely accessed data and maintains availability within 24 hours.

Why this answer

Azure Table Storage does not natively support lifecycle management policies like Blob Storage does. Moving the data to Azure Storage with lifecycle management (via Azure Data Factory or AzCopy) is the correct approach. Option A is wrong because Azure Table Storage does not have automatic tiering like Blob Storage.

Option B is wrong because Azure Cosmos DB is more expensive and not necessary. Option D is wrong because Azure SQL Database is a relational option not suited for this scenario.

730
MCQeasy

A data scientist needs to perform exploratory data analysis on a large dataset stored in Azure Data Lake Storage Gen2 using Python notebooks. The solution must minimize infrastructure management. Which Azure service should the data scientist use?

A.Azure Machine Learning compute instances.
B.Power BI with dataflows.
C.Azure HDInsight with Jupyter notebooks.
D.Azure Databricks with collaborative notebooks.
AnswerD

Databricks offers serverless Spark clusters and easy notebook collaboration.

Why this answer

Azure Databricks provides a fully managed, collaborative notebook environment optimized for big data analytics and machine learning. It integrates natively with Azure Data Lake Storage Gen2, allowing the data scientist to perform exploratory data analysis (EDA) using Python notebooks without managing any underlying infrastructure. This minimizes operational overhead while providing autoscaling clusters and built-in Spark capabilities.

Exam trap

The trap here is that candidates often confuse Azure Machine Learning compute instances (which are for ML model development) with a general-purpose data analytics environment, or they assume HDInsight's Jupyter notebooks are equally managed, overlooking the significant infrastructure management overhead and lack of serverless autoscaling.

How to eliminate wrong answers

Option A is wrong because Azure Machine Learning compute instances are designed for training and deploying ML models, not for general-purpose EDA on large datasets; they require manual scaling and lack the native Spark engine needed for efficient processing of data in Data Lake Storage Gen2. Option B is wrong because Power BI with dataflows is a business intelligence and visualization tool, not an interactive coding environment for Python-based EDA; it abstracts away code and cannot run arbitrary Python notebooks. Option C is wrong because Azure HDInsight with Jupyter notebooks requires manual provisioning and management of Hadoop/Spark clusters, contradicting the requirement to minimize infrastructure management; it also lacks the collaborative, serverless notebook experience of Databricks.

731
Multi-Selectmedium

Which TWO Azure services can be used to store non-relational data? (Choose two.)

Select 2 answers
A.Azure Cosmos DB
B.Azure SQL Database
C.Azure Synapse Analytics
D.Azure Database for MySQL
E.Azure Table Storage
AnswersA, E

A NoSQL database.

Why this answer

Option A (Azure Cosmos DB) is correct because it is a NoSQL database. Option D (Azure Table Storage) is correct because it is a key-value store (non-relational). Option B is wrong because Azure SQL Database is relational.

Option C is wrong because Azure Database for MySQL is relational. Option E is wrong because Azure Synapse Analytics is a relational analytics service.

732
MCQmedium

A manufacturing company stores IoT sensor data as JSON documents in Azure Cosmos DB. Each document has fields: deviceId (high cardinality, many unique values), timestamp, temperature, and humidity. The most frequent query is: 'Retrieve all readings for a specific deviceId from the last hour.' To minimize Request Unit (RU) consumption, which combination of partition key and indexing policy should be chosen?

A.Partition key: deviceId, Indexing: automatic on all properties
B.Partition key: timestamp, Indexing: automatic on all properties
C.Partition key: deviceId, Indexing: none
D.Partition key: temperature, Indexing: automatic on all properties
AnswerA

Correct. deviceId as partition key targets a single partition; automatic indexing ensures the timestamp filter uses an index, minimizing RUs.

Why this answer

Option A is correct because deviceId is the most frequently filtered attribute (in the WHERE clause), making it an ideal partition key that ensures queries are scoped to a single physical partition, minimizing cross-partition fan-out. Automatic indexing on all properties allows efficient filtering on timestamp within the partition, while the index on deviceId is not strictly needed since the partition key itself routes the query, but it does not harm RU consumption significantly. This combination balances query performance and RU cost for the described workload.

Exam trap

The trap here is that candidates often pick timestamp as the partition key because it seems logical for time-range queries, but they overlook that the most frequent query filters on deviceId, making deviceId the correct partition key to avoid cross-partition queries.

How to eliminate wrong answers

Option B is wrong because timestamp as a partition key would cause each query for a specific deviceId to scatter across all partitions (since the same deviceId's data spans many timestamps), resulting in high RU consumption due to cross-partition queries. Option C is wrong because setting indexing to 'none' would force full scans of all documents within the partition for the timestamp filter, dramatically increasing RU cost compared to using an index. Option D is wrong because temperature has low cardinality (few unique values) and is not used in the WHERE clause, leading to hot partitions and inefficient query routing.

733
MCQmedium

A company stores IoT sensor data in Azure Blob Storage. The data is written hourly and must be retained for 90 days. After 90 days, it must be automatically deleted. Which access tier should be used for cost optimization during the retention period?

A.Premium tier
B.Archive tier
C.Cool tier
D.Hot tier
AnswerC

Cool tier is designed for infrequently accessed data with a 30-day minimum retention, matching the 90-day retention without early deletion penalty.

Why this answer

The Cool tier is optimized for data that is infrequently accessed and stored for at least 30 days, with lower storage costs and higher access costs. Hot tier is for frequent access and would be more expensive. Archive tier has a 180-day minimum retention penalty.

Premium tier is for high transaction volumes and is not cost-effective for this scenario.

734
MCQmedium

A data engineer needs to transform large datasets stored in Azure Data Lake Storage Gen2 using Python and Apache Spark. They want a serverless compute option that automatically scales and requires no cluster management. Which Azure service should they use?

A.Azure Synapse Analytics dedicated SQL pool
B.Azure Databricks with interactive clusters
C.Azure Synapse Analytics serverless Spark pool
D.Azure Data Factory with MapReduce
AnswerC

Correct. Serverless Spark pools in Azure Synapse Analytics are fully managed, automatically scale, and charge only for active compute. Ideal for ad-hoc transformations without cluster management.

Why this answer

Azure Synapse Analytics serverless Spark pool is correct because it provides a serverless Apache Spark compute environment that automatically scales based on workload demand and requires no cluster management. This aligns perfectly with the requirement to transform large datasets in Azure Data Lake Storage Gen2 using Python and Spark without provisioning or managing infrastructure.

Exam trap

The trap here is that candidates often confuse 'serverless' with 'interactive clusters' in Azure Databricks, assuming that Databricks offers a serverless option (which it does not for interactive clusters), or they mistakenly think a dedicated SQL pool can run Spark transformations.

How to eliminate wrong answers

Option A is wrong because Azure Synapse Analytics dedicated SQL pool is a provisioned, always-on MPP (Massively Parallel Processing) SQL engine that requires manual scaling and management, not a serverless Spark compute option. Option B is wrong because Azure Databricks with interactive clusters requires manual cluster creation, configuration, and management, and interactive clusters are not serverless—they run continuously until terminated. Option D is wrong because Azure Data Factory with MapReduce is an orchestration and ETL service that uses external MapReduce jobs (e.g., on HDInsight), not a native serverless Spark compute option, and it does not provide direct Python/Spark execution without cluster management.

735
MCQeasy

A media company needs to store thousands of high-resolution videos. Each video is up to 10 GB in size and must be accessible via HTTP/HTTPS URLs for playback. The company does not require a file system hierarchy or SMB protocol support. Which Azure storage solution is most appropriate for this scenario?

A.Azure Blob Storage
B.Azure Files
C.Azure Queue Storage
D.Azure Table Storage
AnswerA

Correct. Blob Storage is optimized for storing large amounts of unstructured data like videos and provides HTTP/HTTPS access.

Why this answer

Azure Blob Storage is designed for storing massive amounts of unstructured data, such as high-resolution videos, and provides HTTP/HTTPS access via URLs. It supports objects up to 4.77 TiB (or larger with premium block blobs), easily accommodating 10 GB files, and offers no file system hierarchy or SMB protocol, matching the company's requirements exactly.

Exam trap

The trap here is that candidates may confuse Azure Files (which supports SMB) with general file storage, but the question explicitly rules out SMB and file hierarchy, making Blob Storage the correct choice for HTTP/HTTPS-accessible binary objects.

How to eliminate wrong answers

Option B is wrong because Azure Files provides SMB and NFS protocol support and a file system hierarchy, which the company explicitly does not require. Option C is wrong because Azure Queue Storage is a messaging service for asynchronous communication between application components, not for storing or serving video files. Option D is wrong because Azure Table Storage is a NoSQL key-value store for structured data, not designed for large binary objects like videos.

736
Multi-Selectmedium

Which TWO are valid ways to secure data in transit for an Azure SQL Database?

Select 2 answers
A.Use the 'Encrypt connection' setting in connection strings
B.Require TLS 1.2 for client connections
C.Enable Transparent Data Encryption (TDE)
D.Configure firewall rules to allow only specific IPs
E.Use Always Encrypted
AnswersA, B

This enforces TLS encryption for the connection.

Why this answer

Options A and B are correct. TLS ensures encryption between client and server. Azure SQL Database always enforces encryption (TLS) for connections.

Option C is wrong because transparent data encryption (TDE) encrypts data at rest, not in transit. Option D is wrong because always encrypted is for column-level encryption at rest and in transit. Option E is wrong because firewall rules control access, not encryption.

737
Drag & Dropmedium

Drag and drop the steps to create an Azure SQL Database in the correct order.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps
Order

Why this order

Creating an Azure SQL Database involves selecting the service, configuring the server and database settings, choosing the appropriate tier, and finally deploying.

738
MCQmedium

A company is migrating an on-premises SQL Server database to Azure. The database uses SQL Server Integration Services (SSIS) packages for daily ETL processes. The company wants to minimize administrative overhead for patching and backup management, but needs to retain full control over instance-level configurations and support for SSIS. Which Azure SQL service should they choose?

A.Azure SQL Database
B.Azure SQL Managed Instance
C.Azure Synapse Analytics
D.Azure SQL Server on Azure Virtual Machines
AnswerB

Azure SQL Managed Instance supports SSIS and provides instance-level control with automated patching and backups, minimizing overhead.

Why this answer

Azure SQL Managed Instance is correct because it provides near 100% compatibility with on-premises SQL Server, including full support for SQL Server Integration Services (SSIS) via Azure-SSIS Integration Runtime, while offloading patching and backup management to the platform. It also allows full control over instance-level configurations such as collation, CLR, and SQL Agent jobs, which are not available in Azure SQL Database.

Exam trap

The trap here is that candidates often confuse Azure SQL Database's PaaS benefits with full SQL Server compatibility, overlooking that SSIS and instance-level configurations require Managed Instance, not the more restrictive Azure SQL Database.

How to eliminate wrong answers

Option A is wrong because Azure SQL Database is a platform-as-a-service (PaaS) offering that does not support instance-level configurations (e.g., SQL Agent jobs, cross-database queries) and has limited SSIS support (only via elastic jobs, not native SSIS packages). Option C is wrong because Azure Synapse Analytics is a massively parallel processing (MPP) data warehouse optimized for large-scale analytics, not for transactional workloads or SSIS package execution; it lacks instance-level control and native SSIS support. Option D is wrong because Azure SQL Server on Azure Virtual Machines is an infrastructure-as-a-service (IaaS) option that requires the customer to manage patching, backups, and high availability, contradicting the requirement to minimize administrative overhead.

739
MCQmedium

A company uses Azure SQL Database to store order data. The Orders table has millions of rows with columns: OrderID (primary key, clustered), CustomerID, OrderDate, Status, TotalAmount. Queries frequently filter on OrderDate and Status, and sort results by OrderDate descending. Which indexing strategy will most improve query performance for these filters and sort?

A.Create a clustered index on OrderDate
B.Create a nonclustered index on (OrderDate DESC, Status) and include TotalAmount
C.Create a nonclustered index on Status alone
D.Create a clustered columnstore index on the table
AnswerB

This composite index covers both filter columns in the correct sort order and includes the TotalAmount column, making the query fully covered without needing to access the table. This yields the best performance for the described queries.

Why this answer

Option B creates a covering nonclustered index on (OrderDate DESC, Status) that directly supports the frequent filter on OrderDate and Status and the ORDER BY OrderDate DESC. Including TotalAmount as a non-key column makes the index covering, so all needed columns come from the index without key lookups, maximizing query performance.

Exam trap

The trap here is that candidates often think a clustered index on the filter column is always best, but they overlook that the existing clustered index on OrderID is needed for primary key enforcement and that a covering nonclustered index is the optimal way to support specific query patterns without disrupting the table's physical design.

How to eliminate wrong answers

Option A is wrong because changing the clustered index from OrderID to OrderDate would break the primary key constraint and could cause fragmentation and performance issues for other queries that rely on the clustered key. Option C is wrong because an index on Status alone does not help with the OrderDate filter or the ORDER BY OrderDate DESC, requiring a separate sort operation. Option D is wrong because a clustered columnstore index is optimized for large-scale analytical scans and aggregations, not for point lookups or ordered retrieval of specific rows, and would perform poorly for this filtered, sorted query.

740
MCQmedium

A retail company uses Azure SQL Database to store customer transactions. They need to analyze sales trends over time. Which Azure service should they use to build interactive dashboards and reports without moving data out of Azure?

A.Azure Analysis Services
B.Azure Synapse Analytics
C.Microsoft Purview
D.Power BI
AnswerD

Power BI connects directly to Azure SQL Database for interactive dashboards.

Why this answer

Power BI is the correct choice because it is a business analytics service that can connect directly to Azure SQL Database to build interactive dashboards and reports without requiring data movement. It supports DirectQuery mode, which queries the source database in real-time, enabling live analysis of sales trends while data remains in Azure.

Exam trap

The trap here is that candidates may confuse Azure Synapse Analytics as a reporting tool, but it is primarily a data warehousing and analytics platform that requires data movement or transformation, whereas Power BI is the native Azure service for direct, no-movement interactive reporting.

How to eliminate wrong answers

Option A is wrong because Azure Analysis Services is an analytical engine that requires data to be loaded into its in-memory tabular model, which involves moving or processing data outside the source database. Option B is wrong because Azure Synapse Analytics is a big data and analytics platform that typically requires data to be ingested into its dedicated SQL pool or data lake, not suitable for direct, no-movement reporting on a transactional Azure SQL Database. Option C is wrong because Microsoft Purview is a data governance and catalog service, not a reporting or dashboard tool; it cannot build interactive visualizations.

741
MCQhard

A multinational corporation needs to deploy a relational database in Azure that supports global distribution and low-latency reads and writes across multiple regions. The data model is primarily key-value with occasional relational queries. Which Azure service should they choose?

A.Azure SQL Database with active geo-replication
B.Azure SQL Managed Instance
C.Azure Database for PostgreSQL – Hyperscale (Citus)
D.Azure Cosmos DB with PostgreSQL-compatible API
AnswerD

Supports global distribution, multi-region writes, and relational queries via the PostgreSQL API.

Why this answer

Option D is correct because Azure Cosmos DB offers multiple consistency levels and global distribution with multi-region writes, and its API for NoSQL can handle key-value data. However, it also provides a PostgreSQL-compatible API (Citus) for relational scenarios, making it the best fit for global distribution. Option A is wrong because Azure SQL Database's geo-replication supports only one writable region.

Option B is wrong because Azure SQL Managed Instance is region-bound. Option C is wrong because Azure Database for PostgreSQL with Hyperscale (Citus) is a good option but does not offer the same multi-region write capabilities as Cosmos DB, and the question emphasizes global distribution.

742
MCQmedium

A hospital stores patient vital signs data in Azure Cosmos DB. Each document contains PatientID, Timestamp, HeartRate, BloodPressure, and other measurements. The most common query retrieves all vital signs for a specific patient within a time range (e.g., last 24 hours). Which property should be chosen as the partition key to minimize Request Unit (RU) consumption and ensure even data distribution?

A.PatientID
B.Timestamp
C.HeartRate
D.BloodPressure
AnswerA

Correct. PatientID is the natural filter for the most common query, making it single-partition. It also has high cardinality, ensuring data is spread evenly across partitions.

Why this answer

PatientID is the ideal partition key because the most common query filters by PatientID and a time range. With PatientID as the partition key, Cosmos DB can route the query to a single physical partition containing all documents for that patient, minimizing cross-partition queries and reducing RU consumption. It also ensures even data distribution since each patient generates a similar volume of vital signs data, avoiding hot partitions.

Exam trap

Microsoft often tests the misconception that Timestamp is a good partition key for time-based queries, but candidates fail to realize that Timestamp causes hot partitions and does not distribute write load evenly.

How to eliminate wrong answers

Option B (Timestamp) is wrong because using Timestamp as the partition key would cause all writes for the same time window to land on a single partition, creating a hot partition and increasing RU costs due to throttling; it also makes range queries across patients inefficient. Option C (HeartRate) is wrong because HeartRate has low cardinality (e.g., 30–250 bpm), leading to a small number of logical partitions that cannot be evenly distributed across physical partitions, causing storage and throughput imbalances. Option D (BloodPressure) is wrong because BloodPressure values are also low cardinality and often repeated across patients, resulting in uneven data distribution and poor query performance when filtering by patient and time.

743
MCQeasy

A logistics company ingests GPS coordinates from delivery trucks in real-time to update a live tracking dashboard. They also run a nightly job to aggregate the day's deliveries into a report stored in Azure SQL Database. Which statement correctly describes the data processing types used for these two workloads?

A.GPS ingestion is stream processing; nightly aggregation is batch processing.
B.GPS ingestion is batch processing; nightly aggregation is stream processing.
C.Both workloads are examples of stream processing.
D.Both workloads are examples of batch processing.
AnswerA

Correct. Real-time GPS data is ingested as a continuous stream, while the nightly job processes accumulated data in a batch.

Why this answer

Option A is correct because the real-time ingestion of GPS coordinates from delivery trucks is a classic stream processing workload, where data is processed continuously as it arrives with low latency. The nightly aggregation of daily deliveries into a report stored in Azure SQL Database is a batch processing workload, where data is processed in bulk at scheduled intervals. Azure Stream Analytics is commonly used for the streaming ingestion, while Azure SQL Database or Azure Synapse Analytics can handle the batch aggregation.

Exam trap

The trap here is that candidates confuse the terms 'stream processing' and 'batch processing' by focusing on the data source (GPS is continuous) versus the processing schedule (nightly is periodic), rather than the fundamental processing paradigm of continuous vs. bulk data handling.

How to eliminate wrong answers

Option B is wrong because it reverses the definitions: GPS ingestion is not batch processing (it requires real-time, low-latency processing), and nightly aggregation is not stream processing (it processes data in bulk at a scheduled time). Option C is wrong because the nightly aggregation is not stream processing; it processes data in a single batch job, not as a continuous stream. Option D is wrong because the GPS ingestion is not batch processing; it processes data in real-time as it arrives, not in batches.

744
MCQeasy

A company needs to store archived log files that are rarely accessed but must be retained for regulatory compliance. The logs are text-based and each file is about 10 MB. They want the lowest storage cost while ensuring the data is durable and can be read when needed. Which Azure Blob Storage access tier should they choose?

A.Hot
B.Cool
C.Cold
D.Archive
AnswerD

Archive tier has the lowest storage cost and is designed for data that is rarely accessed. Data can be read when needed, but retrieval takes hours, which is acceptable given the requirements.

Why this answer

The Archive tier is the correct choice because it offers the lowest storage cost for data that is rarely accessed and must be retained for long periods. Archived log files that are text-based and 10 MB each fit this profile perfectly, as the Archive tier is designed for data that can tolerate a retrieval latency of several hours (up to 15 hours for standard priority) while providing the same high durability (99.9999999999% or 11 nines) as other tiers. The data remains fully durable and can be read when needed by first rehydrating it to an online tier (Hot, Cool, or Cold) before access.

Exam trap

The trap here is that candidates often confuse 'Cold' with 'Archive' because both are low-cost tiers, but Cold is still an online tier with immediate access and higher cost, while Archive is the only offline tier designed for true archival storage with the lowest cost but significant retrieval latency.

How to eliminate wrong answers

Option A (Hot) is wrong because it is optimized for frequent access and has the highest storage cost, making it unsuitable for rarely accessed archived data. Option B (Cool) is wrong because it is designed for data accessed infrequently (about once a month) but still incurs higher storage costs than Archive, and it is not the lowest-cost option for long-term retention. Option C (Cold) is wrong because, while it is a lower-cost tier for infrequent access with a 30-day minimum storage period, it still costs more than Archive and is intended for data that may be accessed occasionally, not for rarely accessed archival data.

745
MCQmedium

A company has a suite of 20 e-commerce applications, each with its own SQL Server database. The databases vary in size from 5 GB to 100 GB and have unpredictable usage patterns with bursty peaks. The company wants to migrate to Azure SQL Database to benefit from built-in high availability and automatic backups. They need to minimize costs by only paying for the resources each database actually uses, and they want to avoid over-provisioning for peak loads. Which Azure SQL Database deployment option should they choose?

A.Azure SQL Database Elastic Pool
B.Azure SQL Database (single database) with Serverless compute tier
C.Azure SQL Managed Instance
D.Azure SQL Database Hyperscale
AnswerA

Correct. Elastic pools allow multiple databases to share a set of resources, providing cost-effective performance management for databases with varying and unpredictable usage.

Why this answer

Azure SQL Database Elastic Pool is the correct choice because it allows multiple databases to share a fixed pool of resources (DTUs or vCores), enabling cost efficiency by pooling and reallocating resources across databases with unpredictable, bursty usage patterns. This avoids over-provisioning for peak loads while still providing built-in high availability and automatic backups, as each database in the pool benefits from these features without needing individual resource reservations.

Exam trap

The trap here is that candidates confuse the Serverless compute tier (which auto-pauses for cost savings on a single database) with the Elastic Pool (which shares resources across multiple databases), leading them to choose Serverless for cost minimization without recognizing the need for resource pooling across 20 databases.

How to eliminate wrong answers

Option B is wrong because the Serverless compute tier is designed for a single database with intermittent usage, not for pooling resources across 20 databases; it would require separate provisioning per database, leading to higher costs and no resource sharing. Option C is wrong because Azure SQL Managed Instance is a full SQL Server instance intended for lift-and-shift migrations with instance-level features like SQL Agent, not for cost-optimized multi-database scenarios with bursty peaks. Option D is wrong because Hyperscale is built for very large databases (over 1 TB) with high throughput and rapid scaling, not for pooling multiple smaller databases (5-100 GB) to minimize costs.

746
MCQmedium

A retail company is designing a product catalog for its e-commerce website. Each product has a unique ProductID, a name, a price, and a variable number of attributes (e.g., size, color, weight) that differ across product categories. The application requires ability to read a product's details by ProductID with single-digit millisecond latency from any Azure region globally. The schema must be flexible to accommodate new attributes without schema changes. Which Azure data store should the company choose?

A.Azure Cosmos DB using the NoSQL API
B.Azure Table Storage
C.Azure SQL Database
D.Azure Blob Storage
AnswerA

Correct. Azure Cosmos DB NoSQL API offers flexible schemas, global distribution, and guaranteed low-latency reads under 10 ms for point reads.

Why this answer

Azure Cosmos DB with the NoSQL API is correct because it provides a fully managed, globally distributed NoSQL database that supports flexible schemas (allowing variable product attributes without schema changes) and guarantees single-digit millisecond read latency at any scale from any Azure region via its multi-region write and read replicas. The unique ProductID serves as a natural partition key, enabling efficient point reads with consistent low latency.

Exam trap

The trap here is that candidates often confuse Azure Table Storage's flexible schema and global distribution with Cosmos DB's performance guarantees, overlooking the specific single-digit millisecond latency requirement that only Cosmos DB can consistently meet across all regions.

How to eliminate wrong answers

Option B (Azure Table Storage) is wrong because while it offers a flexible schema and global distribution, it does not guarantee single-digit millisecond latency for point reads across regions; its latency is typically higher and less consistent than Cosmos DB. Option C (Azure SQL Database) is wrong because it enforces a fixed relational schema, requiring schema changes (ALTER TABLE) to add new product attributes, and its global read latency is not optimized for single-digit millisecond reads from any region without complex geo-replication setups. Option D (Azure Blob Storage) is wrong because it is an object store for unstructured blobs, not a database; it lacks native query capabilities for individual product details by ID and cannot provide single-digit millisecond read latency for structured data access.

747
MCQeasy

A data scientist needs to analyze historical sales data to identify yearly trends. They run SQL queries that aggregate millions of rows. No new data is being added during analysis. Which type of data processing workload does this represent?

A.Online Transaction Processing (OLTP)
B.Online Analytical Processing (OLAP)
C.Batch processing
D.Stream processing
AnswerB

OLAP is used for complex queries and aggregations on historical data, which matches the scenario.

Why this answer

This workload is Online Analytical Processing (OLAP) because the data scientist is running complex SQL queries that aggregate millions of rows of historical sales data to identify yearly trends. OLAP is designed for read-intensive, analytical queries that summarize large volumes of static data, which matches the scenario where no new data is being added during analysis.

Exam trap

Microsoft often tests the distinction between OLTP and OLAP by presenting a scenario with 'SQL queries' and 'aggregation,' leading candidates to mistakenly think any SQL query implies OLTP, when in fact the analytical nature and static dataset clearly indicate OLAP.

How to eliminate wrong answers

Option A is wrong because Online Transaction Processing (OLTP) is optimized for high-volume, low-latency insert/update/delete operations (e.g., order entry), not for aggregating millions of rows for trend analysis. Option C is wrong because batch processing typically involves processing large volumes of data in scheduled, automated jobs (e.g., nightly ETL), whereas this scenario is an interactive analytical query run by a data scientist, not a scheduled batch job. Option D is wrong because stream processing handles continuous, real-time data flows (e.g., sensor data or clickstreams) with low latency, but the question explicitly states no new data is being added during analysis, making it a static dataset.

748
MCQmedium

You are reviewing a Data Factory mapping data flow definition. What is the primary purpose of this data flow?

A.Pivot the data by OrderID
B.Filter rows where OrderID is null
C.Remove duplicate OrderIDs by counting them
D.Merge two data sources
AnswerC

The aggregate counts OrderID, effectively identifying duplicates.

Why this answer

Option C is correct because the data flow reads CSV, performs an aggregate (count by OrderID) to remove duplicates, and outputs Parquet. Option A is wrong because it does not filter rows. Option B is wrong because it doesn't join.

Option D is wrong because it doesn't pivot.

749
MCQeasy

A small business needs a cost-effective relational database for a new web application. The workload is light and predictable. They want to minimize administrative overhead. Which Azure service should they choose?

A.Azure Database for PostgreSQL
B.SQL Server on Azure Virtual Machines
C.Azure SQL Database (provisioned DTU)
D.Azure SQL Database serverless
AnswerD

Serverless automatically scales and pauses, ideal for light workloads.

Why this answer

Option A is correct because Azure SQL Database serverless is designed for intermittent, unpredictable workloads and automatically pauses when idle, reducing cost. Option B is wrong because Azure SQL Database provisioned is for steady-state workloads. Option C is wrong because SQL Server on Azure VMs is IaaS and requires management.

Option D is wrong because Azure Database for PostgreSQL is fully managed but PostgreSQL-specific; Azure SQL Database serverless is more cost-effective for light relational workloads.

750
MCQeasy

Refer to the exhibit. A data engineer is reviewing an ARM template for a storage account. What does the property 'isHnsEnabled' set to true indicate?

A.Blob soft delete is enabled
B.The storage account supports Azure Data Lake Storage Gen2
C.Versioning is enabled for blobs
D.Geo-redundant storage is configured
AnswerB

HNS enables Data Lake Storage Gen2.

Why this answer

The 'isHnsEnabled' property, when set to true, enables the Hierarchical Namespace (HNS) on the storage account. HNS is the core feature that differentiates Azure Data Lake Storage Gen2 from a standard blob storage account, allowing for a true file system hierarchy with POSIX-like access control lists (ACLs). This enables the storage account to support Data Lake Storage Gen2 workloads.

Exam trap

The trap here is that candidates confuse 'isHnsEnabled' with other blob-level features like soft delete or versioning, because all three are often discussed in the context of data protection and management, but only HNS is specific to Data Lake Storage Gen2.

How to eliminate wrong answers

Option A is wrong because blob soft delete is controlled by the 'deleteRetentionPolicy' property on the blob service, not by 'isHnsEnabled'. Option C is wrong because blob versioning is enabled via the 'isVersioningEnabled' property on the blob service, not by 'isHnsEnabled'. Option D is wrong because geo-redundant storage (GRS) is a replication setting configured via the 'sku.name' property (e.g., 'Standard_GRS'), not by 'isHnsEnabled'.

Page 9

Page 10 of 14

Page 11