Microsoft Azure Data Fundamentals DP-900 (DP-900) — Questions 976982

982 questions total · 14pages · All types, answers revealed

Page 13

Page 14 of 14

976
MCQhard

A financial services company runs large-scale analytical queries on a dedicated SQL pool in Azure Synapse Analytics. They notice that during peak hours, complex aggregations consume excessive resources, causing slower queries from other users. They need to ensure that critical management reports always get enough resources and complete within a guaranteed time, while other less important queries do not starve them. Which feature should they implement?

A.Result-set caching
B.Materialized views
C.Workload management
D.Columnstore index
AnswerC

Workload management uses workload groups and classifiers to allocate resources and prioritize critical queries, ensuring predictable performance.

Why this answer

Workload management in Azure Synapse Analytics allows you to classify, assign resources, and prioritize queries by creating workload groups and classifiers. By configuring a workload group for critical management reports with a higher importance and a guaranteed minimum resource percentage, you ensure those queries always get sufficient resources and complete within a guaranteed time, while less important queries are throttled and cannot starve the critical ones.

Exam trap

The trap here is that candidates often confuse performance optimization features (caching, materialized views, indexes) with resource governance, assuming any performance improvement will solve concurrency and starvation issues, but only workload management provides explicit prioritization and resource allocation.

How to eliminate wrong answers

Option A is wrong because result-set caching stores the results of queries to reduce latency for repeated executions, but it does not control resource allocation or prioritize queries during peak loads. Option B is wrong because materialized views pre-compute and store aggregated data to improve query performance, but they do not provide resource governance or guarantee completion times for specific workloads. Option D is wrong because columnstore indexes improve compression and query performance for analytical workloads, but they do not manage concurrency or resource allocation among different users or query classes.

977
Multi-Selecthard

A globally distributed online auction platform uses a replicated database system across multiple Azure regions. The system must continue accepting bids (writes) even if a network partition occurs between regions, because auctions cannot be interrupted. The business decides that during a partition, some users might see slightly outdated item prices (read inconsistency) but all bids must be recorded. According to the CAP theorem, which two properties is this system prioritizing?

Select 2 answers
A.Availability (A) and Partition Tolerance (P)
B.Consistency (C) and Partition Tolerance (P)
C.Consistency (C) and Availability (A)
D.Durability and Availability
AnswersA, D

The system must remain available to accept bids even when network partitions occur, so it ensures Partition Tolerance (P). It also prioritizes Availability (A) by allowing writes to continue in all regions. As a result, Consistency (C) is sacrificed, meaning different regions may return different data temporarily.

Why this answer

The system must continue accepting bids (writes) even during a network partition, which means it prioritizes Availability (A) — every request receives a response, even if it's not the most recent data. It also must function across multiple Azure regions that can become disconnected, which requires Partition Tolerance (P) — the system continues to operate despite network splits. The trade-off is that Consistency (C) is sacrificed, as users may see slightly outdated item prices during a partition.

This is a classic AP (Availability and Partition Tolerance) choice from the CAP theorem.

Exam trap

The trap here is that candidates often confuse the CAP theorem's 'Consistency' with ACID consistency or durability, or they mistakenly think 'Availability' means the system is always up, when in CAP it specifically means every request receives a non-error response even during a partition.

978
MCQhard

A company uses Azure Cosmos DB with the MongoDB API for a customer profile service. The service handles 10,000 writes per second and 50,000 reads per second. The data is 1 KB per document. The company needs to reduce read latency for frequently accessed customers and minimize RU consumption. Currently, the service reads the entire document for every request. They decide to implement a materialized view pattern using Azure Cosmos DB change feed and a separate container. Which additional step should they take to optimize read performance and cost?

A.Create a materialized view container with a partition key optimized for the read queries.
B.Use stored procedures to aggregate data on read.
C.Increase the provisioned RU/s on the source container.
D.Enable Time-to-Live (TTL) on the source container to automatically expire old data.
AnswerA

Materialized views with optimized partition keys reduce RU consumption and latency for common queries.

Why this answer

Using a separate container with a different partition key (option B) allows the materialized view to be optimized for the read pattern, reducing RU cost and latency. Option A (increasing RU) does not address the root cause. Option C (adding TTL) would delete data, not improve reads.

Option D (using stored procedures) is for transactional logic, not read optimization.

979
MCQeasy

A logistics company stores shipping waybill data as JSON documents. Each document contains fields like 'shipmentId', 'destination', and 'items', but the number of items and the fields within each item can vary between shipments. Which category best describes this type of data?

A.Operational data
B.Semi-structured data
C.Unstructured data
D.Structured data
AnswerB

JSON documents with optional fields and variable structures are a classic example of semi-structured data, which has some organizational properties but no rigid schema.

Why this answer

JSON documents with varying fields and nested structures like 'items' that differ between shipments are a classic example of semi-structured data. Unlike structured data with a fixed schema, semi-structured data uses tags or markers (like JSON key-value pairs) to separate data elements, allowing for flexibility in the number and type of fields per record. This aligns with the DP-900 definition of semi-structured data, which includes formats such as JSON, XML, and Parquet.

Exam trap

The trap here is that candidates confuse 'semi-structured' with 'unstructured' because JSON appears flexible, but JSON is still structured with key-value pairs, unlike truly unstructured data like audio or video files.

How to eliminate wrong answers

Option A is wrong because operational data refers to data used for day-to-day business operations (e.g., transaction logs, sensor readings), not a classification of data structure. Option C is wrong because unstructured data lacks a predefined data model or schema entirely (e.g., images, videos, plain text), whereas JSON has a defined structure with keys and values. Option D is wrong because structured data requires a rigid schema with fixed fields and data types (e.g., SQL tables), which does not apply to JSON documents where fields like 'items' can vary in count and structure.

980
MCQmedium

You are the database administrator for a large financial institution migrating their core banking system to Azure. The system uses SQL Server with many stored procedures, triggers, and CLR assemblies. The database is 2 TB and growing. The migration must minimize application changes and support high availability with automatic failover. You need to select an Azure relational database service. What should you choose?

A.Azure SQL Database
B.Azure Database for PostgreSQL
C.SQL Server on Azure Virtual Machines
D.Azure SQL Managed Instance
AnswerD

Offers high compatibility, supports CLR, and provides built-in high availability.

Why this answer

Option C is correct because Azure SQL Managed Instance provides near 100% compatibility with SQL Server, supports CLR, and offers high availability with automatic failover. Option A (Azure SQL Database) lacks support for CLR and cross-database queries. Option B (SQL Server on Azure VM) requires manual high availability setup.

Option D (Azure Database for PostgreSQL) requires significant application changes.

981
Multi-Selectmedium

Which TWO Azure services can be used to perform interactive ad-hoc analytics on large datasets using Apache Spark?

Select 2 answers
A.Azure Analysis Services
B.Azure HDInsight
C.Azure Databricks
D.Azure Synapse Analytics
E.Azure Data Lake Storage
AnswersC, D

Fully managed Spark platform for analytics.

Why this answer

Azure Databricks is correct because it provides a fully managed Apache Spark platform optimized for interactive ad-hoc analytics, allowing data engineers and data scientists to run Spark jobs in collaborative notebooks with auto-scaling clusters. It supports interactive queries, real-time dashboards, and machine learning workloads on large datasets using Spark's in-memory processing engine.

Exam trap

The trap here is that candidates often confuse Azure HDInsight (which does support Spark) with a service optimized for interactive ad-hoc analytics, but HDInsight is more suited for batch and scheduled workloads, not the real-time, collaborative, and auto-scaling environment that Databricks provides.

982
MCQmedium

A data engineering team needs to build a batch ETL pipeline that transforms large volumes of clickstream data stored as CSV files in Azure Data Lake Storage Gen2. The transformations require running distributed Python and Scala code using Apache Spark. The transformed data will be loaded into a data warehouse for reporting. The team wants a serverless compute environment that automatically scales and charges per second. Which Azure service should they use to run the Spark transformations?

A.Azure Synapse Analytics (Spark pools)
B.Azure Data Factory
C.Azure Stream Analytics
D.Azure Analysis Services
AnswerA

Azure Synapse Analytics provides serverless and dedicated Spark pools that can run distributed Spark jobs on data in ADLS Gen2. It integrates tightly with the data lake and offers per-second billing for serverless pools.

Why this answer

Azure Synapse Analytics (Spark pools) is the correct choice because it provides a serverless Apache Spark compute environment that automatically scales and charges per second, perfectly matching the requirement for running distributed Python and Scala transformations on large volumes of clickstream data stored in Azure Data Lake Storage Gen2. The service integrates directly with the data lake and can load transformed results into a dedicated SQL pool for data warehouse reporting.

Exam trap

The trap here is that candidates often confuse Azure Data Factory's ability to orchestrate Spark jobs with actually running Spark code, leading them to select it instead of recognizing that Synapse Spark pools are the dedicated compute service for executing distributed Python/Scala transformations.

How to eliminate wrong answers

Option B (Azure Data Factory) is wrong because it is an orchestration and data integration service, not a compute engine for running distributed Spark code; it can trigger Spark jobs but does not execute Python or Scala transformations itself. Option C (Azure Stream Analytics) is wrong because it is designed for real-time stream processing using SQL-like queries, not for batch ETL transformations with Spark code. Option D (Azure Analysis Services) is wrong because it is a semantic modeling and reporting layer for tabular data, not a compute environment for running Spark transformations.

Page 13

Page 14 of 14