CCNA Core Data Concepts Questions — Page 4 of 4

226

MCQmedium

A retail company captures real-time sensor data from IoT devices to detect anomalies and predict equipment failures. The data must be processed immediately as it arrives. Which type of data processing workload best describes this scenario?

A.Batch processing

B.Streaming processing

C.Online transaction processing (OLTP)

D.Data warehousing

AnswerB

Streaming processing ingests and analyzes data in real time, enabling prompt anomaly detection and failure prediction from IoT sensor feeds.

Why this answer

B is correct because streaming processing is designed for continuous, real-time data ingestion and immediate analysis, which matches the requirement to process sensor data as it arrives. Technologies like Azure Stream Analytics or Apache Kafka enable low-latency processing of IoT data streams to detect anomalies and predict failures without batching.

Exam trap

Microsoft often tests the distinction between batch and streaming by describing a scenario with 'immediate' or 'real-time' requirements, and candidates mistakenly choose batch processing because they overlook the latency constraint.

How to eliminate wrong answers

Option A is wrong because batch processing processes data in large, scheduled chunks (e.g., hourly or daily), which introduces latency and cannot handle real-time sensor data that must be processed immediately. Option C is wrong because OLTP focuses on managing transactional operations (e.g., order entry, inventory updates) with ACID compliance, not on continuous, high-velocity stream analytics. Option D is wrong because data warehousing is optimized for storing and querying historical, structured data for reporting and BI, not for real-time ingestion and immediate anomaly detection.

Practice this question →

227

MCQhard

A healthcare organization must store patient records with strict compliance requirements. They need to classify data as public, internal, or confidential, and apply encryption and access policies accordingly. Which Microsoft Purview feature should they use?

A.Microsoft Purview Data Map

B.Azure Policy

C.Microsoft Defender for Cloud

D.Azure Information Protection

AnswerA

Purview Data Map allows scanning and classifying data assets across sources.

Why this answer

Microsoft Purview Data Map is the correct choice because it provides a unified data governance solution that enables automated data classification (public, internal, confidential) across hybrid and multi-cloud environments. It integrates with sensitivity labels and encryption policies to enforce access controls based on classification, meeting strict compliance requirements for patient records.

Exam trap

The trap here is that candidates often confuse Azure Information Protection (a legacy labeling tool) with Microsoft Purview Data Map, not realizing that Purview provides the unified data governance and automated classification capabilities required for compliance-driven data management.

How to eliminate wrong answers

Option B (Azure Policy) is wrong because it enforces organizational rules and compliance standards on Azure resources (e.g., tagging, location restrictions) but does not natively classify data or apply encryption/access policies at the data level. Option C (Microsoft Defender for Cloud) is wrong because it focuses on cloud security posture management, threat detection, and vulnerability assessment, not on data classification or granular access policies. Option D (Azure Information Protection) is wrong because it is a legacy labeling and classification solution that has been superseded by Microsoft Purview Information Protection; it lacks the unified data map and automated scanning capabilities of Purview Data Map.

Practice this question →

228

MCQeasy

A social media application displays the number of posts each user has created. After a user submits a new post, the count must reflect the update across all servers within a few seconds. Which data consistency model best describes this requirement?

A.Strong consistency

B.Eventual consistency

C.Sequential consistency

D.Causal consistency

AnswerB

Eventual consistency allows updates to propagate asynchronously to replicas, guaranteeing that if no further updates occur, all replicas will return the same value after a short period. This matches the requirement of reflecting the update within a few seconds.

Why this answer

Eventual consistency is correct because the requirement allows a few seconds for the update to propagate across all servers, meaning the system does not guarantee immediate uniformity but will converge to the same count eventually. This is typical in distributed systems like social media applications where high availability and partition tolerance are prioritized over immediate consistency, often using techniques like asynchronous replication.

Exam trap

The trap here is that candidates confuse 'eventual consistency' with 'weak consistency' or assume that any delay means strong consistency is required, but the key is the explicit tolerance of a few seconds, which aligns with eventual consistency's convergence guarantee.

How to eliminate wrong answers

Option A is wrong because strong consistency would require all servers to reflect the new post count immediately upon write, which conflicts with the 'within a few seconds' tolerance and would impose performance penalties in a distributed system. Option C is wrong because sequential consistency ensures operations appear in a global order consistent with program order, which is stricter than needed and not typically used for simple count updates across servers. Option D is wrong because causal consistency preserves the order of causally related events, which is unnecessary for a simple counter update that has no causal dependencies with other operations.

Practice this question →

229

MCQeasy

A retail company processes customer orders throughout the day. Each order involves inserting a new record into a database table, updating inventory counts, and deleting temporary cart data. At the end of each week, the company runs a query that aggregates all orders by product category and region to generate a sales report. Which of the following best describes these two workloads?

A.Order processing is OLAP; weekly reporting is OLTP

B.Order processing is batch processing; weekly reporting is streaming processing

C.Order processing is OLTP; weekly reporting is OLAP

D.Both workloads are OLTP

AnswerC

Order processing is transactional (OLTP) and the weekly report is analytical (OLAP). This is a correct distinction between the two common data workload patterns.

Why this answer

Order processing involves frequent, small transactions (inserts, updates, deletes) that are typical of Online Transaction Processing (OLTP) workloads, which prioritize data integrity and low-latency writes. The weekly sales report aggregates large volumes of historical data by product category and region, which is characteristic of Online Analytical Processing (OLAP) workloads that support complex queries and data summarization. Option C correctly identifies these two distinct workload types.

Exam trap

The trap here is that candidates confuse the terms OLTP and OLAP, mistakenly thinking that any database operation is OLTP or that reporting is always OLTP, when in fact the key differentiator is the workload pattern—transactional vs. analytical.

How to eliminate wrong answers

Option A is wrong because it reverses the definitions: order processing is OLTP (not OLAP) due to its transactional nature, and weekly reporting is OLAP (not OLTP) because it involves heavy aggregation over historical data. Option B is wrong because order processing is not batch processing—it occurs in real-time as each order is placed, and weekly reporting is not streaming processing; it is a scheduled batch job that runs at fixed intervals. Option D is wrong because both workloads are not OLTP; the weekly reporting query performs large-scale aggregations that would degrade OLTP performance and is designed for OLAP systems.

Practice this question →

230

MCQeasy

A data engineer needs to load data from an on-premises SQL Server database to Azure Synapse Analytics every hour with minimal latency. Which Azure service should they use?

A.Azure Databricks

B.Azure Data Factory

C.Azure SQL Database

D.Azure HDInsight

AnswerB

Cloud-based ETL service that can run pipelines every hour with low latency.

Why this answer

Azure Data Factory (ADF) is the correct choice because it provides a fully managed, code-free ETL service that can connect to on-premises SQL Server via self-hosted integration runtime, and load data into Azure Synapse Analytics with low latency using a scheduled trigger (e.g., every hour). ADF supports incremental data loading and parallel copy activities, minimizing latency while handling the required frequency.

Exam trap

The trap here is that candidates often confuse Azure Data Factory with Azure Databricks or HDInsight, assuming any big data or analytics service can handle scheduled data ingestion, but only ADF is purpose-built for orchestration and low-latency data movement from on-premises sources.

How to eliminate wrong answers

Option A is wrong because Azure Databricks is an Apache Spark-based analytics platform designed for big data processing and machine learning, not a dedicated data ingestion or orchestration service; it lacks native scheduling and on-premises connectivity for hourly low-latency loads without additional setup. Option C is wrong because Azure SQL Database is a relational database service, not a data integration or orchestration tool; it cannot directly load data from on-premises SQL Server into Synapse Analytics on a schedule. Option D is wrong because Azure HDInsight is a managed Hadoop/Spark cluster service for big data analytics, not a data movement or orchestration service; it requires custom scripting and manual scheduling to perform hourly loads, adding complexity and latency.

Practice this question →

231

MCQeasy

A company stores customer data in a SQL Server database table with columns: CustomerID (integer), Name (varchar), Email (varchar), SignupDate (date). All rows adhere to this schema. Which type of data does this represent?

A.Structured data

B.Unstructured data

C.Semi-structured data

D.Transactional data

AnswerA

Correct. The data is stored in a relational table with a fixed schema, which is the definition of structured data.

Why this answer

This data is structured because it conforms to a fixed schema with clearly defined columns (CustomerID, Name, Email, SignupDate) and data types (integer, varchar, date). In SQL Server, structured data is stored in tables with rows and columns, enabling efficient querying via T-SQL and indexing. The consistent adherence to the schema across all rows is the hallmark of structured data.

Exam trap

The trap here is that candidates confuse the content of the data (e.g., customer information) with its structure, or mistakenly think that any data in a database is automatically structured, ignoring the distinction between structured, semi-structured, and unstructured formats.

How to eliminate wrong answers

Option B is wrong because unstructured data has no predefined schema or organization (e.g., text files, images, videos), whereas this table has a rigid schema. Option C is wrong because semi-structured data (e.g., JSON, XML) allows schema flexibility and nested structures, but this table enforces fixed columns and data types. Option D is wrong because transactional data refers to records of business transactions (e.g., sales orders, payments), not the general classification of data format; this table could store transactional data, but the question asks about the type of data based on its structure.

Practice this question →

232

MCQeasy

A company ingests streaming data from social media feeds and needs to process and analyze the data in real time. Which Azure service should they use to capture the stream?

A.Azure Stream Analytics

B.Azure IoT Hub

C.Azure Event Hubs

D.Azure Data Lake Storage

AnswerC

Azure Event Hubs is a scalable event ingestion service for streaming data.

Why this answer

Azure Event Hubs is a fully managed, real-time data ingestion service designed to capture and process millions of events per second from sources like social media feeds. It provides a scalable, low-latency endpoint for streaming data, making it the correct choice for capturing the stream before further analysis.

Exam trap

The trap here is that candidates confuse Azure Stream Analytics (a processing service) with Event Hubs (an ingestion service), or assume IoT Hub is suitable for non-IoT streaming data due to its similar event ingestion capability.

How to eliminate wrong answers

Option A is wrong because Azure Stream Analytics is a stream processing engine that analyzes data in motion, not a capture/ingestion service; it typically consumes from Event Hubs or IoT Hub. Option B is wrong because Azure IoT Hub is specifically built for bidirectional communication with IoT devices, not for general-purpose social media stream ingestion, and it lacks the high-throughput, multi-protocol ingestion capabilities of Event Hubs. Option D is wrong because Azure Data Lake Storage is a hierarchical file store for batch and analytics workloads, not a real-time streaming capture service; it cannot ingest streaming data directly without an intermediary like Event Hubs or Stream Analytics.

Practice this question →

233

MCQeasy

A company stores customer data in a SQL Server table with fixed columns (CustomerID, Name, Email, SignupDate). The company also stores application logs as JSON documents and marketing images as JPEG files. Which data type describes the customer data?

A.Structured data

B.Semi-structured data

C.Unstructured data

D.Relational data

AnswerA

Correct. A SQL table with defined columns and data types is the classic example of structured data.

Why this answer

Customer data stored in a SQL Server table with fixed columns (CustomerID, Name, Email, SignupDate) follows a rigid schema where each row has the same set of columns with defined data types. This conforms to the relational model, making it structured data. Structured data is organized into rows and columns with a fixed schema, enabling efficient querying via SQL.

Exam trap

The trap here is that candidates confuse 'relational data' (a storage model) with 'structured data' (a data type category), leading them to pick D instead of A, even though the question explicitly asks for the data type.

How to eliminate wrong answers

Option B is wrong because semi-structured data (e.g., JSON, XML) does not enforce a fixed schema; it allows flexible key-value pairs or nested structures, which does not match the fixed-column SQL Server table. Option C is wrong because unstructured data (e.g., JPEG images, plain text files) lacks a predefined data model or organization, unlike the tabular customer data. Option D is wrong because 'relational data' is not a data type category in the DP-900 core data concepts; it describes a storage model (relational databases) that can hold structured data, but the question asks for the data type, not the storage model.

Practice this question →

234

MCQeasy

A retail company maintains a database of customer information including CustomerID, Name, Address, and Phone. Each record follows the same fixed schema. This type of data is best described as:

A.Structured data

B.Semi-structured data

C.Unstructured data

D.Relational data

AnswerA

Structured data has a fixed schema, like a table with defined columns.

Why this answer

Structured data conforms to a fixed schema where each record has the same fields (CustomerID, Name, Address, Phone) and data types, making it ideal for relational database storage. This rigid, tabular format allows efficient querying using SQL and enforces consistency across all rows.

Exam trap

The trap here is that candidates confuse 'relational data' (a storage model) with 'structured data' (a data type), leading them to select Option D, but the DP-900 exam categorizes data by its structure, not by the database system used to store it.

How to eliminate wrong answers

Option B is wrong because semi-structured data (e.g., JSON, XML) does not enforce a fixed schema; fields can vary between records, unlike the uniform schema described. Option C is wrong because unstructured data (e.g., images, videos, text files) has no predefined structure or schema, whereas customer records with fixed fields are clearly organized. Option D is wrong because 'relational data' is not a data type category in the DP-900 taxonomy; it refers to a database model that stores structured data, but the question asks for the data type itself, not the storage model.

Practice this question →

235

Multi-Selecthard

Which THREE are valid Azure data storage services that support semi-structured data?

Select 3 answers

A.Azure Cosmos DB

B.Azure SQL Database

C.Azure Table Storage

D.Azure Blob Storage

E.Azure Data Lake Storage

AnswersA, C, D

Supports JSON documents and semi-structured data.

Why this answer

Azure Cosmos DB is a fully managed NoSQL database service that natively supports semi-structured data through its flexible schema model, allowing documents, key-value pairs, and graph data. It provides multiple APIs (SQL, MongoDB, Cassandra, Gremlin, and Table) to interact with semi-structured data, making it a valid choice for this question.

Exam trap

The trap here is that candidates may incorrectly assume Azure SQL Database supports semi-structured data because it can store JSON in columns, but it still requires a predefined relational schema and does not natively handle flexible schemas like a true NoSQL service.

Practice this question →

236

MCQeasy

A data file contains records for customer orders. Each record has fields for OrderID, CustomerID, and OrderDate that are present in every record. However, some records include an optional 'DiscountCode' field, and others include an optional 'GiftMessage' field. The file is stored in JSON format. Which type of data does this file represent?

A.Structured data

B.Semi-structured data

C.Unstructured data

D.Transactional data

AnswerB

Correct. Semi-structured data has some organizational properties (like tags or keys) but allows variations in the schema. JSON documents with optional fields fall into this category.

Why this answer

The JSON file contains records with a fixed set of fields (OrderID, CustomerID, OrderDate) that are always present, but also includes optional fields (DiscountCode, GiftMessage) that may appear in some records but not others. This mix of a consistent schema with flexible, self-describing fields is the hallmark of semi-structured data. JSON itself is a semi-structured format because it uses key-value pairs and allows nested or optional attributes without requiring a rigid schema.

Exam trap

The trap here is that candidates confuse 'semi-structured' with 'unstructured' because they see optional fields and think the data has no structure, but the presence of a consistent base schema (OrderID, CustomerID, OrderDate) clearly distinguishes it as semi-structured.

How to eliminate wrong answers

Option A is wrong because structured data requires a fixed schema (e.g., a relational table with predefined columns), but this JSON file allows optional fields that may be missing from some records, violating the strict schema requirement. Option C is wrong because unstructured data has no predefined structure or organization (e.g., raw text, images, audio), whereas this file has a consistent base schema with OrderID, CustomerID, and OrderDate in every record. Option D is wrong because transactional data refers to data that records events or transactions (like orders), but this is a classification of data content, not a classification of data structure; the question asks about the type of data based on its format, not its business use.

Practice this question →

237

MCQmedium

Your company stores customer data in Azure Blob Storage. To comply with data residency regulations, you must ensure data is replicated within the same Azure region. Which replication option should you choose?

A.Zone-redundant storage (ZRS)

B.Locally-redundant storage (LRS)

C.Geo-redundant storage (GRS)

D.Read-access geo-redundant storage (RA-GRS)

AnswerB

LRS replicates within a single region, keeping data resident.

Why this answer

Locally-redundant storage (LRS) replicates data three times within a single physical location in the same Azure region, ensuring data residency compliance by never copying data outside that region. This is the only option that guarantees all replicas stay within one region without any cross-region or cross-zone replication.

Exam trap

The trap here is that candidates often confuse 'replication within the same region' with 'zone-redundant storage' (ZRS) because ZRS also stays within the region, but the question's emphasis on 'data residency' and 'same region' is designed to test whether you know that LRS is the simplest and most restrictive option that keeps all copies in a single location, while ZRS still uses multiple zones which may be considered separate data centers for some compliance definitions.

How to eliminate wrong answers

Option A is wrong because Zone-redundant storage (ZRS) replicates data synchronously across three Azure availability zones within the same region, which still satisfies data residency but is not the simplest or most cost-effective choice when only intra-region replication is required; however, the question asks for the option that ensures data is replicated within the same region, and ZRS does that, but LRS is more directly aligned with the 'same region' requirement without zone-level distribution. Option C is wrong because Geo-redundant storage (GRS) replicates data to a secondary region that is hundreds of miles away, violating data residency regulations that require data to stay within a single region. Option D is wrong because Read-access geo-redundant storage (RA-GRS) also replicates data to a secondary region and additionally provides read access to that secondary copy, which still breaks the data residency constraint.

Practice this question →

238

MCQeasy

A ride-sharing company processes trip requests from customers. Each trip is recorded as a single transaction that updates the driver's status, calculates the fare, and logs the ride. At the end of each month, the company runs reports that aggregate millions of trips to determine average wait times and revenue per driver. Which pair of terms best describes these two distinct workloads?

A.OLTP and OLAP

B.Batch processing and stream processing

C.ETL and ELT

D.Relational and non-relational

AnswerA

Correct. OLTP handles the individual trip transactions, while OLAP handles the monthly reporting and aggregation.

Why this answer

The first workload (trip request processing) is a classic OLTP (Online Transaction Processing) system because each trip is a single, atomic transaction that updates driver status, calculates fare, and logs the ride in real time. The second workload (monthly aggregation reports) is OLAP (Online Analytical Processing) because it queries millions of historical trip records to compute averages and revenue summaries. These two patterns have fundamentally different data storage and query optimization requirements, making OLTP and OLAP the correct pair.

Exam trap

The trap here is that candidates confuse the processing method (batch/stream) with the workload type (OLTP/OLAP), but the question specifically asks for the pair that best describes the distinct workloads—transactional updates vs. analytical reporting—which is the classic OLTP vs. OLAP distinction.

How to eliminate wrong answers

Option B is wrong because batch processing and stream processing describe data ingestion patterns, not the transactional vs. analytical nature of the workloads; the trip requests are processed individually (not in batches or streams), and the monthly reports are batch analytics, but the question asks for the pair that best describes the distinct workloads, not the processing method. Option C is wrong because ETL and ELT are data integration processes (Extract, Transform, Load vs. Extract, Load, Transform) used to move data between systems, not the fundamental workload types themselves.

Option D is wrong because relational and non-relational refer to database models (structured tables vs. flexible schemas), which are orthogonal to the transactional vs. analytical distinction; both workloads could be implemented using either model.

Practice this question →

239

MCQeasy

A company stores customer records in a relational database table with fixed columns (CustomerID, Name, Email). They also store product reviews as JSON documents that may contain varying fields such as Rating, Comment, and optional Tags. Additionally, they store product images as JPEG files. Which of the following correctly orders these data types from most structured to least structured?

A.JSON documents, relational table, image files

B.Relational table, JSON documents, image files

C.Image files, relational table, JSON documents

D.Relational table, image files, JSON documents

AnswerB

This is correct: relational tables have a fixed schema (most structured), JSON allows flexible fields (semi-structured), and image files are binary with no inherent structure (unstructured).

Why this answer

Relational tables enforce a fixed schema with predefined columns and data types, making them the most structured. JSON documents have a flexible schema where fields like Tags are optional, placing them in the middle. Image files are binary blobs with no inherent structure, making them the least structured.

Option B correctly orders these from most structured (relational table) to least structured (image files).

Exam trap

The trap here is that candidates often confuse semi-structured JSON with unstructured data, or assume that all data with a format (like JPEG headers) is structured, but the key distinction is schema rigidity and queryability.

How to eliminate wrong answers

Option A is wrong because it places JSON documents ahead of relational tables, but JSON documents have a flexible schema (optional fields, varying types) while relational tables enforce a rigid schema with fixed columns and constraints, making tables more structured. Option C is wrong because it places image files first, but image files are unstructured binary data with no schema, while relational tables and JSON documents both have some level of structure. Option D is wrong because it places image files before JSON documents, but JSON documents have a defined structure (key-value pairs, nesting) whereas image files are completely unstructured binary blobs.

Practice this question →

240

MCQhard

A data engineer needs to implement a solution that provides near real-time analytics on clickstream data. The data arrives as JSON events and must be queryable with sub-second latency using SQL-like queries. The solution should minimize operational overhead. Which Azure service should they use?

A.Azure Stream Analytics

B.Azure Analysis Services

C.Azure Synapse Analytics

D.Azure Data Explorer

AnswerD

ADX is designed for near real-time analytics on streaming data, supports SQL-like KQL, and delivers sub-second query performance.

Why this answer

Azure Data Explorer (ADX) is designed for interactive analytics on large volumes of streaming and historical data with sub-second query latency using Kusto Query Language (KQL), which supports SQL-like syntax. It natively ingests JSON events, provides near real-time analytics, and minimizes operational overhead as a fully managed, serverless service.

Exam trap

The trap here is that candidates often confuse Azure Stream Analytics (a real-time processing engine) with Azure Data Explorer (an interactive analytics database), failing to recognize that the requirement for 'sub-second latency using SQL-like queries' on stored data points to a query engine, not a stream processor.

How to eliminate wrong answers

Option A is wrong because Azure Stream Analytics is a real-time stream processing engine that outputs to sinks (e.g., Power BI, Event Hubs) but does not natively support sub-second interactive SQL queries on stored data; it is designed for continuous queries, not ad-hoc analytics. Option B is wrong because Azure Analysis Services is an OLAP engine for semantic models and multidimensional cubes, not designed for raw clickstream JSON ingestion or sub-second query latency on streaming data. Option C is wrong because Azure Synapse Analytics is a big data analytics platform optimized for large-scale batch and interactive queries using dedicated SQL pools, but it incurs higher operational overhead and is not purpose-built for near real-time, sub-second latency on high-velocity streaming JSON events.

Practice this question →

241

MCQeasy

A retail chain collects sales data from all its stores at the end of each business day by exporting CSV files from each store's database. The data is then combined and analyzed to generate daily sales reports. Which type of data processing does this describe?

A.Batch processing

B.Real-time processing

C.Stream processing

D.Interactive query

AnswerA

Batch processing handles data in discrete, scheduled batches, which matches the daily collection and reporting cycle.

Why this answer

This describes batch processing because sales data is collected from each store at the end of the business day, exported as CSV files, and then combined and analyzed in a scheduled, non-continuous manner. Batch processing is ideal for large volumes of data that are processed at periodic intervals, such as daily sales reports, rather than requiring immediate action.

Exam trap

The trap here is that candidates confuse 'daily export' with 'real-time' because they think 'daily' implies frequent updates, but batch processing is defined by the scheduled, non-continuous nature of the data collection and processing, not the frequency.

How to eliminate wrong answers

Option B (Real-time processing) is wrong because real-time processing requires data to be processed immediately as it arrives, with sub-second latency, which does not match the end-of-day CSV export and batch analysis. Option C (Stream processing) is wrong because stream processing handles continuous, unbounded data flows (e.g., from IoT sensors or clickstreams) and processes each event incrementally, not by collecting files at a fixed time. Option D (Interactive query) is wrong because interactive query refers to ad-hoc, on-demand exploration of data (e.g., using SQL against a data warehouse), not a scheduled, automated daily report generation from exported files.

Practice this question →

242

MCQeasy

A company wants to run SQL queries on data stored in Azure Cosmos DB for NoSQL. Which API should they use?

A.Core (SQL) API

B.Gremlin API

C.Cassandra API

D.MongoDB API

AnswerA

Supports SQL queries on NoSQL data.

Why this answer

The Core (SQL) API is the native API for Azure Cosmos DB for NoSQL, designed to query JSON documents using a SQL-like syntax. Since the requirement is to run SQL queries on data stored in Azure Cosmos DB for NoSQL, this API directly supports that need without requiring any protocol translation or schema mapping.

Exam trap

The trap here is that candidates often confuse 'SQL queries' with the Cassandra API because both use a SQL-like language, but Cassandra uses CQL, not standard SQL, and is designed for a different data model (wide-column vs. document).

How to eliminate wrong answers

Option B (Gremlin API) is wrong because it is used for graph data models and queries using the Apache TinkerPop graph traversal language, not for SQL queries on NoSQL documents. Option C (Cassandra API) is wrong because it implements the Apache Cassandra wire protocol for wide-column stores and uses CQL (Cassandra Query Language), not standard SQL. Option D (MongoDB API) is wrong because it provides compatibility with MongoDB's document model and query syntax (e.g., BSON, find(), aggregate()), not SQL.

Practice this question →

243

Multi-Selectmedium

Which TWO of the following are correct descriptions of data processing workloads in Azure?

Select 2 answers

A.Streaming processing is used for interactive queries on historical data.

B.Streaming processing is used to process data at rest.

C.Streaming processing is used to process data in real time as it arrives.

D.Batch processing is used to process data in real time as it arrives.

E.Batch processing is used to process large volumes of data at scheduled intervals.

AnswersC, E

Streaming processes data continuously in real time.

Why this answer

Option C is correct because streaming processing in Azure (e.g., Azure Stream Analytics, Event Hubs, or Kafka on HDInsight) is designed to ingest, analyze, and act on data in near real-time as it arrives, often with sub-second latency. This is fundamentally different from batch processing, which handles data at rest.

Exam trap

The trap here is that candidates confuse 'streaming' with 'interactive querying' or assume batch can handle real-time data, but Azure explicitly separates these workloads based on data state (in motion vs. at rest) and latency requirements.

Practice this question →

244

MCQmedium

The exhibit shows a T-SQL query against an Azure SQL Database. What is the purpose of the HAVING clause in this query?

A.To sort the result set by TotalSales descending

B.To join two tables

C.To filter groups after aggregation

D.To filter rows before grouping

AnswerC

HAVING filters groups based on aggregate conditions.

Why this answer

The HAVING clause is used in T-SQL to filter groups after the GROUP BY clause has performed aggregation. In this query, it restricts the result set to only those product categories whose total sales (SUM(Amount)) exceed 1000, which is a condition on the aggregated value, not on individual rows.

Exam trap

The trap here is that candidates often confuse HAVING with WHERE, mistakenly thinking HAVING filters individual rows before grouping, when in fact WHERE performs that role and HAVING only applies after aggregation.

How to eliminate wrong answers

Option A is wrong because sorting the result set is done by the ORDER BY clause, not HAVING. Option B is wrong because joining tables is accomplished with JOIN clauses (e.g., INNER JOIN, LEFT JOIN), not HAVING. Option D is wrong because filtering rows before grouping is the role of the WHERE clause, which operates on individual rows before aggregation; HAVING filters after aggregation.

Practice this question →

245

MCQmedium

A healthcare application stores patient vital signs readings. Each reading is a JSON document with fields: PatientID, Timestamp, HeartRate, BloodPressure (systolic and diastolic). The application frequently queries for all readings of a specific patient within a time range, and the schema varies occasionally (e.g., new optional fields are added). How should this data be classified?

A.Structured

B.Semi-structured

C.Unstructured

D.Relational

AnswerB

Semi-structured data uses tags or markers (like JSON) to separate data elements and allows schema flexibility.

Why this answer

The data is semi-structured because it is stored as JSON documents, which have a flexible schema that can vary between records (e.g., new optional fields can be added). JSON documents are self-describing and do not require a fixed schema like relational tables, but they still have organizational properties (fields like PatientID, Timestamp) that distinguish them from unstructured data like plain text or images. The application's queries on specific fields (PatientID, Timestamp) further confirm the data has structure, but the schema flexibility rules out structured or relational classifications.

Exam trap

The trap here is that candidates confuse 'structured' with 'having fields'—they see PatientID and Timestamp and assume it must be structured, but the key differentiator is schema flexibility (optional fields, varying structure) which defines semi-structured data.

How to eliminate wrong answers

Option A is wrong because structured data requires a rigid, predefined schema (e.g., fixed columns and data types in a SQL table), but JSON documents allow schema variation and optional fields, which violates the strict schema constraint. Option C is wrong because unstructured data has no predefined data model or organization (e.g., raw text files, images, videos), whereas JSON documents have named fields and a hierarchical structure that can be parsed and queried. Option D is wrong because relational data is a subset of structured data that enforces relationships through foreign keys and normalization, but JSON documents in this scenario are not stored in relational tables and do not enforce referential integrity or a fixed schema.

Practice this question →

246

MCQhard

Your organization has a data warehouse in Azure Synapse Analytics. You need to load data from Azure Blob Storage daily, transforming it using a data flow. Which Azure service should you use for the ETL process?

A.Azure Databricks

B.Azure Data Factory

C.Azure Logic Apps

D.Azure Synapse Pipelines

AnswerB

Offers mapping data flows for visual ETL without coding.

Why this answer

Azure Data Factory (ADF) is the correct choice because it provides native integration with Azure Synapse Analytics and Azure Blob Storage, and it includes a visual data flow designer for transforming data without writing code. ADF's mapping data flows execute at scale on Spark clusters, making it ideal for daily ETL workloads that require both ingestion and transformation.

Exam trap

The trap here is that candidates confuse Azure Synapse Pipelines (which is just ADF inside Synapse) as a separate service, but the correct Azure service name for the ETL tool is Azure Data Factory, not Synapse Pipelines.

How to eliminate wrong answers

Option A is wrong because Azure Databricks is a big data analytics platform that requires you to write code (Python, Scala, SQL) to build transformations, and it does not have a native, no-code data flow designer like ADF; it is overkill for a simple daily load with transformations. Option C is wrong because Azure Logic Apps is a workflow automation service designed for integrating SaaS applications and orchestrating business processes, not for performing data transformations at scale or loading data into a data warehouse. Option D is wrong because Azure Synapse Pipelines is actually built on top of Azure Data Factory and shares the same engine, but the standalone service name for the ETL tool is Azure Data Factory; Synapse Pipelines is a feature within Synapse, not a separate service, and the question asks for the Azure service, which is Azure Data Factory.

Practice this question →

247

MCQmedium

A company stores customer data in a relational database. The database design includes a rule that every order must be associated with a valid customer ID that exists in the Customers table. This rule is an example of which data concept?

A.Referential integrity

B.Data normalization

C.Entity integrity

D.Data consistency

AnswerA

Referential integrity uses foreign keys to ensure values in one table match primary keys in another, exactly as described.

Why this answer

Referential integrity ensures that relationships between tables remain consistent. In a relational database, a foreign key constraint enforces that every order's customer ID must match an existing customer ID in the Customers table, preventing orphaned records. This rule directly implements referential integrity as defined by the SQL standard (e.g., via FOREIGN KEY constraints).

Exam trap

The trap here is that candidates often confuse referential integrity with entity integrity, mistakenly thinking that any rule involving a 'valid ID' is about primary keys, when in fact it is about foreign key relationships between tables.

How to eliminate wrong answers

Option B is wrong because data normalization is a design process to reduce data redundancy and avoid anomalies (e.g., 1NF, 2NF, 3NF), not a rule that enforces valid cross-table relationships. Option C is wrong because entity integrity ensures that the primary key of a table is unique and not null, which applies to the Customers table's customer ID column, not to the foreign key relationship from Orders to Customers. Option D is wrong because data consistency is a broader property of the database state (e.g., ensuring all constraints are satisfied), not a specific constraint type; referential integrity is one mechanism to achieve consistency, but the rule itself is a referential integrity constraint.

Practice this question →

248

MCQmedium

A company stores customer transaction data in Azure Blob Storage. The data is rarely accessed after 30 days, but must be retained for 7 years for compliance. Which access tier minimizes storage cost while meeting the retention requirement?

A.Hot tier

B.Cool tier

C.Premium tier

D.Archive tier

AnswerD

Archive tier is the lowest-cost option for data that is rarely accessed and requires long-term retention.

Why this answer

The Archive tier is the correct choice because it offers the lowest storage cost for data that is rarely accessed, which aligns with the scenario where data is accessed infrequently after 30 days but must be retained for 7 years. Azure Blob Storage's Archive tier is designed for long-term retention with a retrieval latency of several hours, making it cost-effective for compliance-driven data that does not require immediate access.

Exam trap

The trap here is that candidates may choose the Cool tier thinking it balances cost and access, but they overlook that the Archive tier is significantly cheaper for data that is accessed less than once a year, which is typical for 7-year compliance retention.

How to eliminate wrong answers

Option A is wrong because the Hot tier is optimized for frequent access and has the highest storage cost, which would be wasteful for data that is rarely accessed after 30 days. Option B is wrong because the Cool tier is designed for data accessed infrequently (e.g., every 30 days or more) but still has higher storage costs than Archive and is not the most cost-effective for 7-year retention with rare access. Option C is wrong because the Premium tier is for high-performance, low-latency access (e.g., via Azure Virtual Machines) and is the most expensive, making it unsuitable for rarely accessed compliance data.

Practice this question →

249

MCQhard

A financial services company stores account balances in Azure SQL Database (strong consistency) and transaction audit logs in Azure Cosmos DB (eventual consistency by default). A compliance requirement demands that when a transaction is rolled back in the SQL database, the corresponding audit log entries in Cosmos DB must also be removed within a short time frame. Which term best describes the difficulty of maintaining this constraint?

A.ACID compliance

B.Idempotency

C.Distributed transaction coordination

D.Schema flexibility

AnswerC

Correct. Managing atomicity across heterogeneous stores that lack native distributed transaction support requires custom compensation logic or a saga pattern.

Why this answer

Option C is correct because the scenario requires coordinating a rollback across two distinct data stores—Azure SQL Database (ACID-compliant, strong consistency) and Azure Cosmos DB (eventual consistency by default). This cross-system transactional consistency is a classic distributed transaction coordination problem, often addressed via patterns like the two-phase commit (2PC) or the saga pattern, but not natively supported between these two services without custom orchestration.

Exam trap

The trap here is that candidates confuse ACID compliance (which is a property of a single database) with the ability to maintain atomicity across multiple independent data stores, leading them to select Option A instead of recognizing the need for distributed transaction coordination.

How to eliminate wrong answers

Option A is wrong because ACID compliance applies to a single database system (like Azure SQL Database) and does not extend to coordinating transactions across heterogeneous data stores. Option B is wrong because idempotency ensures that repeated operations produce the same result, but it does not solve the problem of atomically removing audit logs across two systems when a rollback occurs. Option D is wrong because schema flexibility (e.g., schema-agnostic design in Cosmos DB) is unrelated to transactional consistency or cross-store coordination.

Practice this question →

250

Multi-Selecteasy

A manufacturing company uses IoT sensors to monitor machine temperature. The data is analyzed immediately to trigger alerts if temperature exceeds a threshold. The same data is also stored and later analyzed to identify long-term trends. Which two terms best describe these data processing approaches?

Select 2 answers

A.Real-time processing for alerts, batch processing for trend analysis

B.Stream processing for alerts, transactional processing for trend analysis

C.Batch processing for alerts, real-time processing for trend analysis

D.OLAP for alerts, OLTP for trend analysis

AnswersA, B

Correct. Alerts require immediate action (real-time), while trend analysis typically uses accumulated data processed periodically (batch).

Why this answer

Option A is correct because the scenario describes two distinct processing requirements: immediate alerting on temperature thresholds requires real-time (or stream) processing to minimize latency, while long-term trend analysis can be performed on stored data using batch processing, which is efficient for large historical datasets. Real-time processing handles data as it arrives with low latency, and batch processes data in bulk at scheduled intervals.

Exam trap

The trap here is that candidates confuse 'real-time' with 'batch' based on the word 'immediately' but fail to recognize that trend analysis is inherently a batch workload, leading them to reverse the pairings in options C or D.

Practice this question →

251

MCQeasy

A retail company stores customer data in a relational database table with columns for CustomerID, Name, and Email. Product reviews are stored as JSON documents where each document contains review text and a rating. Product images are stored as binary files in Azure Blob Storage. Which of the following correctly categorizes these data types in order: relational table, JSON documents, binary images?

A.Structured, semi-structured, unstructured

B.Semi-structured, structured, unstructured

C.Unstructured, semi-structured, structured

D.Structured, unstructured, semi-structured

AnswerA

Correct. Relational tables are structured (fixed schema), JSON is semi-structured (flexible schema), and binary images are unstructured (no schema).

Why this answer

A is correct because relational tables enforce a fixed schema (columns with defined data types), making them structured data. JSON documents have a flexible schema (key-value pairs) but still contain metadata, classifying them as semi-structured. Binary image files in Azure Blob Storage have no inherent structure or schema, making them unstructured data.

This matches the order: structured, semi-structured, unstructured.

Exam trap

The trap here is that candidates often confuse semi-structured data (like JSON) with unstructured data because JSON appears 'flexible,' but it still has a defined key-value structure, whereas truly unstructured data (binary blobs) has no schema at all.

How to eliminate wrong answers

Option B is wrong because it incorrectly classifies the relational table as semi-structured and the JSON documents as structured; relational tables are strictly structured (fixed schema) while JSON documents are semi-structured (self-describing, schema-on-read). Option C is wrong because it reverses the entire order, labeling binary images as structured and the relational table as unstructured; binary images have no schema or metadata, making them unstructured, not structured. Option D is wrong because it places JSON documents as unstructured and binary images as semi-structured; JSON documents have a defined structure (key-value pairs) and are semi-structured, while binary images lack any structure and are unstructured.

Practice this question →

252

Multi-Selectmedium

Which TWO of the following are characteristics of structured data? (Choose two.)

Select 2 answers

A.No predefined schema

B.Stored in rows and columns

C.Fixed schema

D.Key-value pairs

E.Schema-on-read

AnswersB, C

Structured data is typically tabular.

Why this answer

Structured data is organized in a tabular format with rows and columns, which is the defining characteristic of relational databases like SQL Server or Azure SQL Database. This structure enforces a fixed schema, meaning the data types and relationships are defined before data is entered, ensuring consistency and enabling efficient querying via SQL.

Exam trap

Microsoft often tests the distinction between 'fixed schema' (structured) and 'schema-on-read' (semi-structured), and candidates mistakenly associate key-value pairs with structured data instead of NoSQL.

Practice this question →

253

MCQhard

Your company stores sensitive customer data in Azure SQL Database. You need to implement column-level encryption for the 'SSN' column using a customer-managed key stored in Azure Key Vault. Which feature should you use?

A.Azure Policy

B.Always Encrypted

C.Transparent Data Encryption (TDE)

D.Dynamic Data Masking

AnswerB

Always Encrypted provides column-level encryption with customer-managed keys.

Why this answer

Always Encrypted is the correct feature because it allows client-side encryption of sensitive columns, such as 'SSN', using a customer-managed key stored in Azure Key Vault. The encryption keys are never exposed to the database engine, ensuring that even database administrators cannot view the plaintext data. This meets the requirement for column-level encryption with customer-managed keys.

Exam trap

The trap here is that candidates often confuse Transparent Data Encryption (TDE) with column-level encryption, but TDE only protects data at rest and does not prevent database administrators or the cloud provider from reading the data in memory or during queries.

How to eliminate wrong answers

Option A is wrong because Azure Policy is a governance tool used to enforce organizational standards and compliance rules across Azure resources, not a data encryption feature for individual columns. Option C is wrong because Transparent Data Encryption (TDE) encrypts the entire database at rest (the storage layer), not at the column level, and it does not support customer-managed keys for column-specific encryption. Option D is wrong because Dynamic Data Masking obfuscates data at query time for unauthorized users but does not encrypt the underlying data; the masked values are still stored in plaintext and can be accessed by privileged users.

Practice this question →

254

MCQeasy

Refer to the exhibit. The JSON shows a configuration for which Azure service?

A.Azure Analysis Services

B.Azure Data Factory

C.Power BI

D.Azure Synapse Analytics

AnswerB

Data Factory defines linked services and datasets in JSON.

Why this answer

The JSON snippet defines a pipeline with a copy activity that moves data from a source (Azure Blob Storage) to a sink (Azure SQL Database). This is the core pattern of Azure Data Factory (ADF), which orchestrates and automates data movement and transformation. The structure with 'name', 'properties', 'activities', 'typeProperties', 'source', and 'sink' is specific to ADF pipeline definitions.

Exam trap

The trap here is that candidates confuse the JSON pipeline definition with Azure Synapse Analytics pipelines, which share the same underlying engine but are accessed via a different portal and have additional Synapse-specific features like Spark job definitions and SQL script activities.

How to eliminate wrong answers

Option A is wrong because Azure Analysis Services is a semantic model and analytics engine (using Tabular or Multidimensional models), not a data orchestration service; it does not use JSON pipeline definitions with copy activities. Option C is wrong because Power BI is a visualization and reporting tool that uses datasets and dashboards, not JSON-based pipeline definitions with source/sink configurations. Option D is wrong because Azure Synapse Analytics is a unified analytics platform that includes dedicated SQL pools, serverless SQL, and Spark, but its native pipeline definitions (Synapse Pipelines) are derived from ADF; the exhibit shows a generic ADF pipeline JSON, not a Synapse-specific artifact like a SQL script or Spark job.

Practice this question →

255

MCQeasy

A data engineer needs to process streaming data from IoT devices in near real-time and store the results in Azure Cosmos DB. Which Azure service should they use for the stream processing?

A.Azure Synapse Analytics

B.Azure Databricks

C.Azure Stream Analytics

D.Azure Data Factory

AnswerC

Stream Analytics provides near real-time stream processing with native Cosmos DB sink.

Why this answer

Azure Stream Analytics is the correct choice because it is a fully managed, real-time stream processing engine designed specifically for low-latency, near-real-time analytics on streaming data. It can ingest data from IoT devices via Event Hubs or IoT Hub, apply SQL-based transformations, and directly output the results to Azure Cosmos DB with millisecond latency, making it ideal for this scenario.

Exam trap

The trap here is that candidates often confuse Azure Stream Analytics with Azure Data Factory or Azure Databricks, mistakenly thinking that any 'data processing' tool can handle real-time streaming, but only Stream Analytics is purpose-built for near-real-time, serverless stream processing with direct Cosmos DB integration.

How to eliminate wrong answers

Option A is wrong because Azure Synapse Analytics is a unified analytics platform focused on large-scale batch processing and data warehousing, not real-time stream processing; it lacks native support for continuous streaming queries with sub-second latency. Option B is wrong because Azure Databricks is a big data and machine learning platform that can process streaming data via Structured Streaming, but it requires cluster management and is overkill for simple near-real-time IoT processing; it is not the simplest or most cost-effective choice for direct Cosmos DB output. Option D is wrong because Azure Data Factory is a cloud-based ETL and data integration service designed for batch-oriented data movement and orchestration, not for real-time stream processing; it cannot handle continuous, low-latency streaming workloads.

Practice this question →

256

MCQeasy

A company receives data from a point-of-sale system. Each row contains TransactionID, ProductID, Quantity, and Price. The data has a fixed schema and is stored in a table. How should this data be classified?

A.Structured data

B.Semi-structured data

C.Unstructured data

D.Transactional data

AnswerA

Correct. The data has a fixed schema (columns TransactionID, ProductID, Quantity, Price) and is stored in a table, which is the definition of structured data.

Why this answer

The data has a fixed schema with clearly defined columns (TransactionID, ProductID, Quantity, Price) and each row follows the same structure, which is the definition of structured data. In Azure, this would map directly to a table in Azure SQL Database or a fixed-schema table in Azure Synapse Analytics. The rigid schema and consistent data types make it ideal for relational storage and querying.

Exam trap

The trap here is that candidates confuse 'transactional data' (a workload pattern) with 'structured data' (a data classification), leading them to pick Option D because the data comes from a point-of-sale system, but the question explicitly asks about data structure, not data source or usage.

How to eliminate wrong answers

Option B is wrong because semi-structured data (e.g., JSON, XML, Parquet) does not enforce a fixed schema; fields can vary between rows, unlike this rigid table. Option C is wrong because unstructured data (e.g., images, videos, text files) has no predefined schema or organization, whereas this data has a strict columnar structure. Option D is wrong because 'transactional data' describes a workload type (OLTP) or data generated by transactions, not a classification of data structure; the question asks how the data should be classified by structure, not by its source or usage.

Practice this question →

257

MCQeasy

A retail company stores customer data in three formats: a relational database table with fixed columns for CustomerID, Name, and Email; customer feedback as JSON documents with varying fields such as rating and comment; and product images as JPEG files. Which of the following correctly classifies these data types from most structured to least structured?

A.JSON documents, relational table, image files

B.Relational table, JSON documents, image files

C.Image files, JSON documents, relational table

D.Relational table, image files, JSON documents

AnswerB

Correct. Relational tables have a fixed schema (structured), JSON documents allow varying fields (semi-structured), and image files lack a predefined schema (unstructured).

Why this answer

Option B is correct because relational tables enforce a fixed schema with defined columns and data types, making them the most structured. JSON documents are semi-structured, allowing varying fields and flexible schemas, while image files are unstructured binary data with no inherent schema. This ordering from most to least structured aligns with the core data classification concept in the DP-900 exam.

Exam trap

The trap here is that candidates often confuse semi-structured JSON with unstructured data, or assume that any file format (like images) has inherent structure, leading them to misorder the classification from most to least structured.

How to eliminate wrong answers

Option A is wrong because it incorrectly places JSON documents (semi-structured) as more structured than relational tables (structured), reversing the correct order. Option C is wrong because it lists image files (unstructured) as the most structured, which is a fundamental misunderstanding of data classification. Option D is wrong because it places image files (unstructured) above JSON documents (semi-structured), failing to recognize that semi-structured data has more organization than unstructured binary files.

Practice this question →

258

MCQeasy

A company stores customer records in a relational table with columns like CustomerID, Name, and Email. Product reviews are stored as JSON documents, and marketing images are stored as PNG files. Which of the following correctly orders these data types from most structured to least structured?

A.A. Product reviews, Customer records, Marketing images

B.B. Customer records, Product reviews, Marketing images

C.C. Marketing images, Customer records, Product reviews

D.D. Customer records, Marketing images, Product reviews

AnswerB

Customer records in a relational table are strictly structured (fixed schema), product reviews as JSON are semi-structured (schema-on-read), and marketing images are unstructured (binary files). This is the correct order from most to least structured.

Why this answer

Customer records in a relational table have a fixed schema with defined columns (e.g., CustomerID, Name, Email), making them the most structured. Product reviews stored as JSON documents are semi-structured because they have a flexible schema with key-value pairs but no fixed columns. Marketing images as PNG files are unstructured binary data with no inherent schema.

Option B correctly orders these from most to least structured.

Exam trap

The trap here is that candidates often confuse semi-structured JSON with unstructured data, or assume that any file format (like PNG) has inherent structure, leading them to misorder the data types by perceived complexity rather than schema rigidity.

How to eliminate wrong answers

Option A is wrong because it places product reviews (semi-structured JSON) before customer records (structured relational table), incorrectly suggesting JSON is more structured than a fixed-schema table. Option C is wrong because it lists marketing images (unstructured binary) as the most structured, which is the opposite of the correct order. Option D is wrong because it places marketing images (unstructured) before product reviews (semi-structured), failing to recognize that JSON documents have more structure than raw binary files.

Practice this question →

259

MCQhard

A multinational corporation needs to store archival data for 10 years with the lowest possible storage cost, while still being able to retrieve it within 24 hours if needed. Which Azure storage tier should they use?

A.Archive Blob Storage

B.Cool Blob Storage

C.Premium Blob Storage

D.Hot Blob Storage

AnswerA

Lowest cost, retrieval within 15 hours, meets 24-hour requirement.

Why this answer

Archive Blob Storage is the correct choice because it is designed for long-term retention of data that is rarely accessed, offering the lowest storage cost among Azure blob tiers. The 10-year retention requirement and 24-hour retrieval window align perfectly with Archive's capabilities, as data can be rehydrated to a hot or cool tier within hours (typically up to 15 hours for standard priority rehydration).

Exam trap

The trap here is that candidates often confuse 'lowest storage cost' with 'lowest overall cost' and overlook the retrieval time constraint, mistakenly choosing Cool Blob Storage because it offers lower cost than Hot but still allows immediate access, ignoring that Archive is even cheaper and meets the 24-hour retrieval window.

How to eliminate wrong answers

Option B (Cool Blob Storage) is wrong because it is optimized for data accessed infrequently but with immediate retrieval needs, not for archival durations of 10 years, and its storage cost is higher than Archive. Option C (Premium Blob Storage) is wrong because it uses SSD-backed storage for low-latency, high-frequency access scenarios, making it the most expensive tier and unsuitable for archival data. Option D (Hot Blob Storage) is wrong because it is designed for data accessed frequently with millisecond latency, incurring the highest storage cost, which contradicts the requirement for lowest possible cost.

Practice this question →

260

MCQeasy

A logistics company uses an online system to process incoming delivery requests one at a time, updating the database immediately upon each transaction. They also run a weekly job that analyzes thousands of delivery records to identify average delivery times and trends. Which set of terms correctly classifies these two workloads?

A.OLTP and OLAP

B.Batch processing and real-time processing

C.Relational and non-relational

D.Structured and semi-structured

AnswerA

Correct. OLTP handles day-to-day transactions, and OLAP handles analysis across many records.

Why this answer

The first workload processes individual delivery requests with immediate database updates, which is the definition of Online Transaction Processing (OLTP). The second workload runs a weekly job analyzing thousands of records for trends and averages, which is Online Analytical Processing (OLAP). These two terms correctly classify the transactional and analytical workloads described.

Exam trap

The trap here is that candidates confuse the processing mode (batch vs. real-time) with the workload classification (OLTP vs. OLAP), but the question specifically asks for the terms that classify the workloads, not describe their timing.

How to eliminate wrong answers

Option B is wrong because 'batch processing and real-time processing' describes processing modes, not workload classifications; the question asks for terms that classify the workloads, and while the weekly job is batch, the first workload is real-time, but the correct pair is OLTP/OLAP. Option C is wrong because 'relational and non-relational' refers to database types (e.g., SQL vs. NoSQL), not to the nature of the workloads (transactional vs. analytical).

Option D is wrong because 'structured and semi-structured' describes data formats (e.g., tables vs. JSON), not the operational characteristics of the workloads.

Practice this question →

261

MCQhard

Your company runs a global e-commerce platform that generates over 5 TB of clickstream data daily. The data is currently stored as raw CSV files in Azure Blob Storage. The data engineering team needs to transform this data into a star schema for business intelligence reporting. They want to use a serverless, code-first approach where they can write Python or SQL transformations. The transformed data should be stored in a format that optimizes query performance for Power BI. You also need to ensure that the solution can handle variable data volumes without manual scaling. Which Azure service should you use for the transformation?

A.Azure Stream Analytics

B.Azure Databricks

C.Azure Synapse Serverless SQL

D.Azure Data Factory

AnswerB

Serverless, code-first Spark environment supporting Python and SQL for large-scale transformations.

Why this answer

Azure Databricks is the correct choice because it provides a serverless, code-first environment where data engineers can write Python or SQL transformations using Apache Spark. It can handle variable data volumes without manual scaling, and it can output transformed data in optimized formats like Parquet, which significantly improves query performance for Power BI. This aligns perfectly with the requirement for a serverless, code-first approach and star schema transformation.

Exam trap

The trap here is that candidates often confuse Azure Data Factory as a transformation service, but it is actually an orchestration tool that requires a separate compute engine (like Databricks or Synapse) to perform the actual data transformations.

How to eliminate wrong answers

Option A is wrong because Azure Stream Analytics is designed for real-time stream processing, not batch transformations of large CSV files in Blob Storage, and it does not support writing Python transformations. Option C is wrong because Azure Synapse Serverless SQL is a SQL-only query engine that cannot execute Python transformations, and it is not a code-first transformation service. Option D is wrong because Azure Data Factory is primarily an orchestration and ETL/ELT pipeline service that uses visual pipelines or code snippets, but it is not designed for writing custom Python or SQL transformations on large datasets; it relies on compute engines like Databricks or Synapse for actual data processing.

Practice this question →

262

MCQhard

A data engineer loads raw log files into a storage system. The structure of the data is interpreted at the time of reading, allowing queries to apply schema on the fly without preprocessing. This approach is best described as:

A.Schema-on-write

B.Schema-on-read

C.Data warehouse

D.Data virtualization

AnswerB

Schema-on-read applies the data structure when the data is accessed, typical in data lake architectures.

Why this answer

Schema-on-read means the data is stored in its raw, unstructured form, and the schema is applied dynamically when the data is queried. This is exactly what happens when raw log files are loaded into a storage system like Azure Data Lake Storage and queried with tools like Azure Synapse Serverless SQL or Apache Spark, which infer the schema at query time without requiring preprocessing.

Exam trap

The trap here is confusing schema-on-read with data virtualization, as both involve querying data without moving it, but schema-on-read specifically refers to interpreting the structure at read time from raw files, not abstracting multiple sources.

How to eliminate wrong answers

Option A is wrong because schema-on-write requires defining and enforcing a schema before data is written, which contradicts the scenario of interpreting structure at read time. Option C is wrong because a data warehouse typically uses schema-on-write with a predefined, optimized schema for structured data, not raw log files with on-the-fly interpretation. Option D is wrong because data virtualization provides a unified view of data from multiple sources without moving it, but it does not specifically describe the schema-on-read approach where the structure is interpreted at query time from raw storage.

Practice this question →

263

MCQeasy

A company operates an online store where customers place orders and the system immediately updates inventory and records payments. This workload is best described as:

A.OLAP (Online Analytical Processing)

B.OLTP (Online Transaction Processing)

C.Batch processing

D.Data warehousing

AnswerB

Correct. The immediate processing of orders, inventory updates, and payments with ACID properties is a classic OLTP workload.

Why this answer

This workload is best described as OLTP because it involves real-time, high-frequency transactions that immediately update inventory and record payments. OLTP systems are designed for concurrent, atomic operations that maintain data integrity, which is exactly what an online store's order processing requires.

Exam trap

The trap here is that candidates confuse OLTP with batch processing because both involve data updates, but OLTP requires immediate, row-level transactions while batch processing defers updates to a scheduled window.

How to eliminate wrong answers

Option A is wrong because OLAP is used for complex analytical queries and aggregations over large historical datasets, not for real-time transactional updates. Option C is wrong because batch processing involves delayed, scheduled processing of data in bulk, whereas the scenario requires immediate updates. Option D is wrong because data warehousing is a repository for structured, historical data used for reporting and analysis, not for handling live transactional workloads.

Practice this question →

264

MCQeasy

You are designing a data pipeline for a social media analytics platform. The pipeline needs to ingest posts from multiple sources (Twitter, Facebook) in real time, transform the data by adding sentiment scores, and store the results in a data store for later analysis. The transformation logic is simple and can be expressed as a SQL query. You want to minimize coding effort. Which Azure service should you use for the transformation step?

A.Azure Data Factory

B.Azure Databricks

C.Azure Functions

D.Azure Stream Analytics

AnswerD

SQL-based transformation for streaming data, low-code.

Why this answer

Azure Stream Analytics is the correct choice because it is designed for real-time data processing with SQL-like query language, allowing you to transform streaming data (e.g., from Twitter and Facebook) by adding sentiment scores using simple SQL expressions without writing custom code. It integrates natively with Azure Event Hubs or IoT Hub for ingestion and outputs to Azure SQL Database, Cosmos DB, or Blob Storage for analysis, minimizing coding effort.

Exam trap

The trap here is that candidates often confuse Azure Data Factory (batch ETL) with real-time stream processing, or assume Azure Functions is simpler for SQL-like transformations, but Stream Analytics is the only service that combines real-time ingestion, SQL-based transformation, and minimal coding effort.

How to eliminate wrong answers

Option A is wrong because Azure Data Factory is an orchestration and ETL service for batch data movement and transformation, not designed for real-time stream processing; it cannot handle sub-second latency or continuous SQL-based transformations on live streams. Option B is wrong because Azure Databricks is a big data analytics platform that requires writing Spark code (Python, Scala, or SQL) and managing clusters, which involves more coding effort than a simple SQL query on a stream. Option C is wrong because Azure Functions is a serverless compute service for event-driven code execution, but it requires writing custom code (e.g., C#, JavaScript) for each transformation, and it lacks native SQL-based stream processing capabilities, making it less efficient for simple SQL transformations on real-time data.

Practice this question →

265

MCQhard

A manufacturing company collects sensor data from thousands of IoT devices. Each reading contains a device ID, timestamp, value, and device-specific measurement fields. The company needs to analyze the data in real time to detect anomalies and trigger alerts. They also need to store the same data for historical batch analysis to identify long-term trends. Which architecture pattern best describes this combination of data processing approaches?

A.Batch processing only

B.Stream processing only

C.Lambda architecture

D.Data lake

AnswerC

Lambda architecture combines batch and stream processing, allowing both real-time anomaly detection and historical batch analysis on the same data set.

Why this answer

The Lambda architecture is the correct pattern because it combines both stream processing for real-time anomaly detection and alerting, and batch processing for historical analysis of long-term trends. This architecture uses a speed layer for low-latency stream processing (e.g., Apache Kafka, Azure Stream Analytics) and a batch layer for comprehensive, accurate historical computations (e.g., Azure Data Lake, Apache Spark). The serving layer then merges results from both paths to provide a unified view.

Exam trap

The trap here is that candidates confuse a storage architecture (data lake) with a processing architecture pattern, or mistakenly think that either stream or batch processing alone can satisfy both real-time and historical requirements.

How to eliminate wrong answers

Option A is wrong because batch processing alone cannot handle real-time anomaly detection and alerting, as it processes data in large, scheduled intervals with high latency. Option B is wrong because stream processing alone is not designed for efficient historical batch analysis over long periods, as it focuses on low-latency, in-memory computations and typically does not retain full historical data for reprocessing. Option D is wrong because a data lake is a storage repository for raw data in its native format, not a processing architecture pattern that combines real-time and batch analytics.

Practice this question →

266

Multi-Selecthard

A globally distributed online auction platform uses a replicated database system across multiple Azure regions. The system must continue accepting bids (writes) even if a network partition occurs between regions, because auctions cannot be interrupted. The business decides that during a partition, some users might see slightly outdated item prices (read inconsistency) but all bids must be recorded. According to the CAP theorem, which two properties is this system prioritizing?

Select 2 answers

A.Availability (A) and Partition Tolerance (P)

B.Consistency (C) and Partition Tolerance (P)

C.Consistency (C) and Availability (A)

D.Durability and Availability

AnswersA, D

The system must remain available to accept bids even when network partitions occur, so it ensures Partition Tolerance (P). It also prioritizes Availability (A) by allowing writes to continue in all regions. As a result, Consistency (C) is sacrificed, meaning different regions may return different data temporarily.

Why this answer

The system must continue accepting bids (writes) even during a network partition, which means it prioritizes Availability (A) — every request receives a response, even if it's not the most recent data. It also must function across multiple Azure regions that can become disconnected, which requires Partition Tolerance (P) — the system continues to operate despite network splits. The trade-off is that Consistency (C) is sacrificed, as users may see slightly outdated item prices during a partition.

This is a classic AP (Availability and Partition Tolerance) choice from the CAP theorem.

Exam trap

The trap here is that candidates often confuse the CAP theorem's 'Consistency' with ACID consistency or durability, or they mistakenly think 'Availability' means the system is always up, when in CAP it specifically means every request receives a non-error response even during a partition.

Practice this question →

267

MCQeasy

A logistics company stores shipping waybill data as JSON documents. Each document contains fields like 'shipmentId', 'destination', and 'items', but the number of items and the fields within each item can vary between shipments. Which category best describes this type of data?

A.Operational data

B.Semi-structured data

C.Unstructured data

D.Structured data

AnswerB

JSON documents with optional fields and variable structures are a classic example of semi-structured data, which has some organizational properties but no rigid schema.

Why this answer

JSON documents with varying fields and nested structures like 'items' that differ between shipments are a classic example of semi-structured data. Unlike structured data with a fixed schema, semi-structured data uses tags or markers (like JSON key-value pairs) to separate data elements, allowing for flexibility in the number and type of fields per record. This aligns with the DP-900 definition of semi-structured data, which includes formats such as JSON, XML, and Parquet.

Exam trap

The trap here is that candidates confuse 'semi-structured' with 'unstructured' because JSON appears flexible, but JSON is still structured with key-value pairs, unlike truly unstructured data like audio or video files.

How to eliminate wrong answers

Option A is wrong because operational data refers to data used for day-to-day business operations (e.g., transaction logs, sensor readings), not a classification of data structure. Option C is wrong because unstructured data lacks a predefined data model or schema entirely (e.g., images, videos, plain text), whereas JSON has a defined structure with keys and values. Option D is wrong because structured data requires a rigid schema with fixed fields and data types (e.g., SQL tables), which does not apply to JSON documents where fields like 'items' can vary in count and structure.

Practice this question →