CCNA Describe core data concepts Questions — Page 2 of 4

MCQeasy

A company stores customer transaction data in Azure Blob Storage. They need to query the data using SQL-based tools without moving the data. Which Azure service should they use?

A.Azure SQL Database

B.Azure Analysis Services

C.Azure Cosmos DB

D.Azure Synapse Serverless SQL pool

AnswerD

Allows querying data in Blob Storage using T-SQL without moving it.

Why this answer

Azure Synapse Serverless SQL pool allows you to query data directly from Azure Blob Storage using T-SQL without moving or copying the data. It uses a pay-per-query model and supports reading common file formats like Parquet, CSV, and JSON, making it ideal for ad-hoc querying over data lakes.

Exam trap

The trap here is that candidates often confuse Azure Synapse Serverless SQL pool with Azure SQL Database, assuming any 'SQL' service can query external storage, but only Synapse Serverless SQL pool provides native external data querying over Blob Storage without data movement.

How to eliminate wrong answers

Option A is wrong because Azure SQL Database is a fully managed relational database service that requires data to be imported and stored within its own storage engine, not queried in place from Blob Storage. Option B is wrong because Azure Analysis Services is a semantic modeling and analytics engine that requires data to be loaded into an in-memory tabular model, not queried directly from Blob Storage. Option C is wrong because Azure Cosmos DB is a NoSQL database service with its own storage and query APIs (SQL, MongoDB, Cassandra, etc.), and it cannot query external data in Blob Storage without first ingesting it.

Practice this question →

MCQeasy

You are analyzing the results of a KQL query in Azure Data Explorer. What does this query return?

A.Total damage per event type

B.All states with flood events sorted by damage

C.Top 5 states with highest total property damage from floods

D.Top 5 flood events with highest damage

AnswerC

The query filters, sums, and returns top 5.

Why this answer

The KQL query uses the 'summarize' operator to aggregate total property damage by state, then applies 'top 5 by' to return the five states with the highest total property damage from flood events. The 'where' clause filters for flood events, and the 'project' operator selects only the state and damage columns, confirming that the result is the top 5 states by total property damage.

Exam trap

The trap here is that candidates often confuse grouping by state versus grouping by event type, or they misinterpret 'top 5 by' as returning all rows sorted rather than only the top 5 rows.

How to eliminate wrong answers

Option A is wrong because the query groups by state, not by event type, so it returns damage per state, not per event type. Option B is wrong because the query uses 'top 5 by' to return only the highest damage states, not all states, and it sorts by damage descending, not alphabetically. Option D is wrong because the query groups by state, not by individual flood events, so it returns aggregated damage per state, not per event.

Practice this question →

Multi-Selecthard

Which THREE are valid use cases for Azure Cosmos DB?

Select 3 answers

A.Storing IoT telemetry data with low latency

B.Relational OLTP with complex joins

C.Session state management for web applications

D.Personalization and recommendation engines

E.Storing large files like images and videos

AnswersA, C, D

Cosmos DB provides low-latency reads and writes.

Why this answer

Azure Cosmos DB is a globally distributed, multi-model NoSQL database service designed for low-latency, high-throughput workloads. Storing IoT telemetry data requires fast ingestion and real-time querying, which Cosmos DB supports with single-digit millisecond read/write latencies at the 99th percentile, making it ideal for this use case.

Exam trap

The trap here is that candidates confuse Azure Cosmos DB's multi-model support (e.g., table API, Cassandra API) with relational database capabilities, leading them to incorrectly select Option B for OLTP with complex joins, or they assume Cosmos DB can handle large binary files like Blob Storage, missing the 2 MB document size limit.

Practice this question →

MCQmedium

A company stores customer data in a relational database with columns like CustomerID, Name, and Email. They also store product images as JPEG files in Azure Blob Storage, and customer feedback as JSON documents that contain varying fields such as rating, comments, and optional metadata. Which of the following correctly orders these data types from most structured to least structured?

A.Relational data, images, JSON

B.Images, JSON, relational data

C.Relational data, JSON, images

D.JSON, relational data, images

AnswerC

Correct order: structured (relational), semi-structured (JSON), unstructured (images).

Why this answer

Relational data (CustomerID, Name, Email) is the most structured because it enforces a fixed schema with defined data types and constraints. JSON documents (customer feedback) are semi-structured: they have a flexible schema with optional fields like metadata, but still use key-value pairs. Images (JPEG files) are unstructured binary data with no inherent schema.

Option C correctly orders them from most structured (relational) to least structured (images).

Exam trap

The trap here is that candidates confuse 'semi-structured' with 'unstructured' or assume images have more structure than JSON because they are stored in a named file, but the key distinction is schema rigidity: relational > JSON > binary blobs.

How to eliminate wrong answers

Option A is wrong because it places images (unstructured binary) before JSON (semi-structured), incorrectly suggesting that binary files have more structure than key-value documents. Option B is wrong because it reverses the entire order, claiming images are most structured and relational data is least structured, which contradicts the fundamental definition of structured vs. unstructured data. Option D is wrong because it ranks JSON as more structured than relational data, but relational databases enforce a rigid schema with primary keys and data types, making them more structured than JSON's flexible schema.

Practice this question →

MCQhard

A global social media application allows users to post updates and 'like' posts. The application is designed to prioritize availability and partition tolerance over strong consistency. As a result, when a user likes a post, the like count may not be immediately visible to all users, but it will eventually become consistent across all regions. Which consistency model does this application follow?

A.Strong consistency

B.Eventual consistency

C.Consistent prefix

D.Bounded staleness

AnswerB

Eventual consistency guarantees that if no new updates are made, all replicas will eventually return the same value. This matches the scenario where updates are not immediately visible but become consistent over time, supporting high availability and partition tolerance.

Why this answer

The application prioritizes availability and partition tolerance, which aligns with the eventual consistency model. In this model, updates (like a 'like' count) are propagated asynchronously across replicas, and while reads may return stale data temporarily, all replicas will converge to the same value over time. This is typical of NoSQL systems like Apache Cassandra or Amazon DynamoDB when configured with eventual consistency.

Exam trap

The trap here is that candidates often confuse 'eventual consistency' with 'bounded staleness' because both allow stale reads, but eventual consistency has no guaranteed time or version bound, whereas bounded staleness imposes a strict limit—a distinction Microsoft explicitly tests in DP-900.

How to eliminate wrong answers

Option A is wrong because strong consistency requires all reads to return the most recent write immediately, which would sacrifice availability and partition tolerance—contradicting the application's design priorities. Option C is wrong because consistent prefix guarantees that reads see writes in the order they occurred, but it does not allow for temporary staleness in the like count; it is used in systems like Cosmos DB with a specific consistency level that still imposes ordering constraints. Option D is wrong because bounded staleness guarantees that reads are at most a fixed number of versions or time interval behind the latest write, which imposes a strict upper bound on staleness—not the 'eventually consistent' behavior described where no time bound is guaranteed.

Practice this question →

MCQeasy

A company operates an online store that processes customer orders. When a customer places an order, the system must immediately reduce the inventory count for the purchased items and record the order details. At the end of each month, the company runs reports that aggregate sales data over the past month to analyze trends. Which type of data processing workload best describes the order placement activity?

A.Transactional processing

B.Analytical processing

C.Batch processing

D.Stream processing

AnswerA

Order placement involves immediate, real-time updates to inventory and order records, requiring transactional consistency and ACID properties. This is a classic example of an Online Transaction Processing (OLTP) workload.

Why this answer

Order placement requires immediate inventory reduction and order recording, which demands ACID (Atomicity, Consistency, Isolation, Durability) guarantees. This is a classic transactional processing workload, typically handled by OLTP (Online Transaction Processing) systems like SQL Server or Azure SQL Database, ensuring data integrity even under concurrent access.

Exam trap

The trap here is confusing the immediate, atomic nature of order placement with batch or stream processing, when the key differentiator is the need for ACID compliance in a single, discrete operation.

How to eliminate wrong answers

Option B (Analytical processing) is wrong because it focuses on querying and aggregating historical data for reporting and trend analysis, not on real-time, atomic updates of operational data. Option C (Batch processing) is wrong because it processes data in scheduled, bulk intervals (e.g., nightly runs), whereas order placement must happen immediately upon customer action. Option D (Stream processing) is wrong because it handles continuous, unbounded data flows (e.g., sensor data or clickstreams) with low latency, but it does not inherently enforce transactional consistency for individual record updates like inventory deduction.

Practice this question →

MCQeasy

A consulting firm collects client information in two forms: a spreadsheet with columns for Name, Address, and Phone Number, and audio recordings of client meetings. Which of the following statements correctly categorizes these data types?

A.Both the spreadsheet data and the audio recordings are examples of structured data.

B.The spreadsheet data is structured, and the audio recordings are semi-structured.

C.The spreadsheet data is structured, and the audio recordings are unstructured.

D.The spreadsheet data is semi-structured, and the audio recordings are unstructured.

AnswerC

Correct. The spreadsheet has a fixed schema (columns) making it structured; audio recordings have no defined schema, making them unstructured.

Why this answer

The spreadsheet data with columns for Name, Address, and Phone Number has a predefined schema (rows and columns), making it structured data. Audio recordings are binary files with no inherent schema or organization, fitting the definition of unstructured data. Option C correctly pairs these classifications.

Exam trap

The trap here is confusing semi-structured data (e.g., JSON, XML with tags) with unstructured data (e.g., audio, video, images), leading candidates to incorrectly classify audio recordings as semi-structured because they contain metadata, but the content itself is unstructured.

How to eliminate wrong answers

Option A is wrong because audio recordings are not structured; they lack a fixed schema and cannot be easily queried with SQL. Option B is wrong because audio recordings are unstructured, not semi-structured (semi-structured data has tags or markers, like JSON or XML). Option D is wrong because the spreadsheet data is structured, not semi-structured; it has a rigid schema with defined columns and data types.

Practice this question →

MCQhard

A healthcare organization stores patient records in Azure Blob Storage and must comply with data retention policies that require deleting records after 7 years. They also need to prevent any modification or deletion of records before the retention period ends. Which Azure feature should they use?

A.Immutable storage with time-based retention policy

B.Azure Backup for Blob Storage

C.Soft delete for Blob Storage

D.Azure Blob Storage lifecycle management

AnswerA

Immutable storage ensures blobs cannot be modified or deleted until the retention period ends.

Why this answer

Immutable storage with a time-based retention policy (WORM – Write Once, Read Many) ensures that blobs cannot be modified or deleted until the retention period expires. This directly meets the dual requirement of preventing premature deletion while enforcing a 7-year retention, as the policy locks the data for the specified duration.

Exam trap

The trap here is that candidates confuse soft delete (which only protects against accidental deletion) or lifecycle management (which automates tiering/expiry) with the strict WORM guarantee required for regulatory compliance, where no modification or deletion is allowed before the retention period ends.

How to eliminate wrong answers

Option B (Azure Backup for Blob Storage) is wrong because it provides point-in-time recovery and protection against accidental deletion, but it does not prevent intentional modification or deletion of the original blobs before the retention period ends. Option C (Soft delete for Blob Storage) is wrong because it only retains deleted blobs for a configurable period (e.g., 7 days) and allows recovery, but it does not block deletion or modification during the retention period. Option D (Azure Blob Storage lifecycle management) is wrong because it automates tiering or deletion based on age, but it cannot enforce a write-once, read-many lock to prevent modification or deletion before the retention period expires.

Practice this question →

Multi-Selectmedium

A company is designing a data solution for a retail application. The solution must support real-time analytics on streaming sales data, and also provide historical reports for business intelligence. Which TWO data processing models should be combined to meet these requirements?

Select 2 answers

A.Distributed processing

B.Batch processing

C.Data lake storage

D.Transactional database

E.Stream processing

AnswersB, E

Batch processing is used for periodic processing of large volumes of historical data, suitable for business intelligence reports.

Why this answer

Batch processing (B) is correct because it is used to process large volumes of historical sales data at scheduled intervals, enabling the generation of comprehensive business intelligence reports. Stream processing (E) is correct because it handles real-time data ingestion and analytics on streaming sales data, allowing the application to react instantly to sales events. Combining these two models (often called a Lambda architecture) meets both the real-time and historical reporting requirements.

Exam trap

The trap here is that candidates confuse 'distributed processing' (a general architecture) with a specific processing model, or they mistakenly think a transactional database can handle real-time analytics on streaming data, when in fact it is optimized for single-row transactions, not continuous data streams.

Practice this question →

Multi-Selecteasy

Which TWO of the following are characteristics of structured data?

Select 2 answers

A.Data uses tags or markers to separate elements

B.Data is organized in rows and columns

C.Data is stored in Azure Cosmos DB

D.Data conforms to a fixed schema

E.Data has no predefined schema

AnswersB, D

Structured data fits neatly into tables.

Why this answer

Structured data is defined by its organization into rows and columns, typically within a relational database or spreadsheet, where each column represents a specific attribute and each row a record. This tabular format enables efficient querying, sorting, and aggregation using SQL. Option B correctly identifies this core characteristic.

Exam trap

The trap here is that candidates confuse the storage location (Azure Cosmos DB) with data structure type, forgetting that Cosmos DB is designed for semi-structured data, not structured data, and that 'tags or markers' (Option A) describe semi-structured formats like JSON or XML, not structured data.

Practice this question →

MCQeasy

A retail company collects raw clickstream data from its website as JSON files. Data scientists need to run exploratory analytics on this raw data without a predefined schema. BI analysts also need to generate weekly sales reports from aggregated transactional data stored in a relational format. Which combination of data storage approaches best meets these needs?

A.Store raw data in Azure Blob Storage and aggregated data in Azure Cosmos DB

B.Store raw data in Azure Data Lake Storage and aggregated data in Azure SQL Database

C.Store raw data in Azure Table Storage and aggregated data in Azure Data Lake Storage

D.Store raw data in Azure SQL Database and aggregated data in Azure Blob Storage

AnswerB

Azure Data Lake Storage provides a scalable data lake for raw data with schema-on-read, while Azure SQL Database is a relational database ideal for structured transactional data and BI reports.

Why this answer

Azure Data Lake Storage (ADLS) is optimized for storing raw, schema-on-read data like JSON files, enabling data scientists to run exploratory analytics without a predefined schema. Azure SQL Database provides a relational structure with ACID compliance, ideal for BI analysts generating weekly sales reports from aggregated transactional data. This combination directly addresses both unstructured raw data and structured reporting needs.

Exam trap

Microsoft often tests the distinction between storage for raw, schema-less data (ADLS/Blob) versus structured, relational data (Azure SQL Database), and the trap here is that candidates confuse Azure Cosmos DB or Table Storage as suitable for raw data, overlooking that they are NoSQL databases with fixed schemas or key-value limitations, not optimized for exploratory analytics on JSON files.

How to eliminate wrong answers

Option A is wrong because Azure Cosmos DB is a NoSQL database designed for low-latency, globally distributed applications, not for cost-effective storage of raw JSON files for exploratory analytics, and it lacks the relational query capabilities needed for BI reports. Option C is wrong because Azure Table Storage is a NoSQL key-value store unsuitable for schema-on-read analytics on raw JSON, and Azure Data Lake Storage is not a relational database for aggregated transactional reporting. Option D is wrong because Azure SQL Database is a relational store that requires a predefined schema, making it inappropriate for raw, schema-less clickstream data, and Azure Blob Storage lacks the relational querying and aggregation features needed for weekly sales reports.

Practice this question →

MCQmedium

A data engineer needs to load 500 GB of CSV files from an on-premises server into Azure Data Lake Storage Gen2 daily. The data must be transferred securely over the internet. Which Azure tool should they use?

A.Azure Data Factory

B.Azure PowerShell

C.Azure Import/Export service

D.AzCopy

AnswerD

AzCopy is optimized for copying data to Azure Storage over the network with high performance.

Why this answer

AzCopy is the correct tool because it is a command-line utility designed for high-performance, secure copying of data to and from Azure Blob Storage and Azure Data Lake Storage Gen2. It supports the required 500 GB daily transfer over the internet using HTTPS encryption, and can be scripted for automation without the overhead of a full orchestration service.

Exam trap

The trap here is that candidates often confuse Azure Data Factory as the default tool for any data movement, overlooking that AzCopy is the lightweight, purpose-built utility for direct, scriptable bulk transfers without orchestration overhead.

How to eliminate wrong answers

Option A is wrong because Azure Data Factory is a cloud-based ETL and data orchestration service, not a direct data transfer tool; it adds unnecessary complexity and cost for a simple bulk copy task, and is not optimized for single-shot, high-volume transfers like AzCopy. Option B is wrong because Azure PowerShell is a scripting environment for managing Azure resources, not a dedicated data transfer tool; it lacks the parallelization and resume capabilities needed for efficient 500 GB file transfers. Option C is wrong because Azure Import/Export service is designed for physical shipment of hard drives to Azure datacenters, not for transferring data over the internet; it is intended for very large datasets (terabytes to petabytes) where network transfer is impractical.

Practice this question →

MCQmedium

Your organization uses Azure Cosmos DB for a real-time inventory application. The data includes a container with items that have a `category` property. The operations team frequently queries for all items in a specific category. To optimize query performance and minimize request unit (RU) consumption, you decide to implement a materialized view. Which Azure Cosmos DB feature should you use to achieve this?

A.Partition key design

B.Change feed

C.Composite indexes

D.Materialized views (preview)

AnswerD

Azure Cosmos DB materialized views allow you to define a view that is automatically updated and optimized for common queries.

Why this answer

Option D is correct because Azure Cosmos DB's materialized views (preview) feature allows you to pre-join, aggregate, and transform data from a source container into a separate container optimized for specific query patterns, such as filtering by `category`. This reduces RU consumption by avoiding full scans or expensive cross-partition queries, as the view is pre-computed and indexed according to the target query.

Exam trap

The trap here is that candidates confuse the change feed (a reactive stream of changes) with materialized views (a persisted, queryable snapshot), or assume that composite indexes alone can achieve the same pre-computation benefits as a materialized view.

How to eliminate wrong answers

Option A is wrong because partition key design distributes data across physical partitions for scalability and write performance, but it does not create a pre-computed, denormalized copy of data optimized for a specific query pattern; a poorly chosen partition key can even increase RU costs for queries. Option B is wrong because the change feed is a mechanism to capture incremental changes (inserts, updates, deletes) to items in a container, enabling event-driven processing or replication, but it does not itself provide a query-optimized, persisted view of the data. Option C is wrong because composite indexes improve query performance by indexing multiple properties in a specific order, but they do not create a separate, pre-materialized dataset; they still require the query engine to scan indexed data at query time, which may not be as efficient as a materialized view for frequent aggregation or filtering.

Practice this question →

MCQeasy

A hospital system stores patient medical records. Each record includes structured data like patient ID, name, date of birth, and also includes unstructured data like doctor's notes and X-ray images. Which type of data is the doctor's notes?

A.A. Structured data

B.B. Semi-structured data

C.C. Unstructured data

D.D. Relational data

AnswerC

Unstructured data lacks a predefined data model or schema. Doctor's notes are free-form text, making them unstructured.

Why this answer

Doctor's notes are unstructured data because they consist of free-form text that does not follow a predefined data model or schema. Unlike structured data (e.g., patient ID, name) which fits neatly into rows and columns, doctor's notes lack a fixed format and cannot be easily queried using traditional relational database tools without additional processing.

Exam trap

The trap here is that candidates may confuse 'unstructured' with 'semi-structured' because doctor's notes might contain some implicit structure (e.g., date headers), but the key exam distinction is that unstructured data lacks a formal schema or metadata tags, unlike semi-structured data such as JSON or XML.

How to eliminate wrong answers

Option A is wrong because structured data requires a strict schema (e.g., tables with rows and columns), whereas doctor's notes are free-form text. Option B is wrong because semi-structured data (e.g., JSON, XML) has tags or markers to separate elements and enforce hierarchy, but doctor's notes have no such organizational metadata. Option D is wrong because relational data is a subset of structured data organized into tables with defined relationships, which does not apply to free-text notes.

Practice this question →

MCQhard

The exhibit shows an ARM template snippet for deploying an Azure storage account. What is the redundancy level of the storage account?

A.Read-access geo-redundant storage (RA-GRS)

B.Zone-redundant storage (ZRS)

C.Locally redundant storage (LRS)

D.Geo-redundant storage (GRS)

AnswerC

Standard_LRS indicates LRS.

Why this answer

The ARM template snippet does not specify a 'sku.tier' or 'sku.name' property that would indicate geo-replication or zone-redundancy. By default, when no redundancy option is explicitly configured, Azure Storage accounts deploy with Locally redundant storage (LRS), which replicates data three times within a single datacenter in the primary region.

Exam trap

The trap here is that candidates often assume a storage account must have a redundancy level explicitly declared in the template, but Azure defaults to LRS when no 'sku' or 'redundancy' property is present, leading them to incorrectly select GRS or RA-GRS based on assumptions about geo-replication.

How to eliminate wrong answers

Option A is wrong because Read-access geo-redundant storage (RA-GRS) requires explicit configuration of the 'sku.name' property to 'Standard_GRS' and setting 'supportsHttpsTrafficOnly' or enabling read-access via properties, which are absent in the snippet. Option B is wrong because Zone-redundant storage (ZRS) requires the 'sku.name' to be set to 'Standard_ZRS' and the storage account to be deployed in a region supporting availability zones, neither of which is indicated in the snippet. Option D is wrong because Geo-redundant storage (GRS) also requires explicit 'sku.name' of 'Standard_GRS' and is not the default; the snippet shows no such property, so the default LRS applies.

Practice this question →

MCQeasy

A company stores customer data in a relational table with columns CustomerID, FullName, and Email. They also store product descriptions as JSON documents with varying fields, and product images as JPEG files. Which of the following correctly classifies these data types from most structured to least structured?

A.Structured, semi-structured, unstructured

B.Unstructured, semi-structured, structured

C.Semi-structured, structured, unstructured

D.Structured, unstructured, semi-structured

AnswerA

Correct. Relational table data is fully structured, JSON is semi-structured, and JPEG images are unstructured.

Why this answer

The customer data in a relational table with fixed columns (CustomerID, FullName, Email) is structured because it has a rigid schema and defined data types. The JSON documents for product descriptions are semi-structured because they use key-value pairs with flexible fields but still have metadata (tags, keys) that provide organization. The JPEG product images are unstructured because they are binary blobs with no inherent schema or metadata that the database can query directly.

This ordering from most to least structured matches option A.

Exam trap

The trap here is that candidates often confuse semi-structured (JSON) with unstructured (JPEG) because both lack a fixed schema, but JSON has inherent key-value structure that databases can query, whereas JPEG is purely binary with no queryable structure.

How to eliminate wrong answers

Option B is wrong because it places unstructured (JPEG) as the most structured, which is incorrect — JPEG files have no schema and are the least structured. Option C is wrong because it places semi-structured (JSON) as the most structured, but relational tables with fixed schemas are more structured than JSON documents. Option D is wrong because it places unstructured (JPEG) in the middle, but JPEG files are the least structured of the three data types.

Practice this question →

MCQeasy

Refer to the exhibit. You are designing a fact table for a data warehouse. The table will store sales transactions with daily granularity. Which column would be most appropriate as the distribution column in a hash-distributed table in Azure Synapse Analytics?

A.SalesAmount

B.CustomerKey

C.ProductKey

D.OrderDate

AnswerB

CustomerKey has high cardinality and is frequently used in joins, making it a good distribution column.

Why this answer

CustomerKey (B) is the most appropriate distribution column because it has high cardinality and is frequently used in joins with dimension tables, ensuring data is evenly distributed across distributions in Azure Synapse Analytics. A hash-distributed table requires a column with many unique values to avoid data skew, and CustomerKey is a natural key for sales transactions that meets this requirement.

Exam trap

Microsoft often tests the misconception that any column with high cardinality is suitable for hash distribution, but the trap here is that the column must also be frequently used in joins and evenly distribute data, not just have many unique values.

How to eliminate wrong answers

Option A (SalesAmount) is wrong because it is a measure column with continuous values that would cause data skew and poor query performance due to uneven distribution. Option C (ProductKey) is wrong because while it has high cardinality, it is less frequently used in join operations compared to CustomerKey, and using it may lead to suboptimal distribution for common sales analysis queries. Option D (OrderDate) is wrong because it has low cardinality (only 365 distinct values per year) and would cause severe data skew, as all transactions on the same date would hash to the same distribution, leading to hot spots and degraded performance.

Practice this question →

MCQeasy

A company receives customer order data from its online store in a CSV file. Each line contains fields like OrderID, CustomerName, Product, Quantity, and OrderDate. This data is best described as:

A.Structured data

B.Semi-structured data

C.Unstructured data

D.Transactional data

AnswerA

Correct because the CSV file has a fixed schema with consistent fields per record, making it structured.

Why this answer

A is correct because the CSV file contains data that conforms to a strict tabular schema with predefined columns (OrderID, CustomerName, Product, Quantity, OrderDate) and consistent data types per column. This rigid, row-and-column format with a fixed schema is the defining characteristic of structured data, which can be directly loaded into a relational database or Azure SQL Database without transformation.

Exam trap

The trap here is that candidates confuse 'transactional data' (a workload type) with 'structured data' (a format classification), leading them to pick D because the data describes orders, even though the question explicitly asks about the data's format, not its business purpose.

How to eliminate wrong answers

Option B is wrong because semi-structured data (e.g., JSON, XML, Parquet) allows flexible schema variations, such as missing fields or nested structures, whereas CSV enforces a fixed number of columns per row and a consistent order. Option C is wrong because unstructured data (e.g., text files, images, videos) has no predefined schema or organization, while CSV has a clear row/column structure. Option D is wrong because transactional data refers to a type of workload (OLTP) that records business transactions, not a data format classification; the CSV file itself is a structured data format regardless of whether it contains transactional records.

Practice this question →

MCQeasy

A company wants to store JSON documents that need to be queried with high throughput and low latency globally. Which Azure data service is most appropriate?

A.Azure Table Storage

B.Azure Cosmos DB

C.Azure SQL Database

D.Azure Blob Storage

AnswerB

Cosmos DB provides global distribution, low latency, and native JSON support.

Why this answer

Azure Cosmos DB is the most appropriate service because it is a globally distributed, multi-model database that natively supports JSON documents and provides guaranteed single-digit-millisecond latency at the 99th percentile, along with high throughput via configurable request units (RUs). Its turnkey global distribution enables low-latency reads and writes across multiple Azure regions, making it ideal for globally queried JSON workloads.

Exam trap

The trap here is that candidates confuse Azure Table Storage's key-value model with JSON document support, or assume Azure SQL Database's JSON functions make it suitable for globally distributed, high-throughput JSON workloads, missing Cosmos DB's core differentiator of turnkey global distribution and guaranteed low latency.

How to eliminate wrong answers

Option A is wrong because Azure Table Storage is a NoSQL key-value store that stores data in entity/partition structures, not native JSON documents, and it lacks global distribution with guaranteed low-latency SLAs. Option C is wrong because Azure SQL Database is a relational database that stores data in tables with a fixed schema, not as native JSON documents, and while it supports JSON functions, it is not designed for globally distributed, high-throughput JSON queries with multi-region write capabilities. Option D is wrong because Azure Blob Storage is an object storage service for unstructured binary data, not a queryable database; it cannot natively query JSON documents with low latency and high throughput.

Practice this question →

Matchingmedium

Match each Azure data migration tool to its use case.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts

Matches

Migrate databases to Azure with minimal downtime

Copy blobs or files to/from Azure Storage

Offline data transfer for large datasets

Ship physical disks to Azure datacenter

Orchestrate data movement and transformation

Why these pairings

Azure offers various tools for data migration scenarios.

Practice this question →

MCQhard

You are reviewing an ARM template for an Azure SQL Database deployment. What is the maximum size of the database?

A.5 GB

B.50 GB

C.5 MB

D.500 GB

AnswerA

5,368,709,120 bytes = 5 GB.

Why this answer

The ARM template for an Azure SQL Database deployment specifies the database size based on the selected service tier and performance level. For the Basic tier, the maximum database size is 5 GB, which is the correct answer. This is a fixed limit for Basic tier databases, while higher tiers like Standard or Premium support larger sizes.

Exam trap

The trap here is that candidates often assume all Azure SQL Database tiers support large sizes (like 500 GB) or confuse the Basic tier's 5 GB limit with the much smaller 5 MB, forgetting that Basic is designed for low-cost, small-scale workloads.

How to eliminate wrong answers

Option B (50 GB) is wrong because it is not a standard maximum size for any Azure SQL Database tier; the Basic tier is 5 GB, Standard goes up to 250 GB or more, and Premium up to 4 TB. Option C (5 MB) is wrong because it is far too small; even the smallest Azure SQL Database (Basic) supports 5 GB, not megabytes. Option D (500 GB) is wrong because while some Standard or Premium tiers can reach that size, the Basic tier is limited to 5 GB, and the question does not specify a higher tier.

Practice this question →

MCQhard

A financial analytics company has two distinct data processing workloads. The first workload ingests real-time stock trade data from a message queue, calculates moving averages every minute, and updates a dashboard for traders. The second workload receives daily CSV files containing end-of-day trade summaries, transforms them using Python scripts, and loads the results into a data warehouse for monthly reporting. Which statement correctly characterizes these workloads?

A.First workload: Stream processing, Second workload: Batch processing

B.First workload: Batch processing, Second workload: Stream processing

C.First workload: OLTP, Second workload: OLAP

D.First workload: Transactional processing, Second workload: Analytical processing

AnswerA

Real-time stock trade analysis with moving averages is a classic stream processing workload (low latency, continuous). End-of-day CSV file processing is batch processing (scheduled, bulk).

Why this answer

Option A is correct because the first workload processes real-time stock trade data from a message queue and calculates moving averages every minute, which is a classic stream processing pattern (continuous, low-latency data ingestion and computation). The second workload handles daily CSV files with end-of-day summaries, transforms them with Python scripts, and loads results into a data warehouse for monthly reporting, which is a classic batch processing pattern (scheduled, high-latency processing of bounded data sets).

Exam trap

The trap here is that candidates confuse 'real-time' with 'transactional processing' (OLTP) or 'analytical processing' (OLAP), when the correct distinction is between stream processing (continuous, low-latency) and batch processing (scheduled, high-latency).

How to eliminate wrong answers

Option B is wrong because it reverses the definitions: the first workload is clearly stream processing (real-time, message queue), not batch processing, and the second workload is batch processing (daily files, scheduled transformation), not stream processing. Option C is wrong because OLTP (Online Transaction Processing) refers to systems that handle high-volume, low-latency transactions (e.g., order entry), not real-time analytics; the first workload is stream processing, not OLTP. Option D is wrong because 'transactional processing' is synonymous with OLTP, not stream processing, and 'analytical processing' is synonymous with OLAP, not batch processing; the first workload is stream processing, and the second is batch processing.

Practice this question →

MCQmedium

Your organization uses Microsoft Fabric to build a data lakehouse. Data engineers need to transform data using Spark and store results in Delta Lake format. Which Fabric component should they use?

A.Dataflows Gen2

B.Pipelines

C.Notebooks

D.Semantic models

AnswerC

Notebooks support Spark and Delta Lake.

Why this answer

Notebooks in Microsoft Fabric provide an interactive environment for writing and executing Spark code, which is required for transforming data using Spark. The results can be directly written to Delta Lake format, making Notebooks the correct component for this task.

Exam trap

The trap here is that candidates may confuse Pipelines (which orchestrate activities) with the actual compute engine (Notebooks) that runs Spark transformations, leading them to select Pipelines as the component for executing Spark code.

How to eliminate wrong answers

Option A is wrong because Dataflows Gen2 are used for low-code data transformation using Power Query, not for running Spark code. Option B is wrong because Pipelines are used for orchestrating and scheduling data movement and transformation activities, but they do not execute Spark transformations themselves. Option D is wrong because Semantic models are used for defining business logic and measures for reporting in Power BI, not for data transformation or Spark execution.

Practice this question →

Multi-Selecteasy

A retail company operates an online store. When a customer places an order, the system immediately updates inventory and payment records. Separately, the company's business analysts run weekly reports that aggregate sales data to identify trends. Which two terms correctly describe these workloads?

Select 2 answers

A.Batch processing and real-time processing

B.OLTP and OLAP

C.Structured and Unstructured data

D.Data ingestion and data transformation

AnswersA, B

Batch processing refers to processing data in large batches, while real-time processing handles data as it arrives. The order processing is transactional, not necessarily real-time analytics, and the weekly reports are batch, but 'batch' and 'real-time' are not the precise terms for workload types.

Why this answer

The order processing system that immediately updates inventory and payment records is a classic Online Transaction Processing (OLTP) workload, which handles high volumes of small, real-time transactions with ACID guarantees. The weekly sales aggregation reports run by business analysts are an Online Analytical Processing (OLAP) workload, which involves complex queries over large historical datasets to support business intelligence. Option B correctly pairs these two distinct processing paradigms.

Exam trap

The trap here is that candidates confuse the processing mode (batch vs. real-time) with the workload type (OLTP vs. OLAP), failing to recognize that OLTP is inherently real-time and OLAP is typically batch-oriented, but the question specifically asks for the correct terms that describe the workloads themselves.

Practice this question →

100

MCQmedium

You need to design a data storage solution for a global e-commerce application that must support ACID transactions and require minimal latency for point lookups by a unique key. Which Azure data service should you use?

A.Azure Table Storage

B.Azure SQL Database

C.Azure Blob Storage

D.Azure Cosmos DB

AnswerD

Cosmos DB provides low-latency point lookups and ACID transactions within a logical partition.

Why this answer

Azure Cosmos DB is the correct choice because it provides global distribution with multi-region writes, guarantees ACID transactions through its transactional batch API, and offers single-digit millisecond latency for point reads by a unique key (e.g., id and partition key). This makes it ideal for a global e-commerce application requiring both strong consistency and low-latency lookups.

Exam trap

The trap here is that candidates often assume Azure SQL Database is the only ACID-compliant option, overlooking Cosmos DB's transactional batch support and its superior global low-latency capabilities.

How to eliminate wrong answers

Option A is wrong because Azure Table Storage does not support ACID transactions (it only offers entity-level atomicity) and has higher latency for point lookups compared to Cosmos DB. Option B is wrong because Azure SQL Database, while fully ACID-compliant, is not designed for global distribution with minimal latency; it requires read replicas and manual failover, and its point lookup latency is higher than Cosmos DB's single-digit millisecond SLA. Option C is wrong because Azure Blob Storage is an object store for unstructured data, does not support ACID transactions, and point lookups by unique key are not its primary access pattern (it uses HTTP-based REST operations with higher latency).

Practice this question →

101

MCQmedium

A team is designing a data pipeline to process streaming sensor data from IoT devices. The data must be ingested, transformed in real time, and stored in a time-series database. Which combination of Azure services should they use?

A.Azure IoT Hub, Azure Data Lake Storage, and Azure Databricks

B.Azure IoT Hub, Azure Stream Analytics, and Azure Data Explorer

C.Azure Event Hubs, Azure Functions, and Azure SQL Database

D.Azure Event Hubs, Azure Synapse Pipelines, and Azure Cosmos DB

AnswerB

IoT Hub ingests device data, Stream Analytics performs real-time transformations, and Data Explorer is a time-series database for fast analytics.

Why this answer

Option B is correct because Azure IoT Hub ingests streaming sensor data from IoT devices, Azure Stream Analytics provides real-time transformation and analysis of the data streams, and Azure Data Explorer (ADX) is a fully managed time-series database optimized for high-velocity telemetry data. This combination directly addresses the requirement for ingestion, real-time transformation, and time-series storage.

Exam trap

The trap here is that candidates often confuse Azure Data Explorer with Azure Data Lake Storage or Azure SQL Database, assuming any storage service can handle time-series data, but ADX is the only Azure service purpose-built for high-ingestion-rate time-series analytics with features like materialized views and data sharding.

How to eliminate wrong answers

Option A is wrong because Azure Data Lake Storage is a hierarchical file store for batch/analytics, not a time-series database, and Azure Databricks is primarily for batch and interactive analytics, not real-time stream processing with low-latency time-series storage. Option C is wrong because Azure SQL Database is a relational OLTP database not optimized for time-series workloads, and Azure Functions is event-driven compute, not a dedicated stream processing service for real-time transformations. Option D is wrong because Azure Synapse Pipelines is an orchestration tool for data movement and transformation, not real-time stream processing, and Azure Cosmos DB is a multi-model NoSQL database that lacks native time-series optimizations like automatic retention policies and downsampling.

Practice this question →

102

MCQeasy

A retail company stores customer transaction data in a relational database. Each transaction is recorded with a fixed schema including TransactionID, CustomerID, ProductID, Quantity, and TotalAmount. Which type of data does this represent?

A.Unstructured data

B.Semi-structured data

C.Structured data

D.Binary data

AnswerC

Structured data conforms to a predefined schema, typically stored in rows and columns in a relational database. The fixed schema of TransactionID, CustomerID, etc., makes this structured data.

Why this answer

Option C is correct because the data conforms to a fixed schema with defined columns (TransactionID, CustomerID, ProductID, Quantity, TotalAmount) and data types, which is the defining characteristic of structured data. In a relational database, this schema enforces consistency and allows for efficient querying using SQL, making it a classic example of structured data.

Exam trap

The trap here is that candidates may confuse 'structured data' with 'semi-structured data' because both have some organization, but the key differentiator is the rigid, predefined schema enforced by the relational database versus the flexible, self-describing schema of semi-structured formats like JSON or XML.

How to eliminate wrong answers

Option A is wrong because unstructured data has no predefined schema or organization (e.g., text files, images, videos), whereas this data has a fixed schema. Option B is wrong because semi-structured data has some organizational properties but does not conform to a rigid schema (e.g., JSON, XML with flexible tags), while this data uses a strict relational schema. Option D is wrong because binary data refers to raw byte sequences (e.g., executable files, images), not tabular data with typed columns.

Practice this question →

103

MCQhard

Your company, Contoso Ltd., operates a global e-commerce platform. The data engineering team ingests over 10 TB of raw clickstream data daily into Azure Data Lake Storage Gen2. The data is partitioned by date and hour. Business analysts need to query this data using Azure Synapse Serverless SQL to generate daily sales reports. However, the reports are taking over 30 minutes to run, and the team needs to improve query performance without moving data to a dedicated SQL pool. You are asked to recommend a solution. Which action should you take?

A.Convert the data from JSON to Parquet format and apply Snappy compression.

B.Use Azure Data Factory to copy the data into Azure SQL Database and create indexes.

C.Create a dedicated SQL pool and distribute the data across 60 distributions.

D.Create external tables using a partition elimination strategy and ensure the data is partitioned by date.

AnswerD

Partition elimination allows the serverless SQL engine to read only the partitions needed for the query, significantly reducing data scanned and improving performance.

Why this answer

Option D is correct because Azure Synapse Serverless SQL can use external tables with partition elimination to skip irrelevant partitions (e.g., date/hour folders) during query execution. This reduces the amount of data scanned, directly improving query performance without moving data. Partition elimination works by filtering on the partition column (e.g., date) in the WHERE clause, allowing the query engine to read only the necessary files.

Exam trap

The trap here is that candidates often assume converting file format (Parquet) alone is sufficient, but the question specifically targets reducing data scanned via partition elimination, which is a more direct optimization for partitioned data in serverless SQL.

How to eliminate wrong answers

Option A is wrong because while converting to Parquet with Snappy compression can improve performance, it does not address the root cause of scanning all 10 TB daily; partition elimination is more impactful for reducing data scanned. Option B is wrong because copying data to Azure SQL Database defeats the requirement of not moving data to a dedicated SQL pool, and it introduces additional cost and latency. Option C is wrong because creating a dedicated SQL pool explicitly violates the requirement to not move data to a dedicated SQL pool; it also involves provisioning and managing separate compute resources.

Practice this question →

104

MCQhard

Your team uses Azure SQL Database and wants to implement row-level security (RLS) to restrict access to sales data by region. Which type of data workload characteristic does RLS primarily address?

A.Concurrency

B.Consistency

C.Security

D.Durability

AnswerC

RLS is a security feature that restricts data access based on user identity.

Why this answer

Row-level security (RLS) in Azure SQL Database restricts data access at the database engine level by applying a security predicate that filters rows based on user attributes, such as region. This directly addresses the security characteristic of a data workload by ensuring that users can only see data they are authorized to view, without requiring application-level changes.

Exam trap

The trap here is that candidates confuse security (access control) with concurrency (multi-user access) or consistency (data integrity), because RLS involves filtering rows during queries, which might superficially resemble managing concurrent access or ensuring data correctness.

How to eliminate wrong answers

Option A is wrong because concurrency refers to the ability of multiple users to access data simultaneously without conflicts, which is managed by locking and isolation levels, not by row-level filtering. Option B is wrong because consistency ensures that data remains accurate and valid across transactions (e.g., via ACID properties), whereas RLS does not enforce data integrity rules. Option D is wrong because durability guarantees that committed transactions persist even after a system failure, typically achieved through transaction logs and backups, not through access control predicates.

Practice this question →

105

MCQeasy

A company stores an employee database in a relational database. The Employees table includes columns: EmployeeID (integer), FirstName (text), LastName (text), HireDate (date), and a column called Photo which stores the employee's photo as a binary large object (BLOB). Which statement best describes the data types in this table?

A.All columns store structured data.

B.The Photo column stores unstructured data, while the other columns store structured data.

C.All columns store unstructured data.

D.The HireDate column stores semi-structured data.

AnswerB

Structured data is organized with a fixed schema; the integer, text, and date columns all have a fixed type and format. The Photo column contains binary image data with no inherent structure, making it unstructured data.

Why this answer

The Photo column stores a binary large object (BLOB), which is unstructured data because it does not have a predefined schema or format that can be easily queried or indexed by relational operations. In contrast, EmployeeID, FirstName, LastName, and HireDate are all structured data types (integer, text, date) that conform to a fixed schema and support direct querying, sorting, and indexing. This distinction is fundamental in Azure data services, where structured data is typically stored in Azure SQL Database or Azure Synapse, while unstructured BLOBs are better suited for Azure Blob Storage.

Exam trap

The trap here is that candidates may assume all columns in a relational database are structured, overlooking that BLOB columns store unstructured binary data, which is a key distinction tested in the DP-900 exam under core data concepts.

How to eliminate wrong answers

Option A is wrong because it claims all columns store structured data, but the Photo column as a BLOB is unstructured binary data without a fixed schema. Option C is wrong because it states all columns store unstructured data, but EmployeeID, FirstName, LastName, and HireDate have explicit data types (integer, text, date) that are structured and schema-bound. Option D is wrong because the HireDate column stores a date value, which is structured data, not semi-structured data (semi-structured data would be something like JSON or XML with flexible schema).

Practice this question →

106

Multi-Selectmedium

Which THREE are characteristics of structured data? (Choose three.)

Select 3 answers

A.Has a predefined schema

B.Consists of audio and video files

C.Uses JSON or XML format

D.Stored in relational databases

E.Organized in rows and columns

AnswersA, D, E

Schema is defined before data is stored.

Why this answer

Structured data has a predefined schema, meaning the data types, relationships, and constraints are defined before data is entered. This schema ensures consistency and enables efficient querying, which is why relational databases enforce a fixed schema through table definitions and constraints like primary keys and foreign keys.

Exam trap

The trap here is that candidates confuse semi-structured formats like JSON and XML with structured data, but structured data requires a rigid schema enforced by the database, not just a self-describing format.

Practice this question →

107

MCQeasy

A company stores customer information in a SQL database table with columns: CustomerID, FirstName, LastName, Email, SignupDate. They also store product images as JPEG files in Azure Blob Storage. Which statement correctly describes the types of data involved?

A.Customer data is unstructured, product images are semi-structured.

B.Customer data is structured, product images are unstructured.

C.Both are structured.

D.Customer data is semi-structured, product images are unstructured.

AnswerB

Customer data is stored in a relational table with a fixed schema, making it structured. JPEG images have no inherent structure, making them unstructured.

Why this answer

Customer data stored in a SQL database table with defined columns (CustomerID, FirstName, LastName, Email, SignupDate) is structured because it adheres to a fixed schema with rows and columns. Product images stored as JPEG files in Azure Blob Storage are unstructured because they lack a predefined data model and are stored as binary large objects (BLOBs) without a schema. Option B correctly identifies this distinction.

Exam trap

The trap here is confusing 'unstructured' with 'semi-structured' — candidates often misclassify JPEG images as semi-structured because they have metadata (e.g., EXIF), but the data itself (pixel values) has no schema, making it unstructured, while semi-structured data like JSON has a self-describing structure.

How to eliminate wrong answers

Option A is wrong because customer data in a SQL table is structured, not unstructured, and product images are unstructured, not semi-structured. Option C is wrong because product images are unstructured, not structured; only the customer data is structured. Option D is wrong because customer data is structured, not semi-structured; semi-structured data (e.g., JSON, XML) has tags or markers but no rigid schema, whereas a SQL table has a fixed schema.

Practice this question →

108

Multi-Selecteasy

Which TWO of the following Azure services are categorized as Platform as a Service (PaaS) for data storage?

Select 2 answers

A.Azure Cosmos DB

B.Azure Synapse Analytics dedicated SQL pool

C.Azure Data Lake Storage Gen2

D.Azure SQL Database

E.Azure Virtual Machines with SQL Server

AnswersA, D

Azure Cosmos DB is a fully managed NoSQL PaaS database.

Why this answer

Azure Cosmos DB is a fully managed NoSQL database service that provides turnkey global distribution, elastic scaling, and multi-model support (document, key-value, graph, column-family). As a PaaS offering, it abstracts infrastructure management—such as hardware provisioning, patching, and replication—allowing developers to focus on data modeling and application logic.

Exam trap

The trap here is that candidates often confuse Azure Data Lake Storage Gen2 (which is IaaS-level object storage) with a managed database PaaS, or they mistakenly think Azure Synapse Analytics dedicated SQL pool is a primary data storage service rather than an analytics engine that typically queries data stored elsewhere.

Practice this question →

109

Multi-Selectmedium

Which TWO Azure services can be used to perform real-time data ingestion and processing? (Choose two.)

Select 2 answers

A.Azure SQL Database

B.Azure Event Hubs

C.Azure Blob Storage

D.Azure Data Factory

E.Azure Stream Analytics

AnswersB, E

Ingests real-time data streams.

Why this answer

Azure Event Hubs is a fully managed, real-time data ingestion service that can ingest millions of events per second from any source, using AMQP, HTTPS, or Apache Kafka protocol. It is designed for high-throughput, low-latency event streaming, making it ideal for real-time data ingestion and processing pipelines.

Exam trap

The trap here is that candidates often confuse batch processing services like Azure Data Factory or storage services like Blob Storage with real-time ingestion, forgetting that real-time requires event-driven, low-latency ingestion and processing capabilities.

Practice this question →

110

MCQhard

You are implementing a data pipeline that ingests millions of events per second from IoT devices. The pipeline must tolerate failures and guarantee exactly-once processing. Which Azure service should you use to ingest the events?

A.Azure IoT Hub

B.Azure Event Hubs

C.Azure Service Bus

D.Azure Queue Storage

AnswerB

Event Hubs can ingest millions of events per second, supports checkpointing for exactly-once processing, and provides at-least-once delivery with idempotent consumers.

Why this answer

Azure Event Hubs is the correct choice because it is a big data streaming platform and event ingestion service designed for high-throughput scenarios, capable of ingesting millions of events per second. It supports exactly-once processing through checkpointing and partition-based offset management, and its built-in replication and availability zones provide fault tolerance.

Exam trap

The trap here is that candidates confuse Azure IoT Hub with Event Hubs because both handle IoT data, but IoT Hub is for device management and control, not for high-throughput event ingestion with exactly-once guarantees.

How to eliminate wrong answers

Option A is wrong because Azure IoT Hub is optimized for device management and bi-directional communication with IoT devices, not for high-throughput event ingestion at millions of events per second; it has lower throughput limits and is not designed for exactly-once processing at that scale. Option C is wrong because Azure Service Bus is a message broker for enterprise messaging with features like topics and queues, but it is not built for high-throughput event streaming and has lower throughput ceilings, making it unsuitable for millions of events per second. Option D is wrong because Azure Queue Storage is a simple message queue for decoupling application components with at-least-once delivery semantics and limited throughput, not supporting exactly-once processing or the high ingestion rates required.

Practice this question →

111

MCQmedium

Refer to the exhibit. A data engineer needs to query the orders.csv file using Azure Synapse Serverless SQL. What is the most efficient way to access this data?

A.Use PolyBase to create external table

B.Use OPENROWSET in Serverless SQL

C.Copy data to Azure SQL Database using ADF

D.Load data into a dedicated SQL pool

AnswerB

OPENROWSET can query files directly without loading.

Why this answer

Azure Synapse Serverless SQL is designed for on-demand querying of data stored in data lakes without provisioning storage. The OPENROWSET function with the BULK option allows direct querying of CSV files using T-SQL, making it the most efficient method for ad-hoc analysis of the orders.csv file without data movement or schema management.

Exam trap

The trap here is that candidates often confuse PolyBase (which is for dedicated SQL pools) with Serverless SQL's OPENROWSET, or assume that data must be moved to a database before querying, missing the serverless paradigm of query-in-place.

How to eliminate wrong answers

Option A is wrong because PolyBase is used to create external tables in dedicated SQL pools, not in Serverless SQL, and requires defining external data sources and file formats, adding unnecessary overhead for a simple query. Option C is wrong because copying data to Azure SQL Database using ADF involves data movement and additional costs, which is inefficient for a one-time or ad-hoc query. Option D is wrong because loading data into a dedicated SQL pool requires provisioning and managing a dedicated resource, which is overkill and costly for querying a single CSV file.

Practice this question →

112

MCQeasy

A healthcare organization stores patient records in a relational database table with fixed columns for PatientID, Name, and DateOfBirth. Additionally, they store clinical notes as free-form text files for each patient visit. Which statement correctly classifies these data types?

A.Both patient records and clinical notes are examples of unstructured data.

B.Patient records are structured data, and clinical notes are unstructured data.

C.Both patient records and clinical notes are examples of structured data.

D.Patient records are unstructured data, and clinical notes are semi-structured data.

AnswerB

Patient records have defined columns (structured), while clinical notes are free-form text (unstructured).

Why this answer

Patient records stored in a relational database table with fixed columns (PatientID, Name, DateOfBirth) conform to a predefined schema, making them structured data. Clinical notes stored as free-form text files lack a fixed schema or organization, which classifies them as unstructured data. Option B correctly identifies this distinction.

Exam trap

The trap here is that candidates confuse 'free-form text' with semi-structured data (e.g., JSON or XML), but semi-structured data has tags or key-value pairs, whereas free-form text has no inherent structure at all.

How to eliminate wrong answers

Option A is wrong because patient records in a relational table with fixed columns are structured data, not unstructured. Option C is wrong because clinical notes as free-form text files have no predefined schema, so they are unstructured, not structured. Option D is wrong because patient records are structured (not unstructured), and clinical notes are unstructured (not semi-structured, as they lack tags or metadata that would make them semi-structured like JSON or XML).

Practice this question →

113

MCQeasy

A company processes sales transactions in real-time from a retail website. Each transaction is recorded as a row in a relational database. Additionally, the company stores weekly sales reports as PDF files. Which statement correctly describes these data types?

A.Transactions are unstructured, reports are semi-structured.

B.Transactions are structured, reports are unstructured.

C.Both are structured because they are files.

D.Both are unstructured because they are digital.

AnswerB

Correct. Transactions have a rigid schema (structured), and PDF files lack a predefined schema (unstructured).

Why this answer

Transactions are structured because they are stored as rows in a relational database, which imposes a fixed schema with defined columns and data types. Weekly sales reports as PDF files are unstructured because they lack a predefined data model and cannot be easily queried using SQL without additional processing. Option B correctly identifies this distinction.

Exam trap

The trap here is that candidates confuse 'file format' with 'data structure', assuming all files are structured, when in fact PDFs are unstructured binary files that lack the row/column schema of relational data.

How to eliminate wrong answers

Option A is wrong because it reverses the definitions: transactions are structured (not unstructured) and reports are unstructured (not semi-structured). Option C is wrong because not all files are structured; PDF files are binary blobs without a row/column schema, unlike relational database tables. Option D is wrong because being digital does not imply unstructured; structured data like relational tables is also digital but has a rigid schema.

Practice this question →

114

MCQmedium

You need to choose a data store for a mobile app that requires real-time synchronization of user preferences across devices. The data is small per user and key-value oriented. Which Azure service is most appropriate?

A.Azure Cosmos DB

B.Azure Cache for Redis

C.Azure Blob Storage

D.Azure SQL Database

AnswerA

Cosmos DB offers low latency, global distribution, and key-value API.

Why this answer

Azure Cosmos DB is the most appropriate choice because it provides global distribution, low-latency reads and writes, and automatic conflict resolution, which are essential for real-time synchronization of user preferences across devices. Its key-value API (e.g., Table API or Core SQL API with a simple partition key) efficiently handles small, per-user data with a key-value orientation, ensuring that changes made on one device are quickly reflected on others.

Exam trap

The trap here is that candidates often confuse Azure Cache for Redis as a primary data store for persistent, synchronized user preferences, overlooking its transient nature and lack of built-in conflict resolution for multi-device scenarios.

How to eliminate wrong answers

Option B (Azure Cache for Redis) is wrong because it is an in-memory cache designed for temporary, volatile data with limited persistence options; it does not provide built-in conflict resolution or durable, globally distributed synchronization for user preferences that must persist across sessions. Option C (Azure Blob Storage) is wrong because it is optimized for large, unstructured binary objects (e.g., images, videos) and lacks the low-latency, key-value access patterns and real-time sync capabilities needed for small, frequently updated user preferences. Option D (Azure SQL Database) is wrong because it is a relational database that requires a fixed schema and is not optimized for simple key-value workloads; its overhead and lack of native conflict resolution make it unsuitable for real-time synchronization of small, per-user key-value data.

Practice this question →

115

MCQmedium

In a banking application, a transaction transfers $100 from Account A to Account B. The system deducts $100 from Account A successfully, but due to a network error, the credit to Account B fails. The application rolls back the deduction from Account A, ensuring that neither account is affected. Which ACID property is being enforced?

A.Atomicity

B.Consistency

C.Isolation

D.Durability

AnswerA

Atomicity ensures the transaction is all-or-nothing; if any part fails, the entire transaction is rolled back, as seen in this example.

Why this answer

Atomicity ensures that a transaction is treated as a single, indivisible unit of work. In this scenario, the deduction from Account A and the credit to Account B must both succeed or both fail entirely. When the credit to Account B fails, the system rolls back the deduction from Account A, preserving the all-or-nothing nature of the transaction.

This is the core behavior of atomicity in ACID-compliant database systems like Azure SQL Database or SQL Server.

Exam trap

The trap here is that candidates confuse the rollback action with consistency, because both involve maintaining a correct state, but atomicity specifically governs the all-or-nothing completion of the transaction itself, not the validity of the data rules.

How to eliminate wrong answers

Option B (Consistency) is wrong because consistency ensures that a transaction brings the database from one valid state to another, respecting all defined rules (e.g., constraints, triggers, cascades). While the rollback does maintain consistency, the specific action of rolling back a partial change is the hallmark of atomicity, not consistency. Option C (Isolation) is wrong because isolation controls how concurrent transactions are visible to each other (e.g., via locking or snapshot isolation levels), not the rollback of a failed transaction.

Option D (Durability) is wrong because durability guarantees that once a transaction is committed, its changes persist even after a system failure; here, the transaction was not committed, so durability is not relevant.

Practice this question →

116

MCQeasy

A banking system processes a money transfer between two accounts. The system is designed so that after the transaction is committed, the results are permanently saved and survive any subsequent system failure, such as a power outage. Which ACID property ensures this behavior?

A.Durability

B.Atomicity

C.Consistency

D.Isolation

AnswerA

Correct. Durability guarantees that committed changes are saved permanently, surviving failures.

Why this answer

Durability ensures that once a transaction is committed, its changes are permanently stored and survive system failures, such as power outages or crashes. In this banking scenario, the money transfer results are written to non-volatile storage (e.g., disk) via a write-ahead log, guaranteeing that the committed state is recoverable even after a restart.

Exam trap

The trap here is that candidates often confuse durability with atomicity, thinking that 'surviving failures' means the transaction either completes fully or not at all, but atomicity handles the rollback of partial transactions, not the persistence of committed ones.

How to eliminate wrong answers

Option B (Atomicity) is wrong because atomicity ensures that a transaction is treated as an all-or-nothing unit, meaning either all operations complete or none do, but it does not guarantee that committed data survives failures. Option C (Consistency) is wrong because consistency ensures that a transaction brings the database from one valid state to another, preserving integrity constraints, but it does not address persistence after a commit. Option D (Isolation) is wrong because isolation ensures that concurrent transactions do not interfere with each other, preventing dirty reads or lost updates, but it does not provide durability against system crashes.

Practice this question →

117

MCQeasy

A company stores customer names, addresses, and order history. They need to perform complex queries that join customer and order data. Which type of data store is most appropriate for this scenario?

A.Key-value store

B.Relational database

C.Document database

D.Graph database

AnswerB

Relational databases organize data into tables with defined schemas and support SQL queries including joins, making them ideal for this requirement.

Why this answer

A relational database (e.g., Azure SQL Database) is most appropriate because the scenario requires joining customer and order data via complex queries. Relational databases enforce a fixed schema with tables, primary keys, and foreign keys, enabling efficient JOIN operations using SQL. This structure ensures data integrity and supports ACID transactions, which are essential for accurate order history and customer records.

Exam trap

The trap here is that candidates often choose a document database (Option C) because they associate 'complex queries' with JSON flexibility, but fail to recognize that 'joining' specifically requires relational database features like SQL JOINs and foreign keys, which document stores lack.

How to eliminate wrong answers

Option A is wrong because a key-value store (e.g., Azure Cosmos DB Table API) is optimized for simple lookups by a single key and does not support complex JOIN queries or relational integrity between entities. Option C is wrong because a document database (e.g., Azure Cosmos DB Core API) stores semi-structured JSON documents and, while it can embed related data, it lacks native JOIN capabilities and enforces no schema, making complex relational queries inefficient. Option D is wrong because a graph database (e.g., Azure Cosmos DB Gremlin API) is designed for traversing relationships between highly connected entities (e.g., social networks), not for tabular JOINs on structured customer and order data.

Practice this question →

118

MCQmedium

Refer to the exhibit. You are reviewing an ARM template for a new storage account. The storage account will store data that must be accessible from any Azure region and must be highly durable. Which change should you make to the template?

A.Set supportsHttpsTrafficOnly to false

B.Change the SKU name to Premium_LRS

C.Change the SKU name to Standard_GRS

D.Change the kind to BlobStorage

AnswerC

Geo-redundant storage replicates data to a secondary region, improving durability across regions.

Why this answer

Standard_GRS (Geo-Redundant Storage) is the correct SKU because it replicates data synchronously three times within a primary region and asynchronously to a secondary region hundreds of miles away, ensuring high durability (11 nines) and accessibility from any Azure region via read-access (RA-GRS). The requirement for data to be accessible from any Azure region and highly durable aligns with GRS's geo-replication, whereas LRS only replicates within a single datacenter and Premium_LRS is for low-latency workloads, not geo-accessibility.

Exam trap

Microsoft often tests the misconception that changing the 'kind' (e.g., to BlobStorage) or disabling HTTPS affects durability or geo-accessibility, when in fact only the SKU name (replication strategy) controls these properties, and candidates confuse security settings with replication settings.

How to eliminate wrong answers

Option A is wrong because setting supportsHttpsTrafficOnly to false disables HTTPS enforcement, which is a security setting unrelated to durability or regional accessibility; it would expose data to insecure HTTP traffic. Option B is wrong because Premium_LRS uses SSD-based storage with local redundancy only, offering lower durability (11 nines vs. 16 nines for GRS) and no geo-replication, failing the 'accessible from any Azure region' requirement. Option D is wrong because changing the kind to BlobStorage restricts the account to blob-only storage (block blobs and append blobs), but the question does not specify blob-only data; moreover, the kind does not affect durability or geo-accessibility—that is determined by the SKU.

Practice this question →

119

MCQhard

A healthcare application stores patient medical history in a relational database. The system must ensure that after a transaction updates multiple records (e.g., diagnosis and medication), all changes are saved or none are saved. This property is best described as:

A.Atomicity

B.Consistency

C.Durability

D.Isolation

AnswerA

Atomicity ensures that a transaction is either fully completed or fully rolled back, matching the all-or-nothing requirement.

Why this answer

Atomicity ensures that a transaction is treated as a single, indivisible unit of work. In the context of a relational database storing patient medical history, if a transaction updates both the diagnosis and medication records, atomicity guarantees that either both updates are committed or both are rolled back, preventing partial updates that could leave the data in an inconsistent state.

Exam trap

The trap here is that candidates often confuse atomicity with consistency, mistakenly thinking that 'all-or-nothing' is about maintaining data rules, when in fact atomicity is specifically about the transaction's indivisibility at the write level.

How to eliminate wrong answers

Option B (Consistency) is wrong because consistency ensures that a transaction brings the database from one valid state to another, respecting all defined rules (e.g., constraints, triggers), but it does not directly enforce the all-or-nothing behavior of multiple record updates. Option C (Durability) is wrong because durability guarantees that once a transaction is committed, its changes persist even after a system failure, but it does not control whether the transaction is fully applied or rolled back. Option D (Isolation) is wrong because isolation ensures that concurrent transactions do not interfere with each other, preventing dirty reads or lost updates, but it does not mandate that all changes within a single transaction are saved or none are saved.

Practice this question →

120

Multi-Selecthard

Which THREE of the following are valid Azure data storage services? (Choose three.)

Select 3 answers

A.Azure Files

B.Azure Blob Storage

C.Azure Redis Cache

D.Azure Table Storage

E.Azure Service Bus

AnswersA, B, D

Yes, it's a fully managed file share.

Why this answer

Azure Files provides fully managed file shares in the cloud that can be accessed via the Server Message Block (SMB) protocol or the Network File System (NFS) protocol. It is a valid Azure data storage service because it stores data as files in a hierarchical structure, making it suitable for lift-and-shift scenarios for on-premises file servers.

Exam trap

The trap here is that candidates may confuse Azure Redis Cache and Azure Service Bus as data storage services because they store data temporarily, but the DP-900 exam defines 'data storage services' as those designed for persistent, structured or unstructured data storage, not transient messaging or caching.

Practice this question →

121

MCQeasy

A retail company stores product information in a relational database table with fixed columns: ProductID (integer), Name (string), Price (decimal). They also store customer reviews as JSON documents where each review may contain different fields such as rating, comment, and optional images. Additionally, they store product images as JPEG files in Azure Blob Storage. Which of the following correctly classifies these data types from most structured to least structured?

A.Structured (product info), Semi-structured (reviews), Unstructured (images)

B.Semi-structured (product info), Structured (reviews), Unstructured (images)

C.Unstructured (product info), Semi-structured (reviews), Structured (images)

D.Structured (product info), Unstructured (reviews), Semi-structured (images)

AnswerA

Product info in a relational table is structured. JSON reviews are semi-structured because they have a flexible schema. JPEG images are unstructured binary data.

Why this answer

Product info in a relational table with fixed columns (ProductID, Name, Price) is structured data. Customer reviews stored as JSON documents, which may have varying fields like rating, comment, and optional images, are semi-structured because they have a flexible schema. Product images stored as JPEG files in Azure Blob Storage are unstructured binary data.

This ordering from most to least structured matches option A.

Exam trap

Microsoft often tests the distinction between semi-structured and unstructured data, where candidates mistakenly classify JSON as unstructured because it lacks a fixed schema, but JSON is semi-structured due to its inherent key-value structure and optional fields.

How to eliminate wrong answers

Option B is wrong because it incorrectly classifies product info as semi-structured (it has a fixed schema, making it structured) and reviews as structured (JSON with optional fields is semi-structured). Option C is wrong because it classifies product info as unstructured (it is structured) and images as structured (JPEG files are unstructured binary data). Option D is wrong because it classifies reviews as unstructured (JSON with a schema is semi-structured) and images as semi-structured (JPEG files have no schema, making them unstructured).

Practice this question →

122

MCQeasy

Refer to the exhibit. You are deploying an Azure Storage account. The JSON snippet represents a template parameter. What does the 'isHnsEnabled' property enable?

A.Blob versioning

B.Soft delete for blobs

C.Geo-redundant storage

D.Hierarchical namespace for the storage account

AnswerD

Enables Data Lake Storage Gen2 capabilities.

Why this answer

The 'isHnsEnabled' property enables the hierarchical namespace for the storage account, which is a core feature of Azure Data Lake Storage Gen2. When set to true, it allows the storage account to organize blobs into a directory hierarchy, enabling POSIX-like access control lists (ACLs) and file system semantics. This is essential for big data analytics workloads that require a file system structure rather than a flat blob storage model.

Exam trap

The trap here is that candidates often confuse 'isHnsEnabled' with blob-level features like versioning or soft delete, because all three are related to data management, but only the hierarchical namespace fundamentally changes the storage account's architecture to support file system semantics.

How to eliminate wrong answers

Option A is wrong because blob versioning is enabled via the 'Versioning' property in the Blob service settings, not by 'isHnsEnabled'. Option B is wrong because soft delete for blobs is configured through the 'DeleteRetentionPolicy' property in the Blob service, not through the hierarchical namespace flag. Option C is wrong because geo-redundant storage (GRS) is a replication option set via the 'sku.name' property (e.g., 'Standard_GRS'), not by enabling a hierarchical namespace.

Practice this question →

123

Multi-Selectmedium

Which TWO of the following are common characteristics of a NoSQL database?

Select 2 answers

A.Flexible schema

B.Normalized data storage

C.Strong ACID transaction support

D.Relational data model

E.Horizontal scaling

AnswersA, E

NoSQL databases allow schema flexibility, making them suitable for semi-structured or unstructured data.

Why this answer

Option A is correct because NoSQL databases, such as MongoDB or Cassandra, use a flexible schema that allows documents or records to have varying fields without requiring predefined table structures. This enables developers to iterate quickly and store semi-structured or unstructured data, such as JSON documents, without costly schema migrations.

Exam trap

The trap here is that candidates confuse 'flexible schema' with 'no schema at all' or mistakenly think NoSQL always supports strong ACID transactions, when in reality most NoSQL systems trade ACID for scalability and performance.

Practice this question →

124

MCQmedium

Your organization uses Azure SQL Database and needs to ensure that all customer data is encrypted at rest and in transit with minimal administrative overhead. Which solution should you recommend?

A.Use Microsoft Purview Information Protection to label and encrypt the data.

B.Enable Transparent Data Encryption (TDE) and enforce TLS 1.2 for connections.

C.Implement Dynamic Data Masking on the customer table.

D.Enable Always Encrypted for all sensitive columns and use client-side encryption.

AnswerB

TDE encrypts the database at rest automatically, and enforcing TLS ensures encryption in transit with minimal overhead.

Why this answer

Option B is correct because Transparent Data Encryption (TDE) encrypts Azure SQL Database data files at rest without requiring any application changes, and enforcing TLS 1.2 ensures all data in transit is encrypted using a strong, industry-standard protocol. This combination meets the requirement for encryption at rest and in transit with minimal administrative overhead, as TDE is managed by the platform and TLS enforcement is a simple server-level setting.

Exam trap

The trap here is that candidates often confuse Dynamic Data Masking (which only hides data in results) with encryption, or they overcomplicate the solution by choosing Always Encrypted, which requires client-side changes and key management, when the question explicitly asks for minimal administrative overhead.

How to eliminate wrong answers

Option A is wrong because Microsoft Purview Information Protection is a data classification and labeling service, not a native encryption mechanism for Azure SQL Database; it does not encrypt data at rest or in transit within the database engine. Option C is wrong because Dynamic Data Masking only obfuscates data in query results for unauthorized users, it does not encrypt data at rest or in transit. Option D is wrong because Always Encrypted requires client-side encryption and key management, which adds significant administrative overhead and application changes, contradicting the 'minimal administrative overhead' requirement.

Practice this question →

125

MCQeasy

A company stores employee records in a database. Each employee record contains an EmployeeID (unique), Name, Department, and HireDate. The EmployeeID is used to uniquely identify each employee. Which data concept does the EmployeeID represent?

A.Index

B.Foreign key

C.Primary key

D.Unique constraint

AnswerC

The primary key uniquely identifies each row and is a fundamental concept in relational databases.

Why this answer

The EmployeeID is used to uniquely identify each employee record, which is the defining characteristic of a primary key. In relational databases, a primary key enforces entity integrity by ensuring each row has a unique, non-null identifier. This aligns with the core data concept of a primary key as the unique identifier for a table.

Exam trap

The trap here is that candidates often confuse a unique constraint with a primary key because both enforce uniqueness, but the primary key uniquely identifies the row and cannot contain NULLs, while a unique constraint is a secondary uniqueness enforcement that can allow a single NULL value.

How to eliminate wrong answers

Option A is wrong because an index is a performance optimization structure that speeds up data retrieval, not a constraint that uniquely identifies rows. Option B is wrong because a foreign key is a column that references a primary key in another table to establish a relationship, not a unique identifier within its own table. Option D is wrong because a unique constraint ensures all values in a column are distinct but does not inherently designate the column as the table's primary identifier; a table can have multiple unique constraints but only one primary key.

Practice this question →

126

MCQhard

Refer to the exhibit. You are reviewing an ARM template for an Azure SQL Database deployment. The database must support a read-only workload that requires low latency. The current configuration uses General Purpose tier with 4 vCores. What is the most significant performance improvement you can make without changing the tier?

A.Increase maxSizeBytes to 1 TB

B.Set the edition to 'Serverless'

C.Enable read scale-out by adding 'readScale' property

D.Change requestedBackupStorageRedundancy to 'Local'

AnswerC

Read scale-out allows read-only queries to be routed to a secondary replica, improving performance for read workloads.

Why this answer

Enabling read scale-out by adding the 'readScale' property allows the database to use a read-only replica, offloading read workloads from the primary and providing low-latency reads. This is the most significant performance improvement within the General Purpose tier because it directly addresses the read-only workload requirement without changing the tier or incurring additional compute costs.

Exam trap

The trap here is that candidates often confuse scaling storage (maxSizeBytes) or changing backup redundancy with performance improvements, but the question specifically targets read latency for a read-only workload, which is directly addressed by read scale-out rather than storage or backup changes.

How to eliminate wrong answers

Option A is wrong because increasing maxSizeBytes to 1 TB only expands storage capacity, which does not improve read performance or latency for a read-only workload. Option B is wrong because setting the edition to 'Serverless' changes the tier (from provisioned to serverless compute), which violates the constraint of not changing the tier, and serverless is designed for intermittent workloads, not low-latency read performance. Option D is wrong because changing requestedBackupStorageRedundancy to 'Local' affects backup storage redundancy (e.g., LRS vs.

GRS), not query performance or read latency.

Practice this question →

127

Multi-Selecteasy

Which TWO Azure data services are classified as NoSQL databases? (Choose two.)

Select 2 answers

A.Azure SQL Managed Instance

B.Azure Cosmos DB

C.Azure Table Storage

D.Azure Database for PostgreSQL

E.Azure SQL Database

AnswersB, C

NoSQL database.

Why this answer

Azure Cosmos DB is a fully managed NoSQL database service that supports multiple data models, including document, key-value, graph, and column-family, via APIs like SQL, MongoDB, Cassandra, Gremlin, and Table. It is explicitly designed as a NoSQL database with schema-agnostic, horizontally scalable storage.

Exam trap

The trap here is that candidates often confuse Azure Table Storage (a NoSQL key-value store) with Azure SQL Database or Managed Instance, assuming 'Table' implies a relational table, but it is actually a NoSQL service.

Practice this question →

128

MCQeasy

A retail company collects data from online transactions including order ID, customer details, product IDs, quantities, and timestamps. The data is stored in a relational database and used for order processing and inventory management. Which characteristic of this data makes it structured?

A.It is stored in rows and columns with a predefined schema.

B.It is stored as key-value pairs.

C.It is stored in JSON format with variable fields.

D.It is stored in unstructured text files.

AnswerA

Structured data is characterized by a rigid schema and tabular format, enabling relational database features like ACID transactions.

Why this answer

Option A is correct because structured data is defined by a fixed schema where each entity (e.g., orders) is stored in rows and columns with predefined data types (e.g., INT for order ID, VARCHAR for customer details). This relational model enforces consistency and enables efficient querying via SQL for order processing and inventory management.

Exam trap

The trap here is that candidates confuse 'structured' with any organized storage format (like JSON or key-value pairs), but the DP-900 exam specifically defines structured data as having a fixed schema with rows and columns in a relational database.

How to eliminate wrong answers

Option B is wrong because key-value pairs (e.g., in Redis or DynamoDB) are a NoSQL model that does not enforce a fixed schema or relational integrity, unlike the structured data described. Option C is wrong because JSON with variable fields is semi-structured data; it allows flexible schemas and nested structures, not the rigid rows-and-columns format of a relational database. Option D is wrong because unstructured text files (e.g., .txt or .log files) lack any predefined schema or organization, making them unsuitable for direct SQL-based order processing and inventory management.

Practice this question →

129

MCQmedium

You need to design a data storage solution for an e-commerce platform that requires ACID transactions for order processing and high availability across regions. Which Azure service meets these requirements?

A.Azure Database for MySQL with read replicas

B.Azure Synapse Analytics

C.Azure SQL Database with active geo-replication

D.Azure Cosmos DB with multiple write regions

AnswerC

Active geo-replication provides readable secondaries in other regions for HA.

Why this answer

Azure SQL Database with active geo-region replication supports ACID transactions natively and provides automatic failover to a secondary region, ensuring high availability across regions. This meets the e-commerce platform's need for transactional consistency and regional resilience.

Exam trap

The trap here is that candidates often confuse 'high availability' with 'multi-region writes' and choose Cosmos DB, overlooking that ACID transactions require a relational database with strict consistency guarantees, not just eventual consistency or single-document atomicity.

How to eliminate wrong answers

Option A is wrong because Azure Database for MySQL with read replicas supports ACID transactions but read replicas are read-only and do not provide automatic failover for write workloads, thus failing high availability for order processing writes. Option B is wrong because Azure Synapse Analytics is a big data analytics service optimized for large-scale data warehousing and analytics, not for OLTP workloads requiring ACID transactions. Option D is wrong because Azure Cosmos DB with multiple write regions provides multi-region writes and high availability but does not support full ACID transactions across multiple documents; it offers single-document atomicity and eventual consistency by default, not the strict ACID guarantees needed for order processing.

Practice this question →

130

MCQeasy

A startup is building a mobile app that allows users to share short text updates. Each update includes a user ID, timestamp, and message text. The development team expects rapid growth and needs a storage solution that can scale horizontally, handle high write throughput, and provide low-latency reads globally. Which Azure data service is most appropriate?

A.Azure SQL Database with a single database.

B.Azure Cosmos DB with a multi-master configuration and partition on user ID.

C.Azure Blob Storage with append blobs.

D.Azure Table Storage with user ID as partition key and timestamp as row key.

AnswerB

Cosmos DB provides global distribution, multi-master writes, and single-digit millisecond latency, ideal for high-throughput NoSQL workloads.

Why this answer

Azure Cosmos DB with a multi-master configuration is the most appropriate choice because it provides global distribution with multiple write regions, enabling horizontal scaling and low-latency reads and writes worldwide. Partitioning on user ID ensures even data distribution and efficient query performance for the app's high write throughput requirements.

Exam trap

The trap here is that candidates often confuse Azure Table Storage's horizontal scaling with the global, multi-master capabilities of Cosmos DB, assuming Table Storage can provide low-latency writes worldwide when it lacks native multi-region write support and has higher latency for cross-region scenarios.

How to eliminate wrong answers

Option A is wrong because Azure SQL Database with a single database is a relational database that scales vertically (up to a maximum size and DTU/vCore limit) and cannot natively handle global low-latency reads or multi-region writes without complex sharding or read replicas. Option C is wrong because Azure Blob Storage with append blobs is designed for unstructured data like logs or files, not for low-latency, high-throughput transactional updates with querying by user ID and timestamp. Option D is wrong because Azure Table Storage, while scalable, does not support multi-master writes or global low-latency reads natively; it is a key-value store with limited query capabilities and eventual consistency by default, which may not meet the app's need for low-latency writes globally.

Practice this question →

131

MCQeasy

A company collects data from multiple sources: IoT sensor streams, social media feeds, and CSV files from legacy systems. They want to store all this data in its original format without any transformation, so that data scientists can later apply machine learning models or run ad-hoc queries. Which data storage pattern best describes this approach?

A.Data warehouse

B.Data lake

C.Relational database

D.Data mart

AnswerB

A data lake stores data in its native format without transformation, supporting diverse data types and ad-hoc exploration by data scientists.

Why this answer

A data lake is designed to store vast amounts of raw data in its native format (structured, semi-structured, or unstructured) without requiring upfront schema or transformation. This aligns perfectly with the scenario of ingesting IoT streams, social media feeds, and CSV files as-is, enabling data scientists to later apply machine learning or run ad-hoc queries directly against the raw data.

Exam trap

The trap here is that candidates often confuse a data lake with a data warehouse, assuming both are for analytics, but the key differentiator is that a data lake stores raw, unprocessed data while a data warehouse requires transformation and schema-on-write.

How to eliminate wrong answers

Option A is wrong because a data warehouse stores data that has been transformed, cleaned, and structured into a schema optimized for analytics and reporting, not raw, unprocessed data. Option C is wrong because a relational database enforces a rigid schema and ACID transactions, making it unsuitable for storing diverse raw formats like IoT streams and social media feeds without transformation. Option D is wrong because a data mart is a subset of a data warehouse focused on a specific business domain, requiring pre-processed and aggregated data, not raw, unaltered source data.

Practice this question →

132

MCQmedium

A social media company stores user profiles as JSON documents where each profile may have different attributes (e.g., some profiles include 'education' while others include 'work history'). The company also stores user-generated posts in a relational database table with fixed columns (PostID, UserID, Content, Timestamp). Which of the following best describes the data types used for user profiles and user posts?

A.User profiles are structured data; posts are unstructured data.

B.User profiles are semi-structured data; posts are structured data.

C.Both are semi-structured data.

D.User profiles are unstructured data; posts are structured data.

AnswerB

Profiles are semi-structured (JSON with optional fields), posts are structured (fixed relational schema).

Why this answer

User profiles are stored as JSON documents with varying attributes, which is a classic example of semi-structured data because it has some organizational properties (key-value pairs) but does not enforce a fixed schema. User posts are stored in a relational database table with fixed columns (PostID, UserID, Content, Timestamp), which is structured data because it adheres to a rigid schema with defined data types and relationships.

Exam trap

The trap here is that candidates often confuse 'semi-structured' with 'unstructured' because JSON looks like free-form text, but JSON actually has a defined key-value structure, making it semi-structured, not unstructured.

How to eliminate wrong answers

Option A is wrong because user profiles are not structured data; they lack a fixed schema and can have varying attributes, which is the definition of semi-structured data. Option C is wrong because user posts are stored in a relational table with fixed columns, making them structured data, not semi-structured. Option D is wrong because user profiles are not unstructured data like free-form text or images; they are JSON documents with key-value pairs, which have a logical structure even if the schema is flexible.

Practice this question →

133

MCQeasy

A retail company stores product inventory data in a fixed-schema table with columns for ProductID, ProductName, QuantityInStock, and ReorderLevel. How should this data be classified?

A.Structured data

B.Semi-structured data

C.Unstructured data

D.Streaming data

AnswerA

Correct - The data has a fixed schema organized in rows and columns, which is the definition of structured data.

Why this answer

This data is classified as structured data because it conforms to a fixed schema with clearly defined columns (ProductID, ProductName, QuantityInStock, ReorderLevel) and data types, stored in a relational table. Structured data is highly organized, easily queryable via SQL, and follows a rigid schema, which matches the description of the inventory table.

Exam trap

The trap here is that candidates may confuse structured data with semi-structured data because both involve some organization, but the key distinction is that structured data requires a rigid, predefined schema (like a fixed-schema table), while semi-structured data allows schema flexibility (e.g., JSON with optional fields).

How to eliminate wrong answers

Option B is wrong because semi-structured data (e.g., JSON, XML, or CSV with flexible schemas) does not enforce a fixed schema or strict column definitions, whereas this table has a predefined schema. Option C is wrong because unstructured data (e.g., text files, images, or videos) lacks any predefined data model or organization, unlike the tabular inventory data. Option D is wrong because streaming data refers to continuous, real-time data flows (e.g., IoT sensor data or clickstreams), not static data stored in a table.

Practice this question →

134

MCQhard

Match each ACID property with its correct description. Properties: - Atomicity - Consistency - Isolation - Durability Descriptions: 1. Transactions appear to execute one after the other, even if they are concurrent. 2. Once a transaction is committed, the changes are permanently saved and survive failures. 3. A transaction either completes fully or is rolled back entirely. 4. A transaction brings the database from one valid state to another, obeying all rules. Which option correctly maps each property to its description?

A.Atomicity → 3, Consistency → 4, Isolation → 1, Durability → 2

B.Atomicity → 4, Consistency → 3, Isolation → 2, Durability → 1

C.Atomicity → 2, Consistency → 1, Isolation → 3, Durability → 4

D.Atomicity → 1, Consistency → 2, Isolation → 4, Durability → 3

AnswerA

This is the correct mapping of ACID properties to their standard definitions.

Why this answer

Option A is correct because it accurately maps each ACID property to its definition. Atomicity ensures a transaction is all-or-nothing (3), Consistency guarantees the database moves from one valid state to another (4), Isolation makes concurrent transactions appear serial (1), and Durability ensures committed changes persist even after a failure (2). These are the standard definitions used in Azure SQL Database and other relational database systems.

Exam trap

The trap here is that candidates confuse the definitions of Consistency and Atomicity, often thinking Consistency means 'all-or-nothing' rather than 'valid state transitions,' or they swap Isolation with Durability by misremembering the 'permanent save' concept.

How to eliminate wrong answers

Option B is wrong because it swaps Atomicity and Consistency: Atomicity is about all-or-nothing execution, not bringing the database to a valid state (which is Consistency). Option C is wrong because it assigns Durability to 'transactions appear to execute one after the other' (Isolation) and Atomicity to 'changes are permanently saved' (Durability), completely inverting the properties. Option D is wrong because it maps Atomicity to 'transactions appear to execute one after the other' (Isolation) and Isolation to 'brings the database from one valid state to another' (Consistency), mixing up the core definitions.

Practice this question →

135

MCQmedium

You design a data solution for an e-commerce platform. Transactional data must be stored with ACID compliance for order processing, while clickstream data from the website will be used for analytics. Which combination of Azure data services best meets these needs?

A.Azure Cosmos DB for transactions; Azure SQL Database for analytics

B.Azure SQL Database for transactions; Azure Synapse Analytics for analytics

C.Azure Blob Storage for transactions; Azure Data Lake Storage for analytics

D.Azure Database for MySQL for transactions; Azure Analysis Services for analytics

AnswerB

Azure SQL Database is ACID-compliant; Synapse Analytics is for big data analytics.

Why this answer

Azure SQL Database provides full ACID compliance for transactional workloads like order processing, ensuring data integrity. Azure Synapse Analytics is optimized for large-scale analytics on clickstream data, offering massively parallel processing (MPP) and integration with data lakes. This combination separates OLTP and OLAP workloads efficiently.

Exam trap

The trap here is that candidates often assume Azure Cosmos DB (Option A) is ACID-compliant because it supports multi-document transactions within a single partition, but it does not guarantee full ACID across partitions, making it unsuitable for strict order processing.

How to eliminate wrong answers

Option A is wrong because Azure Cosmos DB is a NoSQL database that offers configurable consistency levels (not full ACID across all operations) and is not ideal for strict ACID-compliant order processing; Azure SQL Database is transactional but not optimized for large-scale analytics like Synapse. Option C is wrong because Azure Blob Storage is an object store with no ACID transaction support (it offers eventual consistency for blobs) and is unsuitable for order processing; Azure Data Lake Storage is for raw data storage, not interactive analytics. Option D is wrong because Azure Database for MySQL provides ACID compliance but Azure Analysis Services is a semantic modeling layer (not a scalable analytics engine) and lacks the MPP capabilities needed for clickstream analytics.

Practice this question →

136

MCQeasy

A healthcare company stores patient records in a relational database with fixed columns (PatientID, Name, DOB, BloodType). Medical images such as X-rays are stored as DICOM files. Clinical notes are stored as free-text documents. Which of the following correctly classifies these data types from most structured to least structured?

A.Patient records (structured), DICOM files (structured), Clinical notes (unstructured)

B.Patient records (structured), DICOM files (semi-structured), Clinical notes (unstructured)

C.Patient records (semi-structured), DICOM files (unstructured), Clinical notes (structured)

D.Patient records (unstructured), DICOM files (semi-structured), Clinical notes (structured)

AnswerB

Correct. Patient records are structured because they reside in a relational table with fixed columns. DICOM files have a standard format with metadata tags, making them semi-structured. Clinical notes as free text are unstructured.

Why this answer

Patient records in a fixed-column relational database are structured data because they conform to a rigid schema with defined data types. DICOM files are semi-structured because they contain a structured header with metadata tags (e.g., patient ID, study date) alongside an unstructured binary image payload. Clinical notes as free-text documents are unstructured because they lack a predefined schema or organization, making them difficult to query without natural language processing.

Exam trap

The trap here is that candidates often misclassify DICOM files as fully structured due to their standardized header, overlooking the unstructured binary image payload that makes them semi-structured.

How to eliminate wrong answers

Option A is wrong because DICOM files are not fully structured; they have a structured header but also contain unstructured binary image data, making them semi-structured. Option C is wrong because patient records in a relational database are structured, not semi-structured, and clinical notes are unstructured, not structured. Option D is wrong because patient records are structured, not unstructured, and DICOM files are semi-structured, not semi-structured in the way described; the entire classification is reversed.

Practice this question →

137

MCQmedium

An e-commerce company runs a data pipeline that reads all orders from the previous hour, aggregates total sales per product category, and writes the results to a reporting database. The pipeline executes at the start of every hour. Which type of data processing workload does this pipeline represent?

A.Batch processing

B.Stream processing

C.Transactional processing

D.Interactive processing

AnswerA

The pipeline processes a batch of data (hourly orders) on a schedule, which is batch processing.

Why this answer

This pipeline reads all orders from the previous hour, aggregates total sales per product category, and writes results to a reporting database at the start of every hour. This is a classic batch processing workload because data is collected over a fixed time window (one hour) and processed as a single, scheduled job, not continuously. Batch processing is ideal for non-real-time, high-volume data transformations like hourly sales aggregation.

Exam trap

The trap here is that candidates confuse scheduled batch processing with stream processing because both can handle time-windowed aggregations, but batch processes data in discrete, scheduled chunks while stream processes data continuously as it arrives.

How to eliminate wrong answers

Option B is wrong because stream processing handles data in real-time or near-real-time as it arrives, not on a fixed hourly schedule. Option C is wrong because transactional processing (OLTP) focuses on individual, atomic transactions (e.g., placing an order) and does not involve aggregating data over a time window. Option D is wrong because interactive processing involves user-driven queries or operations that return results immediately, not scheduled batch jobs.

Practice this question →

138

MCQeasy

A company wants to run complex analytics queries across petabytes of data stored in Azure Data Lake Storage. They need a serverless option that supports T-SQL. Which Azure service should they use?

A.Azure SQL Database serverless

B.Azure Analysis Services

C.Azure Databricks

D.Azure Synapse Serverless SQL pool

AnswerD

Serverless SQL pool provides T-SQL interface over data in Data Lake Storage, with pay-per-query pricing.

Why this answer

Azure Synapse Serverless SQL pool is the correct choice because it provides a serverless, on-demand query service that allows you to run T-SQL queries directly against data stored in Azure Data Lake Storage (ADLS). It supports complex analytics over petabytes of data without provisioning any infrastructure, and it uses T-SQL as the query language, meeting all the stated requirements.

Exam trap

The trap here is that candidates often confuse 'serverless' with 'Azure SQL Database serverless' (Option A) because of the name, but fail to recognize that Azure SQL Database serverless is a transactional database, not a data lake query engine, and does not support querying external storage like ADLS with T-SQL.

How to eliminate wrong answers

Option A is wrong because Azure SQL Database serverless is a serverless compute tier for a relational database, but it is designed for transactional workloads and does not natively query data stored in Azure Data Lake Storage; it requires data to be loaded into the database first. Option B is wrong because Azure Analysis Services is a fully managed platform as a service (PaaS) that provides enterprise-grade data modeling and semantic layers, but it does not support direct T-SQL queries against ADLS; it uses DAX or MDX and requires data to be imported or queried via a gateway. Option C is wrong because Azure Databricks is an Apache Spark-based analytics platform that supports SQL queries via Spark SQL, but it does not use T-SQL; it uses Spark SQL syntax and requires a cluster to be running, even if auto-terminating, making it not a true serverless T-SQL option.

Practice this question →

139

Multi-Selecteasy

A car manufacturing company has two data processing systems: one system processes real-time sensor data from assembly lines to immediately detect equipment failures, and another system processes historical production records to generate monthly efficiency reports. Which two types of data processing workloads best describe these systems?

Select 2 answers

A.Stream processing and batch processing

B.OLTP and OLAP

C.Online processing and offline processing

D.Transactional processing and analytical processing

AnswersA, D

Correct. Real-time sensor analysis is stream processing; historical reports are batch processing.

Why this answer

Stream processing handles real-time sensor data to detect equipment failures immediately, as it processes data continuously with low latency. Batch processing is ideal for historical production records to generate monthly efficiency reports, as it processes large volumes of data at scheduled intervals. These two workloads directly match the definitions of stream and batch processing in Azure data services like Azure Stream Analytics and Azure Synapse Analytics.

Exam trap

Microsoft often tests the distinction between stream/batch and OLTP/OLAP by making candidates confuse real-time transaction processing (OLTP) with real-time stream processing, but OLTP is for individual record updates, not continuous sensor data streams.

Practice this question →

140

MCQmedium

You are designing a data pipeline that ingests sales transactions from an on-premises SQL Server database into Azure Synapse Analytics for reporting. The data must be processed incrementally every hour with minimal latency. Which Azure service should you use to orchestrate the pipeline?

A.Azure Logic Apps

B.Azure Databricks

C.Azure Functions

D.Azure Data Factory

AnswerD

Azure Data Factory is purpose-built for ETL and data orchestration, supporting incremental loads from on-premises.

Why this answer

Azure Data Factory (ADF) is the correct choice because it is a cloud-based ETL and data orchestration service designed specifically for building complex, schedule-driven pipelines. It natively supports incremental data loading from on-premises SQL Server via self-hosted integration runtime, and can trigger pipelines on an hourly schedule with minimal latency, making it ideal for this scenario.

Exam trap

The trap here is that candidates confuse orchestration services with compute or processing services, assuming Azure Databricks or Azure Functions can handle scheduling and data movement, when in fact Azure Data Factory is the dedicated PaaS orchestrator for such pipelines.

How to eliminate wrong answers

Option A is wrong because Azure Logic Apps is a workflow automation service for integrating apps and services, not designed for heavy data movement or complex ETL orchestration; it lacks native support for self-hosted integration runtime and incremental data loading from on-premises databases. Option B is wrong because Azure Databricks is an Apache Spark-based analytics platform for big data processing and machine learning, not a pipeline orchestration service; while it can process data, it requires additional tooling for scheduling and orchestration. Option C is wrong because Azure Functions is a serverless compute service for running event-driven code, not a data pipeline orchestrator; it lacks built-in connectors for on-premises SQL Server and does not provide scheduling or monitoring capabilities for complex data movement.

Practice this question →

141

MCQeasy

A manufacturing company collects sensor data from equipment on the factory floor. The data is generated continuously and must be processed immediately to detect anomalies and trigger alerts. Which type of data processing workload best describes this scenario?

A.Batch processing

B.Stream processing

C.Transactional processing

D.Analytical processing

AnswerB

Stream processes data in real time as it arrives, making it suitable for scenarios requiring immediate alerts and actions.

Why this answer

B is correct because the scenario requires continuous data ingestion and immediate processing to detect anomalies and trigger alerts, which is the defining characteristic of stream processing. Technologies like Azure Stream Analytics or Apache Kafka are designed to handle unbounded data streams with low-latency processing, unlike batch processing which operates on static datasets at scheduled intervals.

Exam trap

The trap here is that candidates confuse 'stream processing' with 'batch processing' because both can involve large volumes of data, but the key differentiator is the requirement for immediate, continuous processing versus scheduled, deferred processing.

How to eliminate wrong answers

Option A is wrong because batch processing processes data in large, discrete chunks at scheduled times, which cannot meet the 'immediately' requirement for real-time anomaly detection. Option C is wrong because transactional processing focuses on ACID-compliant operations for individual transactions (e.g., order entry), not continuous sensor data streams. Option D is wrong because analytical processing typically involves historical data aggregation and reporting (e.g., OLAP cubes), not real-time event-driven alerting.

Practice this question →

142

MCQmedium

A database system must ensure that when a transfer of funds between two accounts is processed, if the system crashes after debiting the first account but before crediting the second, the database automatically undoes the debit. This property is best described as:

A.Atomicity

B.Consistency

C.Isolation

D.Durability

AnswerA

Atomicity ensures that all operations in a transaction complete or none do; a crash triggers an automatic rollback, undoing the partial debit.

Why this answer

Atomicity ensures that a transaction is treated as a single, indivisible unit of work. If the system crashes after debiting one account but before crediting the other, the database's transaction log records the partial changes, and during recovery, the database engine (e.g., SQL Server's ARIES recovery model) performs an automatic rollback of the uncommitted transaction, undoing the debit to maintain atomicity.

Exam trap

The trap here is that candidates confuse atomicity with consistency, thinking that maintaining a correct total balance (consistency) is what undoes the debit, but atomicity is the property that specifically handles the rollback of incomplete transactions after a crash.

How to eliminate wrong answers

Option B is wrong because consistency ensures that a transaction brings the database from one valid state to another, enforcing integrity constraints (e.g., total balance remains constant), but it does not inherently handle crash recovery or undo partial changes. Option C is wrong because isolation controls how concurrent transactions interact (e.g., via locking or snapshot isolation), preventing dirty reads or lost updates, but it does not address crash recovery or rollback of incomplete transactions. Option D is wrong because durability guarantees that once a transaction is committed, its changes persist even after a crash (e.g., via write-ahead logging), but it does not undo uncommitted changes; durability applies only to committed transactions.

Practice this question →

143

MCQeasy

A marketing company collects data from social media feeds including text posts, images, and videos. The data arrives in various formats with no fixed structure or schema. This type of data is best described as:

A.A) Structured data

B.B) Semi-structured data

C.C) Unstructured data

D.D) Relational data

AnswerC

Unstructured data has no predefined schema and includes free-form text, images, videos, etc. Social media feeds typically contain this type of data.

Why this answer

Unstructured data lacks a predefined data model or schema, making it ideal for storing text posts, images, and videos that arrive in varied formats. Unlike structured or semi-structured data, unstructured data cannot be easily organized into rows and columns or parsed with tags, which is why option C is correct for this scenario.

Exam trap

The trap here is that candidates confuse semi-structured data (e.g., JSON with tags) with unstructured data, but the key differentiator is the complete absence of any schema or metadata markers in the described social media feeds.

How to eliminate wrong answers

Option A is wrong because structured data requires a fixed schema with rows and columns (e.g., a SQL table), which does not apply to free-form text, images, or videos. Option B is wrong because semi-structured data has some organizational properties like tags or key-value pairs (e.g., JSON, XML), but the data described has no fixed structure or schema at all. Option D is wrong because relational data is a subset of structured data stored in tables with defined relationships, which is not the case for heterogeneous social media feeds.

Practice this question →

144

MCQeasy

A company collects data from three sources: Source A: Customer records from a relational database with fixed columns (CustomerID, Name, Address). Source B: Social media posts in JSON format with varying fields (e.g., some posts have 'likes', others have 'shares'). Source C: Handwritten notes saved as scanned images in TIFF format. Which statement correctly categorizes the data by structure?

A.Source A: Structured, Source B: Semi-structured, Source C: Unstructured

B.Source A: Structured, Source B: Unstructured, Source C: Semi-structured

C.Source A: Semi-structured, Source B: Structured, Source C: Unstructured

D.Source A: Semi-structured, Source B: Unstructured, Source C: Structured

AnswerA

This correctly identifies structured data (customer records with fixed columns), semi-structured data (JSON with variable fields), and unstructured data (images with no inherent structure).

Why this answer

Source A's relational database with fixed columns (CustomerID, Name, Address) enforces a strict schema, making it structured data. Source B's JSON format allows varying fields like 'likes' or 'shares' per record, which is the hallmark of semi-structured data (self-describing, schema-on-read). Source C's scanned TIFF images are binary blobs with no inherent internal structure for querying, classifying them as unstructured data.

This matches the standard DP-900 categorization: structured (fixed schema), semi-structured (flexible schema), unstructured (no schema).

Exam trap

Microsoft often tests the misconception that 'JSON is unstructured because it looks like text' or that 'scanned images are semi-structured because they have metadata,' but the DP-900 definition hinges on whether the data has a fixed schema (structured), flexible schema (semi-structured), or no schema (unstructured).

How to eliminate wrong answers

Option B is wrong because it misclassifies Source B (JSON with varying fields) as unstructured, but JSON is the classic example of semi-structured data due to its key-value pairs and flexible schema. Option C is wrong because it labels Source A (relational database with fixed columns) as semi-structured, but relational databases enforce a rigid schema (rows and columns) that defines structured data. Option D is wrong because it calls Source A semi-structured (should be structured) and Source C structured (should be unstructured), completely reversing the correct categorization.

Practice this question →

145

MCQeasy

Your team is migrating a data warehouse to Azure Synapse Analytics. You need to ensure that the data model supports both historical trend analysis and current-day reporting with minimal storage redundancy. Which table design pattern should you use?

A.Single flat table containing all attributes

B.Wide table with repeated customer attributes per order

C.Highly normalized design with many tables

D.Star schema with dimension and fact tables

AnswerD

Star schema is the standard for data warehousing, enabling efficient queries and reducing storage redundancy.

Why this answer

The star schema is the correct choice because it separates business processes into fact tables (for measures like sales quantities) and dimension tables (for descriptive attributes like customer or date). This design directly supports both historical trend analysis (by joining facts with the date dimension) and current-day reporting (by filtering on the latest date) while minimizing storage redundancy through normalized dimensions. Azure Synapse Analytics is optimized for star schemas, leveraging columnstore indexes and distributed tables to accelerate such queries.

Exam trap

The trap here is that candidates often confuse 'normalization' (Option C) with data warehouse best practices, not realizing that star schemas intentionally denormalize dimensions to optimize for read-heavy analytical queries, while highly normalized designs are better suited for OLTP systems, not Azure Synapse Analytics.

How to eliminate wrong answers

Option A is wrong because a single flat table containing all attributes would cause massive data duplication and poor query performance, as every row repeats customer and product details for each order, leading to high storage costs and slow analytical scans. Option B is wrong because a wide table with repeated customer attributes per order introduces significant redundancy and update anomalies, making it inefficient for both historical analysis and current reporting, and it contradicts the goal of minimal storage redundancy. Option C is wrong because a highly normalized design with many tables (e.g., 3NF) requires complex joins across numerous tables, which degrades query performance in a data warehouse context and is not optimized for the analytical workloads that Synapse is designed for.

Practice this question →

146

MCQeasy

Which classification of data describes information that has a fixed schema and is organized into rows and columns, such as data found in a relational database table?

A.Unstructured data

B.Semi-structured data

C.Structured data

D.Transformed data

AnswerC

Structured data conforms to a fixed schema, typically in tables with rows and columns. This is the standard format for relational database systems.

Why this answer

Structured data is defined by a fixed schema, where each data element adheres to a predefined data type and relationship, organized into rows and columns. This is the fundamental model of a relational database table, such as those in Azure SQL Database or SQL Server, where constraints like primary keys and foreign keys enforce the schema.

Exam trap

Microsoft often tests the distinction between structured and semi-structured data, where candidates mistakenly classify JSON or XML as structured because it has some organization, but the key differentiator is the rigid, predefined schema enforced by the database, not just the presence of tags or keys.

How to eliminate wrong answers

Option A is wrong because unstructured data has no predefined schema or organization, such as text files, images, or videos, and cannot be stored directly in rows and columns. Option B is wrong because semi-structured data has some organizational properties (like tags or key-value pairs) but does not enforce a rigid schema; examples include JSON or XML files, which are not strictly row-and-column. Option D is wrong because 'transformed data' is not a classification of data by structure; it refers to data that has been processed or altered from its original form, such as through ETL operations, and does not describe a schema-based organization.

Practice this question →

147

MCQhard

A healthcare organization needs to store patient records that must be immutable and cannot be modified or deleted for 7 years due to regulatory compliance. Which Azure feature should they use?

A.Microsoft Purview

B.Azure Policy

C.Azure Blob Storage immutable storage

D.Microsoft Defender for Cloud

AnswerC

Provides WORM (write once, read many) capability for compliance.

Why this answer

Azure Blob Storage immutable storage is correct because it provides WORM (Write Once, Read Many) capabilities that prevent data from being modified or deleted for a specified retention period. This directly meets the regulatory requirement for patient records to remain immutable for 7 years, as the policy is enforced at the storage level and cannot be overridden by any user, including administrators.

Exam trap

The trap here is that candidates confuse Azure Policy (which enforces resource-level compliance rules) with data-level immutability, but Azure Policy cannot prevent data modification within a blob—only Azure Blob Storage immutable storage provides that guarantee.

How to eliminate wrong answers

Option A is wrong because Microsoft Purview is a data governance and catalog service for discovering and classifying data, not a storage-level immutability enforcement mechanism. Option B is wrong because Azure Policy enforces organizational rules and compliance across Azure resources (e.g., restricting resource locations), but it cannot prevent modification or deletion of data within a storage blob. Option D is wrong because Microsoft Defender for Cloud is a security posture management and threat protection service, not a data immutability feature.

Practice this question →

148

MCQmedium

A retail company wants to run real-time analytics on streaming clickstream data from their website. Which Azure service should they use to ingest and process the data?

A.Azure Analysis Services

B.Azure Data Lake Storage

C.Azure SQL Database

D.Azure Stream Analytics

AnswerD

Real-time stream processing service that can ingest and analyze streaming data.

Why this answer

Azure Stream Analytics is a real-time analytics and event-processing engine designed to ingest, process, and analyze high-velocity streaming data, such as clickstream data from a website. It can directly consume data from Azure Event Hubs or IoT Hub and output results to sinks like Power BI, Azure SQL Database, or Azure Data Lake Storage, making it the correct choice for real-time analytics on streaming data.

Exam trap

The trap here is that candidates often confuse Azure Stream Analytics with Azure SQL Database or Azure Data Lake Storage, mistakenly thinking a traditional database or storage service can handle real-time streaming ingestion and processing, when in fact they lack the necessary low-latency, event-driven architecture.

How to eliminate wrong answers

Option A is wrong because Azure Analysis Services is an OLAP engine for creating semantic models and running ad-hoc analytical queries on pre-processed data, not for ingesting or processing real-time streaming data. Option B is wrong because Azure Data Lake Storage is a scalable and secure data lake for storing large volumes of raw or processed data, but it does not provide real-time stream ingestion or processing capabilities. Option C is wrong because Azure SQL Database is a relational database service for storing and querying structured data, not designed for high-throughput, low-latency stream ingestion or real-time event processing.

Practice this question →

149

MCQeasy

A healthcare organization stores patient medical records in a relational database with columns such as PatientID, Name, and DateOfBirth. They also store radiology images as DICOM files in Azure Blob Storage. Which statement correctly classifies these data types?

A.Both patient records and radiology images are structured data.

B.Patient records are semi-structured, and radiology images are unstructured.

C.Patient records are structured, and radiology images are unstructured.

D.Patient records are unstructured, and radiology images are semi-structured.

AnswerC

Patient records have fixed columns and data types (structured), while DICOM files are binary with no queryable schema (unstructured).

Why this answer

Patient records in a relational database with fixed columns like PatientID, Name, and DateOfBirth adhere to a predefined schema, making them structured data. Radiology images stored as DICOM files in Azure Blob Storage have no internal schema or tabular format and are therefore unstructured data. Option C correctly matches these classifications.

Exam trap

The trap here is conflating 'semi-structured' with 'structured' or 'unstructured'—candidates often misclassify relational database records as semi-structured because they have multiple columns, but the key is the rigid schema enforced by the relational model.

How to eliminate wrong answers

Option A is wrong because it incorrectly classifies radiology images as structured data; DICOM files in blob storage have no fixed schema or relational structure. Option B is wrong because patient records in a relational database are structured, not semi-structured; semi-structured data (e.g., JSON or XML) has tags or markers but no rigid schema. Option D is wrong because patient records are structured, not unstructured, and radiology images are unstructured, not semi-structured.

Practice this question →

150

MCQmedium

A data analyst needs to combine sales data from Azure SQL Database and inventory data from Azure Cosmos DB into a single Power BI report. Which Power BI feature should they use?

A.Power Query

B.Power BI Desktop

C.DAX formulas

D.Dataflows

AnswerA

Power Query enables connecting to and merging data from multiple sources.

Why this answer

Power Query is the correct feature because it is the data connection and transformation engine in Power BI that allows you to connect to multiple data sources—such as Azure SQL Database and Azure Cosmos DB—and combine them into a single dataset for reporting. It provides a graphical interface to merge, append, and shape data from disparate sources before loading it into the data model, which is exactly what the analyst needs to do.

Exam trap

The trap here is that candidates often confuse the tool (Power BI Desktop) with the feature (Power Query), or they mistakenly think DAX is used for data integration, when in fact DAX operates only on data already in the model, not on source connections.

How to eliminate wrong answers

Option B (Power BI Desktop) is wrong because Power BI Desktop is the application that hosts Power Query, not the specific feature for combining data from multiple sources; it is the environment where the report is built, not the tool for data integration. Option C (DAX formulas) is wrong because DAX (Data Analysis Expressions) is used for creating calculated columns, measures, and custom aggregations within the data model after data is loaded, not for connecting to or combining data from different source systems. Option D (Dataflows) is wrong because Dataflows are a cloud-based ETL tool for preparing and reusing data across workspaces, but they are not the direct feature used within a single Power BI Desktop report to combine live connections from Azure SQL Database and Azure Cosmos DB; Power Query is the immediate tool for that task.

Practice this question →