Knowledge + Practice

CCNA Core Data Concepts Questions

75 of 267 questions · Page 1/4 · Core Data Concepts topic · Answers revealed

Practice these questions Exam hub All questions

1

MCQeasy

A retail company stores three types of customer data: (1) a table with columns for CustomerID, Name, and Email; (2) product reviews as JSON documents with varying fields such as rating and comment; (3) product demonstration videos stored in MP4 format. Which of the following correctly classifies these data types in order from first to third?

A.Structured, unstructured, semi-structured

B.Semi-structured, structured, unstructured

C.Structured, semi-structured, unstructured

D.Unstructured, semi-structured, structured

AnswerC

Correct. The table is structured (fixed schema), JSON documents are semi-structured (flexible schema), and videos are unstructured (no schema).

Why this answer

Option C is correct because the customer table with fixed columns (CustomerID, Name, Email) is structured data, product reviews as JSON documents with varying fields are semi-structured data (schema-on-read, flexible fields), and MP4 video files are unstructured data (no schema, binary format). This ordering matches the standard classification in Azure Data Fundamentals: structured → semi-structured → unstructured.

Exam trap

The trap here is that candidates often confuse semi-structured data (like JSON) with unstructured data, or assume all non-tabular data is unstructured, when in fact JSON's key-value pairs with varying fields make it semi-structured.

How to eliminate wrong answers

Option A is wrong because it incorrectly places unstructured before semi-structured; product reviews as JSON are semi-structured, not unstructured, and MP4 videos are unstructured, not semi-structured. Option B is wrong because it starts with semi-structured for the customer table, which is clearly structured with a fixed schema; it also misorders the remaining types. Option D is wrong because it begins with unstructured for the customer table, which is structured, and then places semi-structured before structured, reversing the correct order.

Practice this question →

2

MCQhard

Refer to the exhibit. You create an Azure Policy to deny virtual machines that are not using approved SKUs. After assigning the policy to a subscription, a user tries to create a VM with SKU 'Standard_DS2_v2' and the creation is allowed. What is the most likely reason?

A.The policy is not assigned to the resource group where the VM is created.

B.The field type 'Microsoft.Compute/virtualMachines' is incorrect.

C.The alias 'Microsoft.Compute/virtualMachines/sku.name' is incorrect; the correct alias is 'Microsoft.Compute/virtualMachines/sku'.

D.The policy rule does not specify a deny effect.

AnswerC

The correct alias for VM SKU is 'Microsoft.Compute/virtualMachines/sku.name', but it is case-sensitive and must match the exact property path; however, the exhibit uses 'sku.name' which is correct. Actually the issue is that the property is 'hardwareProfile.vmSize', not 'sku.name'. So the alias is wrong.

Why this answer

Option C is correct because the alias 'Microsoft.Compute/virtualMachines/sku.name' is invalid; the correct alias for the SKU property of a virtual machine is 'Microsoft.Compute/virtualMachines/sku'. Azure Policy aliases must exactly match the ARM resource property path. Using an incorrect alias means the policy rule never evaluates the intended property, so the deny effect never triggers, allowing any SKU to be created.

Exam trap

The trap here is that candidates assume the alias must include the property's child field (like 'name') because they think of the SKU as an object with sub-properties, but Azure Policy aliases for simple string values do not include child fields.

How to eliminate wrong answers

Option A is wrong because the policy is assigned to the subscription, which covers all resource groups within that subscription by default; the VM creation is allowed due to a policy rule issue, not a scope issue. Option B is wrong because 'Microsoft.Compute/virtualMachines' is the correct resource type for Azure virtual machines; the field type is not the cause of the policy not enforcing. Option D is wrong because the question states the policy is created to 'deny' VMs, and the exhibit would show a deny effect; if the effect were missing, the policy would not deny, but the core problem is the incorrect alias preventing the condition from matching.

Practice this question →

3

MCQeasy

A logistics company collects data from fleet sensors. Each sensor sends a JSON message containing the vehicle ID, timestamp, and a variable set of measurements such as engine temperature, tire pressure, and fuel level. The structure of the JSON message differs between sensor types and sometimes includes optional fields. How should this data be classified?

A.Structured data

B.Semi-structured data

C.Unstructured data

D.Relational data

AnswerB

Correct. JSON data with optional fields and varying structure is a classic example of semi-structured data, which uses tags or keys to organize data without a rigid schema.

Why this answer

The data is semi-structured because it conforms to a schema (JSON format with fields like vehicle ID and timestamp) but allows variability in structure, such as optional fields and different sets of measurements per sensor type. This flexibility is a hallmark of semi-structured data, which does not require a rigid tabular schema like structured data but still contains tags or markers to separate data elements.

Exam trap

The trap here is that candidates see 'JSON' and assume it is structured data because JSON has keys and values, but they miss that the variable and optional fields make it semi-structured, not strictly structured.

How to eliminate wrong answers

Option A is wrong because structured data requires a fixed schema with consistent fields and data types, typically stored in relational tables, whereas the JSON messages here have variable and optional fields. Option C is wrong because unstructured data has no predefined structure or schema, such as raw video or text files, but these JSON messages have a defined format with key-value pairs. Option D is wrong because relational data is a subset of structured data that is organized into tables with rows and columns and enforces relationships via foreign keys, which does not apply to the flexible JSON messages.

Practice this question →

4

MCQhard

A company uses Azure Data Lake Storage Gen2 for a data lake. They implement a folder structure with access control lists (ACLs). A new data scientist needs to read data from a specific folder but not write to it. Which ACL permission should be assigned?

A.Execute

B.Modify

C.Write

D.Read

AnswerA

Execute on a folder allows traversing; combined with Read on files allows reading data.

Why this answer

Execute (X) permission on a folder in Azure Data Lake Storage Gen2 is required to traverse the folder and access its contents. Without Execute, a user cannot list or read files inside the folder, even if Read permission is granted. Since the data scientist only needs to read data (not write), assigning Execute on the folder and Read on the files allows traversal and read access without write capability.

Exam trap

The trap here is that candidates often assume Read permission on a folder is sufficient to read its contents, but without Execute permission, the folder cannot be traversed, making the data inaccessible.

How to eliminate wrong answers

Option B (Modify) is wrong because Modify includes Write and Delete permissions, which would allow the data scientist to create, update, or delete files in the folder, violating the requirement to prevent writes. Option C (Write) is wrong because Write permission allows creating and modifying files in the folder, which is explicitly not allowed. Option D (Read) is wrong because Read on a folder alone does not grant the ability to traverse the folder hierarchy; without Execute, the data scientist cannot list or access files within the folder, making Read ineffective for reading data.

Practice this question →

5

MCQeasy

A company stores customer information in a SQL database with fixed columns (CustomerID, Name, Email). They also store scanned PDF contracts and product images in a file storage system. Which statement correctly describes the types of data mentioned?

A.Both the customer information and the files are structured data.

B.The customer information is semi-structured, and the files are unstructured.

C.The customer information is structured, and the files are unstructured.

D.Both the customer information and the files are unstructured.

AnswerC

Correct. Customer information in a SQL table with a fixed schema is structured data. PDFs and images lack a predefined schema, making them unstructured.

Why this answer

Customer information stored in fixed columns (CustomerID, Name, Email) follows a strict schema with defined data types and relationships, making it structured data. Scanned PDF contracts and product images are binary files with no inherent schema or organization, fitting the definition of unstructured data. Option C correctly pairs these classifications.

Exam trap

The trap here is that candidates confuse 'semi-structured' (e.g., JSON with flexible fields) with structured data (fixed schema), or assume all digital files are structured because they have metadata, ignoring the lack of a predefined schema in the content itself.

How to eliminate wrong answers

Option A is wrong because it incorrectly classifies the files as structured data; scanned PDFs and images are binary blobs without a predefined schema. Option B is wrong because it mislabels the customer information as semi-structured; fixed-column SQL tables with strict schemas are structured, not semi-structured (which would use flexible formats like JSON or XML). Option D is wrong because it incorrectly classifies the customer information as unstructured; the fixed-column SQL database enforces a rigid schema, making it structured data.

Practice this question →

6

MCQhard

Your organization uses Azure Data Lake Storage Gen2 as a data lake. You need to enforce data retention policies automatically, such as deleting files older than 90 days. Which Azure feature should you use?

A.Azure Policy

B.Azure Blob Storage lifecycle management

C.Azure Data Factory

D.Azure RBAC

AnswerB

Lifecycle management policies automate deletion or tiering of blobs based on age.

Why this answer

Azure Blob Storage lifecycle management allows you to define rules that automatically delete or tier blobs based on age. Since Azure Data Lake Storage Gen2 is built on top of Azure Blob Storage, you can use lifecycle management policies to delete files older than 90 days by setting a 'Delete blob' action with a 'daysAfterModificationGreaterThan' filter of 90.

Exam trap

The trap here is that candidates may confuse Azure Policy (which enforces rules on resource configurations) with data lifecycle management (which manages data within storage), or think Azure Data Factory is needed for scheduled deletion, when Azure Blob Storage lifecycle management is the native, policy-driven solution.

How to eliminate wrong answers

Option A is wrong because Azure Policy is used to enforce organizational standards and compliance by evaluating resource configurations (e.g., requiring encryption), not to manage data retention or automate deletion of files based on age. Option C is wrong because Azure Data Factory is an ETL and data orchestration service that can move or transform data, but it is not designed for automated, policy-based lifecycle management like deleting old files; you would need custom pipelines and triggers to mimic this, which is less efficient and not the intended use. Option D is wrong because Azure RBAC controls access permissions to resources (who can read/write/delete), not automated data retention or deletion based on time.

Practice this question →

7

MCQeasy

A retail company captures real-time clickstream data from its website. They need to store this data for immediate analysis using KQL. Which Azure service should they use?

A.Azure Stream Analytics

B.Azure Cosmos DB

C.Azure Data Explorer

D.Azure SQL Database

AnswerC

Azure Data Explorer is designed for real-time analytics on streaming data with KQL support.

Why this answer

Azure Data Explorer (ADX) is optimized for interactive analytics on large volumes of streaming and high-velocity data, supporting Kusto Query Language (KQL) for real-time queries. It ingests clickstream data with low latency and provides immediate analysis capabilities, making it the correct choice for this scenario.

Exam trap

The trap here is that candidates often confuse Azure Stream Analytics (a processing service) with Azure Data Explorer (a storage and query service), but the question specifically requires storing data for immediate KQL analysis, which Stream Analytics cannot do natively.

How to eliminate wrong answers

Option A is wrong because Azure Stream Analytics is a real-time stream processing engine that outputs to sinks like Azure Data Explorer or Power BI, but it does not natively support KQL for querying stored data. Option B is wrong because Azure Cosmos DB is a NoSQL database designed for transactional workloads with low-latency reads/writes, not for ad-hoc analytical queries using KQL. Option D is wrong because Azure SQL Database is a relational database optimized for OLTP and structured queries with T-SQL, not for high-velocity streaming data analysis with KQL.

Practice this question →

8

MCQeasy

A retail company processes historical sales data in a nightly batch job that loads aggregated reports into a data warehouse. Additionally, the company analyzes live customer interactions from their website to provide real-time product recommendations. Which pair of terms correctly describes these two data processing approaches?

A.OLTP and OLAP

B.Batch processing and streaming processing

C.Structured data and unstructured data

D.Relational and NoSQL

AnswerB

Batch processing handles data in fixed, scheduled intervals (nightly reports). Streaming processing handles data continuously as it arrives (real-time recommendations). This pair correctly describes the two approaches.

Why this answer

The nightly batch job that loads aggregated reports into a data warehouse is a classic example of batch processing, where data is processed in large, scheduled chunks. The real-time analysis of live customer interactions for product recommendations is streaming processing, which handles data continuously as it arrives. Option B correctly pairs these two distinct processing paradigms.

Exam trap

The trap here is that candidates confuse OLTP/OLAP (which describe transactional vs. analytical workloads) with processing methods (batch vs. streaming), leading them to incorrectly select Option A.

How to eliminate wrong answers

Option A is wrong because OLTP (Online Transaction Processing) is designed for high-volume, low-latency transactional operations (e.g., order entry), not for nightly batch reporting, and OLAP (Online Analytical Processing) is a storage/query architecture for analytics, not a processing approach. Option C is wrong because structured data (e.g., tables) and unstructured data (e.g., text) describe data formats, not processing methods like batch or streaming. Option D is wrong because relational and NoSQL refer to database types (schema-based vs. flexible schema), not to how data is processed over time.

Practice this question →

9

MCQeasy

A social media application allows users to post updates and like posts. After a user clicks the like button, the like count must update immediately and be exactly the same for all users viewing the post. Which data consistency model best fits this requirement?

A.Eventual consistency

B.Strong consistency

C.Session consistency

D.Bounded staleness consistency

AnswerB

Strong consistency guarantees that after a write, all reads return the most recent write. This is required for the like count to be immediately accurate for all viewers.

Why this answer

Strong consistency ensures that after a write operation (like clicking the like button) completes, any subsequent read operation returns the most recent write. This guarantees that all users viewing the post see the exact same, up-to-date like count immediately. This is required for the social media scenario where the like count must be identical for all viewers without any delay.

Exam trap

Microsoft often tests the misconception that 'eventual consistency' is acceptable for real-time updates, but the key differentiator here is the requirement for immediate and identical visibility for all users, which only strong consistency satisfies.

How to eliminate wrong answers

Option A is wrong because eventual consistency allows replicas to temporarily diverge, meaning some users might see an outdated like count for a period of time, which violates the requirement for immediate and identical updates. Option C is wrong because session consistency only guarantees monotonic reads and writes within a single user session; it does not ensure that all users across different sessions see the same updated count immediately. Option D is wrong because bounded staleness consistency permits a configurable time window or version lag before updates are visible to all readers, which would not meet the requirement for an instant, identical view for all users.

Practice this question →

10

MCQeasy

A marketing team needs to analyze customer sentiment from social media posts in real time. The solution must ingest a stream of tweets, perform sentiment analysis using a pre-built AI model, and store the results in a dashboard for immediate visualization. The team has limited coding experience and prefers a low-code/no-code approach. Which combination of Azure services should you recommend?

A.Azure Event Hubs, Azure Functions, and Azure Cosmos DB

B.Azure IoT Hub, Azure Data Factory, and Power BI

C.Azure Event Hubs, Azure Stream Analytics, and Power BI

D.Azure Event Hubs, Azure HDInsight, and Power BI

AnswerC

Low-code real-time analytics with built-in sentiment analysis and Power BI dashboard.

Why this answer

Option C is correct because Azure Event Hubs ingests the real-time tweet stream, Azure Stream Analytics performs sentiment analysis using its built-in machine learning functions (a low-code/no-code approach), and Power BI provides the dashboard for immediate visualization. This combination meets the real-time, low-code requirement without custom coding.

Exam trap

The trap here is that candidates may choose Azure Functions (Option A) thinking it's serverless and low-code, but it actually requires writing code for sentiment analysis, whereas Azure Stream Analytics provides a true low-code/no-code solution with built-in ML capabilities.

How to eliminate wrong answers

Option A is wrong because Azure Functions requires custom code to implement sentiment analysis, which violates the low-code/no-code preference, and Azure Cosmos DB is a NoSQL database not optimized for real-time dashboarding. Option B is wrong because Azure IoT Hub is designed for IoT device telemetry, not social media streams, and Azure Data Factory is a batch-oriented ETL service, not suitable for real-time stream processing. Option D is wrong because Azure HDInsight is a big data analytics service that requires coding (e.g., Spark, Hive) and is overkill for simple sentiment analysis, contradicting the low-code/no-code requirement.

Practice this question →

11

MCQhard

An e-commerce company uses Azure Cosmos DB for its product catalog. They need to ensure that read requests are served from the nearest Azure region to reduce latency. Which feature should they use?

A.Azure Front Door

B.Cosmos DB multi-region writes

C.Azure Content Delivery Network

D.Microsoft Traffic Manager

AnswerB

Allows reads from the nearest region with automatic routing.

Why this answer

Cosmos DB multi-region writes (correctly referred to as multi-region reads in this context) allows you to configure your database account to be read from multiple Azure regions, enabling the SDK to automatically route read requests to the nearest region based on the client's location. This reduces latency by serving reads from a local replica without requiring a separate global load-balancing service.

Exam trap

The trap here is that candidates often confuse Azure Front Door or Traffic Manager as the solution for global read routing, but Cosmos DB's native multi-region read capability is the correct answer because it operates at the database SDK level with automatic region awareness and consistency support.

How to eliminate wrong answers

Option A is wrong because Azure Front Door is a global HTTP/HTTPS load balancer and application accelerator, not a database-level feature; it operates at the application layer and cannot directly serve Cosmos DB read requests from the nearest region without additional configuration. Option C is wrong because Azure Content Delivery Network (CDN) caches static content (e.g., images, videos) at edge nodes, not dynamic database queries; it cannot cache or serve Cosmos DB document reads. Option D is wrong because Microsoft Traffic Manager is a DNS-based traffic routing service that directs traffic at the domain level, but it does not integrate with Cosmos DB's SDK to provide automatic, region-aware read routing with session consistency.

Practice this question →

12

Multi-Selecthard

A global e-commerce platform uses a distributed database for its shopping cart service. The platform must be highly available and continue to accept writes even if network partitions occur between data centers. The business accepts that during a partition, users might see slightly outdated inventory counts, but the service must remain operational. According to the CAP theorem, which two properties is this system prioritizing?

Select 2 answers

A.Consistency and Partition Tolerance

B.Availability and Partition Tolerance

C.Consistency and Availability

D.Durability and Partition Tolerance

AnswersB, C

The system is designed to remain available (accept writes) even during network partitions, sacrificing immediate consistency (stale data is acceptable). This is a classic 'AP' system under the CAP theorem.

Why this answer

The scenario describes a system that must remain operational and accept writes during network partitions, even if data becomes temporarily inconsistent (stale inventory counts). This prioritizes Availability (the service stays up and accepts writes) and Partition Tolerance (the system continues to function despite network splits). According to the CAP theorem, when a partition occurs, a distributed system must choose between Consistency and Availability; here, the business accepts eventual consistency, so Availability and Partition Tolerance are the chosen properties.

Exam trap

The trap here is that candidates often assume 'highly available' automatically means 'Consistency and Availability' (CA), forgetting that the CAP theorem states you cannot have all three during a partition, and the scenario explicitly allows stale data, which sacrifices Consistency for Availability.

Practice this question →

13

MCQeasy

A company needs to store JSON documents that are frequently updated by multiple services. The solution must support indexing and querying by any property. Which Azure data service should they use?

A.Azure Blob Storage

B.Azure SQL Database

C.Azure Cosmos DB

D.Azure Table Storage

AnswerC

Azure Cosmos DB is a globally distributed NoSQL database that stores JSON documents, automatically indexes all properties, and supports SQL-like queries.

Why this answer

Azure Cosmos DB is a fully managed NoSQL database designed for JSON documents, offering native support for indexing every property automatically without requiring a predefined schema. Its multi-model API (including SQL API) allows querying by any property with low-latency reads and writes, making it ideal for services that frequently update JSON documents.

Exam trap

The trap here is that candidates confuse Azure Blob Storage's ability to store JSON files (as blobs) with the ability to query them by property, overlooking the lack of native indexing and querying capabilities.

How to eliminate wrong answers

Option A is wrong because Azure Blob Storage stores unstructured binary or text data as blobs, not queryable JSON documents, and lacks native indexing or querying by arbitrary properties. Option B is wrong because Azure SQL Database is a relational database that requires a fixed schema and does not natively store or index JSON documents without manual schema design and JSON functions. Option D is wrong because Azure Table Storage is a key-attribute store that only supports queries on partition key and row key, not arbitrary property indexing, and is not optimized for JSON document storage.

Practice this question →

14

MCQeasy

A healthcare organization is planning a data analytics platform. They will ingest data from various sources: structured patient records from a relational database, semi-structured JSON logs from medical devices, and unstructured physician notes as plain text files. Which characteristic of big data describes the different formats of data being ingested?

A.Volume

B.Velocity

C.Variety

D.Veracity

AnswerC

Variety correctly describes the different data types (structured, semi-structured, unstructured) being ingested.

Why this answer

The question describes data in three distinct formats: structured (relational database), semi-structured (JSON logs), and unstructured (plain text). In big data terminology, 'Variety' specifically refers to the different types and formats of data being processed. This is a core concept in the 4 V's of big data, where Variety captures the heterogeneity of data sources and structures.

Exam trap

The trap here is that candidates often confuse 'Variety' with 'Volume' because they associate big data with large datasets, but the question explicitly asks about different formats, not size.

How to eliminate wrong answers

Option A (Volume) is wrong because Volume refers to the sheer quantity of data being generated, not the different formats. Option B (Velocity) is wrong because Velocity describes the speed at which data is generated and processed, such as real-time streaming from IoT devices. Option D (Veracity) is wrong because Veracity concerns the quality, accuracy, and trustworthiness of the data, not its format diversity.

Practice this question →

15

Matchingmedium

Match each Azure data service to its primary purpose.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts

Matches

Relational database as a service

NoSQL multi-model database

Big data and analytics

Unstructured object storage

Scalable data lake for analytics

Why these pairings

These are core Azure data services with distinct use cases.

Practice this question →

16

MCQmedium

A company is designing a data solution for their e-commerce platform. They need to store product catalogs with varying attributes, support high-throughput read/write operations, and ensure low-latency access globally. Which Azure data store is most appropriate?

A.Azure Cosmos DB

B.Azure SQL Database

C.Azure Redis Cache

D.Azure Data Lake Storage

AnswerA

Cosmos DB is a NoSQL database with automatic indexing, multi-master replication, and low-latency global access.

Why this answer

Azure Cosmos DB is the most appropriate choice because it is a globally distributed, multi-model database service that supports schema-agnostic storage of product catalogs with varying attributes. It offers guaranteed single-digit-millisecond latency for reads and writes at any scale, and its turnkey global distribution enables low-latency access from multiple regions, meeting the e-commerce platform's high-throughput and global requirements.

Exam trap

The trap here is that candidates often confuse Azure SQL Database's JSON support with native schema flexibility, overlooking the fact that Cosmos DB is purpose-built for globally distributed, schema-agnostic workloads with guaranteed latency SLAs.

How to eliminate wrong answers

Option B is wrong because Azure SQL Database is a relational database with a fixed schema, which is not suitable for storing product catalogs with varying attributes without complex schema changes or using JSON columns that lack native indexing and global distribution capabilities. Option C is wrong because Azure Redis Cache is an in-memory data store primarily used for caching and session state, not for durable, persistent storage of product catalogs with high-throughput writes and global replication. Option D is wrong because Azure Data Lake Storage is designed for big data analytics and batch processing of large volumes of unstructured data, not for low-latency, high-throughput transactional read/write operations required by an e-commerce product catalog.

Practice this question →

17

MCQmedium

A bank processes a fund transfer transaction. The system debits $100 from Account A and then credits $100 to Account B. If the system crashes after debiting Account A but before crediting Account B, the database automatically reverts the debit. Which ACID property ensures this behavior?

A.Atomicity

B.Consistency

C.Isolation

D.Durability

AnswerA

Correct - Atomicity guarantees that the transaction is all-or-nothing. The rollback of the debit upon crash is a direct result of atomicity enforcement.

Why this answer

Atomicity ensures that a transaction is treated as a single, indivisible unit of work. If any part of the transaction fails (e.g., a crash after debiting Account A but before crediting Account B), the entire transaction is rolled back, reverting any partial changes like the debit. This all-or-nothing behavior is the core of atomicity in database systems.

Exam trap

The trap here is that candidates often confuse atomicity with consistency, thinking that 'keeping the database in a valid state' is what triggers the rollback, but it is actually atomicity that enforces the all-or-nothing rule for the transaction itself.

How to eliminate wrong answers

Option B is wrong because Consistency ensures that a transaction transforms the database from one valid state to another, enforcing integrity constraints (e.g., total balance remains constant), but it does not handle rollback of partial changes after a crash. Option C is wrong because Isolation ensures that concurrent transactions do not interfere with each other (e.g., via locking or MVCC), but it does not address crash recovery or rollback of incomplete transactions. Option D is wrong because Durability guarantees that once a transaction is committed, its changes persist even after a system failure (e.g., via write-ahead logging), but it does not revert uncommitted changes; that is the role of atomicity.

Practice this question →

18

Multi-Selecthard

Which TWO Azure services are primarily used for batch processing of large volumes of data? (Choose two.)

Select 2 answers

A.Azure Synapse Analytics

B.Azure SQL Database

C.Azure Stream Analytics

D.Azure Databricks

E.Azure Data Lake Storage

AnswersA, D

Synapse provides SQL and Spark engines for batch processing.

Why this answer

Azure Synapse Analytics is correct because it provides a cloud-based data warehousing and analytics service that uses massively parallel processing (MPP) to run complex queries and batch processing jobs over large datasets, often using PolyBase or T-SQL to transform and load data in bulk. Azure Databricks is correct because it is an Apache Spark-based analytics platform optimized for batch processing, allowing users to run distributed data processing jobs (e.g., ETL, data transformation) across large volumes of data using DataFrames and RDDs in a cluster environment.

Exam trap

The trap here is that candidates often confuse Azure Data Lake Storage (a storage service) with a processing service, or mistakenly think Azure SQL Database can handle large-scale batch processing due to its ability to run bulk insert operations, but it lacks the distributed compute and parallel architecture required for true batch processing at scale.

Practice this question →

19

MCQeasy

A retail company operates an e-commerce website that processes customer orders (insert, update, delete) throughout the day. The same company also runs reports on sales trends at the end of each quarter. Which type of data processing workload does the order processing represent?

A.A) Batch processing

B.B) Transactional processing (OLTP)

C.C) Analytical processing (OLAP)

D.D) Stream processing

AnswerB

Correct. Order processing requires real-time handling of individual inserts, updates, and deletes, which is the definition of OLTP. OLTP systems are designed for high concurrency and low latency for transactional operations.

Why this answer

Order processing involves inserting, updating, and deleting individual customer orders in real time as they occur. This is the classic definition of an Online Transaction Processing (OLTP) workload, which is optimized for high-volume, low-latency transactions that maintain ACID (Atomicity, Consistency, Isolation, Durability) properties. The e-commerce website requires immediate data consistency for each order, which is the hallmark of transactional processing.

Exam trap

The trap here is that candidates confuse 'analytical processing' (OLAP) with 'transactional processing' (OLTP) because both involve databases, but OLAP is for read-heavy, aggregated queries on historical data, not for the write-heavy, individual row operations of order management.

How to eliminate wrong answers

Option A is wrong because batch processing handles large volumes of data in scheduled, non-real-time batches (e.g., end-of-day payroll runs), not the continuous, individual order operations described. Option C is wrong because analytical processing (OLAP) is designed for complex queries and aggregations over historical data (e.g., sales trend reports), not for the high-frequency inserts/updates/deletes of live orders. Option D is wrong because stream processing deals with continuous, unbounded data flows (e.g., real-time sensor data or clickstreams) using event-time windows, not the discrete, stateful transactions of an order system.

Practice this question →

20

MCQeasy

A manufacturing company stores two types of data: (1) real-time sensor readings from production machines used to monitor current machine status, and (2) historical daily production summaries used by managers to identify trends over months. Which statement accurately describes these workloads?

A.Sensor readings are an OLAP workload; daily summaries are an OLTP workload.

B.Sensor readings are an OLTP workload; daily summaries are an OLAP workload.

C.Sensor readings are a NoSQL workload; daily summaries are a relational workload.

D.Sensor readings are a batch workload; daily summaries are a real-time workload.

AnswerB

Sensor readings are frequent transactions (OLTP), daily summaries are analytical (OLAP).

Why this answer

Option B is correct because real-time sensor readings involve frequent, small inserts and point lookups (typical of an OLTP workload), while historical daily summaries are aggregated data used for trend analysis over months (typical of an OLAP workload). OLTP systems handle high-volume transactional operations, whereas OLAP systems support complex queries and aggregations on large historical datasets.

Exam trap

The trap here is that candidates confuse OLTP with real-time and OLAP with batch, but OLTP can be real-time (e.g., sensor inserts) and OLAP can be batch (e.g., daily summaries), so the key distinction is transactional vs. analytical processing, not timing.

How to eliminate wrong answers

Option A is wrong because it reverses the definitions: sensor readings are an OLTP workload (not OLAP), and daily summaries are an OLAP workload (not OLTP). Option C is wrong because the workload type (OLTP vs. OLAP) is independent of the data model (NoSQL vs. relational); sensor readings could be stored in a relational or NoSQL database, and daily summaries could also be in either.

Option D is wrong because sensor readings are a real-time (streaming) workload, not batch; daily summaries are a batch workload (processed from historical data), not real-time.

Practice this question →

21

MCQeasy

A company wants to store JSON documents from IoT devices with low latency and high availability. Which Azure data store should they use?

A.Azure Blob Storage

B.Azure Cosmos DB

C.Azure Table Storage

D.Azure SQL Database

AnswerB

Azure Cosmos DB is a globally distributed NoSQL database that natively supports JSON documents.

Why this answer

Azure Cosmos DB is the correct choice because it is a fully managed NoSQL database designed for low-latency, high-availability workloads, with native support for JSON documents. It offers single-digit millisecond read/write latencies at the 99th percentile, global distribution with multi-region writes, and multiple consistency models, making it ideal for IoT scenarios that require fast, always-on access to semi-structured data.

Exam trap

The trap here is that candidates often confuse Azure Blob Storage's ability to store JSON files with the need for a database that can natively query and index JSON documents, leading them to choose Blob Storage for its low cost rather than Cosmos DB for its low-latency querying capabilities.

How to eliminate wrong answers

Option A is wrong because Azure Blob Storage is an object store for unstructured binary data (blobs) and does not provide native JSON document querying or indexing; it would require additional compute to parse and query JSON files. Option C is wrong because Azure Table Storage is a key-value store that does not natively support JSON documents; it stores entities as rows with a fixed schema and lacks the rich querying and indexing capabilities of a document database. Option D is wrong because Azure SQL Database is a relational database that requires a predefined schema and is not optimized for storing and querying flexible JSON documents with the same low-latency, high-throughput characteristics as Cosmos DB.

Practice this question →

22

MCQmedium

Your organization is migrating on-premises SQL Server databases to Azure. The databases include a mission-critical OLTP system that requires high availability with automatic failover and a reporting database that is used for read-only queries. You need to choose the appropriate Azure SQL deployment options for each workload. The OLTP system must have a recovery point objective (RPO) of less than 5 seconds and a recovery time objective (RTO) of less than 30 seconds. The reporting database should be cost-effective and can tolerate up to 5 minutes of data loss. What should you recommend?

A.Use SQL Server on Azure Virtual Machines with Always On Availability Groups for both workloads.

B.Use Azure SQL Database Hyperscale for OLTP and Azure SQL Database serverless for reporting.

C.Use Azure SQL Database Managed Instance with a failover group for OLTP, and use a read-only replica of the Managed Instance for reporting.

D.Use Azure SQL Database single database with active geo-replication for both workloads.

AnswerC

Failover group provides low RPO/RTO; read-only replica serves reporting.

Why this answer

Option C is correct because Azure SQL Database Managed Instance supports failover groups that provide automatic failover across regions with an RPO of less than 5 seconds and an RTO of less than 30 seconds, meeting the OLTP requirements. The read-only replica of the Managed Instance can be used for reporting queries without impacting the primary OLTP workload, and it is cost-effective as it does not require a separate database instance.

Exam trap

The trap here is that candidates often confuse the high availability features of Azure SQL Database single database (active geo-replication) with the stricter RPO/RTO guarantees of Managed Instance failover groups, or they assume that SQL Server on Azure VMs with Always On Availability Groups is the only option for such requirements, overlooking the managed service benefits.

How to eliminate wrong answers

Option A is wrong because SQL Server on Azure Virtual Machines with Always On Availability Groups requires manual configuration and management of the VMs and availability groups, and it does not provide the automatic failover with the specified RPO/RTO as a managed service; it also incurs higher operational overhead and cost for both workloads. Option B is wrong because Azure SQL Database Hyperscale is designed for large databases with high scalability and fast backup/restore, but it does not guarantee an RPO of less than 5 seconds and an RTO of less than 30 seconds for automatic failover; the serverless tier for reporting is cost-effective but does not provide a read-only replica for reporting without additional cost. Option D is wrong because Azure SQL Database single database with active geo-replication can provide failover but typically has an RPO of up to 5 seconds and an RTO of up to 1 hour, which does not meet the strict RTO of less than 30 seconds for the OLTP system; using it for both workloads would also be less cost-effective for the reporting database.

Practice this question →

23

MCQeasy

A financial database system ensures that once a transaction is committed, the data changes are permanently stored and will survive any subsequent system failure, such as a power outage or crash. Which property of ACID transactions does this describe?

A.A: Atomicity

B.B: Consistency

C.C: Isolation

D.D: Durability

AnswerD

Durability ensures that once a transaction is committed, its effects are permanent and survive system failures.

Why this answer

D is correct because durability guarantees that once a transaction is committed, the changes persist permanently, even in the event of a system failure like a power outage or crash. In SQL Server, this is implemented via the write-ahead log (WAL) and checkpoint processes, ensuring committed data is flushed to disk before acknowledging success.

Exam trap

The trap here is that candidates confuse durability with atomicity, thinking 'permanent storage' relates to the all-or-nothing nature of a transaction, but atomicity only guarantees that partial changes are rolled back, not that committed data survives crashes.

How to eliminate wrong answers

Option A is wrong because atomicity ensures that a transaction is treated as an all-or-nothing unit, not that committed data survives failures. Option B is wrong because consistency ensures that a transaction brings the database from one valid state to another, preserving integrity constraints, not permanent storage. Option C is wrong because isolation ensures that concurrent transactions do not interfere with each other, not that committed data is durable.

Practice this question →

24

MCQeasy

A financial company needs to store transactional records where each record has a fixed set of attributes (TransactionID, Amount, Date, AccountID). The data must support complex queries and enforce referential integrity. Which type of data store is most appropriate?

A.Key-value store

B.Document database

C.Relational database

D.Graph database

AnswerC

Relational databases store data in tables with predefined schemas, support SQL for complex queries, and enforce referential integrity via foreign keys, making them ideal for transactional data.

Why this answer

A relational database (option C) is the most appropriate choice because transactional records with a fixed schema and the need for referential integrity (e.g., ensuring AccountID references a valid account) are best handled by a structured, ACID-compliant system like Azure SQL Database or SQL Server. Relational databases enforce constraints such as foreign keys and support complex queries using JOINs and aggregations, which are essential for financial reporting and auditing.

Exam trap

The trap here is that candidates often confuse 'fixed schema' with 'document databases,' assuming JSON documents can enforce structure, but document databases do not enforce schema or referential integrity at the database level, which is a key requirement for transactional records.

How to eliminate wrong answers

Option A is wrong because a key-value store (e.g., Azure Cosmos DB Table API) treats each record as an opaque blob indexed by a key, lacking built-in support for complex queries (e.g., filtering by Amount range) and referential integrity constraints. Option B is wrong because a document database (e.g., Azure Cosmos DB Core API) stores semi-structured JSON documents, which do not enforce a fixed schema or foreign key relationships, making it unsuitable for strict referential integrity. Option D is wrong because a graph database (e.g., Azure Cosmos DB Gremlin API) is optimized for traversing relationships between entities (e.g., social networks), not for enforcing referential integrity or performing SQL-style complex queries on tabular transactional data.

Practice this question →

25

MCQeasy

A company stores product information such as product ID, name, price, and category in a relational database with rows and columns. This data is best described as:

A.Structured data

B.Semi-structured data

C.Unstructured data

D.Transactional data

AnswerA

Structured data conforms to a rigid schema, typically stored in tables with rows and columns, which matches the product information described.

Why this answer

Structured data conforms to a predefined schema with rows and columns, making it easily searchable and queryable via SQL. The product information (ID, name, price, category) fits this model exactly, as each attribute has a fixed data type and is stored in a relational database table.

Exam trap

The trap here is confusing 'transactional data' (a workload type) with 'structured data' (a data format), leading candidates to pick D because product information is often used in transactions, but the question asks about the data's structure, not its purpose.

How to eliminate wrong answers

Option B is wrong because semi-structured data (e.g., JSON, XML) does not require a fixed schema and often uses tags or key-value pairs, not rigid rows and columns. Option C is wrong because unstructured data (e.g., images, videos, text files) lacks a predefined data model or organization into rows and columns. Option D is wrong because transactional data refers to records of business transactions (e.g., sales orders, payments) and is a type of structured data, not a distinct category of data structure.

Practice this question →

26

MCQmedium

A healthcare provider stores patient admission data in a relational database table with columns for PatientID, Name, and AdmissionDate. Progress notes are stored as free-text documents. Lab results are stored as XML files that contain varying fields depending on the test type. Which of the following correctly categorizes these three data types in order: relational table, progress notes, lab results?

A.Structured, Unstructured, Semi-structured

B.Structured, Semi-structured, Unstructured

C.Semi-structured, Unstructured, Structured

D.Unstructured, Structured, Semi-structured

AnswerA

The relational table has a fixed schema (structured). Free-text progress notes have no schema (unstructured). XML files have tags and can vary, making them semi-structured. This is correct.

Why this answer

The relational table with PatientID, Name, and AdmissionDate enforces a fixed schema with defined data types, making it structured data. Progress notes as free-text documents have no predefined structure or schema, classifying them as unstructured data. Lab results in XML files use tags to organize data but allow varying fields per test type, which is the hallmark of semi-structured data.

Option A correctly maps these in order: structured, unstructured, semi-structured.

Exam trap

The trap here is that candidates confuse semi-structured data (like XML with varying fields) with unstructured data, or they misorder the three types by not recognizing that a relational table is always structured and free-text is always unstructured.

How to eliminate wrong answers

Option B is wrong because it incorrectly categorizes progress notes as semi-structured; free-text documents lack any schema or tags, so they are unstructured, not semi-structured. Option C is wrong because it places semi-structured first (lab results) and structured last (relational table), reversing the correct order; the relational table is structured, not semi-structured. Option D is wrong because it starts with unstructured for the relational table, which has a rigid schema, and places structured for lab results, which are semi-structured due to varying XML fields.

Practice this question →

27

MCQeasy

A company stores customer orders in a relational database that handles many small transactions (inserts, updates, deletes) throughout the day. Separately, they maintain a data warehouse that is used for complex aggregations and historical trend analysis. Which statement correctly describes these two workloads?

A.The first system is an OLTP workload; the second is an OLAP workload.

B.Both systems are OLTP workloads because they store customer orders.

C.The first system is an OLAP workload; the second is an OLTP workload.

D.Both systems are OLAP workloads because they both involve data storage.

AnswerA

OLTP systems handle many small, real-time transactions, while OLAP systems are used for complex analytical queries on aggregated data. This accurately describes the two workloads.

Why this answer

The first system handles many small, concurrent transactions (inserts, updates, deletes) typical of an Online Transaction Processing (OLTP) workload, optimized for ACID compliance and fast query response. The second system is an Online Analytical Processing (OLAP) workload, designed for complex aggregations and historical trend analysis using columnar storage and star schemas. This distinction is fundamental in data architecture, where OLTP systems prioritize write performance and OLAP systems prioritize read performance for large-scale analytics.

Exam trap

The trap here is that candidates confuse the terms OLTP and OLAP, often assuming any database that stores data is OLTP or that any system with 'warehouse' in the name is automatically OLTP, when in fact the workload pattern (many small transactions vs. complex aggregations) defines the category.

How to eliminate wrong answers

Option B is wrong because both systems are not OLTP; the data warehouse is specifically designed for analytical queries, not transactional processing. Option C is wrong because it reverses the definitions: the first system is OLTP (transactional), not OLAP (analytical). Option D is wrong because both systems are not OLAP; the relational database handling small transactions is an OLTP workload, and data storage alone does not define a workload type.

Practice this question →

28

MCQmedium

A company stores customer orders. Each order has a unique order ID, customer ID, a list of items (each item contains product ID, quantity, and price), and an order date. They frequently query orders by customer ID and also need to filter by order date ranges. The data volume is high and schema flexibility is desired because items can vary in structure. Which type of data store is best suited for this scenario?

A.Relational database

B.Key-value store

C.Document database

D.Graph database

AnswerC

Correct. Document databases store data in nested documents (e.g., JSON), which matches the order-with-items structure, and support indexing on multiple fields for flexible queries.

Why this answer

A document database (e.g., Azure Cosmos DB for NoSQL) is ideal because it stores each order as a self-contained JSON document, allowing the items array to vary in structure per order (schema flexibility). It supports efficient queries by customer ID (using a partition key) and filtering by order date ranges (using indexing on the date field), while handling high data volumes with horizontal scaling.

Exam trap

The trap here is that candidates often choose a relational database (Option A) because they think 'orders' and 'items' imply a need for joins, but the requirement for schema flexibility and high-volume queries by customer ID and date range actually points to a document store, which can embed items directly and index the relevant fields.

How to eliminate wrong answers

Option A is wrong because a relational database enforces a fixed schema (e.g., separate normalized tables for orders and items), which conflicts with the requirement for schema flexibility when items can vary in structure. Option B is wrong because a key-value store (e.g., Azure Cosmos DB for Table API) retrieves data only by a single key (e.g., order ID) and does not natively support filtering by non-key attributes like customer ID or order date ranges without scanning all records. Option D is wrong because a graph database (e.g., Azure Cosmos DB for Gremlin) is optimized for traversing relationships between entities (e.g., customer-product networks), not for storing and querying semi-structured documents with flexible schemas and range filters.

Practice this question →

29

Multi-Selecthard

Which THREE factors should you consider when choosing between Azure SQL Database and Azure Cosmos DB for a new application? (Choose three.)

Select 3 answers

A.Schema flexibility

B.Global distribution needs

C.Cost per GB

D.Maximum data size

E.Consistency model requirements

AnswersA, B, E

Cosmos DB is schema-agnostic; SQL Database requires a defined schema.

Why this answer

Option A is correct because Azure SQL Database requires a fixed relational schema, whereas Azure Cosmos DB is schema-agnostic and supports flexible, document-based data models. This makes Cosmos DB ideal for applications with evolving or unstructured data, while SQL Database suits strictly relational workloads.

Exam trap

The trap here is that candidates often confuse cost or storage limits as key differentiators, but the DP-900 exam focuses on schema flexibility, global distribution, and consistency models as the core architectural trade-offs between these two services.

Practice this question →

30

MCQeasy

A company collects customer feedback forms. Each form contains always-present fields like CustomerID and SubmissionDate, but also a free-text Comments field and optional fields like Rating or ProductCategory that vary between forms. How should this data be classified?

A.Structured data

B.Semi-structured data

C.Unstructured data

D.Relational data

AnswerB

Correct. Semi-structured data has a flexible schema, often with a mix of mandatory fields and optional or varying fields, as seen in this scenario.

Why this answer

The customer feedback forms contain a mix of structured fields (CustomerID, SubmissionDate) that follow a fixed schema and unstructured fields (free-text Comments) plus optional fields (Rating, ProductCategory) that may or may not be present. This combination of schema-optional and schema-fixed data within the same record is the hallmark of semi-structured data, which does not require a rigid schema like a relational table but still has some organizational properties (e.g., tags or key-value pairs). In Azure, this data is well-suited for storage in Azure Cosmos DB (using JSON documents) or Azure Blob Storage with metadata, rather than a strictly relational database.

Exam trap

Microsoft often tests the misconception that any data with some structure (like a form with fixed fields) must be 'structured,' but the presence of optional or free-text fields pushes it into the semi-structured category.

How to eliminate wrong answers

Option A is wrong because structured data requires a fixed, predefined schema where every record has the same fields and data types (like a SQL table), but the optional and free-text fields here break that rigidity. Option C is wrong because unstructured data has no schema at all (e.g., raw video files, plain text without metadata), whereas these forms have always-present fields like CustomerID and SubmissionDate that provide structure. Option D is wrong because relational data is a subset of structured data that enforces relationships through foreign keys and normalization, which does not apply to forms with varying optional fields.

Practice this question →

31

MCQmedium

Your company stores IoT sensor data in Azure Blob Storage. Data analysts need to query the data using SQL without moving it. Which Azure service should you use?

A.Azure Stream Analytics

B.Azure Data Lake Storage

C.Azure Synapse Serverless SQL

D.Azure SQL Database

AnswerC

Allows querying data in Azure Blob Storage using T-SQL without moving it.

Why this answer

Azure Synapse Serverless SQL is the correct choice because it provides a SQL-based query engine that can directly query data stored in Azure Blob Storage using T-SQL, without requiring data movement or a dedicated data warehouse. It uses a pay-per-query model and supports reading various file formats like Parquet, CSV, and JSON, making it ideal for ad-hoc analytical queries on IoT sensor data.

Exam trap

The trap here is that candidates often confuse Azure Synapse Serverless SQL with Azure SQL Database, mistakenly thinking any 'SQL' service can query external storage, but Azure SQL Database requires data to be loaded into its own tables, while Serverless SQL queries data in place.

How to eliminate wrong answers

Option A is wrong because Azure Stream Analytics is a real-time stream processing service designed for analyzing data in motion (e.g., from IoT Hub or Event Hubs), not for querying static data already stored in Blob Storage using SQL. Option B is wrong because Azure Data Lake Storage is a storage service (built on Blob Storage) that provides a hierarchical namespace and POSIX-like access control; it does not include a built-in SQL query engine. Option D is wrong because Azure SQL Database is a fully managed relational database service that requires data to be imported and stored in its own tables, not for querying data directly in external Blob Storage without movement.

Practice this question →

32

Multi-Selecteasy

Which TWO data storage types are classified as structured data in Azure? (Choose two.)

Select 2 answers

A.Azure Cosmos DB

B.Azure Data Lake Storage

C.Azure SQL Managed Instance

D.Azure SQL Database

E.Azure Blob Storage

AnswersC, D

Stores structured relational data with a fixed schema.

Why this answer

Azure SQL Managed Instance is a fully managed SQL Server database engine in Azure, which stores data in a relational schema with predefined tables, columns, and data types. This structured format enforces a rigid schema, making it a classic example of structured data storage in Azure.

Exam trap

The trap here is that candidates often confuse NoSQL databases like Azure Cosmos DB as structured because they support indexing and querying, but structured data specifically requires a fixed relational schema enforced by the database engine, which Cosmos DB does not mandate.

Practice this question →

33

MCQeasy

A company stores customer names and addresses in a fixed-format file where each record has the same fields in the same order. This type of data is best described as:

A.Structured data

B.Semi-structured data

C.Unstructured data

D.Streaming data

AnswerA

Structured data follows a fixed schema with defined fields, matching the scenario's fixed-format file.

Why this answer

A fixed-format file where each record has the same fields in the same order is a classic example of structured data. Structured data conforms to a rigid schema, such as a table with defined columns and data types, making it easily searchable and processable by relational database systems like Azure SQL Database. The consistent field order and fixed format allow for direct parsing without interpretation.

Exam trap

The trap here is that candidates confuse 'fixed-format' with 'semi-structured' because both can be stored in files, but the key distinction is that fixed-format enforces a rigid schema with identical fields per record, whereas semi-structured allows schema flexibility.

How to eliminate wrong answers

Option B is wrong because semi-structured data (e.g., JSON, XML) does not enforce a fixed schema; fields can vary between records and order may not be guaranteed. Option C is wrong because unstructured data (e.g., text files, images, videos) lacks a predefined data model or organization, unlike the fixed-format file described. Option D is wrong because streaming data refers to data that is continuously generated and processed in real-time (e.g., from IoT devices or event hubs), not to the storage format or schema of the data.

Practice this question →

34

Multi-Selecteasy

Which TWO of the following are characteristics of structured data?

Select 2 answers

A.It has a predefined schema

B.It is stored in Azure Blob Storage as objects

C.It can contain images and videos

D.It is often stored in relational databases

E.It uses tags to describe the data

AnswersA, D

Structured data conforms to a schema, such as tables with rows and columns.

Why this answer

Structured data adheres to a predefined schema, meaning its fields, data types, and relationships are defined in advance, typically enforced by a database management system. This schema ensures consistency and enables efficient querying using SQL. Relational databases are the primary storage system for structured data, organizing it into tables with rows and columns that follow the schema.

Exam trap

The trap here is that candidates confuse the storage location (Azure Blob Storage) or metadata mechanisms (tags) with the core definition of structured data, which is solely about having a predefined schema and typically being stored in relational databases.

Practice this question →

35

Multi-Selecthard

Which THREE of the following are characteristics of a data lake compared to a data warehouse?

Select 3 answers

A.Data lakes store data in its native or raw format.

B.Data lakes store structured, semi-structured, and unstructured data.

C.Data lakes use schema-on-read rather than schema-on-write.

D.Data lakes guarantee ACID transactions across all data.

E.Data lakes store only structured data.

AnswersA, B, C

Data lakes store raw data in its original format.

Why this answer

Option A is correct because a data lake stores data in its native or raw format, meaning it does not require transformation or schema definition at the time of ingestion. This allows organizations to retain the original fidelity of the data, which is a fundamental distinction from a data warehouse that typically transforms and structures data before loading (ETL). In Azure, Azure Data Lake Storage (ADLS) Gen2 supports storing any file format (e.g., Parquet, CSV, JSON, binary) without preprocessing.

Exam trap

Microsoft often tests the misconception that data lakes are just 'dumping grounds' without any structure, but the trap here is confusing ACID guarantees (which are optional and engine-specific) as a universal characteristic of data lakes, or assuming data lakes only handle unstructured data when they actually support all data types.

Practice this question →

36

MCQmedium

A company updates a customer's address in a database. The update must ensure that all existing orders still reference a valid customer ID. The database checks the foreign key constraint and rejects the update if it would violate referential integrity. Which ACID property does this enforcement represent?

A.Atomicity

B.Consistency

C.Isolation

D.Durability

AnswerB

Consistency guarantees that a transaction will not violate any database integrity constraints. By rejecting an update that would break referential integrity, the database enforces consistency.

Why this answer

Consistency ensures that any database transaction brings the database from one valid state to another, preserving all defined rules, including constraints like foreign keys. In this scenario, the foreign key constraint enforcement prevents an update that would leave orphaned order records, directly upholding the consistency property by rejecting the transaction if it violates referential integrity.

Exam trap

The trap here is that candidates often confuse consistency with atomicity, thinking that rejecting an invalid update is about 'all-or-nothing' behavior, when in fact consistency is specifically about maintaining data integrity rules and constraints.

How to eliminate wrong answers

Option A is wrong because atomicity ensures that a transaction is treated as a single, indivisible unit that either fully completes or fully rolls back, but it does not specifically enforce data rules like foreign key constraints. Option C is wrong because isolation ensures that concurrent transactions do not interfere with each other, preventing dirty reads or lost updates, but it does not enforce referential integrity rules. Option D is wrong because durability guarantees that once a transaction is committed, its changes persist even in the event of a system failure, but it does not validate or enforce constraints during the transaction.

Practice this question →

37

MCQmedium

A company collects temperature readings from IoT sensors every second. Each reading includes a timestamp, sensor ID, and temperature value. The data is used for real-time monitoring and historical trend analysis. Which type of data is this most likely classified as?

A.Structured data

B.Semi-structured data

C.Unstructured data

D.Streaming data

AnswerA

Correct. Each record has the same fixed attributes (timestamp, sensor ID, temperature) conforming to a rigid schema, typical of structured data.

Why this answer

The data consists of timestamp, sensor ID, and temperature value, each with a defined data type and relationship, fitting a tabular schema (rows and columns) typical of relational databases. This structured format enables efficient querying for real-time monitoring and historical trend analysis using SQL-based systems like Azure SQL Database or Azure Synapse Analytics.

Exam trap

The trap here is confusing the data's structure (structured vs. semi-structured) with its velocity (streaming vs. batch), leading candidates to incorrectly select 'Streaming data' because the data arrives in real time, even though the question explicitly asks about classification by type, not ingestion method.

How to eliminate wrong answers

Option B is wrong because semi-structured data (e.g., JSON, XML) has flexible schema with tags or key-value pairs, not the fixed, predefined columns of this IoT data. Option C is wrong because unstructured data (e.g., images, videos, text files) lacks a predefined data model or organization, unlike the clearly defined fields here. Option D is wrong because streaming data refers to the continuous flow of data (e.g., via Azure Stream Analytics or Event Hubs), not the classification of the data's structure; the question asks about data type, not ingestion method.

Practice this question →

38

MCQhard

Refer to the exhibit. You are analyzing storm event data in Azure Data Explorer. The KQL query returns the top 5 event types by count in Texas. However, the results show event types with very low counts (e.g., 'Volcanic Ash' with 2 events). What is the most likely reason for this?

A.The query has a syntax error.

B.The 'take' operator limits the number of events, not the number of types.

C.The state filter is case-sensitive and the actual state value differs.

D.The 'summarize' operator cannot be used with 'where'.

AnswerC

KQL is case-sensitive; 'TEXAS' may not match 'Texas'.

Why this answer

Option B is correct because the StormEvents sample table contains data from multiple states, but the filter for 'TEXAS' may be case-sensitive or the state might be stored as 'Texas' or 'tx'. If the case does not match exactly, the filter returns few or no rows, leading to unexpected results. Option A is wrong because the 'take' operator limits rows, not aggregates.

Option C is wrong because summarizing by EventType should work regardless of state. Option D is wrong because the query is syntactically correct.

Practice this question →

39

MCQeasy

A company wants to store historical sales data for long-term analysis. The data is accessed infrequently but must be retained for 7 years. Which Azure storage tier minimizes cost while meeting these requirements?

A.Archive storage tier

B.Premium storage tier

C.Cool storage tier

D.Hot storage tier

AnswerC

Cool tier is for infrequently accessed data, lower cost.

Why this answer

The Cool storage tier is designed for data that is accessed infrequently but must be retained for extended periods, offering lower storage costs than Hot tier while still providing low-latency access when needed. With a 30-day minimum storage duration and a cost structure optimized for infrequent reads, it balances cost and accessibility for 7-year retention of historical sales data.

Exam trap

The trap here is that candidates often choose the Archive tier because it has the lowest storage cost, forgetting that retrieval latency and higher access costs make it unsuitable for data that may need to be accessed even occasionally during the retention period.

How to eliminate wrong answers

Option A is wrong because the Archive storage tier is intended for data that is rarely accessed and can tolerate hours of retrieval latency, which is overkill for data that may need occasional access and incurs higher retrieval costs. Option B is wrong because Premium storage tier is optimized for high-performance, low-latency workloads (e.g., IaaS VMs or databases) and is significantly more expensive, making it unsuitable for long-term, infrequently accessed historical data. Option D is wrong because the Hot storage tier is designed for frequently accessed data with higher storage costs and no minimum retention period, leading to unnecessary expense for data that is accessed infrequently.

Practice this question →

40

Multi-Selecthard

Which THREE data storage considerations are important when choosing between Azure SQL Database and Azure Cosmos DB? (Choose three.)

Select 3 answers

A.ACID transaction support

B.Global distribution capabilities

C.Schema flexibility

D.Maximum storage capacity

E.Built-in analytics features

AnswersA, B, C

SQL Database provides full ACID compliance.

Why this answer

Option A is correct because Azure SQL Database provides full ACID (Atomicity, Consistency, Isolation, Durability) transaction support, ensuring reliable data operations with commit and rollback capabilities. This is critical for applications requiring strict data integrity, such as financial systems or inventory management.

Exam trap

The trap here is that candidates often confuse 'maximum storage capacity' or 'built-in analytics' as key differentiators, when in fact the core decision hinges on ACID transactions, global distribution, and schema flexibility—the three factors that directly align with the fundamental differences between relational and NoSQL databases.

Practice this question →

41

MCQhard

A financial services company is evaluating distributed NoSQL databases for a new application that must remain fully available even during network partitions. The application can tolerate stale reads for some types of queries. Which statement accurately describes the trade-off described by the CAP theorem in this context?

A.During a network partition, the system can maintain both consistency and availability.

B.When a network partition occurs, a distributed system must choose between providing consistency and providing availability.

C.Partition tolerance is an optional property and can be sacrificed to achieve both consistency and availability.

D.Availability guarantees that every read returns the most recent write.

AnswerB

This is the core trade-off of the CAP theorem: during a partition, you must sacrifice either consistency (to stay available) or availability (to remain consistent).

Why this answer

The CAP theorem states that during a network partition (P), a distributed system must choose between consistency (C) and availability (A). Since the application requires full availability even during partitions, it must sacrifice strong consistency in favor of eventual consistency, which tolerates stale reads. Option B correctly captures this fundamental trade-off.

Exam trap

The trap here is that candidates often confuse 'availability' with 'consistency' or assume that partition tolerance can be sacrificed, when in fact the CAP theorem requires that partition tolerance be a given in any distributed system, and the real choice is between consistency and availability during a partition.

How to eliminate wrong answers

Option A is wrong because during a network partition, it is impossible for a distributed system to maintain both consistency and availability simultaneously; the CAP theorem proves that only two of the three properties can be guaranteed at any time. Option C is wrong because partition tolerance is not optional in a distributed system that spans multiple nodes or data centers; network partitions are a reality that must be tolerated, so sacrificing P is not a valid choice for a system that must remain fully available. Option D is wrong because availability does not guarantee that every read returns the most recent write; that is a property of strong consistency, not availability.

Practice this question →

42

MCQeasy

A company stores customer data in three formats: a relational table with fixed columns for CustomerID, Name, and Email; product reviews stored as JSON documents with varying fields such as rating and comment; and product demonstration videos in MP4 format. Which of the following correctly lists these data types from most structured to least structured?

A.Relational table, MP4 videos, JSON documents

B.JSON documents, relational table, MP4 videos

C.Relational table, JSON documents, MP4 videos

D.MP4 videos, JSON documents, relational table

AnswerC

Correct. This order correctly ranks structured (relational), semi-structured (JSON), and unstructured (MP4) data.

Why this answer

Option C is correct because data structuredness is determined by schema rigidity. A relational table has a fixed schema with predefined columns (CustomerID, Name, Email), making it the most structured. JSON documents have a flexible schema where fields like rating and comment can vary per document, placing them in the semi-structured category.

MP4 videos are unstructured binary data with no inherent schema, making them the least structured.

Exam trap

Microsoft often tests the misconception that JSON is unstructured because it lacks a fixed schema, but JSON is actually semi-structured due to its self-describing key-value pairs, while binary formats like MP4 are truly unstructured.

How to eliminate wrong answers

Option A is wrong because it incorrectly places MP4 videos (unstructured binary data) as more structured than JSON documents (semi-structured with flexible schema). Option B is wrong because it ranks JSON documents as more structured than a relational table, but relational tables enforce a fixed schema with strict data types and constraints, making them the most structured. Option D is wrong because it orders from least to most structured, reversing the correct hierarchy; MP4 videos are the least structured, not the most.

Practice this question →

43

MCQhard

Refer to the exhibit. You are analyzing a message from an IoT device captured in Azure Event Hubs. The message contains system properties indicating the device ID and authentication method. You need to route messages from device-01 to a separate storage container for compliance. Which property should you use in a Stream Analytics query to filter messages?

A.partitionId

B.consumerGroup

C.iothub-connection-device-id

D.deviceId

AnswerC

This system property contains the device ID and can be used in a WHERE clause to filter messages from device-01.

Why this answer

Option C is correct because the `iothub-connection-device-id` system property is automatically added by Azure IoT Hub to every device-to-cloud message. In a Stream Analytics query, you can reference this property directly (e.g., `WHERE iothub-connection-device-id = 'device-01'`) to filter messages from a specific device for routing to a separate storage container for compliance.

Exam trap

Microsoft often tests the exact naming of Azure IoT Hub system properties, and the trap here is that candidates assume a simple `deviceId` property exists, but the actual property name includes the `iothub-connection-` prefix, which is specific to IoT Hub's message enrichment.

How to eliminate wrong answers

Option A is wrong because `partitionId` is a logical partition key used for scaling and ordering within Event Hubs, not a device identifier; filtering by partition ID would not isolate messages from a specific device. Option B is wrong because `consumerGroup` is a logical group of consumers reading from an Event Hub or IoT Hub, used for load balancing and checkpointing, not a property on individual messages. Option D is wrong because `deviceId` is not a standard system property in Azure IoT Hub messages; the correct system property name is `iothub-connection-device-id` (with the full prefix), and using `deviceId` would result in a null or undefined value in the query.

Practice this question →

44

MCQmedium

Refer to the exhibit. You are reviewing an Azure Resource Manager template for a Blob Storage container named 'sales'. The container has versioning enabled. A developer accidentally overwrites a blob. What is the simplest way to recover the previous version?

A.Access the previous version through the version list and restore it

B.Use blob soft delete to recover the blob

C.Restore from a backup using Azure Backup

D.Perform a point-in-time restore of the container

AnswerA

Versioning keeps all versions; you can promote a previous version to the current one.

Why this answer

Option A is correct because Azure Blob Storage versioning automatically maintains a history of blob versions. When a blob is overwritten, the previous version is preserved and can be accessed via the version list. The simplest recovery method is to promote the previous version to the current version, which restores the blob to its prior state without needing additional services or configurations.

Exam trap

The trap here is that candidates confuse versioning with soft delete, assuming soft delete can recover overwrites, but soft delete only protects against deletions, not modifications.

How to eliminate wrong answers

Option B is wrong because blob soft delete is a separate feature that protects against accidental deletion, not overwrites; it would not recover a previous version of an overwritten blob. Option C is wrong because Azure Backup is designed for broader disaster recovery scenarios (e.g., entire storage accounts or VMs) and is overkill for recovering a single blob version; it also requires additional cost and configuration. Option D is wrong because point-in-time restore is used to restore a container to a previous state, but it is more complex and resource-intensive than simply accessing the version list, and it requires the container to have immutable storage policies or specific backup configurations.

Practice this question →

45

MCQmedium

A data engineering team at a logistics company handles two distinct data processing workloads. The first workload ingests GPS data from delivery trucks every 10 seconds and updates a dashboard showing real-time vehicle locations. The second workload processes monthly CSV files of completed deliveries to generate reports on delivery times and route efficiency. Which statement correctly identifies these workloads?

A.Both workloads are streaming workloads

B.GPS data processing is a batch workload; monthly report processing is a streaming workload

C.GPS data processing is a streaming workload; monthly report processing is a batch workload

D.Both workloads are batch workloads

AnswerC

Correct. Real-time data ingestion and dashboard updates represent a streaming workload. Scheduled processing of large files is a batch workload.

Why this answer

C is correct because GPS data ingested every 10 seconds is a continuous, near-real-time stream, making it a streaming workload. Monthly CSV file processing is a classic batch workload, as data is collected over a period and processed in a single, scheduled job. This distinction is fundamental in Azure data services: streaming workloads use services like Azure Stream Analytics or Event Hubs, while batch workloads use Azure Synapse Pipelines or Azure Data Factory.

Exam trap

The trap here is that candidates confuse the frequency of data arrival (every 10 seconds) with batch processing, not recognizing that continuous, low-latency ingestion defines a streaming workload, not just the presence of a schedule.

How to eliminate wrong answers

Option A is wrong because both workloads are not streaming; the monthly CSV processing is clearly a batch workload. Option B is wrong because it reverses the definitions: GPS data processing is streaming, not batch, and monthly report processing is batch, not streaming. Option D is wrong because both workloads are not batch; the GPS data ingestion is a streaming workload due to its continuous, low-latency nature.

Practice this question →

46

MCQeasy

You need to store semi-structured JSON documents from a web application in Azure. The data will be accessed by a key/value lookup. Which Azure data store should you use?

A.Azure Blob Storage

B.Azure Table Storage

C.Azure Cosmos DB

D.Azure SQL Database

AnswerC

NoSQL database that natively supports JSON documents and key-value lookups.

Why this answer

Azure Cosmos DB is the correct choice because it natively supports semi-structured JSON documents and provides key/value lookup via its partition key mechanism. It offers single-digit millisecond latency for point reads, making it ideal for web application data that needs fast, scalable access by a unique key.

Exam trap

The trap here is that candidates often confuse Azure Table Storage's key/value capabilities with JSON document support, but Table Storage stores flat entities, not nested JSON, and lacks native indexing for document fields.

How to eliminate wrong answers

Option A is wrong because Azure Blob Storage is designed for unstructured binary or text data (like images, videos, or logs), not for semi-structured JSON documents with key/value access patterns; it lacks native querying for individual document fields. Option B is wrong because Azure Table Storage stores structured, schema-less entities (rows of properties) but does not natively support JSON documents; it uses OData for queries, not direct key/value lookup on JSON fields. Option D is wrong because Azure SQL Database is a relational database that requires a fixed schema and uses SQL for queries, making it overkill and less efficient for simple key/value lookups on semi-structured JSON compared to Cosmos DB's native document model.

Practice this question →

47

Drag & Dropmedium

Drag and drop the steps to create an Azure Data Lake Storage Gen2 account in the correct order.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps

Order

Why this order

Creating ADLS Gen2 requires enabling the hierarchical namespace feature on a standard storage account.

Practice this question →

48

MCQhard

You are a data architect at a global retail company. The company has an Azure Data Lake Storage Gen2 account that stores petabytes of clickstream data. They need to provide near real-time analytics dashboards for regional managers. The data arrives in batches every 5 minutes. Currently, they use Azure Databricks to transform the data and load it into Azure Synapse Analytics, but the dashboards show data that is 30 minutes old. The business requires dashboards to reflect data within 10 minutes of ingestion. You propose a new solution. Which approach should you recommend?

A.Keep current pipeline but replace Synapse with Azure Analysis Services for faster query performance.

B.Use Azure Data Factory with tumbling window triggers every 5 minutes to load data from Data Lake to Synapse.

C.Ingest data into Azure Event Hubs, use Azure Stream Analytics to process and output to Power BI for real-time dashboards.

D.Increase the number of Databricks clusters and use Auto Loader to speed up transformations.

AnswerC

Stream Analytics provides low-latency streaming to Power BI.

Why this answer

Option C is correct because it uses Azure Event Hubs for low-latency ingestion and Azure Stream Analytics for real-time processing, enabling near real-time dashboards in Power BI with sub-minute latency. This architecture bypasses the batch-oriented pipeline that causes the current 30-minute delay, meeting the 10-minute requirement.

Exam trap

The trap here is that candidates may assume batch tools like Data Factory or Databricks can be tuned to meet near real-time SLAs, but they fundamentally operate on file-based or micro-batch paradigms that cannot match the sub-minute latency of a true streaming pipeline with Event Hubs and Stream Analytics.

How to eliminate wrong answers

Option A is wrong because replacing Synapse with Azure Analysis Services does not address the root cause of latency—the batch processing in Databricks—and Analysis Services is an OLAP engine that still requires data to be loaded, not a streaming solution. Option B is wrong because Azure Data Factory with tumbling window triggers is a batch-oriented orchestration tool that introduces inherent latency from window scheduling and data movement, failing to achieve sub-10-minute freshness. Option D is wrong because increasing Databricks clusters and using Auto Loader only accelerates the batch transformation step but does not eliminate the fundamental batch processing delay, and Auto Loader still operates on file arrival, not streaming.

Practice this question →

49

MCQhard

Your organization stores sensitive financial data in Azure SQL Database. You need to audit all SELECT operations on the 'Transactions' table and alert the security team when a user outside the finance department queries the table. Which feature should you use?

A.Microsoft Defender for SQL

B.Dynamic Data Masking

C.SQL Server Auditing

D.Transparent Data Encryption

AnswerC

Auditing logs database events; can be configured to capture SELECT operations and trigger alerts.

Why this answer

SQL Server Auditing is the correct choice because it tracks database events, including SELECT operations, and writes them to an audit log. You can configure an audit policy to capture all SELECT statements on the 'Transactions' table and then set up an alert (e.g., via Azure Monitor or Logic Apps) that triggers when a user from outside the finance department executes such a query. This directly meets the requirement to both audit and alert on specific user actions.

Exam trap

The trap here is that candidates often confuse auditing (logging who did what) with security features that protect data at rest or in transit, such as TDE or Dynamic Data Masking, which do not provide any logging or alerting capabilities.

How to eliminate wrong answers

Option A is wrong because Microsoft Defender for SQL provides vulnerability assessments, threat detection, and anomaly alerts, but it does not offer granular auditing of specific table-level SELECT operations or user-based alerting. Option B is wrong because Dynamic Data Masking obfuscates sensitive data in query results to unauthorized users, but it does not log or alert on who performed the query. Option D is wrong because Transparent Data Encryption (TDE) encrypts the database at rest and on backup media, but it provides no auditing or alerting capabilities for data access operations.

Practice this question →

50

MCQeasy

A small business wants to use Azure to store and analyze customer feedback from surveys. The surveys are collected via a web app and stored as JSON files. The business needs to run SQL-based queries on the data and generate reports in Power BI. They have a limited budget and prefer a serverless option to minimize management overhead. Which Azure service should they use?

A.Azure Analysis Services

B.Azure Databricks

C.Azure Synapse Analytics serverless SQL pool

D.Azure SQL Database

AnswerC

Serverless, pay-per-query, can query JSON directly.

Why this answer

Azure Synapse Analytics serverless SQL pool is the correct choice because it allows you to query JSON files directly from Azure Data Lake Storage or Blob Storage using standard T-SQL, without provisioning any infrastructure. It is serverless (pay-per-query), supports SQL-based queries, and integrates seamlessly with Power BI for reporting, making it ideal for a small business with a limited budget and minimal management overhead.

Exam trap

The trap here is that candidates often confuse 'serverless' with 'fully managed' and choose Azure SQL Database (which is managed but not serverless in the pay-per-query sense) or Azure Databricks (which is serverless but requires Spark expertise and is not SQL-native), missing that Azure Synapse serverless SQL pool is the only option that combines serverless billing, direct JSON querying, and SQL-based reporting for Power BI.

How to eliminate wrong answers

Option A is wrong because Azure Analysis Services is a fully managed analytical engine that requires provisioning and managing a model, and it is not designed for direct querying of raw JSON files; it is used for building tabular or multidimensional models from pre-processed data. Option B is wrong because Azure Databricks is a big data analytics platform based on Apache Spark, which is overkill for simple SQL queries on JSON files and incurs cluster management costs even in serverless mode; it is not optimized for ad-hoc SQL queries on semi-structured data. Option D is wrong because Azure SQL Database is a fully managed relational database that requires provisioning a database instance and schema, and it is not serverless in the sense of pay-per-query; it incurs ongoing costs even when idle and requires importing JSON data into tables before querying.

Practice this question →

51

MCQmedium

You need to design a real-time dashboard that displays the number of orders placed in the last hour from an e-commerce application. The application writes orders to Azure Event Hubs. Which Azure service should you use to aggregate the data and serve the dashboard with minimal latency?

A.Azure Databricks Structured Streaming

B.Azure Stream Analytics with Power BI output

C.Azure Analysis Services

D.Azure Data Factory with tumbling window

AnswerB

Stream Analytics processes streaming data in real-time and integrates directly with Power BI.

Why this answer

Azure Stream Analytics is purpose-built for real-time data processing from sources like Event Hubs, and its native integration with Power BI enables direct output to a dashboard with sub-second latency. This combination provides the minimal-latency aggregation and serving required for a real-time orders dashboard without additional infrastructure.

Exam trap

The trap here is that candidates may confuse real-time processing with batch-oriented services like Azure Data Factory or assume that any big data platform (like Databricks) is automatically the best choice for low-latency dashboards, overlooking the purpose-built streaming-to-visualization pipeline of Stream Analytics and Power BI.

How to eliminate wrong answers

Option A is wrong because Azure Databricks Structured Streaming, while capable of real-time processing, introduces additional overhead for cluster management and is not optimized for direct dashboard serving with minimal latency compared to Stream Analytics. Option C is wrong because Azure Analysis Services is an OLAP engine for historical data analysis and cannot process real-time streaming data from Event Hubs. Option D is wrong because Azure Data Factory with tumbling window is designed for batch processing on a schedule, not for real-time streaming aggregation and low-latency dashboard updates.

Practice this question →

52

MCQmedium

The exhibit shows a SQL query run against Azure SQL Database. What is the purpose of the HAVING clause in this query?

A.To filter rows before grouping

B.To sort the result set

C.To join two tables

D.To filter groups based on aggregate conditions

AnswerD

HAVING filters groups after GROUP BY using aggregate functions.

Why this answer

The HAVING clause in SQL is used to filter groups after the GROUP BY clause has been applied, based on aggregate conditions such as SUM, COUNT, or AVG. In this query against Azure SQL Database, HAVING restricts the result to only those groups that satisfy the specified aggregate condition, which cannot be done with a WHERE clause because WHERE filters individual rows before grouping.

Exam trap

The trap here is that candidates often confuse HAVING with WHERE, mistakenly thinking HAVING can filter individual rows before grouping, when in fact WHERE must be used for that purpose.

How to eliminate wrong answers

Option A is wrong because the WHERE clause, not HAVING, is used to filter rows before grouping; HAVING operates after grouping. Option B is wrong because sorting is performed by the ORDER BY clause, not HAVING. Option C is wrong because joining tables is done with JOIN (e.g., INNER JOIN, LEFT JOIN) in the FROM clause, not with HAVING.

Practice this question →

53

MCQeasy

A logistics company stores shipment tracking data. The shipment ID, destination, and weight are stored in a fixed-schema database table. The shipment's route history is stored as a JSON document where each document can have different fields depending on the route events recorded. Which classification of data best describes the route history data?

A.Structured data

B.Semi-structured data

C.Unstructured data

D.Analytical data

AnswerB

JSON documents with varying fields are a classic example of semi-structured data. They have a schema that can evolve and are self-describing.

Why this answer

The route history data is stored as JSON documents where each document can have different fields depending on the events recorded. This flexibility in schema—where fields vary per document—is the hallmark of semi-structured data. Unlike structured data with a fixed schema, semi-structured data uses tags or markers (like JSON key-value pairs) to organize the data, making it self-describing.

Exam trap

The trap here is that candidates confuse 'structured' with 'organized' and assume JSON is structured because it has keys, but the key differentiator is schema flexibility—structured data enforces a fixed schema, while semi-structured data allows varying fields per record.

How to eliminate wrong answers

Option A is wrong because structured data requires a fixed schema with predefined columns and data types, such as the shipment ID, destination, and weight table mentioned in the question. Option C is wrong because unstructured data has no inherent structure or organization, such as raw video files or free-form text, whereas JSON documents have a defined key-value structure. Option D is wrong because analytical data is a classification of data usage (e.g., for reporting or BI), not a classification of data structure; the question asks about the structural classification of the route history data.

Practice this question →

54

MCQeasy

A company uses Azure Cosmos DB for a globally distributed application. They need to ensure low-latency reads and writes for users in multiple regions. Which consistency level provides the strongest guarantees without sacrificing availability?

A.Bounded staleness

B.Consistent prefix

C.Strong

D.Eventual

AnswerA

Bounded staleness offers strong consistency with a configurable lag and maintains write availability.

Why this answer

Bounded staleness provides the strongest consistency guarantee that still maintains availability during a partition. It ensures that reads are guaranteed to be within a configurable staleness window (either K versions or a time interval) from the latest write, which is stronger than consistent prefix or eventual consistency, while avoiding the availability trade-offs of strong consistency in a globally distributed Azure Cosmos DB account.

Exam trap

The trap here is that candidates often confuse 'strongest guarantees' with 'strong consistency,' not realizing that strong consistency sacrifices availability during a partition, whereas bounded staleness is the strongest level that still guarantees high availability in a globally distributed setup.

How to eliminate wrong answers

Option B (Consistent prefix) is wrong because it guarantees only that reads never see out-of-order writes, but it does not bound how far behind a read can be, so it is weaker than bounded staleness. Option C (Strong) is wrong because it offers linearizability but sacrifices write availability during a regional outage or partition, as all replicas must acknowledge the write before it is committed. Option D (Eventual) is wrong because it offers no ordering or recency guarantees; reads may return stale data indefinitely, which is the weakest consistency level.

Practice this question →

55

MCQmedium

A financial services company needs to store transaction records for 7 years to comply with regulatory requirements. The data is rarely accessed after the first month but must be available for audit within 24 hours. The storage solution must minimize cost while meeting compliance. Which Azure storage tier should you use for data older than one month?

A.Cool tier

B.Premium tier

C.Archive tier

D.Hot tier

AnswerA

Low storage cost and retrieval within minutes, fitting the 24-hour requirement.

Why this answer

The Cool tier is designed for data that is infrequently accessed but must be available quickly when needed, with a 30-day minimum storage duration and lower storage cost than Hot tier. Since the data is rarely accessed after the first month but must be retrievable within 24 hours for audits, Cool tier meets both the cost and availability requirements without the higher cost of Hot tier or the retrieval delay of Archive tier.

Exam trap

The trap here is that candidates often confuse 'rarely accessed' with 'archival' and choose Archive tier, overlooking the specific 24-hour retrieval requirement and the 180-day minimum storage duration that would cause early deletion charges for a 7-year retention policy.

How to eliminate wrong answers

Option B (Premium tier) is wrong because it is optimized for low-latency, high-throughput workloads (e.g., Azure VM disks or high-performance databases) and is significantly more expensive than needed for rarely accessed audit data. Option C (Archive tier) is wrong because it has a retrieval time of up to 15 hours (and often longer) and a 180-day minimum storage duration, which violates the 24-hour availability requirement and the 7-year retention period without incurring early deletion fees. Option D (Hot tier) is wrong because it is designed for frequently accessed data with the highest storage cost, making it cost-inefficient for data that is rarely accessed after the first month.

Practice this question →

56

MCQeasy

A data analyst needs to create interactive dashboards and reports from data stored in Azure Synapse Analytics. Which tool should they use?

A.Microsoft Power BI

B.SQL Server Reporting Services (SSRS)

C.Microsoft Excel

D.Azure Data Studio

AnswerA

Power BI provides interactive dashboards and reports with native Synapse connectivity.

Why this answer

Microsoft Power BI is the correct tool because it is designed specifically for creating interactive dashboards and reports from a wide range of data sources, including Azure Synapse Analytics. Power BI connects directly to Synapse SQL pools or serverless SQL endpoints using DirectQuery or import mode, enabling real-time visualizations and cross-filtering. This aligns with the requirement for interactive analytics, which is Power BI's core strength.

Exam trap

The trap here is confusing a data query/management tool (Azure Data Studio) or a static reporting tool (SSRS) with a dedicated interactive visualization platform, leading candidates to overlook Power BI's native integration with Azure Synapse Analytics.

How to eliminate wrong answers

Option B (SQL Server Reporting Services) is wrong because SSRS is a paginated report server for static, print-ready reports, not for interactive dashboards with live cross-filtering. Option C (Microsoft Excel) is wrong because while Excel can connect to Synapse and create charts, it lacks native interactive dashboard capabilities like slicers and drill-through across multiple visuals without complex add-ins. Option D (Azure Data Studio) is wrong because it is a database management and query tool for writing T-SQL and notebooks, not a reporting or dashboarding platform.

Practice this question →

57

MCQmedium

Your organization is migrating its on-premises SQL Server databases to Azure. The databases include a mix of operational (OLTP) and analytical (OLAP) workloads. For the OLTP databases, you need high availability and automated failover to a secondary region. For the OLAP databases, you need to support large-scale analytic queries with columnstore indexes and the ability to pause compute to save costs. Which Azure SQL deployment options should you choose for each workload type?

A.Azure SQL Managed Instance for both

B.SQL Server on Azure Virtual Machines for both

C.Azure SQL Database Hyperscale for OLTP; Azure SQL Database Serverless for OLAP

D.Azure SQL Database (geo-replication) for OLTP; Azure Synapse Analytics (dedicated SQL pool) for OLAP

AnswerD

Geo-replication provides failover; Synapse supports columnstore and pause.

Why this answer

Option D is correct because Azure SQL Database with active geo-replication provides high availability and automated failover to a secondary region for OLTP workloads, while Azure Synapse Analytics (dedicated SQL pool) supports large-scale analytic queries with columnstore indexes and allows pausing compute to save costs, meeting the OLAP requirements.

Exam trap

The trap here is that candidates may confuse Azure SQL Database Serverless with Synapse Analytics for OLAP, overlooking that Serverless is for intermittent OLTP workloads, not large-scale analytics, and that Hyperscale is for high-scale OLTP, not geo-replication failover.

How to eliminate wrong answers

Option A is wrong because Azure SQL Managed Instance does not support automated failover to a secondary region (it only offers failover within the same region via failover groups) and lacks the ability to pause compute for cost savings. Option B is wrong because SQL Server on Azure Virtual Machines requires manual configuration for geo-replication and automated failover, and does not natively support pausing compute; it also incurs ongoing VM costs even when idle. Option C is wrong because Azure SQL Database Hyperscale is designed for large databases and high throughput, not specifically for OLTP with geo-replication failover, and Azure SQL Database Serverless supports auto-pausing but is not optimized for large-scale analytic queries with columnstore indexes like Synapse is.

Practice this question →

58

MCQmedium

A company has a database that processes millions of small credit card transactions per second for payment authorization. They also need to run complex reports that aggregate transaction data over months to detect fraud patterns. Which type of workload describes the payment authorization process?

A.OLTP (Online Transaction Processing)

B.OLAP (Online Analytical Processing)

C.HTAP (Hybrid Transactional/Analytical Processing)

D.ETL (Extract, Transform, Load)

AnswerA

OLTP systems handle high volumes of small, fast transactions, such as credit card authorization.

Why this answer

The payment authorization process involves high-volume, low-latency transactions that read, insert, and update individual records in real time. This is the classic definition of OLTP (Online Transaction Processing), which is optimized for ACID-compliant, row-based operations on current data. The scenario explicitly states 'millions of small credit card transactions per second,' which aligns with OLTP workloads like order entry or banking.

Exam trap

The trap here is that candidates see 'complex reports' and 'aggregate transaction data' in the same question and assume the entire workload is analytical, but the question explicitly asks only about the payment authorization process, which is purely transactional.

How to eliminate wrong answers

Option B (OLAP) is wrong because OLAP is designed for complex aggregations and historical analysis over large datasets, not for processing individual real-time transactions. Option C (HTAP) is wrong because HTAP combines OLTP and OLAP in a single system, but the question asks specifically about the payment authorization process, which is purely transactional, not analytical. Option D (ETL) is wrong because ETL is a data integration process used to move and transform data between systems, not a workload type for processing live transactions.

Practice this question →

59

MCQeasy

A data analyst needs to visualize sales data from Azure SQL Database in real-time dashboards. Which tool should they use to create interactive reports?

A.Microsoft Power BI

B.Azure Data Studio

C.Azure Synapse Analytics

D.Microsoft Excel

AnswerA

Power BI is designed for interactive reporting and dashboards.

Why this answer

Microsoft Power BI is the correct tool because it is designed specifically for creating interactive, real-time dashboards and reports from various data sources, including Azure SQL Database. It supports live connections and DirectQuery to enable near-real-time visualization without requiring data movement.

Exam trap

The trap here is confusing database query tools (Azure Data Studio) or data storage/processing services (Azure Synapse Analytics) with dedicated visualization and reporting tools, leading candidates to overlook Power BI's specific role in real-time dashboard creation.

How to eliminate wrong answers

Option B is wrong because Azure Data Studio is a database management and query tool for SQL Server and Azure SQL, not a reporting or dashboarding tool. Option C is wrong because Azure Synapse Analytics is an enterprise analytics service for large-scale data warehousing and big data processing, not a tool for building interactive reports. Option D is wrong because Microsoft Excel is a spreadsheet application that can connect to databases but lacks native real-time dashboard capabilities and is not designed for interactive, live reporting.

Practice this question →

60

MCQmedium

A logistics company tracks package deliveries. When a package is scanned at a distribution center, the system immediately updates the delivery status in a database so customers can see the live tracking information. At the end of each day, the company runs a job that aggregates all delivery status changes into a report for operational analysis. Which of the following best describes these two data processing workloads?

A.Both are batch processing workloads.

B.The status update is a real-time workload, and the daily report is a batch workload.

C.Both are real-time processing workloads.

D.The status update is a batch workload, and the daily report is a real-time workload.

AnswerB

Correct. The status update is processed instantly (real-time), while the daily job processes data in batches (batch).

Why this answer

Option B is correct because the immediate status update upon scanning is a real-time workload, as it processes data instantly for live customer visibility. The end-of-day aggregation job is a batch workload, as it processes accumulated data in a scheduled, non-real-time manner for operational reporting.

Exam trap

The trap here is confusing the speed of the underlying database update with the processing pattern, leading candidates to assume that any database write is batch, or that any scheduled job is real-time, when the key distinction is whether the processing is triggered by each event or runs on a schedule.

How to eliminate wrong answers

Option A is wrong because it incorrectly labels both workloads as batch, ignoring the immediate, low-latency nature of the status update. Option C is wrong because it incorrectly labels both workloads as real-time, ignoring the scheduled, non-continuous nature of the daily aggregation report. Option D is wrong because it reverses the definitions, treating the immediate update as batch and the daily report as real-time, which contradicts the fundamental latency and processing patterns of each workload.

Practice this question →

61

MCQmedium

You are designing a batch processing pipeline that runs nightly to transform CSV files from an FTP server into Parquet files in Azure Data Lake Storage. Which Azure service should you use to orchestrate the pipeline?

A.Azure Functions

B.Azure Data Factory

C.Azure Logic Apps

D.Azure Batch

AnswerB

ADF provides orchestration for batch data pipelines with transformations.

Why this answer

Azure Data Factory (ADF) is the correct choice because it is a cloud-based ETL and data integration service designed specifically for orchestrating and automating data pipelines. It supports scheduled triggers (e.g., nightly runs), native connectors for FTP and Azure Data Lake Storage, and built-in data transformation activities like Copy Data and Mapping Data Flows to convert CSV to Parquet. ADF's control flow and dependency management make it ideal for batch processing pipelines.

Exam trap

The trap here is that candidates confuse Azure Data Factory with Azure Logic Apps or Azure Functions, assuming any 'automation' or 'serverless' service can orchestrate a batch ETL pipeline, but only ADF provides the native data movement, transformation, and scheduling capabilities required for this specific scenario.

How to eliminate wrong answers

Option A is wrong because Azure Functions is a serverless compute service for event-driven, short-running code, not designed for orchestrating complex, scheduled batch pipelines with dependencies and data movement across heterogeneous sources. Option C is wrong because Azure Logic Apps is a low-code workflow automation service primarily for integrating SaaS applications and APIs, lacking native data transformation capabilities like CSV-to-Parquet conversion and optimized data movement for large-scale batch processing. Option D is wrong because Azure Batch is a job scheduling and compute management service for running large-scale parallel and high-performance computing (HPC) workloads, not a data orchestration tool with built-in connectors for FTP and Data Lake Storage.

Practice this question →

62

MCQeasy

A company needs to store semi-structured data from IoT devices, including temperature readings and device status. The data will be queried by time range and device ID. Which Azure data service is most cost-effective for this use case?

A.Azure Blob Storage

B.Azure Cosmos DB

C.Azure SQL Database

D.Azure Table Storage

AnswerD

Table Storage is a low-cost NoSQL store ideal for IoT telemetry.

Why this answer

Azure Table Storage is a NoSQL key-value store that is optimized for storing large amounts of semi-structured data without requiring a fixed schema. It supports efficient queries by partition key (device ID) and row key (timestamp), making it ideal for time-series IoT data at a lower cost than other Azure data services.

Exam trap

The trap here is that candidates often choose Azure Cosmos DB for its NoSQL capabilities, overlooking the fact that Table Storage provides the same key-value functionality at a fraction of the cost for simple IoT workloads.

How to eliminate wrong answers

Option A is wrong because Azure Blob Storage is designed for unstructured binary or text data (e.g., images, logs, backups) and does not natively support indexed queries by device ID and time range without additional indexing or compute layers. Option B is wrong because Azure Cosmos DB, while capable of handling semi-structured data and time-series queries, is significantly more expensive than Table Storage for high-volume IoT data due to its provisioned throughput and multi-model capabilities. Option C is wrong because Azure SQL Database is a relational database that requires a fixed schema and is over-provisioned for simple key-value lookups, leading to higher cost and complexity for semi-structured IoT data.

Practice this question →

63

MCQeasy

A company stores customer orders in a relational database. The database enforces rules that every order must have a unique order number and must be linked to an existing customer record. This enforcement of rules to ensure accuracy and consistency is an example of which data concept?

A.Data schema

B.Data integrity

C.Data redundancy

D.Data latency

AnswerB

Data integrity is maintained through constraints like primary keys and foreign keys, which enforce rules to keep data accurate and consistent.

Why this answer

Data integrity refers to the enforcement of rules that ensure the accuracy, consistency, and reliability of data throughout its lifecycle. In this scenario, the relational database enforces entity integrity (unique order numbers) and referential integrity (linking orders to existing customer records), which are core mechanisms for maintaining data correctness.

Exam trap

The trap here is that candidates often confuse 'data schema' (the structural definition) with 'data integrity' (the enforcement of rules), mistakenly thinking that simply having a schema guarantees data accuracy and consistency.

How to eliminate wrong answers

Option A is wrong because a data schema defines the structure of the database (tables, columns, relationships) but does not itself enforce rules like uniqueness or referential constraints; it is the blueprint, not the enforcement mechanism. Option C is wrong because data redundancy refers to the unnecessary duplication of data, which can lead to inconsistencies, not the enforcement of rules to ensure accuracy and consistency. Option D is wrong because data latency measures the delay between data creation and its availability for use, which is unrelated to rule enforcement for accuracy and consistency.

Practice this question →

64

MCQeasy

A company stores customer order data in a relational database table with columns like OrderID, CustomerID, and OrderDate. They also store product images as JPEG files, and customer feedback as JSON documents with varying fields. Which of the following correctly orders these data types from most structured to least structured?

A.A: JSON documents, Relational table, JPEG files

B.B: Relational table, JSON documents, JPEG files

C.C: JPEG files, Relational table, JSON documents

D.D: Relational table, JPEG files, JSON documents

AnswerB

The relational table is structured, JSON is semi-structured, and JPEG is unstructured, so this is the correct descending order of structure.

Why this answer

Relational tables enforce a fixed schema with rows and columns, making them the most structured. JSON documents have a flexible schema with varying fields, placing them in the middle. JPEG files are binary blobs with no inherent structure for querying, making them the least structured.

Option B correctly orders these from most structured (relational table) to least structured (JPEG files).

Exam trap

The trap here is that candidates often confuse semi-structured data (JSON) with unstructured data (JPEG), incorrectly ranking JSON as less structured than binary files, or they forget that relational tables are the most structured due to their rigid schema enforcement.

How to eliminate wrong answers

Option A is wrong because it places JSON documents as more structured than relational tables, but JSON's flexible schema (allowing varying fields) is less structured than a fixed relational schema. Option C is wrong because it lists JPEG files as more structured than both relational tables and JSON documents, but JPEGs are unstructured binary data with no queryable schema. Option D is wrong because it places JPEG files as more structured than JSON documents, but JSON documents have a semi-structured format with key-value pairs and nesting, while JPEGs are entirely unstructured.

Practice this question →

65

MCQmedium

A bank processes a fund transfer that involves deducting money from one account and crediting it to another. The system ensures that both operations succeed together or, if any part fails, the entire transaction is rolled back, leaving both accounts unchanged. Which ACID property does this scenario primarily guarantee?

A.Consistency

B.Isolation

C.Durability

D.Atomicity

AnswerD

Atomicity ensures that a transaction is an indivisible unit of work. If any part fails, the entire transaction is rolled back, leaving the data unchanged, perfectly matching the described scenario.

Why this answer

Atomicity ensures that a transaction is treated as a single, indivisible unit of work. In this fund transfer scenario, both the debit and credit operations must complete successfully, or the entire transaction is rolled back, leaving the accounts unchanged. This all-or-nothing behavior is the defining characteristic of atomicity in ACID transactions.

Exam trap

The trap here is that candidates often confuse atomicity with consistency, mistakenly thinking that maintaining the total balance (consistency) is the same as the all-or-nothing execution of the transaction, but atomicity specifically focuses on the indivisibility of the transaction steps.

How to eliminate wrong answers

Option A is wrong because consistency ensures that a transaction brings the database from one valid state to another, preserving data integrity rules (e.g., total balance remains constant), but it does not guarantee the all-or-nothing execution of the individual operations. Option B is wrong because isolation ensures that concurrent transactions do not interfere with each other, preventing dirty reads or lost updates, but it does not address the rollback of a failed multi-step operation. Option C is wrong because durability guarantees that once a transaction is committed, its changes persist even in the event of a system failure, but it does not apply to the rollback behavior described in the scenario.

Practice this question →

66

MCQhard

A company uses Azure SQL Database and wants to implement row-level security so that sales managers can only see data for their own region. Which feature should they use?

A.Dynamic Data Masking

B.Row-level security (RLS)

C.Transparent Data Encryption (TDE)

D.Microsoft Purview

AnswerB

RLS restricts which rows users can see based on group membership or context.

Why this answer

Row-level security (RLS) is the correct feature because it allows you to control access to rows in a database table based on the characteristics of the user executing a query. In this scenario, RLS can be implemented using a security policy and a predicate function that filters rows based on the sales manager's region, ensuring they only see data for their own region.

Exam trap

The trap here is that candidates often confuse Dynamic Data Masking (which hides data in results) with Row-level security (which filters rows), leading them to choose option A when the requirement is about restricting row visibility, not masking column values.

How to eliminate wrong answers

Option A is wrong because Dynamic Data Masking obfuscates data in query results (e.g., hiding parts of a credit card number) but does not restrict which rows are visible; it masks columns, not filters rows. Option C is wrong because Transparent Data Encryption (TDE) encrypts the database at rest and in transit but provides no row-level filtering or access control based on user identity. Option D is wrong because Microsoft Purview is a data governance and cataloging service for discovering and managing data assets, not a database-level security feature for filtering rows in queries.

Practice this question →

67

MCQmedium

A company needs to store relational data that requires frequent updates and supports complex joins. They also need to scale out reads by using read replicas. Which Azure service should they choose?

A.Azure Database for MySQL

B.Azure Cosmos DB

C.Azure SQL Database

D.Azure Table Storage

AnswerA

Azure Database for MySQL is relational and supports read replicas.

Why this answer

Azure Database for MySQL is a fully managed relational database service that supports frequent updates and complex joins via SQL. It also offers read replicas, which allow scaling out read-heavy workloads by asynchronously replicating data from the primary server to up to five read-only replicas within the same region or cross-region.

Exam trap

The trap here is that candidates often confuse Azure SQL Database's geo-replication (which is for disaster recovery, not read scaling) with read replicas, or they assume Cosmos DB supports relational joins because of its SQL API, overlooking its fundamental NoSQL architecture.

How to eliminate wrong answers

Option B (Azure Cosmos DB) is wrong because it is a NoSQL multi-model database that does not support complex SQL joins natively and uses a different consistency model; it is designed for globally distributed, schema-less data, not relational data with frequent updates and joins. Option C (Azure SQL Database) is wrong because while it supports relational data and complex joins, it does not natively support read replicas for scaling out reads; it uses geo-replication and failover groups for high availability, not read-scale replicas. Option D (Azure Table Storage) is wrong because it is a NoSQL key-value store that does not support relational schemas, complex joins, or read replicas; it is designed for semi-structured data at massive scale.

Practice this question →

68

MCQmedium

You need to store semi-structured JSON data from a web application and query it using SQL-like syntax. The solution must support high throughput with low latency. Which Azure data store should you use?

A.Azure Blob Storage

B.Azure Cosmos DB

C.Azure SQL Database

D.Azure Table Storage

AnswerB

Cosmos DB natively supports JSON documents and SQL-like queries.

Why this answer

Azure Cosmos DB is the correct choice because it natively supports semi-structured JSON documents and offers SQL-like querying via its core (SQL) API. It is designed for high throughput and low latency with guaranteed single-digit millisecond response times at the 99th percentile, making it ideal for web applications with demanding performance requirements.

Exam trap

The trap here is that candidates often confuse Azure Blob Storage's ability to store JSON files with the ability to query them using SQL syntax, overlooking that Blob Storage lacks a native query engine for semi-structured data.

How to eliminate wrong answers

Option A is wrong because Azure Blob Storage stores unstructured binary or text data and does not support SQL-like querying of JSON content without additional services like Azure Data Lake or serverless SQL pools. Option C is wrong because Azure SQL Database is a relational database that requires a fixed schema and is not optimized for semi-structured JSON data with high throughput and low latency at Cosmos DB's scale. Option D is wrong because Azure Table Storage is a NoSQL key-value store that does not support SQL-like query syntax and is designed for simple, schema-less data with lower throughput and higher latency compared to Cosmos DB.

Practice this question →

69

MCQmedium

A bank processes individual customer transactions in real-time to update account balances and also runs a nightly job that aggregates all daily transactions into summary reports for management. Which of the following best describes these two processing workloads?

A.OLTP for real-time transactions, OLAP for nightly reports

B.Batch processing for transactions, Stream processing for reports

C.OLAP for transactions, OLTP for reports

D.ETL for transactions, ELT for reports

AnswerA

Correct. OLTP is designed for high-volume transactional updates (real-time balance changes), while OLAP is designed for complex queries and aggregation (historical reports).

Why this answer

Option A is correct because real-time individual transaction processing is the hallmark of Online Transaction Processing (OLTP), which focuses on high-volume, low-latency inserts and updates to maintain current account balances. The nightly aggregation of daily transactions into summary reports is a classic Online Analytical Processing (OLAP) workload, which involves complex queries over large historical datasets for business intelligence. These two workloads have fundamentally different performance and design requirements, making OLTP and OLAP the appropriate classifications.

Exam trap

The trap here is that candidates confuse the terms 'batch' and 'stream' with OLTP and OLAP, or incorrectly assume that any nightly job is 'batch processing' and any real-time task is 'stream processing,' when the exam specifically tests the distinction between transactional and analytical workloads.

How to eliminate wrong answers

Option B is wrong because it reverses the definitions: real-time transactions are stream/OLTP processing, not batch, and nightly summary reports are batch/OLAP processing, not stream. Option C is wrong because it swaps the roles: OLAP is designed for analytical queries on aggregated data, not for high-frequency transactional updates, and OLTP is not suited for large-scale summary report generation. Option D is wrong because ETL (Extract, Transform, Load) and ELT (Extract, Load, Transform) are data integration patterns used to move data between systems, not classifications of processing workloads; they describe how data is prepared, not the nature of the workload itself.

Practice this question →

70

Multi-Selectmedium

Which TWO Azure services are appropriate for real-time data ingestion from IoT devices?

Select 2 answers

A.Azure IoT Hub

B.Azure Data Factory

C.Azure SQL Database

D.Azure Blob Storage

E.Azure Event Hubs

AnswersA, E

Specifically built for IoT device connectivity and ingestion.

Why this answer

Azure IoT Hub is designed specifically for bidirectional communication with IoT devices, supporting protocols like MQTT, AMQP, and HTTPS for real-time data ingestion. It provides per-device authentication, device management, and built-in message routing to downstream services, making it ideal for ingesting telemetry data from millions of devices in real time.

Exam trap

The trap here is that candidates often confuse Azure IoT Hub (which provides device identity and management) with Azure Event Hubs (which is a generic event ingestion service), or mistakenly think that Azure Data Factory or Blob Storage can handle real-time IoT ingestion, when they are designed for batch or storage workloads respectively.

Practice this question →

71

MCQeasy

A database administrator is explaining to a colleague that a database transaction must ensure that either all operations within it succeed or none of them take effect. Which ACID property is being described?

A.Atomicity

B.Consistency

C.Isolation

D.Durability

AnswerA

Atomicity ensures all-or-nothing execution of a transaction.

Why this answer

Atomicity ensures that a transaction is treated as a single, indivisible unit of work: either all operations within it are committed successfully, or none are applied. This is the property that guarantees the 'all-or-nothing' behavior described in the question. In Azure SQL Database or SQL Server, atomicity is enforced through the transaction log and the write-ahead logging (WAL) protocol, which records changes before they are written to disk.

Exam trap

The trap here is that candidates often confuse Atomicity with Consistency, because both involve 'correctness' — but Atomicity is about the transaction's execution as a whole, while Consistency is about the database's adherence to rules after the transaction completes.

How to eliminate wrong answers

Option B is wrong because Consistency ensures that a transaction brings the database from one valid state to another, preserving all defined rules (e.g., constraints, triggers, cascades), but it does not guarantee the all-or-nothing outcome. Option C is wrong because Isolation controls how concurrent transactions are visible to each other (e.g., through locking or snapshot isolation), not whether a transaction's operations are applied as a unit. Option D is wrong because Durability guarantees that once a transaction is committed, its changes persist even after a system failure (e.g., via the transaction log being flushed to disk), not the atomic execution of the transaction's operations.

Practice this question →

72

MCQeasy

A research team needs to store thousands of PDF reports that vary in length and structure. The storage solution must allow flexible schema and support access from multiple programming languages via HTTP. Which data storage category best describes these reports?

A.Structured data

B.Semi-structured data

C.Unstructured data

D.Transactional data

AnswerC

Unstructured data has no predefined structure and is stored as files (e.g., PDFs, images). Azure Blob Storage is a common choice for such data.

Why this answer

C is correct because PDF reports with varying length and structure are binary files that do not conform to a predefined data model or schema, which is the definition of unstructured data. Azure Blob Storage or Amazon S3 are typical services for storing such unstructured data, accessed via HTTP REST APIs from any programming language.

Exam trap

The trap here is that candidates confuse 'semi-structured' with 'unstructured' because PDFs can contain text and metadata, but the exam expects you to recognize that the file itself is a binary blob with no schema enforced by the storage system.

How to eliminate wrong answers

Option A is wrong because structured data requires a rigid schema (e.g., tables with rows and columns in a relational database), but PDFs have no fixed schema. Option B is wrong because semi-structured data (e.g., JSON, XML) has tags or key-value pairs that provide some organizational metadata, whereas PDFs are binary blobs without such inherent structure. Option D is wrong because transactional data refers to records of business transactions (e.g., sales orders) that are typically structured and require ACID compliance, not binary documents.

Practice this question →

73

MCQmedium

A hospital collects patient data from multiple sources. Source A stores patient vitals as a continuous stream of readings from wearable devices. Source B stores historical medical records in a relational database with fixed columns (PatientID, Diagnosis, AdmissionDate). Source C stores doctor's notes as unstructured text files. Which statement correctly describes the structure of data from these sources?

A.Source A is semi-structured, Source B is structured, Source C is unstructured.

B.Source A is structured, Source B is structured, Source C is unstructured.

C.Source A is structured, Source B is unstructured, Source C is semi-structured.

D.Source A is semi-structured, Source B is semi-structured, Source C is unstructured.

AnswerB

Both Source A (vitals readings with a fixed schema) and Source B (relational database) are structured data. Source C (unstructured text) is unstructured.

Why this answer

Source A stores patient vitals as a continuous stream from wearable devices, which is structured data because it typically consists of time-stamped numeric readings with a fixed schema (e.g., timestamp, heart rate, blood pressure). Source B uses a relational database with fixed columns (PatientID, Diagnosis, AdmissionDate), which is classic structured data. Source C contains unstructured text files (doctor's notes) with no predefined schema.

Therefore, Option B correctly identifies all three sources.

Exam trap

The trap here is that candidates often confuse a continuous data stream (Source A) with semi-structured data, but in DP-900, a stream of fixed-format sensor readings is considered structured because it has a consistent schema (e.g., timestamp and numeric values), not because it arrives in real time.

How to eliminate wrong answers

Option A is wrong because it labels Source A as semi-structured, but a continuous stream of numeric vitals from wearable devices is structured (fixed schema of timestamp and numeric values), not semi-structured (which would require tags or markers like JSON/XML). Option C is wrong because it calls Source B unstructured, but a relational database with fixed columns is the definition of structured data, not unstructured. Option D is wrong because it labels Source A as semi-structured (should be structured) and Source B as semi-structured (should be structured), while correctly identifying Source C as unstructured.

Practice this question →

74

MCQhard

A company is building a data lake and collects data from three sources: (1) a relational database exporting CSV files with fixed columns for customer records, (2) API responses stored as JSON files with varying fields for product reviews, and (3) scanned handwritten notes stored as TIFF images. Which statement correctly categorizes these data by structure type?

A.1: structured, 2: semi-structured, 3: unstructured

B.1: semi-structured, 2: structured, 3: unstructured

C.1: structured, 2: unstructured, 3: semi-structured

D.1: unstructured, 2: semi-structured, 3: structured

AnswerA

Correct. CSV with fixed columns is structured; JSON with varying fields is semi-structured; images are unstructured.

Why this answer

Option A is correct because CSV files from a relational database have a fixed schema (rows and columns), making them structured data. JSON files from API responses with varying fields are semi-structured, as they use tags/keys to organize data without a rigid schema. TIFF images of handwritten notes are unstructured, lacking a predefined data model or organization.

Exam trap

The trap here is confusing semi-structured data (like JSON with varying fields) with unstructured data, or assuming that any file format (like CSV) is always structured regardless of content consistency.

How to eliminate wrong answers

Option B is wrong because it incorrectly labels CSV files as semi-structured (they are structured with fixed columns) and API JSON responses as structured (they are semi-structured due to varying fields). Option C is wrong because it misclassifies API JSON responses as unstructured (they have key-value pairs, making them semi-structured) and TIFF images as semi-structured (they are unstructured binary data). Option D is wrong because it calls CSV files unstructured (they have a fixed schema) and TIFF images structured (they have no predefined data model).

Practice this question →

75

MCQeasy

A company needs to store JSON documents that require flexible schema and low-latency access globally. Which Azure data service should they use?

A.Azure Table Storage

B.Azure SQL Database

C.Azure Blob Storage

D.Azure Cosmos DB

AnswerD

Cosmos DB supports flexible schema and global distribution.

Why this answer

Azure Cosmos DB is the correct choice because it is a globally distributed, multi-model database service that natively supports JSON documents with flexible schema. It offers turnkey global distribution, single-digit-millisecond latency at the 99th percentile, and multiple consistency models, making it ideal for low-latency access worldwide.

Exam trap

The trap here is that candidates often confuse Azure Blob Storage's ability to store JSON files as blobs with the need for a database that can query and index JSON documents with low-latency global access, leading them to incorrectly choose Blob Storage instead of Cosmos DB.

How to eliminate wrong answers

Option A is wrong because Azure Table Storage is a NoSQL key-value store that does not natively support JSON documents with flexible schema; it stores entities as rows with a fixed set of properties and lacks global distribution with low-latency guarantees. Option B is wrong because Azure SQL Database is a relational database that requires a predefined schema and does not offer native JSON document storage with flexible schema; it also lacks built-in global distribution for low-latency access. Option C is wrong because Azure Blob Storage is an object storage service for unstructured binary data and does not provide native JSON document querying, indexing, or global distribution with low-latency access.

Practice this question →

Page 1 of 4 · 267 questions totalNext →

Ready to test yourself?

Try a timed practice session using only Core Data Concepts questions.

Start 20-question session