CCNA Core Data Concepts Questions — Page 3 of 4

151

MCQeasy

A data analyst needs to query a large dataset stored in Azure Blob Storage using serverless SQL pool in Azure Synapse Analytics. Which data format should they use to minimize storage costs while still supporting efficient querying?

A.CSV

B.JSON

C.Parquet

D.Avro

AnswerC

Parquet is a columnar format that offers high compression and efficient query performance, minimizing storage costs.

Why this answer

Parquet is a columnar storage format that compresses data efficiently and supports predicate pushdown, allowing serverless SQL pool in Azure Synapse to read only the necessary columns and rows. This minimizes storage costs while maintaining high query performance, unlike row-oriented formats such as CSV or JSON.

Exam trap

The trap here is that candidates often assume all compressed formats (like Avro) are equally efficient for analytics, but Azure Synapse serverless SQL pool is specifically optimized for columnar formats like Parquet, not row-oriented ones.

How to eliminate wrong answers

Option A is wrong because CSV is a row-oriented, plain-text format with no compression or schema, leading to larger storage footprint and slower queries due to full file scans. Option B is wrong because JSON is also row-oriented and self-describing, resulting in poor compression and inefficient querying as serverless SQL pool must parse the entire file. Option D is wrong because Avro, while compact and schema-based, is row-oriented and not optimized for analytical queries that benefit from columnar storage and predicate pushdown.

Practice this question →

152

MCQmedium

Your team is building a real-time dashboard for monitoring website traffic. The data source is streaming click events from Azure Event Hubs. The dashboard must update within seconds. Which Azure service should you use to process the stream?

A.Azure Stream Analytics

B.Azure Synapse Pipelines

C.Azure Data Factory

D.Azure Databricks Structured Streaming

AnswerA

Stream Analytics is designed for real-time stream processing with sub-second latency and direct integration with Power BI.

Why this answer

Azure Stream Analytics is designed for real-time stream processing with low-latency output, making it ideal for processing click events from Event Hubs and updating a dashboard within seconds. It provides a SQL-like query language to define transformations and can output directly to Power BI or other visualization tools for near-instantaneous dashboard updates.

Exam trap

Microsoft often tests the misconception that any data processing service can handle streaming, but the trap here is that Azure Data Factory and Synapse Pipelines are batch-oriented, while Databricks Structured Streaming, though capable, is not the simplest or most cost-effective choice for a quick, SQL-based real-time dashboard.

How to eliminate wrong answers

Option B (Azure Synapse Pipelines) is wrong because it is primarily an orchestration tool for data movement and transformation in batch scenarios, not for real-time stream processing with sub-second latency. Option C (Azure Data Factory) is wrong because it is a cloud-based ETL service for batch data integration and scheduling, lacking native support for continuous streaming inputs like Event Hubs. Option D (Azure Databricks Structured Streaming) is wrong because while it can process streams, it is a more complex, code-heavy solution (Spark-based) that is overkill for simple dashboard updates and does not offer the same turnkey, low-latency output to Power BI as Stream Analytics.

Practice this question →

153

MCQhard

A healthcare organization must store patient health records for 7 years to meet regulatory requirements. After 7 years, data must be deleted immediately. They use Azure Blob Storage. Which policy should they implement?

A.Soft delete policy

B.Legal hold policy

C.Lifecycle management policy with deletion after 7 years

D.Time-based retention policy

AnswerD

Time-based retention retains data for the specified period and then allows deletion.

Why this answer

A time-based retention policy (immutability policy) in Azure Blob Storage ensures that blobs are stored in a WORM (Write Once, Read Many) state for a specified period, preventing modification or deletion. After the retention period expires, the data can be deleted immediately, meeting the 7-year regulatory requirement. This policy is designed specifically for compliance scenarios where data must be preserved for a fixed duration and then removed.

Exam trap

The trap here is that candidates confuse lifecycle management (which automates deletion but does not prevent premature modification) with time-based retention (which enforces immutability during the retention period), leading them to choose lifecycle management despite its inability to guarantee data integrity before deletion.

How to eliminate wrong answers

Option A is wrong because a soft delete policy only protects against accidental deletion by retaining deleted blobs for a configurable period, but it does not enforce a minimum retention duration or guarantee immediate deletion after 7 years. Option B is wrong because a legal hold policy indefinitely prevents deletion or modification of blobs for legal or investigation purposes, with no automatic expiration, so it cannot enforce a fixed 7-year retention followed by deletion. Option C is wrong because a lifecycle management policy can delete blobs after a specified age, but it does not prevent modification or deletion during the retention period, meaning data could be altered or deleted before 7 years, violating compliance requirements.

Practice this question →

154

MCQmedium

An e-commerce application processes customer orders. When an order is placed, the system must decrement the inventory count and process the payment. The application ensures that either both operations complete successfully or both are rolled back if any error occurs. Which database property does this guarantee?

A.Atomicity

B.Consistency

C.Isolation

D.Durability

AnswerA

Atomicity ensures that a transaction is an all-or-nothing operation. If any part of the transaction fails, the entire transaction is rolled back, preventing partial updates.

Why this answer

Atomicity ensures that a transaction is treated as a single, indivisible unit of work: either all operations within it (decrement inventory and process payment) complete successfully, or none are applied. If any part fails, the database rolls back all changes, maintaining the 'all-or-nothing' guarantee. This is the core property described in the scenario.

Exam trap

The trap here is that candidates confuse atomicity with consistency, thinking that 'keeping data valid' is the same as 'all-or-nothing execution,' but atomicity is specifically about the transaction's indivisibility, not about data integrity rules.

How to eliminate wrong answers

Option B (Consistency) is wrong because consistency ensures that a transaction brings the database from one valid state to another, preserving defined rules (e.g., constraints, triggers), but it does not guarantee the all-or-nothing rollback of multiple operations. Option C (Isolation) is wrong because isolation controls how concurrent transactions are visible to each other (e.g., preventing dirty reads), not whether a transaction's operations are treated as a single unit. Option D (Durability) is wrong because durability guarantees that once a transaction is committed, its changes persist even after a system failure, but it does not address the rollback of partial failures within a transaction.

Practice this question →

155

MCQeasy

A company stores three types of data: 1) Customer orders in a SQL table with fixed columns for OrderID, CustomerID, and OrderDate. 2) Product reviews in XML files where each file contains varying tags such as <rating> and <comment>. 3) Video files of product demonstrations. Which of the following correctly classifies these data types in order from first to third?

A.Structured, Semi-structured, Unstructured

B.Semi-structured, Unstructured, Structured

C.Unstructured, Structured, Semi-structured

D.Structured, Unstructured, Semi-structured

AnswerA

This correctly identifies the SQL table as structured, the XML files as semi-structured, and video files as unstructured.

Why this answer

Customer orders in a SQL table with fixed columns (OrderID, CustomerID, OrderDate) are structured data because they conform to a rigid schema. Product reviews in XML files with varying tags like <rating> and <comment> are semi-structured data because they have tags/metadata but no fixed schema. Video files of product demonstrations are unstructured data because they lack any predefined data model or organization.

Exam trap

Microsoft often tests the distinction between semi-structured and unstructured data by using XML/JSON as semi-structured examples, where candidates mistakenly classify them as unstructured due to the lack of a fixed schema, ignoring the presence of metadata tags.

How to eliminate wrong answers

Option B is wrong because it incorrectly classifies XML files as semi-structured (correct) but then misorders video files as unstructured (should be third) and SQL tables as structured (should be first), reversing the correct sequence. Option C is wrong because it starts with unstructured (video files) instead of structured (SQL tables), and places structured last, which is the opposite of the correct order. Option D is wrong because it correctly identifies SQL tables as structured first but then incorrectly swaps semi-structured (XML) and unstructured (video), placing video before XML.

Practice this question →

156

MCQmedium

Your team is migrating on-premises SQL Server databases to Azure. They need to minimize application changes and support both OLTP and reporting workloads. Which Azure data service supports hybrid transactional and analytical processing (HTAP)?

A.Azure Analysis Services

B.Azure SQL Managed Instance

C.Azure SQL Database with Hyperscale

D.Azure Data Factory

AnswerC

Hyperscale supports large databases and can handle both transactional and analytical queries.

Why this answer

Azure SQL Database with Hyperscale supports hybrid transactional and analytical processing (HTAP) by using built-in columnstore indexes and near-instantaneous snapshot isolation. This allows the same database to handle high-volume OLTP transactions while simultaneously running complex analytical queries on up-to-date data, minimizing application changes because the existing SQL Server code and tools work without modification.

Exam trap

The trap here is that candidates often confuse Azure SQL Database with Hyperscale (which supports HTAP) with Azure SQL Managed Instance (which is a full SQL Server instance but lacks the built-in HTAP architecture), leading them to choose Option B because it sounds like a direct lift-and-shift option.

How to eliminate wrong answers

Option A is wrong because Azure Analysis Services is a dedicated analytical engine that requires data to be extracted, transformed, and loaded (ETL) from a source, making it unsuitable for direct OLTP workloads and failing to minimize application changes. Option B is wrong because Azure SQL Managed Instance is a fully managed SQL Server instance that supports OLTP and reporting, but it does not natively provide the HTAP architecture; it requires separate read replicas or external reporting solutions to avoid performance impact on transactional workloads. Option D is wrong because Azure Data Factory is a cloud-based ETL and data integration service, not a database platform, and cannot directly serve OLTP or analytical queries.

Practice this question →

157

MCQeasy

A data analyst receives a dataset containing customer order details stored in a CSV file, a JSON file with product reviews, and a folder of JPEG images of products. Which of the following correctly categorizes these data types from most structured to least structured?

A.CSV → JPEG → JSON

B.JSON → CSV → JPEG

C.CSV → JSON → JPEG

D.JPEG → JSON → CSV

AnswerC

Correct. CSV is structured, JSON is semi-structured, and JPEG is unstructured, so this is the correct order from most to least structured.

Why this answer

CSV files are highly structured with rows and columns defined by a schema, making them the most structured. JSON files are semi-structured, using key-value pairs and nested objects that allow flexibility but lack a fixed schema. JPEG images are unstructured binary data with no inherent schema, so the correct order from most to least structured is CSV → JSON → JPEG, making option C correct.

Exam trap

Microsoft often tests the misconception that JSON is more structured than CSV because it uses named keys, but in reality, CSV's fixed schema makes it more structured than JSON's flexible, self-describing format.

How to eliminate wrong answers

Option A is wrong because it places JPEG (unstructured) before JSON (semi-structured), incorrectly suggesting that binary image data is more structured than key-value pairs. Option B is wrong because it orders JSON before CSV, but CSV's rigid tabular schema is more structured than JSON's flexible hierarchy. Option D is wrong because it lists JPEG first, which is the least structured, and JSON before CSV, reversing the correct order of structuredness.

Practice this question →

158

MCQhard

A company's data engineering team uses Azure Data Factory to orchestrate a pipeline that ingests data from Azure Blob Storage, transforms it using Azure Databricks, and loads it into Azure Synapse Dedicated SQL Pool. The pipeline fails intermittently due to transient errors. Which pattern should they implement to improve reliability?

A.Replace Azure Databricks with Azure Functions

B.Increase the pipeline timeout to 24 hours

C.Split the pipeline into multiple smaller pipelines

D.Configure retry policy with exponential backoff on activities

AnswerD

Retry policies with backoff automatically retry failed activities due to transient errors.

Why this answer

Option D is correct because configuring a retry policy with exponential backoff on the Azure Data Factory activities directly addresses transient errors (e.g., network blips, throttling) by automatically retrying the failed activity after increasing delays. This pattern is specifically designed for intermittent failures and is a built-in feature of Azure Data Factory, improving pipeline reliability without architectural changes.

Exam trap

The trap here is that candidates confuse increasing timeout (Option B) with retry logic, or think splitting pipelines (Option C) improves reliability against transient errors, when in fact only a retry policy with backoff directly mitigates intermittent failures in Azure Data Factory.

How to eliminate wrong answers

Option A is wrong because replacing Azure Databricks with Azure Functions would remove the distributed compute engine needed for complex transformations, and Azure Functions are not designed for long-running, data-intensive ETL workloads. Option B is wrong because increasing the pipeline timeout to 24 hours does not handle transient errors; it only allows the pipeline to run longer, but a single transient failure still causes the entire pipeline to fail. Option C is wrong because splitting the pipeline into multiple smaller pipelines does not inherently handle transient errors; it may reduce blast radius but does not provide automatic retry logic for intermittent failures.

Practice this question →

159

MCQeasy

A retail company operates an online store. The store processes each customer's order immediately upon submission, updating inventory and payment records in real-time. Additionally, the company's business analysts run weekly reports that aggregate sales data over the past month to identify trends. Which of the following correctly describes the two workload types represented in this scenario?

A.The order processing is an OLTP workload; the weekly reporting is an OLAP workload.

B.The order processing is an OLAP workload; the weekly reporting is an OLTP workload.

C.Both workloads are OLTP workloads.

D.Both workloads are batch processing workloads.

AnswerA

Correct. Order processing involves many small, real-time transactions (OLTP). Weekly reporting aggregates large volumes of historical data (OLAP).

Why this answer

The order processing system handles individual transactions (inserts/updates) in real-time, which is the hallmark of an Online Transaction Processing (OLTP) workload. The weekly reporting aggregates large volumes of historical data for trend analysis, which is an Online Analytical Processing (OLAP) workload. OLTP is optimized for high-volume, low-latency writes, while OLAP is optimized for complex read-heavy queries over large datasets.

Exam trap

The trap here is that candidates confuse the real-time nature of order processing with batch processing or mistakenly think that any reporting is OLTP, failing to recognize that OLAP is specifically designed for analytical queries over historical data.

How to eliminate wrong answers

Option B is wrong because it reverses the definitions: order processing is not analytical (OLAP) and weekly reporting is not transactional (OLTP). Option C is wrong because the weekly reporting is not a transactional workload; it involves aggregation and analysis, not real-time inserts/updates. Option D is wrong because only the weekly reporting could be considered batch processing, but the order processing is real-time, not batch; moreover, the question asks for workload types (OLTP vs OLAP), not processing modes.

Practice this question →

160

MCQhard

A retail company uses Azure SQL Database to store transactional data. They need to ensure that reporting queries do not impact the performance of the transactional workload. Which solution should you recommend?

A.Configure a read replica in Azure SQL Database

B.Increase the DTU or vCore limit of the database

C.Add indexes to the reporting tables

D.Partition the largest tables by date

AnswerA

A read replica handles reporting queries without impacting the primary transactional workload.

Why this answer

A read replica in Azure SQL Database allows reporting queries to be offloaded to a read-only copy of the database, isolating them from the primary transactional workload. This ensures that reporting activities do not consume resources (CPU, IO, memory) on the primary instance, preventing performance degradation for transactional operations.

Exam trap

The trap here is that candidates often confuse scaling up the database (Option B) with workload isolation, not realizing that scaling up only adds more resources but does not separate read and write operations, so reporting queries can still cause blocking or resource contention on the primary.

How to eliminate wrong answers

Option B is wrong because increasing DTU or vCore limits scales up the entire database, which does not isolate reporting queries from transactional workloads; both workloads still compete for the same resources. Option C is wrong because adding indexes to reporting tables can improve query performance but does not prevent reporting queries from impacting the transactional workload, as they still run on the same database engine. Option D is wrong because partitioning tables by date can improve query performance and manageability but does not provide workload isolation; reporting queries still execute on the same primary database and can contend with transactional operations.

Practice this question →

161

MCQeasy

A retail company uses a point-of-sale (POS) system that records each sales transaction in a database. Each transaction involves reading the current inventory, updating the stock level, and recording the sale. The database must ensure that concurrent transactions do not interfere with each other, so that one transaction does not see partially updated data from another. Which property of a database transaction ensures this isolation?

A.Atomicity

B.Consistency

C.Isolation

D.Durability

AnswerC

Correct. Isolation ensures that concurrent transactions do not see each other's intermediate states, preventing dirty reads and other anomalies.

Why this answer

Isolation ensures that concurrent transactions do not interfere with each other, so each transaction sees a consistent snapshot of the database as if it were the only transaction running. In the POS scenario, isolation prevents one transaction from reading partially updated inventory data from another transaction, which could lead to overselling or stock discrepancies. This property is typically implemented through locking mechanisms or multi-version concurrency control (MVCC).

Exam trap

The trap here is that candidates often confuse isolation with atomicity, thinking that 'not seeing partially updated data' is about the transaction being all-or-nothing, when in fact it is about preventing interference between concurrent transactions.

How to eliminate wrong answers

Option A is wrong because atomicity ensures that a transaction is treated as a single, indivisible unit that either fully completes or fully rolls back, but it does not control how concurrent transactions interact. Option B is wrong because consistency ensures that a transaction brings the database from one valid state to another, preserving integrity constraints, but it does not manage concurrent access. Option D is wrong because durability guarantees that once a transaction is committed, its changes persist even in the event of a system failure, but it has no role in isolating concurrent transactions.

Practice this question →

162

MCQeasy

A company stores customer contact information in a table with columns for CustomerID, Name, Email, and Phone. They also store customer support chat transcripts as plain text files. Which of the following correctly classifies these data types?

A.Both are structured data

B.Customer contact information is structured; chat transcripts are semi-structured

C.Customer contact information is structured; chat transcripts are unstructured

D.Both are semi-structured

AnswerC

The table has a fixed schema (structured), while chat transcripts are free text (unstructured).

Why this answer

Customer contact information stored in a table with columns like CustomerID, Name, Email, and Phone is structured data because it has a fixed schema with rows and columns. Chat transcripts stored as plain text files have no predefined schema or organization, making them unstructured data. Therefore, option C correctly classifies the contact info as structured and the chat transcripts as unstructured.

Exam trap

The trap here is that candidates often confuse semi-structured data (like JSON or XML) with unstructured data (like plain text), incorrectly classifying chat transcripts as semi-structured because they contain some implicit structure (e.g., timestamps or user names) when in fact they lack a formal schema or metadata tags.

How to eliminate wrong answers

Option A is wrong because it claims both are structured, but chat transcripts as plain text files lack a fixed schema and are not organized into rows and columns. Option B is wrong because it classifies chat transcripts as semi-structured; semi-structured data (e.g., JSON, XML) has tags or markers to separate elements, whereas plain text files have no such structure. Option D is wrong because it claims both are semi-structured, but customer contact info in a relational table is strictly structured, and chat transcripts are unstructured.

Practice this question →

163

MCQmedium

A marketing team needs to analyze customer purchase history data stored in Azure SQL Database. They want to create interactive dashboards with drill-down capabilities. Which Microsoft tool should they use?

A.Power BI

B.Azure Data Studio

C.Microsoft Excel

D.Azure Analysis Services

AnswerA

Power BI is designed for interactive dashboards with drill-down capabilities.

Why this answer

Power BI is the correct tool because it is designed specifically for creating interactive dashboards with drill-down capabilities using data from Azure SQL Database. It connects directly to Azure SQL Database via built-in connectors, allowing users to build visualizations that support hierarchical navigation and real-time filtering.

Exam trap

The trap here is that candidates confuse Azure Analysis Services as a visualization tool, when in fact it is a backend analytical engine that requires Power BI or another client for dashboard creation.

How to eliminate wrong answers

Option B is wrong because Azure Data Studio is a database management and query tool, not a dashboarding or visualization tool; it lacks native interactive dashboard and drill-down features. Option C is wrong because Microsoft Excel can create charts and pivot tables, but it does not provide native drill-down capabilities for interactive dashboards and is not optimized for real-time, cloud-based data exploration. Option D is wrong because Azure Analysis Services is a data modeling and analytical engine that provides OLAP cubes and tabular models, but it is not a front-end visualization tool; it requires a separate client like Power BI to render interactive dashboards.

Practice this question →

164

MCQeasy

The exhibit shows a KQL query in Azure Data Explorer. What is the output of this query?

A.Bottom 5 states by total property damage

B.Top 5 states by total property damage

C.All states with total property damage

D.All storm events after 2024-01-01

AnswerB

The query returns top 5 states with highest sum of DamageProperty.

Why this answer

The KQL query uses `summarize` to aggregate total property damage by state, then `top 5 by total_property_damage` to return the five states with the highest total damage. The `desc` argument (default) orders the results in descending order, making option B correct.

Exam trap

The trap here is that candidates may confuse `top` with `take` or `limit`, forgetting that `top` implicitly sorts in descending order unless `asc` is specified, leading them to think it returns the bottom values or all rows.

How to eliminate wrong answers

Option A is wrong because `top 5` returns the highest values, not the lowest; to get bottom 5, you would need `top 5 by total_property_damage asc`. Option C is wrong because `top 5` limits the output to exactly five rows, not all states. Option D is wrong because the query does not filter by date; it aggregates all storm events regardless of date.

Practice this question →

165

MCQeasy

A company stores customer orders in a database. Each order has an OrderID (integer), CustomerName (text), OrderDate (date), and a JSON column for order details that contains varying fields such as discount codes or gift messages. Which statement best describes the data types in this table?

A.The table stores only structured data.

B.The table stores both structured and semi-structured data.

C.The table stores only unstructured data.

D.The table stores only semi-structured data.

AnswerB

Correct. The fixed columns (OrderID, CustomerName, OrderDate) are structured, while the JSON column is semi-structured due to its flexible schema.

Why this answer

The table includes structured columns (OrderID integer, CustomerName text, OrderDate date) and a JSON column for order details, which stores semi-structured data because JSON allows flexible schemas with varying fields like discount codes or gift messages. This combination of fixed-schema columns and a schema-less JSON column means the table holds both structured and semi-structured data, making option B correct.

Exam trap

The trap here is that candidates often mistake JSON for unstructured data, but JSON is semi-structured because it has a logical structure (key-value pairs) even though the schema is flexible, leading them to incorrectly choose option C.

How to eliminate wrong answers

Option A is wrong because the JSON column contains semi-structured data, not purely structured data, as structured data requires a fixed schema with consistent fields. Option C is wrong because unstructured data (e.g., images, videos, raw text files) is not present; JSON is semi-structured, not unstructured. Option D is wrong because the table also includes structured columns (OrderID, CustomerName, OrderDate) with fixed data types, so it does not store only semi-structured data.

Practice this question →

166

MCQeasy

A company collects customer feedback in three forms: a structured table with customer ID and rating (1-5), free-text comments, and audio recordings of phone calls. Which of the following correctly orders these data from least structured to most structured?

A.Audio recordings, free-text comments, structured table

B.Structured table, free-text comments, audio recordings

C.Free-text comments, structured table, audio recordings

D.Audio recordings, structured table, free-text comments

AnswerA

Correct: audio is unstructured, free-text is semi-structured, and the table is structured.

Why this answer

Option A is correct because data structure ranges from unstructured (audio recordings with no schema), through semi-structured (free-text comments with no fixed format), to structured (a table with defined columns and data types). This ordering aligns with the DP-900 core data concept of data classification based on schema rigidity.

Exam trap

The trap here is that candidates often confuse 'least structured' with 'most organized', incorrectly ranking the structured table as least structured due to its simplicity, rather than recognizing that structure refers to schema rigidity, not data size or complexity.

How to eliminate wrong answers

Option B is wrong because it reverses the order, placing the structured table as least structured, which contradicts the definition of structured data having a fixed schema. Option C is wrong because it places free-text comments (semi-structured) before the structured table, but free-text comments lack the rigid schema of a table, making them less structured than the table. Option D is wrong because it places the structured table in the middle, but audio recordings are unstructured and should be the least structured, not the table.

Practice this question →

167

MCQhard

The exhibit shows a Kusto Query Language (KQL) query run in Azure Data Explorer. What is the output of this query?

A.All storm events in Texas with property damage

B.The total property damage for all event types in Texas

C.The top 5 event types in Texas by total property damage

D.A list of the top 5 property damage amounts in Texas

AnswerC

The query summarizes by EventType and returns the top 5.

Why this answer

The query uses `summarize sum(PropertyDamage) by EventType` to aggregate total property damage per event type, then `top 5 by TotalPropertyDamage` to return the five event types with the highest totals. The `where State == 'TEXAS'` filter ensures only Texas storms are considered. This directly yields the top 5 event types in Texas by total property damage.

Exam trap

The trap here is that candidates confuse 'top 5 property damage amounts' (raw values) with 'top 5 event types by total property damage' (aggregated categories), or they think the query lists individual events rather than summarized groups.

How to eliminate wrong answers

Option A is wrong because the query does not list individual storm events; it aggregates damage by event type, so it cannot output 'all storm events'. Option B is wrong because the query groups by EventType and returns multiple rows (top 5), not a single total for all event types combined. Option D is wrong because the query outputs event types, not raw property damage amounts; the `top 5` operator returns the entire row (EventType and TotalPropertyDamage), not just the damage values.

Practice this question →

168

MCQhard

A banking application processes a funds transfer transaction consisting of two steps: debit $100 from Account A and credit $100 to Account B. If the system crashes after debiting Account A but before crediting Account B, the database automatically reverts the debit, restoring Account A to its original balance. Which ACID property guarantees this behavior?

A.Atomicity

B.Consistency

C.Isolation

D.Durability

AnswerA

Correct. Atomicity guarantees the all-or-nothing nature of transactions. Since the debit is rolled back, the transaction is atomic.

Why this answer

Atomicity ensures that a transaction is treated as a single, indivisible unit of work. In this scenario, the debit and credit are part of one transaction; if the system crashes after the debit but before the credit, the database management system (DBMS) automatically rolls back the entire transaction, undoing the debit to restore Account A's original balance. This all-or-nothing behavior is the defining characteristic of atomicity.

Exam trap

The trap here is that candidates often confuse atomicity with consistency, thinking that 'restoring the original balance' is about maintaining data rules, when in fact it is the rollback of an incomplete transaction that demonstrates atomicity.

How to eliminate wrong answers

Option B (Consistency) is wrong because consistency ensures that a transaction brings the database from one valid state to another, preserving all defined rules (e.g., constraints, triggers), but it does not inherently handle crash recovery or rollback of partial changes. Option C (Isolation) is wrong because isolation governs how concurrent transactions are executed independently to prevent interference, not how a single transaction recovers from a crash. Option D (Durability) is wrong because durability guarantees that once a transaction is committed, its changes persist even after a system failure; it does not apply to uncommitted transactions that need to be rolled back.

Practice this question →

169

MCQhard

You are designing a data lake architecture for a large enterprise. You need to organize data into zones (raw, curated, and analytics) and enforce data lineage tracking. Which Azure service should you use to catalog and govern the data?

A.Azure Synapse Analytics

B.Azure Data Factory

C.Microsoft Purview

D.Azure Databricks

AnswerC

Purview provides unified data governance, cataloging, and lineage.

Why this answer

Microsoft Purview is the correct choice because it is a unified data governance service designed specifically for cataloging data assets, tracking lineage across hybrid and multi-cloud environments, and enforcing data policies. Unlike the other options, Purview provides out-of-the-box lineage scanning, a business glossary, and automated classification, making it the appropriate tool for organizing data into zones and ensuring end-to-end lineage in a data lake architecture.

Exam trap

The trap here is that candidates confuse data integration or analytics services (like Azure Data Factory or Synapse) with a dedicated governance and cataloging tool, assuming lineage tracking is a built-in feature of those services rather than a separate function provided by Microsoft Purview.

How to eliminate wrong answers

Option A is wrong because Azure Synapse Analytics is an analytics service that combines data warehousing and big data processing, but it does not provide native data cataloging or lineage tracking capabilities beyond basic metadata; it relies on Purview for governance. Option B is wrong because Azure Data Factory is an ETL and data integration service that can capture lineage during pipeline runs, but it is not a dedicated catalog or governance tool; it lacks persistent cataloging, business glossary, and policy enforcement features. Option D is wrong because Azure Databricks is a unified analytics platform for data engineering and machine learning, but it does not include a built-in data catalog or lineage governance; it integrates with Purview for such purposes.

Practice this question →

170

MCQeasy

Your organization wants to run SQL queries on data stored in Azure Blob Storage without moving the data. Which Azure service supports this?

A.Azure SQL Database

B.Azure Analysis Services

C.Azure Synapse Serverless SQL pool

D.Azure Data Lake Storage Gen2

AnswerC

Serverless SQL pool can query Blob Storage directly using T-SQL.

Why this answer

Azure Synapse Serverless SQL pool allows you to query data directly from Azure Blob Storage using T-SQL without moving the data. It uses a distributed query engine that reads files in place, supporting formats like Parquet, CSV, and JSON, making it ideal for ad-hoc analytics on stored data.

Exam trap

The trap here is that candidates confuse Azure Data Lake Storage Gen2 (a storage service) with a query engine, or assume Azure SQL Database can query external blobs natively, when in fact only Synapse Serverless SQL pool provides serverless T-SQL querying over Blob Storage without data movement.

How to eliminate wrong answers

Option A is wrong because Azure SQL Database is a fully managed relational database that requires data to be imported or loaded into its storage; it cannot query external Blob Storage directly without additional tools like PolyBase. Option B is wrong because Azure Analysis Services is an OLAP engine that requires data to be loaded into its in-memory tabular model from sources like SQL databases; it does not support direct querying of Blob Storage. Option D is wrong because Azure Data Lake Storage Gen2 is a storage service built on Blob Storage with a hierarchical namespace, but it is not a query engine; it stores data but does not provide SQL query capabilities itself.

Practice this question →

171

MCQeasy

A company stores customer information in a table with columns CustomerID, Name, Address, and PhoneNumber. Every row has values for all these columns, and the data follows a fixed schema. Which type of data does this represent?

A.Unstructured data

B.Semi-structured data

C.Structured data

D.Streaming data

AnswerC

Correct. Structured data has a fixed schema with defined columns and data types, typical of relational database tables.

Why this answer

Structured data conforms to a fixed schema where each row has the same columns and data types. The table with CustomerID, Name, Address, and PhoneNumber, where every row contains values for all columns, perfectly fits this definition. This is typical of relational database tables (e.g., in Azure SQL Database) where the schema is enforced at the table level.

Exam trap

The trap here is that candidates may confuse 'semi-structured' with 'structured' because both have some organization, but the key distinction is that structured data enforces a fixed schema for all rows, while semi-structured data allows schema flexibility (e.g., missing attributes or varying data types).

How to eliminate wrong answers

Option A is wrong because unstructured data has no predefined schema or organization (e.g., text files, images, videos), whereas the table has a fixed schema with defined columns. Option B is wrong because semi-structured data has some organizational properties (like tags or key-value pairs) but does not enforce a rigid schema across all records (e.g., JSON or XML files), unlike the fixed schema described. Option D is wrong because streaming data refers to data that is continuously generated and processed in real time (e.g., from IoT devices or event hubs), not to the static storage format of a table.

Practice this question →

172

MCQeasy

A company stores employee records in a relational database table with columns EmployeeID, FirstName, LastName, Department. They also store employee handbooks as PDF files, and customer feedback as XML documents. Which of the following correctly classifies these data types?

A.Employee records: structured, Employee handbooks: semi-structured, Customer feedback: unstructured

B.Employee records: structured, Employee handbooks: unstructured, Customer feedback: semi-structured

C.Employee records: semi-structured, Employee handbooks: unstructured, Customer feedback: structured

D.Employee records: unstructured, Employee handbooks: semi-structured, Customer feedback: structured

AnswerB

Correct classification: employee records in a relational table are structured; PDF handbooks are unstructured; XML feedback is semi-structured due to its hierarchical tags.

Why this answer

Option B is correct because employee records in a relational database table have a fixed schema (columns and data types), making them structured data. Employee handbooks stored as PDF files have no internal schema and are binary blobs, classifying them as unstructured data. Customer feedback stored as XML documents have a flexible, self-describing schema with tags, making them semi-structured data.

Exam trap

The trap here is confusing semi-structured data (which has some organizational properties like tags in XML) with unstructured data (which has no inherent structure), leading candidates to misclassify PDFs as semi-structured or XML as structured.

How to eliminate wrong answers

Option A is wrong because it incorrectly classifies employee handbooks (PDF files) as semi-structured; PDFs are unstructured binary files with no inherent schema, not semi-structured like XML or JSON. Option C is wrong because it classifies employee records as semi-structured; relational database tables with a fixed schema are structured, not semi-structured. Option D is wrong because it classifies employee records as unstructured; relational tables with defined columns and data types are structured, and customer feedback as structured; XML documents are semi-structured, not structured.

Practice this question →

173

MCQeasy

A company wants to provide self-service analytics to business users who need to create reports and dashboards from data in Azure Synapse Analytics. Which tool should you recommend?

A.Power BI

B.Microsoft Excel

C.Azure Synapse Studio

D.Azure Data Studio

AnswerA

Power BI is designed for business users to create interactive reports and dashboards.

Why this answer

Power BI is the correct tool because it is designed specifically for self-service analytics, enabling business users to create interactive reports and dashboards from data stored in Azure Synapse Analytics. Power BI connects directly to Synapse via its built-in connector, allowing users to build visualizations without writing code or relying on IT. This aligns with the requirement for business users to perform ad-hoc analysis and reporting.

Exam trap

The trap here is that candidates may confuse Azure Synapse Studio (a development tool) with a reporting tool, overlooking that Power BI is the designated Microsoft solution for self-service business intelligence and dashboards.

How to eliminate wrong answers

Option B is wrong because Microsoft Excel, while capable of basic data analysis and charting, lacks the native connectivity and interactive dashboard capabilities required for self-service analytics on Azure Synapse Analytics; it is not designed for real-time, large-scale data visualization. Option C is wrong because Azure Synapse Studio is a development and management interface for data engineers and data scientists to build pipelines, write SQL scripts, and manage Spark jobs, not a self-service reporting tool for business users. Option D is wrong because Azure Data Studio is a lightweight database management tool for querying and developing with SQL Server and Azure SQL, focused on developers and DBAs, not on creating business reports and dashboards.

Practice this question →

174

MCQhard

Refer to the exhibit. The JSON shows an Azure Policy definition. Which effect should be used to proactively prevent creation of storage accounts without encryption?

A.AuditIfNotExists

B.Deny

C.Disabled

D.Append

AnswerB

Deny prevents non-compliant resource creation.

Why this answer

The 'Deny' effect is correct because it proactively blocks the creation or update of a storage account that does not meet the encryption requirement, preventing non-compliant resources from being provisioned. This aligns with Azure Policy's ability to enforce compliance at resource creation time, rather than auditing or remediating after the fact.

Exam trap

The trap here is that candidates often confuse 'AuditIfNotExists' with a proactive block, not realizing it only logs non-compliance after the resource is created, whereas 'Deny' is the only effect that prevents creation entirely.

How to eliminate wrong answers

Option A (AuditIfNotExists) is wrong because it only logs a compliance warning when a storage account lacks encryption, but does not prevent its creation; it is a reactive audit effect. Option C (Disabled) is wrong because it turns off the policy entirely, allowing any storage account to be created without encryption. Option D (Append) is wrong because it adds additional fields to a resource during creation or update, but it cannot block a request; it is used to add tags or settings, not to deny non-compliant resources.

Practice this question →

175

MCQmedium

You are designing a data solution for a retail company that needs to store transactional data (orders, payments) with strong consistency and support for complex joins. The data volume is moderate but expected to grow. Which Azure service should you choose?

A.Azure Cosmos DB

B.Azure Table Storage

C.Azure SQL Database

D.Azure Synapse Analytics

AnswerC

Azure SQL Database provides full relational capabilities with strong consistency.

Why this answer

Azure SQL Database is a fully managed relational database that provides ACID transactions with strong consistency and supports complex joins via T-SQL. It is ideal for transactional workloads like orders and payments where data integrity and relational queries are critical, and it scales elastically to accommodate growing data volumes.

Exam trap

The trap here is that candidates confuse 'scalability' with 'suitability for transactional workloads' and choose Cosmos DB for its global distribution, overlooking that strong consistency and complex joins are not its core strengths.

How to eliminate wrong answers

Option A is wrong because Azure Cosmos DB is a NoSQL database that prioritizes horizontal scaling and low latency over strong consistency (defaulting to eventual consistency unless configured for higher cost) and does not natively support complex joins across multiple entities. Option B is wrong because Azure Table Storage is a key-value NoSQL store with no support for joins, foreign keys, or ACID transactions, making it unsuitable for relational transactional data. Option D is wrong because Azure Synapse Analytics is a big data analytics service designed for large-scale data warehousing and complex analytical queries, not for OLTP workloads requiring real-time transactional consistency and frequent small writes.

Practice this question →

176

Multi-Selecthard

Which THREE of the following are benefits of using a columnar storage format like Parquet for analytical workloads?

Select 3 answers

A.Enforced referential integrity constraints

B.Reduced I/O when querying a subset of columns

C.Optimized for frequent row updates

D.Better compression ratios due to similar data types in columns

E.Support for predicate pushdown to skip irrelevant data

AnswersB, D, E

Reading fewer columns reduces I/O and speeds up queries.

Why this answer

Option B is correct because columnar storage formats like Parquet store data by column rather than by row. When a query only needs a subset of columns (e.g., SELECT SUM(sales) FROM table), the storage engine reads only the relevant column chunks from disk, dramatically reducing I/O compared to reading entire rows. This is a key performance advantage for analytical workloads that aggregate or filter on specific columns.

Exam trap

The trap here is that candidates confuse the benefits of columnar storage (optimized for read-heavy, aggregate queries) with row-oriented storage benefits (optimized for frequent updates and transactional integrity), leading them to incorrectly select Option C.

Practice this question →

177

MCQeasy

A bank processes online fund transfers. Each transaction must ensure that either both the debit from the sender's account and the credit to the receiver's account occur, or if any part fails, the entire transaction is rolled back. Which ACID property does this guarantee?

A.Atomicity

B.Consistency

C.Isolation

D.Durability

AnswerA

Atomicity guarantees that all operations in a transaction are treated as a single unit. If any operation fails, the entire transaction is rolled back, leaving data unchanged. This directly applies to the bank scenario.

Why this answer

Atomicity ensures that a transaction is treated as a single, indivisible unit of work. In this fund transfer scenario, atomicity guarantees that both the debit and credit operations either complete successfully together or are fully rolled back if any part fails, preventing partial updates that could leave the system in an inconsistent state.

Exam trap

Microsoft often tests atomicity by describing a multi-step operation and asking which ACID property ensures the 'all-or-nothing' behavior, and the trap here is that candidates confuse atomicity with consistency, thinking that consistency alone prevents partial updates, when in fact atomicity is the property that enforces the rollback of incomplete transactions.

How to eliminate wrong answers

Option B (Consistency) is wrong because consistency ensures that a transaction brings the database from one valid state to another, preserving all defined rules (e.g., constraints, triggers), but it does not inherently guarantee the all-or-nothing behavior of the debit and credit pair. Option C (Isolation) is wrong because isolation controls how concurrent transactions are executed to prevent interference (e.g., dirty reads), not the atomic completion of a single transaction's operations. Option D (Durability) is wrong because durability guarantees that once a transaction is committed, its changes persist even after a system failure, but it does not address the rollback of a failed transaction.

Practice this question →

178

MCQeasy

A company maintains a database of customer orders that are updated frequently. They also store aggregated monthly sales reports that are generated once and then only read. Which statement correctly distinguishes these two types of data workloads?

A.Transactional data is optimized for write operations, and analytical data is optimized for read operations.

B.Transactional data must always be stored in non-relational databases, and analytical data in relational databases.

C.Analytical data always requires real-time processing, whereas transactional data is batch-processed.

D.Transactional data is read-only and analytical data is frequently updated.

AnswerA

This is correct. OLTP systems are designed for efficient writes, while OLAP systems are designed for complex reads.

Why this answer

Option A is correct because transactional workloads (like the frequently updated customer orders) are optimized for write-heavy operations, ensuring ACID compliance and data integrity, while analytical workloads (like the read-only monthly sales reports) are optimized for read-heavy operations, often using columnar storage or pre-aggregated data to speed up queries. This distinction aligns with the core difference between OLTP (Online Transaction Processing) and OLAP (Online Analytical Processing) systems in Azure, such as Azure SQL Database for transactional data and Azure Synapse Analytics for analytical data.

Exam trap

The trap here is that candidates confuse the typical characteristics of OLTP and OLAP, mistakenly thinking analytical data requires real-time processing or that transactional data is read-only, when in fact the opposite is true for each.

How to eliminate wrong answers

Option B is wrong because transactional data can be stored in both relational databases (e.g., Azure SQL Database) and non-relational databases (e.g., Azure Cosmos DB), and analytical data is often stored in relational or specialized columnar stores (e.g., Azure Synapse), not exclusively in one type. Option C is wrong because analytical data typically uses batch processing (e.g., nightly ETL jobs) rather than real-time processing, while transactional data requires real-time or near-real-time processing for individual write operations. Option D is wrong because transactional data is frequently updated (write-heavy), not read-only, and analytical data is typically read-only or updated in bulk during refresh cycles, not frequently updated.

Practice this question →

179

MCQmedium

A logistics company collects sensor data from delivery trucks. Each sensor sends a JSON message that includes a fixed set of core fields (truck ID, timestamp) but also includes optional fields such as temperature, humidity, and engine diagnostics depending on the sensor type. The JSON structure varies between messages. How should this data be classified?

A.Structured data

B.Semi-structured data

C.Unstructured data

D.Relational data

AnswerB

Correct. Semi-structured data does not have a rigid schema but has some organizational properties (tags, markers) to separate data elements. JSON with optional fields is a classic example.

Why this answer

The JSON messages contain a fixed set of core fields (truck ID, timestamp) but also include optional fields that vary per message, meaning the data has a flexible schema. This mixture of structured fields and variable attributes is the defining characteristic of semi-structured data, which does not require a rigid schema like a relational table but still has organizational properties (e.g., key-value pairs). In Azure, this type of data is commonly stored in services like Azure Cosmos DB or Azure Blob Storage with JSON format.

Exam trap

The trap here is that candidates often mistake any data with a consistent core set of fields as 'structured data', overlooking that the presence of optional, varying fields makes it semi-structured.

How to eliminate wrong answers

Option A is wrong because structured data requires a fixed, predefined schema (e.g., columns in a SQL table) with consistent fields across all records, but the JSON messages here have optional fields that vary. Option C is wrong because unstructured data has no predefined structure or schema (e.g., raw video files, plain text), whereas JSON has a defined key-value format. Option D is wrong because relational data specifically refers to data organized into tables with rows and columns linked by foreign keys, which is not the case for JSON messages with varying fields.

Practice this question →

180

MCQhard

A global e-commerce platform uses a combination of relational and NoSQL databases. The order management system requires ACID transactions across multiple tables (Orders, OrderItems, Inventory). The product catalog uses a flexible schema to accommodate varying product attributes and is read-heavy. The session store requires low-latency key-value lookups with eventual consistency. Which of the following pairings of data stores best matches these requirements?

A.Order management: Azure Cosmos DB (NoSQL API) - Product catalog: Azure SQL Database - Session store: Azure Table Storage

B.Order management: Azure SQL Database - Product catalog: Azure Cosmos DB (NoSQL API) - Session store: Azure Cache for Redis

C.Order management: Azure Table Storage - Product catalog: Azure SQL Database - Session store: Azure Cosmos DB (NoSQL API)

D.Order management: Azure Cosmos DB (Table API) - Product catalog: Azure Cache for Redis - Session store: Azure SQL Database

AnswerB

Azure SQL Database provides strong ACID transactions for orders. Cosmos DB with NoSQL API offers flexible schema and low-latency reads for the product catalog. Azure Cache for Redis delivers sub-millisecond key-value lookups ideal for session state with eventual consistency.

Why this answer

Option B is correct because Azure SQL Database provides full ACID transaction support across multiple tables, making it ideal for order management. Azure Cosmos DB (NoSQL API) offers a flexible schema and high read throughput for the product catalog. Azure Cache for Redis delivers sub-millisecond key-value lookups with eventual consistency, perfect for session storage.

Exam trap

The trap here is that candidates often assume NoSQL databases like Cosmos DB can handle ACID transactions across multiple tables, but in reality, Cosmos DB only guarantees atomicity within a single document or stored procedure, not across separate containers or tables.

How to eliminate wrong answers

Option A is wrong because Azure Cosmos DB (NoSQL API) does not support multi-table ACID transactions across separate containers; it only offers single-document atomicity. Option C is wrong because Azure Table Storage lacks ACID transaction support across multiple tables, and Azure SQL Database is not optimized for flexible-schema, read-heavy product catalogs. Option D is wrong because Azure Cosmos DB (Table API) also lacks multi-table ACID transactions, Azure Cache for Redis is not designed for persistent, flexible-schema catalog storage, and Azure SQL Database is not suitable for low-latency key-value session stores with eventual consistency.

Practice this question →

181

MCQmedium

A data analyst needs to create interactive dashboards that display real-time data from Azure SQL Database. Which Microsoft tool should they use?

A.Microsoft Excel

B.Microsoft Copilot

C.Azure Data Studio

D.Power BI

AnswerD

Business analytics tool for real-time dashboards.

Why this answer

Power BI is the correct tool because it is designed specifically for creating interactive dashboards and reports, and it supports real-time data connectivity to Azure SQL Database through DirectQuery or streaming datasets. This allows the data analyst to visualize live data without manual refreshes, meeting the requirement for real-time dashboards.

Exam trap

The trap here is that candidates may confuse Azure Data Studio (a database management tool) with a visualization tool, or assume Microsoft Excel is sufficient for real-time dashboards, when Power BI is the only option that natively supports interactive, real-time visualizations with Azure SQL Database.

How to eliminate wrong answers

Option A is wrong because Microsoft Excel is a spreadsheet application that can connect to Azure SQL Database but lacks native support for real-time interactive dashboards; it requires manual data refresh or Power Query, and its visualization capabilities are limited compared to dedicated BI tools. Option B is wrong because Microsoft Copilot is an AI assistant integrated into various Microsoft products (like Power BI or Azure) to help generate content or code, but it is not a standalone tool for creating dashboards or connecting to live data sources. Option C is wrong because Azure Data Studio is a cross-platform database management and query tool for Azure SQL Database, primarily used for writing T-SQL queries, managing databases, and developing scripts; it does not provide dashboard or real-time visualization capabilities.

Practice this question →

182

MCQeasy

A data engineer is classifying data types collected from three sources for a data lake. Source 1: Customer records from a SQL database exported as CSV files with fixed columns (CustomerID, Name, Address). Source 2: Product reviews obtained via API as JSON documents with varying fields (e.g., some reviews include 'rating' and 'verified_purchase', others include 'comment'). Source 3: Scanned handwritten order forms saved as TIFF images. Which statement correctly categorizes these data by structure?

A.Source 1: Structured; Source 2: Semi-structured; Source 3: Unstructured

B.Source 1: Structured; Source 2: Structured; Source 3: Unstructured

C.Source 1: Semi-structured; Source 2: Structured; Source 3: Unstructured

D.Source 1: Structured; Source 2: Unstructured; Source 3: Semi-structured

AnswerA

Correct. CSV files with fixed columns are structured. JSON with varying fields is semi-structured. TIFF images are unstructured.

Why this answer

Option A is correct because Source 1 (CSV from SQL) has a fixed schema with defined columns, making it structured data. Source 2 (JSON from API) allows varying fields per document, which is the hallmark of semi-structured data. Source 3 (TIFF images) contains no inherent schema or machine-readable structure, classifying it as unstructured data.

Exam trap

The trap here is that candidates confuse CSV files (which are structured when they have a fixed schema) with semi-structured data, or assume JSON is always structured because it has key-value pairs, ignoring that varying fields make it semi-structured.

How to eliminate wrong answers

Option B is wrong because it incorrectly classifies Source 2 (JSON with varying fields) as structured, ignoring that JSON documents with optional or varying fields do not enforce a rigid schema like a SQL table. Option C is wrong because it mislabels Source 1 (CSV with fixed columns) as semi-structured, whereas CSV with a consistent schema is structured, and it also mislabels Source 2 as structured instead of semi-structured. Option D is wrong because it classifies Source 2 (JSON) as unstructured, but JSON has key-value pairs and a defined format, making it semi-structured, and it mislabels Source 3 (TIFF images) as semi-structured, but images lack any inherent data structure.

Practice this question →

183

MCQeasy

A company stores customer data in a SQL table with fixed columns (CustomerID, Name, Email, SignupDate). They also store product images as JPEG files and application logs as JSON documents. Which of the following correctly classifies each data type?

A.SQL table: structured, JPEG: unstructured, JSON: semi-structured

B.SQL table: structured, JPEG: semi-structured, JSON: unstructured

C.SQL table: semi-structured, JPEG: unstructured, JSON: structured

D.SQL table: unstructured, JPEG: structured, JSON: semi-structured

AnswerA

Correct. SQL tables have a fixed schema (structured), JPEG files have no inherent schema (unstructured), and JSON documents have a flexible schema (semi-structured).

Why this answer

Option A is correct because a SQL table with fixed columns enforces a rigid schema, making it structured data. JPEG files are binary blobs with no internal schema, classifying them as unstructured. JSON documents use key-value pairs with flexible schemas, which is the definition of semi-structured data.

Exam trap

The trap here is confusing semi-structured data (like JSON) with unstructured data (like images), or assuming that any file format with a standard (like JPEG) is semi-structured, when in fact JPEG is purely binary and unstructured.

How to eliminate wrong answers

Option B is wrong because it incorrectly classifies JPEG as semi-structured (JPEG is binary and lacks schema) and JSON as unstructured (JSON has a flexible schema, making it semi-structured). Option C is wrong because it classifies the SQL table as semi-structured (SQL tables with fixed columns are structured, not semi-structured) and JSON as structured (JSON is semi-structured, not rigidly structured). Option D is wrong because it classifies the SQL table as unstructured (SQL tables are highly structured) and JPEG as structured (JPEG files have no schema).

Practice this question →

184

MCQeasy

A hospital stores patient records. Each record includes a PatientID (integer), Name (text), DateOfBirth (date), and MRI scan images (binary files). Which classification best describes the MRI scan images?

A.Structured data

B.Semi-structured data

C.Unstructured data

D.Streaming data

AnswerC

Unstructured data has no predefined data model. Binary files like images, videos, and audio files fall into this category.

Why this answer

MRI scan images are binary files that lack a predefined data model or schema, making them unstructured data. Unlike structured data (e.g., rows in a SQL table) or semi-structured data (e.g., JSON with tags), binary image files cannot be easily queried or organized using traditional relational database tools without additional processing.

Exam trap

Microsoft often tests the misconception that any data stored in a database (e.g., as a BLOB) is structured, but the classification depends on the data's internal format, not its storage location.

How to eliminate wrong answers

Option A is wrong because structured data requires a fixed schema with rows and columns, such as a PatientID integer in a relational table, which does not apply to binary image files. Option B is wrong because semi-structured data has organizational properties like tags or key-value pairs (e.g., JSON or XML), whereas MRI images are raw binary blobs without inherent metadata structure. Option D is wrong because streaming data refers to continuous data flows from sources like IoT sensors or log streams, not static binary files stored in a database.

Practice this question →

185

MCQeasy

Your organization has a large dataset of customer transactions stored in Azure Blob Storage as CSV files. You need to run ad-hoc SQL queries on this data without loading it into a database. Which Azure service should you use?

A.Azure Data Factory

B.Azure SQL Database

C.Azure Synapse Serverless SQL pool

D.Azure Analysis Services

AnswerC

Serverless SQL pool queries data directly from Blob Storage using T-SQL, with no loading required.

Why this answer

Azure Synapse Serverless SQL pool allows you to query data directly from files in Azure Blob Storage using standard T-SQL syntax, without needing to load or move the data into a database. It uses a pay-per-query model and supports CSV, Parquet, and JSON formats, making it ideal for ad-hoc analytical queries over large datasets stored in data lakes.

Exam trap

The trap here is that candidates often confuse Azure Data Factory (a data movement/orchestration tool) with a query engine, or assume Azure SQL Database can query external files via PolyBase (which requires loading into external tables, not direct ad-hoc querying).

How to eliminate wrong answers

Option A is wrong because Azure Data Factory is an ETL and data orchestration service, not a SQL query engine; it cannot run ad-hoc SQL queries directly against files. Option B is wrong because Azure SQL Database requires data to be loaded into its relational storage before querying, which contradicts the requirement to query without loading. Option D is wrong because Azure Analysis Services is an OLAP engine for semantic models and multidimensional analysis, not designed for direct SQL queries over raw CSV files in Blob Storage.

Practice this question →

186

MCQmedium

You need to choose a data storage solution for a global e-commerce platform that requires single-digit millisecond read and write latencies across multiple regions. The data is semi-structured and includes user profiles and product catalogs. Which Azure service should you use?

A.Azure Redis Cache

B.Azure Cosmos DB

C.Azure Table Storage

D.Azure SQL Database

AnswerB

Azure Cosmos DB offers global distribution and low latency for semi-structured data.

Why this answer

Azure Cosmos DB is the correct choice because it is a globally distributed, multi-model database service that guarantees single-digit millisecond read and write latencies at the 99th percentile, regardless of the number of regions. It supports semi-structured data natively through its document (JSON) API, making it ideal for user profiles and product catalogs that require low-latency access across multiple geographic regions.

Exam trap

The trap here is that candidates often confuse Azure Redis Cache's in-memory speed with the need for persistent, globally distributed storage, overlooking that Redis Cache is not designed for durable, multi-region data storage with consistency guarantees.

How to eliminate wrong answers

Option A is wrong because Azure Redis Cache is an in-memory data store designed primarily for caching and session state, not for persistent, globally distributed storage of semi-structured data with multi-region write capabilities. Option C is wrong because Azure Table Storage is a NoSQL key-value store that offers only eventual consistency by default and does not provide guaranteed single-digit millisecond latencies across multiple regions or native global distribution. Option D is wrong because Azure SQL Database is a relational database that requires a fixed schema, making it less suitable for semi-structured data, and its global replication options (e.g., failover groups) do not guarantee single-digit millisecond latencies for writes across multiple regions.

Practice this question →

187

MCQeasy

You need to query data stored in Azure Cosmos DB for NoSQL using SQL-like syntax. Which feature should you use?

A.Use Azure SQL Database elastic query

B.Use Power BI DirectQuery

C.Use the SQL API built into Cosmos DB

D.Use Azure Synapse Analytics Serverless SQL pool

AnswerC

Cosmos DB's SQL API allows querying JSON documents with SQL-like syntax.

Why this answer

Azure Cosmos DB for NoSQL provides a native SQL API that allows you to query JSON documents using SQL-like syntax. This API translates standard SQL queries into Cosmos DB's internal query engine, enabling you to SELECT, filter, and project data directly from containers without any additional services or connectors.

Exam trap

The trap here is that candidates may confuse Azure Synapse Analytics Serverless SQL pool (which can also query Cosmos DB) with the native Cosmos DB SQL API, but the question specifically asks for the feature built into Cosmos DB for NoSQL, not an external query service.

How to eliminate wrong answers

Option A is wrong because Azure SQL Database elastic query is used to query data across multiple Azure SQL databases, not for querying Cosmos DB NoSQL data. Option B is wrong because Power BI DirectQuery is a connection mode for real-time analytics from Power BI, not a feature for directly querying Cosmos DB with SQL-like syntax. Option D is wrong because Azure Synapse Analytics Serverless SQL pool can query Cosmos DB via the Synapse Link feature, but it is not the built-in SQL API of Cosmos DB itself and requires additional configuration.

Practice this question →

188

MCQhard

Refer to the exhibit. You are configuring a custom role in Azure RBAC for a team that needs to read and list blobs in a storage account. The JSON snippet shows the permissions assigned. After assigning this role to a user, they report they cannot see the storage account in the Azure portal. What is the most likely cause?

A.The dataActions should be actions instead of dataActions.

B.The role does not include read permission on the storage account resource.

C.The role is not assigned at the subscription scope.

D.The user needs the Contributor role to view the storage account.

AnswerB

Missing Microsoft.Storage/storageAccounts/read action prevents seeing the account.

Why this answer

Option C is correct because the role only includes dataActions for blob read, but lacks read access to the storage account resource itself (e.g., Microsoft.Storage/storageAccounts/read). To see the storage account in the portal, the user needs read permissions on the resource. Option A is wrong because the role is scoped to the storage account, not subscription level.

Option B is wrong because portal access does not require Contributor role; Reader role suffices. Option D is wrong because dataActions are correctly used for blob-level permissions.

Practice this question →

189

MCQeasy

Your company is implementing a data governance solution using Microsoft Purview. The data catalog must automatically scan and classify sensitive data in Azure SQL Database, Azure Synapse Analytics, and Amazon S3. The company uses Microsoft Entra ID for identity management. You need to ensure that the Purview managed identity can authenticate to these data sources. Which authentication method should you configure for the Amazon S3 connection?

A.AWS IAM authentication

B.SQL Authentication

C.Windows Authentication

D.Microsoft Entra ID authentication

AnswerA

Amazon S3 uses AWS IAM for authentication. The Purview managed identity must be granted access via an IAM role.

Why this answer

Amazon S3 is an external cloud storage service that does not support Microsoft Entra ID, SQL Authentication, or Windows Authentication. To authenticate Purview's managed identity to S3, you must configure AWS IAM authentication, which allows Purview to assume an IAM role with permissions to read the S3 bucket metadata and data for scanning and classification.

Exam trap

The trap here is that candidates may assume Microsoft Entra ID authentication works for all data sources because the question mentions Entra ID for identity management, but Amazon S3 is an AWS service that requires AWS IAM, not Microsoft's identity system.

How to eliminate wrong answers

Option B (SQL Authentication) is wrong because SQL Authentication is used for Azure SQL Database and Azure Synapse Analytics, not for Amazon S3, which is a non-relational object store. Option C (Windows Authentication) is wrong because Windows Authentication is only applicable to on-premises SQL Server or Azure services integrated with Active Directory, not to AWS S3. Option D (Microsoft Entra ID authentication) is wrong because Amazon S3 does not support Microsoft Entra ID; it uses AWS IAM for identity and access management.

Practice this question →

190

MCQeasy

A bank's online transaction processing system records every withdrawal and deposit in a database. The bank also runs a monthly report that summarizes total transactions per customer. Which statement correctly identifies these two workloads?

A.Both workloads are OLTP.

B.The transaction recording is OLTP, and the monthly report is OLAP.

C.The transaction recording is OLAP, and the monthly report is OLTP.

D.Both workloads are OLAP.

AnswerB

Correct. OLTP handles the real-time transaction recording, while OLAP analyzes the aggregated historical data for reporting.

Why this answer

The transaction recording system is an OLTP (Online Transaction Processing) workload because it handles individual, real-time transactions (withdrawals and deposits) with high concurrency and low latency. The monthly report summarizing total transactions per customer is an OLAP (Online Analytical Processing) workload because it aggregates historical data for reporting and analysis, typically using batch processing or columnar storage. Option B correctly pairs each workload with its appropriate processing type.

Exam trap

The trap here is that candidates confuse the purpose of the workload—thinking that any database operation is OLTP—and fail to recognize that analytical reporting, even if run on the same database, is an OLAP workload due to its aggregate nature and different performance requirements.

How to eliminate wrong answers

Option A is wrong because it incorrectly classifies both workloads as OLTP, ignoring that the monthly report involves aggregation and analysis, not real-time transaction processing. Option C is wrong because it reverses the roles, claiming transaction recording is OLAP (which is for analytical queries on large datasets) and the monthly report is OLTP (which is for transactional operations). Option D is wrong because it classifies both as OLAP, failing to recognize that the transaction recording system requires immediate, atomic writes characteristic of OLTP.

Practice this question →

191

MCQeasy

A retail company stores product inventory data in a SQL database, customer reviews as JSON files, and product images as JPEG files. Which of the following accurately describes the types of data stored?

A.A. Only structured data is stored because the SQL database contains the primary records.

B.B. Only semi-structured and unstructured data is stored because JSON and images are not purely structured.

C.C. Only unstructured data is stored because images have no predefined schema.

D.D. Structured, semi-structured, and unstructured data are stored.

AnswerD

Correct. The SQL database contains structured data (rows and columns), JSON files contain semi-structured data (key-value pairs with some schema flexibility), and JPEG files contain unstructured data (no inherent structure). All three categories are represented.

Why this answer

The company stores product inventory data in a SQL database, which enforces a fixed schema (tables, rows, columns) and is therefore structured data. Customer reviews stored as JSON files are semi-structured because they have a flexible schema (key-value pairs) but no rigid table structure. Product images as JPEG files are unstructured because they lack any predefined schema or organization.

Option D correctly identifies that all three data types are present.

Exam trap

The trap here is that candidates often assume 'data type' is determined by the storage medium (e.g., SQL = structured only) rather than recognizing that a single system can store multiple data types, leading them to overlook the presence of semi-structured and unstructured data.

How to eliminate wrong answers

Option A is wrong because it incorrectly claims only structured data is stored, ignoring the JSON files (semi-structured) and JPEG images (unstructured). Option B is wrong because it omits the structured data from the SQL database, which is clearly structured. Option C is wrong because it states only unstructured data is stored, ignoring the structured SQL data and semi-structured JSON data.

Practice this question →

192

MCQeasy

A social media platform stores user posts as JSON documents. Each document contains text content, image URLs, timestamps, and user tags. The structure is consistent for most fields, but users can add custom key-value pairs. How should this data be classified?

A.Structured data

B.Semi-structured data

C.Unstructured data

D.Relational data

AnswerB

Semi-structured data has some organizational properties (like tags or key-value pairs) but does not require a fixed schema. JSON documents with optional fields fit this category.

Why this answer

The data is semi-structured because it has a consistent schema for most fields (text, image URLs, timestamps, user tags) but allows custom key-value pairs, which introduces schema flexibility. JSON documents inherently support this mix of fixed and variable attributes, fitting the semi-structured data classification. This aligns with Azure Cosmos DB's handling of JSON items, where each document can have a different set of properties.

Exam trap

Microsoft often tests the misconception that any data with a consistent field is structured, but the presence of optional custom key-value pairs makes it semi-structured, not structured.

How to eliminate wrong answers

Option A is wrong because structured data requires a rigid schema with fixed columns and data types (e.g., a SQL table), but JSON documents with optional custom fields violate that strict schema. Option C is wrong because unstructured data has no predefined structure or organization (e.g., raw text files, images, videos), whereas JSON documents have a defined format with keys and values. Option D is wrong because relational data specifically refers to data organized into tables with rows and columns linked by foreign keys, which JSON documents do not enforce.

Practice this question →

193

MCQeasy

A company stores customer names and addresses in a relational table, product descriptions as JSON files, and product images as JPEG files. Which of the following correctly classifies these data types from most structured to least structured?

A.Structured (customer table), Semi-structured (JSON), Unstructured (JPEG)

B.Structured (customer table), Unstructured (JSON), Semi-structured (JPEG)

C.Semi-structured (customer table), Structured (JSON), Unstructured (JPEG)

D.Unstructured (customer table), Structured (JSON), Semi-structured (JPEG)

AnswerA

This is correct because a table with fixed columns is structured, JSON allows varying fields (semi-structured), and JPEG images have no internal structure (unstructured).

Why this answer

A is correct because structured data (customer table) has a fixed schema with rows and columns, semi-structured data (JSON) uses tags or key-value pairs without a rigid schema, and unstructured data (JPEG) has no predefined structure. The question tests the standard classification hierarchy from most to least structured.

Exam trap

The trap here is confusing semi-structured (JSON) with unstructured (JPEG) because both lack a rigid schema, but JSON has a logical structure (key-value pairs) while JPEG is raw binary data.

How to eliminate wrong answers

Option B is wrong because JSON is semi-structured (not unstructured) as it has a flexible schema with key-value pairs, and JPEG is unstructured (not semi-structured) as it lacks any schema or metadata hierarchy. Option C is wrong because a relational table is structured (not semi-structured) with a fixed schema, and JSON is semi-structured (not structured) as it does not enforce a rigid schema. Option D is wrong because a relational table is structured (not unstructured), JSON is semi-structured (not structured), and JPEG is unstructured (not semi-structured).

Practice this question →

194

Multi-Selecteasy

Which TWO are advantages of using a NoSQL database like Azure Cosmos DB over a relational database like Azure SQL Database?

Select 2 answers

A.Support for complex joins

B.ACID transactions across multiple records

C.Flexible schema design

D.Horizontal scaling across multiple regions

E.Enforced referential integrity

AnswersC, D

NoSQL allows schema-less data models.

Why this answer

Option C is correct because NoSQL databases like Azure Cosmos DB are schema-agnostic, allowing each document or item to have a different structure without requiring migrations or predefined table schemas. This flexibility is ideal for rapidly evolving applications or when ingesting heterogeneous data, whereas Azure SQL Database enforces a rigid schema that must be defined and altered explicitly.

Exam trap

The trap here is that candidates confuse the ACID support in NoSQL databases (which is limited to single-document operations) with the full multi-record ACID transactions of relational databases, leading them to incorrectly select Option B.

Practice this question →

195

Multi-Selectmedium

Which THREE of the following are benefits of using Azure Data Lake Storage Gen2?

Select 3 answers

A.Cost-effective for storing petabytes of data

B.POSIX-compliant access control lists

C.Built-in NoSQL document store

D.Integrated data transformation engine

E.Hierarchical namespace for directory-level operations

AnswersA, B, E

Optimized for large-scale data lakes.

Why this answer

Azure Data Lake Storage Gen2 is cost-effective for storing petabytes of data because it decouples compute from storage, allowing you to store massive amounts of data in Azure Blob Storage at low cost, while leveraging a hierarchical namespace for efficient data organization. This makes it ideal for big data analytics workloads where large-scale data retention is required without incurring high costs.

Exam trap

Microsoft often tests the misconception that Azure Data Lake Storage Gen2 includes built-in data processing capabilities, but it is purely a storage layer, while transformation engines are separate services.

Practice this question →

196

Multi-Selecteasy

Which TWO are benefits of using a NoSQL database like Azure Cosmos DB? (Choose two.)

Select 2 answers

A.Enforcing referential integrity

B.Support for complex joins

C.Horizontal scalability

D.Schema flexibility

E.Full ACID transactions across multiple documents

AnswersC, D

NoSQL databases are designed to scale out across many nodes.

Why this answer

Azure Cosmos DB is a NoSQL database designed for horizontal scalability, which means it can distribute data across multiple servers and regions to handle massive workloads and low-latency access. This is achieved through partitioning and replication, allowing you to scale throughput and storage independently by adding more physical partitions or nodes.

Exam trap

Microsoft often tests the misconception that NoSQL databases support full ACID transactions across multiple documents like relational databases, but in Cosmos DB, multi-document transactions are limited to the same logical partition and are not fully ACID across partitions.

Practice this question →

197

Multi-Selecthard

Which THREE factors should you consider when choosing between Azure SQL Database and Azure Cosmos DB for a new application?

Select 3 answers

A.Need for multi-region writes

B.Schema flexibility requirements

C.Transaction consistency requirements

D.Support for JSON data

E.Support for T-SQL queries

AnswersA, B, C

Cosmos DB supports multi-master replication; SQL Database requires failover groups for multi-region.

Why this answer

Option A is correct because Azure Cosmos DB supports multi-region writes with its multi-master replication, enabling low-latency writes across multiple geographic regions, which is a key differentiator from Azure SQL Database that only supports a single write region. This is critical for globally distributed applications requiring high availability and local write performance.

Exam trap

The trap here is that candidates may assume JSON support or T-SQL compatibility are deciding factors, but both databases handle JSON (though differently) and T-SQL is exclusive to SQL Database, so the real differentiators are multi-region writes, schema flexibility, and transaction consistency guarantees (e.g., ACID in SQL Database vs. tunable consistency levels in Cosmos DB).

Practice this question →

198

MCQmedium

A retail company uploads daily sales data from all stores to Azure Blob Storage at midnight. They then run a series of data transformations using Azure Data Factory on a scheduled trigger at 2:00 AM. This processing pattern is best described as:

A.Batch processing

B.Stream processing

C.Transactional processing

D.Interactive query

AnswerA

Correct. Data is collected over time and processed in bulk on a schedule, which is the definition of batch processing.

Why this answer

This pattern is batch processing because the sales data is collected in Azure Blob Storage over a period (daily) and then processed as a group at a scheduled time (2:00 AM) using Azure Data Factory. Batch processing is designed for high-volume, periodic data loads where latency is acceptable, and the transformation job runs on a complete dataset rather than individual records.

Exam trap

The trap here is that candidates confuse scheduled data movement with stream processing, but the key differentiator is the time delay and the processing of a complete dataset in one job rather than individual events as they occur.

How to eliminate wrong answers

Option B is wrong because stream processing handles data in real-time or near-real-time as it arrives (e.g., using Azure Stream Analytics or Event Hubs), not on a scheduled trigger with a 2-hour delay. Option C is wrong because transactional processing (OLTP) focuses on individual, atomic transactions with ACID guarantees (e.g., Azure SQL Database), not bulk transformations of daily files. Option D is wrong because interactive query implies ad-hoc, user-driven exploration (e.g., using Azure Synapse Serverless SQL or Azure Data Explorer), not a scheduled, automated transformation pipeline.

Practice this question →

199

Matchingmedium

Match each Azure Cosmos DB API to its supported data model.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts

Matches

Document (JSON)

Document (BSON)

Column-family

Graph

Key-value

Why these pairings

Azure Cosmos DB supports multiple data models via APIs.

Practice this question →

200

Multi-Selectmedium

Which THREE Azure services can be used to move data from on-premises SQL Server to Azure?

Select 3 answers

A.Azure Database Migration Service

B.Azure Data Factory

C.Azure Analysis Services

D.Azure Synapse Serverless SQL pool

E.Azure Data Box

AnswersA, B, E

DMS is specifically designed for migrating databases to Azure with minimal downtime.

Why this answer

Azure Database Migration Service (DMS) is correct because it is specifically designed for migrating databases from on-premises SQL Server to Azure SQL Database or SQL Managed Instance with minimal downtime, using the Data Migration Assistant (DMA) for assessment and a self-hosted integration runtime for data movement.

Exam trap

The trap here is that candidates may confuse Azure Analysis Services (a BI modeling tool) or Azure Synapse Serverless SQL pool (a query-only service) with data migration tools, when only services that actively move or copy data from on-premises to Azure are correct.

Practice this question →

201

MCQeasy

A company stores customer data in a relational table with fixed columns: CustomerID (integer), FirstName (string), LastName (string), Email (string). They also store product images as JPEG files in Azure Blob Storage, and customer feedback as JSON documents where each document may contain fields such as rating, comment, and optional metadata. Which of the following correctly classifies these data types?

A.Relational table – structured, JPEG – unstructured, JSON – semi-structured

B.Relational table – structured, JPEG – semi-structured, JSON – unstructured

C.Relational table – semi-structured, JPEG – unstructured, JSON – structured

D.Relational table – unstructured, JPEG – structured, JSON – semi-structured

AnswerA

Correct. Relational tables enforce a fixed schema (structured), JSON allows flexible fields (semi-structured), and JPEG files are binary blobs (unstructured).

Why this answer

Option A is correct because a relational table with fixed columns and data types (CustomerID, FirstName, LastName, Email) stores structured data with a rigid schema. JPEG files in Azure Blob Storage are binary blobs with no internal structure that a database can interpret, making them unstructured. JSON documents with optional fields (like rating, comment, metadata) have a flexible schema that can vary per document, which is the definition of semi-structured data.

Exam trap

The trap here is that candidates often confuse 'semi-structured' with 'unstructured' because JSON looks like free-form text, but its key-value structure with optional fields makes it semi-structured, not unstructured.

How to eliminate wrong answers

Option B is wrong because JPEG files have no schema or metadata that can be parsed as key-value pairs, so they are unstructured, not semi-structured; JSON documents are semi-structured, not unstructured. Option C is wrong because a relational table with fixed columns enforces a strict schema, making it structured, not semi-structured; JSON documents are semi-structured, not structured. Option D is wrong because a relational table is not unstructured—it has a predefined schema; JPEG files are not structured—they lack a row/column format.

Practice this question →

202

MCQhard

Your company has a data lake in Azure Data Lake Storage Gen2 containing terabytes of parquet files. Data scientists need to explore and prepare this data using Python and SQL. They want to use a collaborative notebook environment that integrates with Git for version control. The solution should automatically scale compute resources based on workload demand and minimize management overhead. Which Azure service should you use?

A.Azure Databricks

B.Azure Machine Learning studio

C.Azure Data Studio

D.Azure Synapse Studio

AnswerA

Provides notebooks, Git integration, auto-scaling, and supports Python and SQL.

Why this answer

Azure Databricks is the correct choice because it provides a collaborative notebook environment that natively supports Python and SQL, integrates with Git for version control, and offers auto-scaling clusters that dynamically adjust compute resources based on workload demand. It is purpose-built for big data analytics and data preparation on data lakes, minimizing management overhead through its serverless and managed Spark infrastructure.

Exam trap

The trap here is that candidates often confuse Azure Synapse Studio with Databricks because both offer notebook experiences and Spark support, but Synapse Studio is optimized for enterprise data warehousing and ETL pipelines, not the ad-hoc, collaborative data exploration and auto-scaling flexibility that Databricks provides for data science teams.

How to eliminate wrong answers

Option B is wrong because Azure Machine Learning studio is primarily designed for building, training, and deploying machine learning models, not for ad-hoc data exploration and preparation using Python and SQL in a collaborative notebook environment with Git integration. Option C is wrong because Azure Data Studio is a desktop tool for querying SQL Server and Azure SQL databases, not a cloud-based collaborative notebook environment that auto-scales compute resources. Option D is wrong because Azure Synapse Studio is a unified analytics workspace that does support notebooks and Git, but it is more focused on enterprise data warehousing and large-scale analytics pipelines, and its auto-scaling capabilities are tied to dedicated SQL pools or serverless SQL endpoints, not the flexible, on-demand Spark clusters that Databricks provides for data exploration and preparation.

Practice this question →

203

Multi-Selecteasy

Which TWO of the following are valid Azure data storage services for storing unstructured data?

Select 2 answers

A.Azure SQL Database

B.Azure Table Storage

C.Azure Blob Storage

D.Azure Data Lake Storage Gen2

E.Azure Cosmos DB

AnswersC, D

Azure Blob Storage stores unstructured data like text and binary data.

Why this answer

Azure Blob Storage is a fully managed, massively scalable object storage service designed for unstructured data such as text, binary data, images, videos, and backups. It supports REST-based access and can store any type of file or binary object without requiring a schema, making it a core service for unstructured data workloads.

Exam trap

The trap here is that candidates often confuse semi-structured data (e.g., Table Storage, Cosmos DB) with unstructured data, or incorrectly assume that any NoSQL service qualifies as unstructured storage, when in fact only object storage services like Blob Storage and Data Lake Storage Gen2 are designed for raw, schema-less binary data.

Practice this question →

204

Drag & Dropmedium

Drag and drop the steps to create an Azure SQL Database in the correct order.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps

Order

Why this order

Creating an Azure SQL Database involves selecting the service, configuring the server and database settings, choosing the appropriate tier, and finally deploying.

Practice this question →

205

MCQmedium

A retail company uses Azure SQL Database to store customer transactions. They need to analyze sales trends over time. Which Azure service should they use to build interactive dashboards and reports without moving data out of Azure?

A.Azure Analysis Services

B.Azure Synapse Analytics

C.Microsoft Purview

D.Power BI

AnswerD

Power BI connects directly to Azure SQL Database for interactive dashboards.

Why this answer

Power BI is the correct choice because it is a business analytics service that can connect directly to Azure SQL Database to build interactive dashboards and reports without requiring data movement. It supports DirectQuery mode, which queries the source database in real-time, enabling live analysis of sales trends while data remains in Azure.

Exam trap

The trap here is that candidates may confuse Azure Synapse Analytics as a reporting tool, but it is primarily a data warehousing and analytics platform that requires data movement or transformation, whereas Power BI is the native Azure service for direct, no-movement interactive reporting.

How to eliminate wrong answers

Option A is wrong because Azure Analysis Services is an analytical engine that requires data to be loaded into its in-memory tabular model, which involves moving or processing data outside the source database. Option B is wrong because Azure Synapse Analytics is a big data and analytics platform that typically requires data to be ingested into its dedicated SQL pool or data lake, not suitable for direct, no-movement reporting on a transactional Azure SQL Database. Option C is wrong because Microsoft Purview is a data governance and catalog service, not a reporting or dashboard tool; it cannot build interactive visualizations.

Practice this question →

206

MCQeasy

A logistics company ingests GPS coordinates from delivery trucks in real-time to update a live tracking dashboard. They also run a nightly job to aggregate the day's deliveries into a report stored in Azure SQL Database. Which statement correctly describes the data processing types used for these two workloads?

A.GPS ingestion is stream processing; nightly aggregation is batch processing.

B.GPS ingestion is batch processing; nightly aggregation is stream processing.

C.Both workloads are examples of stream processing.

D.Both workloads are examples of batch processing.

AnswerA

Correct. Real-time GPS data is ingested as a continuous stream, while the nightly job processes accumulated data in a batch.

Why this answer

Option A is correct because the real-time ingestion of GPS coordinates from delivery trucks is a classic stream processing workload, where data is processed continuously as it arrives with low latency. The nightly aggregation of daily deliveries into a report stored in Azure SQL Database is a batch processing workload, where data is processed in bulk at scheduled intervals. Azure Stream Analytics is commonly used for the streaming ingestion, while Azure SQL Database or Azure Synapse Analytics can handle the batch aggregation.

Exam trap

The trap here is that candidates confuse the terms 'stream processing' and 'batch processing' by focusing on the data source (GPS is continuous) versus the processing schedule (nightly is periodic), rather than the fundamental processing paradigm of continuous vs. bulk data handling.

How to eliminate wrong answers

Option B is wrong because it reverses the definitions: GPS ingestion is not batch processing (it requires real-time, low-latency processing), and nightly aggregation is not stream processing (it processes data in bulk at a scheduled time). Option C is wrong because the nightly aggregation is not stream processing; it processes data in a single batch job, not as a continuous stream. Option D is wrong because the GPS ingestion is not batch processing; it processes data in real-time as it arrives, not in batches.

Practice this question →

207

MCQeasy

A data scientist needs to analyze historical sales data to identify yearly trends. They run SQL queries that aggregate millions of rows. No new data is being added during analysis. Which type of data processing workload does this represent?

A.Online Transaction Processing (OLTP)

B.Online Analytical Processing (OLAP)

C.Batch processing

D.Stream processing

AnswerB

OLAP is used for complex queries and aggregations on historical data, which matches the scenario.

Why this answer

This workload is Online Analytical Processing (OLAP) because the data scientist is running complex SQL queries that aggregate millions of rows of historical sales data to identify yearly trends. OLAP is designed for read-intensive, analytical queries that summarize large volumes of static data, which matches the scenario where no new data is being added during analysis.

Exam trap

Microsoft often tests the distinction between OLTP and OLAP by presenting a scenario with 'SQL queries' and 'aggregation,' leading candidates to mistakenly think any SQL query implies OLTP, when in fact the analytical nature and static dataset clearly indicate OLAP.

How to eliminate wrong answers

Option A is wrong because Online Transaction Processing (OLTP) is optimized for high-volume, low-latency insert/update/delete operations (e.g., order entry), not for aggregating millions of rows for trend analysis. Option C is wrong because batch processing typically involves processing large volumes of data in scheduled, automated jobs (e.g., nightly ETL), whereas this scenario is an interactive analytical query run by a data scientist, not a scheduled batch job. Option D is wrong because stream processing handles continuous, real-time data flows (e.g., sensor data or clickstreams) with low latency, but the question explicitly states no new data is being added during analysis, making it a static dataset.

Practice this question →

208

MCQmedium

You are reviewing a Data Factory mapping data flow definition. What is the primary purpose of this data flow?

A.Pivot the data by OrderID

B.Filter rows where OrderID is null

C.Remove duplicate OrderIDs by counting them

D.Merge two data sources

AnswerC

The aggregate counts OrderID, effectively identifying duplicates.

Why this answer

Option C is correct because the data flow reads CSV, performs an aggregate (count by OrderID) to remove duplicates, and outputs Parquet. Option A is wrong because it does not filter rows. Option B is wrong because it doesn't join.

Option D is wrong because it doesn't pivot.

Practice this question →

209

MCQhard

You are designing a data solution for a healthcare application that requires ACID transactions for patient records and needs to run complex analytics queries. Which combination of Azure services should you recommend?

A.Azure Cosmos DB for transactions, Power BI for analytics

B.Azure Database for MySQL for transactions, Azure Analysis Services for analytics

C.Azure Blob Storage for transactions, Azure Machine Learning for analytics

D.Azure SQL Database for transactions, Azure Synapse Analytics for analytics

AnswerD

Azure SQL Database provides ACID transactions, and Azure Synapse Analytics can run complex analytics queries.

Why this answer

Azure SQL Database provides full ACID (Atomicity, Consistency, Isolation, Durability) transaction support, which is essential for healthcare patient records where data integrity is critical. Azure Synapse Analytics is a cloud-based analytics service that can run complex queries against large datasets, including those from Azure SQL Database, using its massively parallel processing (MPP) architecture. This combination allows transactional and analytical workloads to coexist without compromising performance or consistency.

Exam trap

The trap here is that candidates often confuse 'analytics' with visualization tools like Power BI or OLAP cubes, failing to recognize that complex analytics queries require a dedicated MPP engine like Synapse, not just a reporting layer.

How to eliminate wrong answers

Option A is wrong because Azure Cosmos DB is a NoSQL database that does not guarantee full ACID transactions across multiple documents (it offers single-document atomicity only), and Power BI is a visualization tool, not an analytics engine capable of running complex queries directly. Option B is wrong because Azure Analysis Services is an OLAP engine for pre-aggregated data, not designed for running complex ad-hoc analytics queries on raw transactional data; it requires a separate data warehouse or model. Option C is wrong because Azure Blob Storage is an object store with no transaction support (it lacks ACID properties), and Azure Machine Learning is for building predictive models, not for running complex analytics queries on transactional data.

Practice this question →

210

MCQeasy

A retail company receives real-time data from IoT sensors in its warehouses. Each sensor sends a JSON payload containing a device ID, timestamp, and temperature reading. A data engineer needs to classify this data for storage planning. Which data type best describes the JSON payload?

A.Structured data

B.Semi-structured data

C.Unstructured data

D.Relational data

AnswerB

JSON is a classic example of semi-structured data. It uses key-value pairs and can have nested structures, but it does not enforce a rigid schema. This flexibility is ideal for IoT payloads where fields may vary over time.

Why this answer

The JSON payload is considered semi-structured data because it has organizational properties (key-value pairs, nested structure) that provide a schema, but it does not conform to a rigid tabular schema like a relational database. JSON allows flexible fields and varying data types, which is characteristic of semi-structured data.

Exam trap

The trap here is that candidates confuse 'structured' with 'has a format' — JSON has a clear structure, but it is not rigidly tabular, so it falls under semi-structured, not structured data.

How to eliminate wrong answers

Option A is wrong because structured data requires a strict, predefined schema (e.g., relational tables with fixed columns and data types), whereas JSON allows optional fields and nested objects. Option C is wrong because unstructured data lacks any predefined structure or metadata (e.g., raw text, images, video), while JSON has a clear key-value format. Option D is wrong because relational data specifically refers to data organized into tables with rows and columns, enforced by constraints and relationships, which JSON does not inherently provide.

Practice this question →

211

MCQhard

Refer to the exhibit. You are analyzing a Kusto query in Azure Data Explorer. The query is intended to return the top 5 event types that caused the most property damage in Florida. However, the query returns an error. What is the most likely cause?

A.The where clause must specify a numeric value.

B.The summarize operator cannot use sum aggregation.

C.The table or column names are incorrect.

D.The top operator requires an order by clause.

AnswerC

The error is likely due to missing table or column; the query syntax is otherwise valid.

Why this answer

The query returns an error because the table or column names referenced in the query do not match the actual schema in Azure Data Explorer. In Kusto Query Language (KQL), if a table name like 'Events' or a column like 'PropertyDamage' does not exist in the database, the query will fail with a 'semantic error' indicating an unknown table or column. This is the most likely cause given that the query logic (where, summarize, top) is syntactically correct.

Exam trap

The trap here is that candidates may assume the error is due to a syntax or operator misuse (like top needing order by or sum being invalid), when in reality the error stems from a simple schema mismatch—a common oversight when reading queries without verifying the underlying data model.

How to eliminate wrong answers

Option A is wrong because the where clause in KQL can filter on string columns using equality or pattern matching (e.g., 'State == "Florida"'), not only numeric values. Option B is wrong because the summarize operator fully supports the sum() aggregation function for numeric columns, which is a standard and valid operation. Option D is wrong because the top operator in KQL does not require an explicit order by clause; it internally sorts by the specified column(s) in descending order and returns the top N rows.

Practice this question →

212

Multi-Selectmedium

Which TWO Azure services can be used to perform data transformation in a data pipeline? (Choose two.)

Select 2 answers

A.Azure Blob Storage

B.Azure SQL Database

C.Azure Databricks

D.Azure Event Hubs

E.Azure Data Factory

AnswersC, E

Databricks provides Spark-based transformation.

Why this answer

Azure Databricks is correct because it provides an Apache Spark-based analytics platform that can perform complex data transformations, such as ETL (Extract, Transform, Load) operations, using notebooks and clusters. It allows you to write code in Python, Scala, or SQL to transform data at scale, making it a core compute service for data transformation in a pipeline.

Exam trap

The trap here is that candidates often confuse storage or ingestion services (like Blob Storage or Event Hubs) with compute services that actually execute transformation logic, leading them to select options that only move or store data.

Practice this question →

213

MCQeasy

A company is evaluating Azure database services for two different workloads. Workload A processes high-volume, low-latency transactions such as order entry and payment processing, where each transaction updates a few rows. Workload B involves running complex aggregations on terabytes of historical sales data to generate monthly business intelligence reports. Which Azure service is best suited for each workload?

A.A. Workload A: Azure SQL Database; Workload B: Azure Cosmos DB

B.B. Workload A: Azure Cosmos DB; Workload B: Azure Synapse Analytics

C.C. Workload A: Azure Synapse Analytics; Workload B: Azure SQL Database

D.D. Workload A: Azure Cosmos DB; Workload B: Azure Cosmos DB

AnswerB

Cosmos DB provides low-latency transactions (OLTP) and Synapse Analytics is built for large-scale analytics (OLAP), matching the workloads correctly.

Why this answer

Workload A requires a low-latency, high-throughput transactional database capable of handling many small, row-level updates. Azure Cosmos DB is a NoSQL database designed for single-digit millisecond latency and horizontal scaling, making it ideal for order entry and payment processing. Workload B involves complex aggregations on terabytes of historical data, which is best handled by Azure Synapse Analytics, a distributed analytics service that uses massively parallel processing (MPP) to run large-scale queries efficiently.

Exam trap

The trap here is that candidates often confuse Azure SQL Database as the default for all transactional workloads, overlooking that Cosmos DB is specifically designed for ultra-low-latency, globally distributed transactions, and they may also assume Azure Synapse Analytics is only for data warehousing without recognizing its role in complex aggregations on historical data.

How to eliminate wrong answers

Option A is wrong because Azure SQL Database is optimized for relational transactions but does not provide the sub-10-millisecond latency and multi-region distribution that Cosmos DB offers for high-volume, low-latency workloads; also, Cosmos DB is not designed for complex aggregations on terabytes of historical data. Option C is wrong because Azure Synapse Analytics is a data warehouse for analytical workloads, not suitable for high-frequency, low-latency transactional updates, and Azure SQL Database lacks the MPP architecture needed for petabyte-scale aggregations. Option D is wrong because using Azure Cosmos DB for both workloads fails to address Workload B's need for complex aggregations on historical data, as Cosmos DB's query engine is optimized for point reads and simple queries, not large-scale analytical processing.

Practice this question →

214

Multi-Selectmedium

Which TWO Azure services can be used to perform real-time stream processing?

Select 2 answers

A.Azure Data Factory

B.Azure Stream Analytics

C.Azure Analysis Services

D.Azure Databricks Structured Streaming

E.Azure Synapse Pipelines

AnswersB, D

Azure Stream Analytics is a real-time stream processing engine.

Why this answer

Azure Stream Analytics is a fully managed service designed for real-time stream processing. It can ingest data from sources like Azure Event Hubs or IoT Hub, apply SQL-based queries to the streaming data, and output results to sinks such as Power BI or Azure SQL Database, all with sub-second latency.

Exam trap

The trap here is that candidates often confuse Azure Data Factory and Azure Synapse Pipelines with stream processing because they can handle data movement, but they are fundamentally batch-oriented orchestration tools, not real-time stream processors.

Practice this question →

215

MCQmedium

A company stores IoT sensor data in Azure Blob Storage. Data scientists need to query the data using SQL without moving it to another store. Which Azure service should they use?

A.Azure Synapse Serverless SQL pool

B.Azure Analysis Services

C.Azure Data Lake Storage

D.Azure SQL Database

AnswerA

Serverless SQL pool can query data in Blob Storage using SQL without moving it.

Why this answer

Azure Synapse Serverless SQL pool allows you to query data directly from Azure Blob Storage using T-SQL without moving or copying the data. It uses a distributed query engine that reads files (Parquet, CSV, JSON) in place, making it ideal for ad-hoc analytics over IoT sensor data stored in Blob Storage.

Exam trap

The trap here is that candidates confuse Azure Data Lake Storage (a storage layer) with a query service, or assume Azure SQL Database can query external files directly, when in fact only Synapse Serverless SQL pool (or PolyBase in dedicated SQL pool) provides native SQL-on-file capabilities for Blob Storage.

How to eliminate wrong answers

Option B is wrong because Azure Analysis Services is an OLAP engine that requires data to be loaded into a tabular model, not a service for querying raw files in Blob Storage with SQL. Option C is wrong because Azure Data Lake Storage is a storage service (not a query service) that provides hierarchical namespace and POSIX-like access, but it does not natively support SQL querying without an additional compute layer like Synapse. Option D is wrong because Azure SQL Database is a fully managed relational database that requires data to be imported or ingested into tables, not a service for querying files in Blob Storage directly.

Practice this question →

216

Multi-Selecthard

A company uses Azure Data Lake Storage Gen2 for a data lake. They need to ensure that only authorized users can access files and that access is audited. Which two Azure services should they combine? (Choose two options that together form the solution.)

Select 2 answers

A.Azure Policy

B.Azure Key Vault

C.Azure RBAC

D.Azure Monitor

E.Microsoft Entra ID

AnswersC, D

RBAC controls access to storage resources.

Why this answer

Azure RBAC (Role-Based Access Control) is correct because it provides fine-grained access management for Azure Data Lake Storage Gen2, allowing you to assign roles (e.g., Storage Blob Data Contributor) to users, groups, or service principals to control who can read, write, or delete files. Azure Monitor is correct because it can collect and analyze activity logs and diagnostic settings for the storage account, enabling auditing of access events such as successful and failed authentication attempts.

Exam trap

The trap here is that candidates often confuse Microsoft Entra ID (the identity provider) with the actual access control mechanism (RBAC) and auditing service (Monitor), thinking Entra ID alone handles both, but it only authenticates identities—RBAC authorizes them and Monitor audits the actions.

Practice this question →

217

MCQmedium

Your company uses Azure SQL Database and needs to ensure that transactions are durable even if the database instance fails. Which feature should you enable?

A.Active geo-replication

B.Zone-redundant storage

C.Transparent Data Encryption

D.Auto-failover groups

AnswerB

Replicates data across availability zones, ensuring durability.

Why this answer

Zone-redundant storage (ZRS) replicates your Azure SQL Database transaction logs and data files synchronously across three Azure availability zones within the same region. This ensures that even if an entire zone fails, committed transactions are preserved and the database remains available, providing durability at the storage layer without requiring a separate database replica.

Exam trap

The trap here is that candidates often confuse durability (ensuring committed data survives failures) with high availability or disaster recovery features like geo-replication or failover groups, which address availability rather than the storage-level persistence of transactions.

How to eliminate wrong answers

Option A is wrong because active geo-replication creates asynchronous replicas in a paired region for disaster recovery, but it does not guarantee durability of transactions within the primary region during a zone-level failure. Option C is wrong because Transparent Data Encryption (TDE) only encrypts data at rest and in transit, providing security but no durability or availability guarantees. Option D is wrong because auto-failover groups manage failover between primary and secondary databases, but they rely on the underlying storage durability; they do not themselves make transactions durable against a storage failure.

Practice this question →

218

MCQeasy

A retail company stores data about their products in different formats. Product ID and price are stored in a relational database table. Product descriptions are stored as plain text files. Product images are stored as JPEG files. Which of the following best categorizes these data types in order?

A.Structured, semi-structured, unstructured

B.Structured, unstructured, unstructured

C.Structured, semi-structured, structured

D.Semi-structured, structured, unstructured

AnswerB

The relational table is structured. Product descriptions as plain text are unstructured because they have no predefined format. Images are also unstructured binary data.

Why this answer

Product ID and price in a relational database table are structured because they follow a fixed schema with rows and columns. Product descriptions as plain text files have no predefined structure, making them unstructured. Product images as JPEG files are also unstructured because they consist of binary data without a schema.

Thus, the order is structured, unstructured, unstructured, which matches option B.

Exam trap

The trap here is confusing unstructured data (e.g., plain text files) with semi-structured data (e.g., JSON or XML), leading candidates to misclassify product descriptions as semi-structured when they lack any metadata or tags.

How to eliminate wrong answers

Option A is wrong because it incorrectly categorizes product descriptions as semi-structured; plain text files have no tags or metadata to impose partial organization, so they are unstructured. Option C is wrong because it claims product images are structured; JPEG files are binary blobs with no relational schema, making them unstructured. Option D is wrong because it starts with semi-structured for the relational database table; relational tables enforce a strict schema, so they are structured, not semi-structured.

Practice this question →

219

MCQhard

A company stores customer data in a relational table with fixed columns: CustomerID (integer), FirstName (string), LastName (string), Email (string). They also store product images as JPEG files, and customer feedback as JSON documents that may contain varying fields such as rating, comment, and optional metadata. Which of the following correctly orders these data types from most structured to least structured?

A.JSON documents, relational table, JPEG files

B.Relational table, JSON documents, JPEG files

C.JPEG files, JSON documents, relational table

D.Relational table, JPEG files, JSON documents

AnswerB

The relational table has a fixed schema (structured). JSON documents have a flexible schema (semi-structured). JPEG files have no schema (unstructured). This is the correct ordering.

Why this answer

The relational table is the most structured because it enforces a fixed schema with predefined columns and data types (e.g., CustomerID integer, FirstName string). JSON documents are semi-structured: they have a flexible schema where fields like rating and comment can vary per document, but they still provide key-value organization. JPEG files are unstructured binary data with no internal schema or queryable structure, making them the least structured.

Exam trap

The trap here is that candidates often confuse semi-structured data (JSON) with unstructured data (JPEG), mistakenly thinking JSON is unstructured because its fields can vary, when in fact it retains a key-value structure that makes it semi-structured.

How to eliminate wrong answers

Option A is wrong because it orders JSON documents as more structured than a relational table, but relational tables enforce a rigid schema with constraints (e.g., fixed columns, data types) while JSON documents allow varying fields and are semi-structured. Option C is wrong because it places JPEG files (unstructured binary) as more structured than JSON documents (semi-structured with key-value pairs), which reverses the correct order. Option D is wrong because it places JPEG files as more structured than JSON documents, but JPEG files lack any internal schema or metadata that can be queried, whereas JSON documents have a defined structure (e.g., key-value pairs) and are semi-structured.

Practice this question →

220

MCQeasy

An organization uses Azure SQL Database and needs to maintain a copy of the database for read-only reporting without affecting the production workload. Which feature should they use?

A.Azure SQL Database read replica

B.Automated backups

C.Active geo-replication

D.Failover groups

AnswerC

Active geo-replication provides readable secondary replicas that can be used for reporting.

Why this answer

Active geo-replication (Option C) creates a readable secondary replica of an Azure SQL Database in a different Azure region. This secondary replica is continuously updated asynchronously from the primary and can be used for read-only query workloads, offloading reporting traffic without impacting the production database's performance or transaction throughput.

Exam trap

The trap here is that candidates confuse 'read replica' (which exists in Azure SQL Database Hyperscale and Azure SQL Managed Instance) with the standard Azure SQL Database feature, or they mistakenly think failover groups themselves provide the readable copy, when in fact it is Active geo-replication that creates the readable secondary.

How to eliminate wrong answers

Option A is wrong because Azure SQL Database does not support read replicas in the same way as Azure SQL Database for Hyperscale or Azure SQL Managed Instance; the term 'read replica' is not a standard feature for a single Azure SQL Database (non-Hyperscale) — instead, Active geo-replication provides the read-only secondary. Option B is wrong because automated backups are point-in-time restore copies stored in blob storage, not live, readable replicas; they cannot serve ongoing read-only queries without first being restored, which would create a separate database. Option D is wrong because failover groups manage geo-replication and failover orchestration for a group of databases, but the read-only secondary is provided by the underlying Active geo-replication, not by the failover group itself; failover groups are a management layer, not the feature that creates the readable copy.

Practice this question →

221

Multi-Selectmedium

Which TWO Azure services are primarily used for data integration and orchestration?

Select 2 answers

A.Azure Logic Apps

B.Azure Synapse Analytics

C.Azure Stream Analytics

D.Azure Analysis Services

E.Azure Data Factory

AnswersA, E

Workflow automation and integration.

Why this answer

Azure Logic Apps is correct because it is a serverless workflow service that integrates apps, data, and services using connectors and triggers, making it ideal for data integration and orchestration. Azure Data Factory is correct because it is a cloud-based ETL and data integration service that orchestrates and automates data movement and transformation across various data stores.

Exam trap

The trap here is that candidates often confuse Azure Synapse Analytics (a data warehouse) or Azure Stream Analytics (a real-time processing service) with data integration tools, because they involve data movement or processing, but they are not primarily designed for orchestration and integration.

Practice this question →

222

MCQhard

You are reviewing an ARM template for an Azure Storage account. The container named 'data' is created with public access set to 'None'. What is the primary benefit of this configuration?

A.It encrypts data at rest.

B.It restricts access to authorized users only.

C.It enables soft delete for the container.

D.It prevents accidental deletion of blobs.

AnswerB

Setting public access to 'None' disables anonymous access, requiring authentication.

Why this answer

Setting public access to 'None' on a container means that anonymous read requests are not allowed. The primary benefit is that only requests with proper authorization (e.g., using an account key, a shared access signature, or Azure AD credentials) can access the blobs within that container. This directly restricts access to authorized users only, which is the core security advantage.

Exam trap

The trap here is that candidates often confuse 'public access set to None' with broader security features like encryption or deletion protection, when in fact it only controls anonymous read access and does not affect data encryption, soft delete, or accidental deletion safeguards.

How to eliminate wrong answers

Option A is wrong because encryption at rest is enabled by default at the storage account level via Azure Storage Service Encryption (SSE), regardless of the container's public access setting. Option C is wrong because soft delete is a separate data protection feature that must be explicitly enabled on the storage account or container, and it is not a benefit of setting public access to 'None'. Option D is wrong because preventing accidental deletion of blobs is achieved through features like soft delete or immutable storage, not by disabling anonymous access.

Practice this question →

223

MCQhard

Your company is designing a data solution for IoT sensor data that arrives in high volume and must be stored for long-term analytics. The data is append-only and rarely updated. You need to choose a storage solution that balances cost and query performance for historical analysis. Which Azure data store should you recommend?

A.Azure Cosmos DB

B.Azure Table Storage

C.Azure SQL Database

D.Azure Data Lake Storage Gen2

AnswerD

ADLS Gen2 provides cost-effective storage for large datasets and integrates with analytics services like Synapse and Databricks.

Why this answer

Azure Data Lake Storage Gen2 is the correct choice because it combines a hierarchical namespace with Azure Blob Storage, offering scalable, cost-effective storage for high-volume append-only data like IoT sensor logs. It supports both structured and unstructured data, integrates with analytics engines like Azure Synapse and Spark, and provides POSIX-compliant access control, making it ideal for long-term historical analysis at low cost.

Exam trap

The trap here is that candidates often confuse Azure Cosmos DB's low-latency capabilities with suitability for high-volume historical analytics, overlooking its cost model and lack of native file-system semantics for append-only workloads.

How to eliminate wrong answers

Option A is wrong because Azure Cosmos DB is a NoSQL database optimized for low-latency, transactional workloads with global distribution, not for cost-effective long-term storage of append-only IoT data; its per-request pricing and high throughput costs make it unsuitable for high-volume historical analytics. Option B is wrong because Azure Table Storage is a key-value store designed for simple, semi-structured data with limited query capabilities (only on partition and row keys), lacking the hierarchical namespace, file-level security, and native analytics integration needed for complex historical queries on IoT data. Option C is wrong because Azure SQL Database is a relational database with ACID transactions and indexing, which is over-provisioned and expensive for append-only IoT data that rarely updates; its per-core pricing and storage limits make it cost-prohibitive for high-volume, long-term storage compared to object storage.

Practice this question →

224

MCQeasy

A hospital collects patient vital signs every minute using IoT sensors. Each reading contains a timestamp, patient ID, heart rate, blood pressure, and temperature. This data is ingested continuously for real-time monitoring and alerting. Which type of data workload does this scenario best represent?

A.A. Transactional workload

B.B. Analytical workload

C.C. Batch processing

D.D. Real-time streaming

AnswerD

Real-time streaming workloads handle continuous data flows that are processed as soon as they arrive, often with low latency requirements. The hospital's IoT sensors generate data every minute that must be acted on promptly, making this a clear example of a real-time streaming workload.

Why this answer

This scenario requires continuous ingestion of sensor data with immediate processing for real-time monitoring and alerting. Real-time streaming workloads, such as those handled by Azure Stream Analytics or Apache Kafka, are designed to process unbounded data streams with low latency, making option D correct.

Exam trap

The trap here is confusing 'real-time streaming' with 'analytical workload' because both involve data processing, but analytical workloads are designed for historical analysis and reporting, not for sub-second alerting on live data streams.

How to eliminate wrong answers

Option A is wrong because transactional workloads focus on ACID-compliant operations (e.g., OLTP) that handle discrete, small-scale read/write operations, not continuous high-velocity sensor streams. Option B is wrong because analytical workloads typically involve batch or interactive queries over historical data (e.g., using Azure Synapse or Power BI), not millisecond-level alerting on live data. Option C is wrong because batch processing processes data in large, scheduled chunks (e.g., nightly ETL jobs), which cannot meet the real-time alerting requirement of this scenario.

Practice this question →

225

MCQeasy

A database system ensures that a transaction either completes fully and all changes are applied, or it is completely rolled back and no partial changes are saved. Which property of ACID transactions does this describe?

A.Atomicity

B.Consistency

C.Isolation

D.Durability

AnswerA

Atomicity guarantees that all operations within a transaction are completed successfully or none are applied. This 'all or nothing' property prevents partial updates.

Why this answer

Atomicity ensures that a transaction is treated as a single, indivisible unit of work. If any part of the transaction fails, the entire transaction is rolled back, leaving the database in its original state. This property guarantees that no partial changes are saved, which directly matches the description in the question.

Exam trap

Microsoft often tests the distinction between atomicity and consistency by describing a scenario where a transaction either fully applies or fully rolls back, leading candidates to mistakenly choose consistency because they associate 'valid state' with 'complete execution'.

How to eliminate wrong answers

Option B (Consistency) is wrong because consistency ensures that a transaction brings the database from one valid state to another, preserving all defined rules (e.g., constraints, cascades, triggers), but it does not address the 'all-or-nothing' execution of the transaction itself. Option C (Isolation) is wrong because isolation controls how transaction changes are visible to other concurrent transactions (e.g., via locking or snapshot isolation), not whether the transaction completes fully or rolls back. Option D (Durability) is wrong because durability guarantees that once a transaction is committed, its changes persist even after a system failure (e.g., via write-ahead logging), but it does not describe the rollback behavior on failure.

Practice this question →