Microsoft Azure Data Fundamentals DP-900 DP-900 Questions 526–600 | Page 8/14

526

MCQhard

A financial services company stores years of market trade data as Parquet files in Azure Data Lake Storage Gen2. The data volume is terabytes and growing rapidly. Data analysts need to run complex SQL queries that join multiple tables (e.g., trades, instruments, counterparties) and return results within seconds. The company also wants to integrate with Power BI for visualization and Azure Data Factory for orchestration of ETL pipelines. Which Azure service should they choose as the primary analytics platform?

A.Azure SQL Database

B.Azure Synapse Analytics (serverless SQL pool)

C.Azure HDInsight with Spark

D.Azure Analysis Services

AnswerB

Correct. Azure Synapse serverless SQL pool can query large volumes of Parquet files directly with T-SQL, provides MPP performance, integrates with Power BI and Data Factory, and is designed for this type of analytical workload.

Why this answer

Azure Synapse Analytics serverless SQL pool is the correct choice because it provides a distributed SQL query engine that can directly query Parquet files in Azure Data Lake Storage Gen2 using T-SQL, enabling complex joins across multiple tables with fast performance via automatic query optimization and pushdown computation. It integrates natively with Power BI for visualization and Azure Data Factory for ETL orchestration, making it the ideal primary analytics platform for large-scale, schema-on-read data lake scenarios.

Exam trap

The trap here is that candidates often confuse Azure Synapse Analytics serverless SQL pool with Azure SQL Database, assuming both are just 'SQL databases,' but the key differentiator is that serverless SQL pool is a distributed query service for data lakes, not a transactional database.

How to eliminate wrong answers

Option A is wrong because Azure SQL Database is a relational OLTP database designed for transactional workloads, not for querying terabytes of Parquet files in a data lake, and it lacks native support for schema-on-read and distributed query processing over data lake storage. Option C is wrong because Azure HDInsight with Spark is a big data processing framework that requires significant cluster management, coding in Spark SQL or Scala, and does not provide the instant, serverless T-SQL query experience that analysts need for ad-hoc SQL queries with sub-second response times. Option D is wrong because Azure Analysis Services is a semantic modeling and OLAP engine that requires data to be pre-loaded into a tabular model, not a direct query engine for raw Parquet files, and it cannot perform the complex joins across multiple tables in the data lake without prior ETL.

Full explanation →

527

MCQmedium

A mobile game company stores player profiles and game state in Azure Cosmos DB. Each document contains playerId, level, score, inventory (an array of items), and lastLogin. The application requires fast point reads by playerId, queries to find all players within a specific score range, and global distribution with multi-region writes for low latency worldwide. They also want to use a familiar SQL-like query language. Which Azure Cosmos DB API should they choose?

A.Core (SQL) API

B.MongoDB API

C.Cassandra API

D.Gremlin API

AnswerA

Correct. The Core (SQL) API provides a SQL-like query language, supports point reads and range queries, and enables multi-region writes for global distribution.

Why this answer

The Core (SQL) API is the correct choice because it provides native support for SQL-like queries, enabling the required point reads by playerId and range queries on score. It also offers multi-region writes for global distribution with low latency, which aligns with the application's need for worldwide player access. The document model with arrays (inventory) is directly supported, making it ideal for storing player profiles and game state.

Exam trap

The trap here is that candidates often confuse the MongoDB API's use of a familiar query language (MongoDB's own) with SQL-like syntax, or assume that any NoSQL API can handle range queries equally, but the Core (SQL) API is the only one that provides native SQL-like querying with automatic indexing for such patterns.

How to eliminate wrong answers

Option B (MongoDB API) is wrong because while it supports document storage and queries, it uses MongoDB's query language (based on BSON and MongoDB wire protocol) rather than a familiar SQL-like query language, and the requirement explicitly asks for SQL-like syntax. Option C (Cassandra API) is wrong because it uses CQL (Cassandra Query Language) which is not SQL-like in the traditional sense, and it is optimized for high-throughput writes with partition key-based queries, not for range queries on non-key attributes like score without careful modeling. Option D (Gremlin API) is wrong because it is designed for graph data models and traversals using the Gremlin query language, not for document storage or SQL-like queries, and it does not natively support the required point reads or range queries on player profiles.

Full explanation →

528

MCQhard

A healthcare organization needs to store patient records that must be immutable and cannot be modified or deleted for 7 years due to regulatory compliance. Which Azure feature should they use?

A.Microsoft Purview

B.Azure Policy

C.Azure Blob Storage immutable storage

D.Microsoft Defender for Cloud

AnswerC

Provides WORM (write once, read many) capability for compliance.

Why this answer

Azure Blob Storage immutable storage is correct because it provides WORM (Write Once, Read Many) capabilities that prevent data from being modified or deleted for a specified retention period. This directly meets the regulatory requirement for patient records to remain immutable for 7 years, as the policy is enforced at the storage level and cannot be overridden by any user, including administrators.

Exam trap

The trap here is that candidates confuse Azure Policy (which enforces resource-level compliance rules) with data-level immutability, but Azure Policy cannot prevent data modification within a blob—only Azure Blob Storage immutable storage provides that guarantee.

How to eliminate wrong answers

Option A is wrong because Microsoft Purview is a data governance and catalog service for discovering and classifying data, not a storage-level immutability enforcement mechanism. Option B is wrong because Azure Policy enforces organizational rules and compliance across Azure resources (e.g., restricting resource locations), but it cannot prevent modification or deletion of data within a storage blob. Option D is wrong because Microsoft Defender for Cloud is a security posture management and threat protection service, not a data immutability feature.

Full explanation →

529

Multi-Selecteasy

Which TWO Azure services can be used to store semi-structured data? (Choose two.)

Select 2 answers

A.Azure Cosmos DB

B.Azure Blob Storage

C.Azure SQL Database

D.Azure Synapse Analytics

E.Power BI

AnswersA, B

Azure Cosmos DB is a NoSQL database that natively supports semi-structured data like JSON.

Why this answer

Azure Cosmos DB is a NoSQL database service that natively supports semi-structured data through its flexible schema model. It allows storing JSON documents, key-value pairs, and graph data without requiring a fixed schema, making it ideal for semi-structured data like user profiles, IoT telemetry, or product catalogs.

Exam trap

The trap here is that candidates often confuse 'semi-structured data' with 'unstructured data' and incorrectly assume only NoSQL databases qualify, forgetting that Azure Blob Storage can store semi-structured files like JSON or XML as blobs, even though it is not a database.

Full explanation →

530

MCQmedium

A retail company wants to run real-time analytics on streaming clickstream data from their website. Which Azure service should they use to ingest and process the data?

A.Azure Analysis Services

B.Azure Data Lake Storage

C.Azure SQL Database

D.Azure Stream Analytics

AnswerD

Real-time stream processing service that can ingest and analyze streaming data.

Why this answer

Azure Stream Analytics is a real-time analytics and event-processing engine designed to ingest, process, and analyze high-velocity streaming data, such as clickstream data from a website. It can directly consume data from Azure Event Hubs or IoT Hub and output results to sinks like Power BI, Azure SQL Database, or Azure Data Lake Storage, making it the correct choice for real-time analytics on streaming data.

Exam trap

The trap here is that candidates often confuse Azure Stream Analytics with Azure SQL Database or Azure Data Lake Storage, mistakenly thinking a traditional database or storage service can handle real-time streaming ingestion and processing, when in fact they lack the necessary low-latency, event-driven architecture.

How to eliminate wrong answers

Option A is wrong because Azure Analysis Services is an OLAP engine for creating semantic models and running ad-hoc analytical queries on pre-processed data, not for ingesting or processing real-time streaming data. Option B is wrong because Azure Data Lake Storage is a scalable and secure data lake for storing large volumes of raw or processed data, but it does not provide real-time stream ingestion or processing capabilities. Option C is wrong because Azure SQL Database is a relational database service for storing and querying structured data, not designed for high-throughput, low-latency stream ingestion or real-time event processing.

Full explanation →

531

MCQeasy

A healthcare organization stores patient medical records in a relational database with columns such as PatientID, Name, and DateOfBirth. They also store radiology images as DICOM files in Azure Blob Storage. Which statement correctly classifies these data types?

A.Both patient records and radiology images are structured data.

B.Patient records are semi-structured, and radiology images are unstructured.

C.Patient records are structured, and radiology images are unstructured.

D.Patient records are unstructured, and radiology images are semi-structured.

AnswerC

Patient records have fixed columns and data types (structured), while DICOM files are binary with no queryable schema (unstructured).

Why this answer

Patient records in a relational database with fixed columns like PatientID, Name, and DateOfBirth adhere to a predefined schema, making them structured data. Radiology images stored as DICOM files in Azure Blob Storage have no internal schema or tabular format and are therefore unstructured data. Option C correctly matches these classifications.

Exam trap

The trap here is conflating 'semi-structured' with 'structured' or 'unstructured'—candidates often misclassify relational database records as semi-structured because they have multiple columns, but the key is the rigid schema enforced by the relational model.

How to eliminate wrong answers

Option A is wrong because it incorrectly classifies radiology images as structured data; DICOM files in blob storage have no fixed schema or relational structure. Option B is wrong because patient records in a relational database are structured, not semi-structured; semi-structured data (e.g., JSON or XML) has tags or markers but no rigid schema. Option D is wrong because patient records are structured, not unstructured, and radiology images are unstructured, not semi-structured.

Full explanation →

532

MCQeasy

A company wants to provide self-service analytics to business users, allowing them to create reports and dashboards from data stored in Azure Data Lake Storage. Which tool should they use?

A.Azure Synapse Studio

B.Azure Machine Learning studio

C.Power BI

D.Azure Data Studio

AnswerC

Self-service BI tool for reports and dashboards.

Why this answer

Power BI is a self-service business analytics tool that can connect to Azure Data Lake Storage. Option B is wrong because Azure Data Studio is for database management. Option C is wrong because Azure Synapse Studio is for data engineers.

Option D is wrong because Azure Machine Learning studio is for ML.

Full explanation →

533

MCQmedium

A logistics company tracks shipment locations using GPS devices that send JSON data with fields: shipmentId, latitude, longitude, timestamp, speed. The data is stored in Azure Cosmos DB using the Core (SQL) API. The application needs to query all shipments that are currently within a specific geographic bounding box and have a speed greater than 0. Which query approach should they use to efficiently retrieve the data?

A.Use a BETWEEN clause on latitude and longitude and a WHERE clause for speed.

B.Use ST_WITHIN to specify the bounding box polygon and add a WHERE clause for speed.

C.Use ST_DISTANCE to measure distance from a center point and also filter on speed.

D.Use the IN operator to list all acceptable coordinate pairs and a speed filter.

AnswerB

ST_WITHIN efficiently uses the geospatial index to find points within the specified area, and combining it with a speed filter is the recommended pattern.

Why this answer

Option B is correct because Azure Cosmos DB's Core (SQL) API supports geospatial queries using the ST_WITHIN function, which efficiently checks if a point (latitude/longitude) lies inside a polygon (bounding box). Adding a WHERE clause for speed > 0 further filters the results, and Cosmos DB can leverage a composite index on the geospatial field and speed to optimize query performance.

Exam trap

The trap here is that candidates often assume simple range filters (BETWEEN) are sufficient for geospatial queries, overlooking that Cosmos DB requires dedicated spatial functions (ST_WITHIN, ST_DISTANCE) to utilize its spatial index and achieve efficient bounding box queries.

How to eliminate wrong answers

Option A is wrong because BETWEEN on latitude and longitude does not perform true geospatial bounding box filtering; it would require separate range checks and cannot use Cosmos DB's spatial index, leading to inefficient full scans. Option C is wrong because ST_DISTANCE calculates distance from a center point, which is not suitable for a bounding box query and would require additional logic to define the box, plus it cannot directly filter for points within a rectangular region. Option D is wrong because the IN operator is designed for exact matches on discrete values, not for range or geospatial queries, and listing all acceptable coordinate pairs is impractical and would not use spatial indexing.

Full explanation →

534

MCQmedium

A company uses Azure SQL Database for its e-commerce platform. The reporting team runs complex, long-running queries that join multiple tables and would degrade performance of the transactional workload if executed on the primary database. Which Azure SQL Database feature should the company enable to isolate the reporting queries while ensuring read-only access to the most current data?

A.Automatic tuning

B.Geo-replication

C.Read scale-out (read-only replicas)

D.Elastic query

AnswerC

Read scale-out provides a read-only replica in the same region that can be used for reporting without impacting the primary.

Why this answer

Read scale-out (read-only replicas) allows you to offload reporting queries to a read-only replica of the Azure SQL Database, ensuring that complex, long-running queries do not degrade the performance of the primary transactional workload. The replica provides access to the most current data because it uses snapshot isolation and is transactionally consistent with the primary database.

Exam trap

The trap here is that candidates often confuse Geo-replication (which also provides readable secondaries) with read scale-out, but Geo-replication is primarily for disaster recovery and involves asynchronous replication, whereas read scale-out is designed for read workload isolation within the same region with synchronous consistency.

How to eliminate wrong answers

Option A is wrong because Automatic tuning is a performance optimization feature that automatically adjusts index creation, query plan choices, and other settings based on workload patterns, but it does not isolate read-only workloads or provide a separate replica. Option B is wrong because Geo-replication creates readable secondary replicas in a different Azure region for disaster recovery and geographic distribution, but it is not designed for read-scale out within the same region and introduces latency for current data due to asynchronous replication. Option D is wrong because Elastic query enables cross-database queries across multiple Azure SQL databases, but it does not provide a dedicated read-only replica for offloading queries from a single database.

Full explanation →

535

Multi-Selecthard

Which TWO tools can be used to transform data in an Azure data pipeline?

Select 2 answers

A.Power BI Desktop

B.Microsoft Purview

C.Azure Databricks notebooks

D.Azure Storage Explorer

E.Azure Data Factory Data Flow

AnswersC, E

Enables code-based transformations using Spark.

Why this answer

Azure Databricks notebooks are correct because they provide an interactive, collaborative environment for data transformation using Apache Spark. You can write code in Python, Scala, SQL, or R to perform complex ETL operations, and the notebooks integrate directly with Azure Data Factory as a compute target in a pipeline.

Exam trap

The trap here is that candidates often confuse data transformation tools with data governance or storage management tools, leading them to select options like Microsoft Purview or Azure Storage Explorer, which serve entirely different purposes in the Azure analytics ecosystem.

Full explanation →

536

MCQhard

You are a data architect for a healthcare organization. The organization needs to build a real-time analytics solution to monitor patient vital signs from IoT devices. The data arrives at a rate of 10,000 events per second. Each event contains patient ID, timestamp, heart rate, blood pressure, and oxygen saturation. The solution must alert clinicians within 10 seconds when a patient's vital signs exceed predefined thresholds. Additionally, the solution must store the raw data for historical analysis and compliance. You plan to use Azure Event Hubs for ingestion. Which combination of services should you use to meet the requirements? Consider: processing low latency alerts, storing raw data in cost-effective storage, and enabling historical analytics. You also need to ensure that the solution can scale to handle future growth.

A.Use Azure Databricks with Structured Streaming, store data in Delta Lake, and use Power BI for real-time dashboards

B.Use Azure Data Factory to batch ingest events every minute, store in Azure Blob Storage, and use Azure Analysis Services for historical analytics

C.Use Azure Functions to process events, store data in Azure Cosmos DB, and use Power BI for historical analytics

D.Use Azure Stream Analytics for real-time processing and alerting, output data to Azure Data Lake Storage, and use Azure Synapse Serverless SQL for historical analytics

AnswerD

Stream Analytics provides low-latency stream processing, Data Lake Storage is cost-effective for raw data, and Synapse Serverless allows querying the data lake.

Why this answer

Option D is correct because Azure Stream Analytics provides low-latency (sub-second) stream processing and can trigger alerts within the 10-second requirement. Outputting raw data to Azure Data Lake Storage (ADLS) offers cost-effective storage for compliance, and Azure Synapse Serverless SQL enables on-demand historical analytics without provisioning dedicated compute, scaling automatically for future growth.

Exam trap

The trap here is that candidates often confuse real-time processing with batch or micro-batch tools (like Databricks or Data Factory) or choose a transactional database (Cosmos DB) for raw storage, overlooking the cost and latency trade-offs required for high-throughput IoT scenarios.

How to eliminate wrong answers

Option A is wrong because Azure Databricks with Structured Streaming introduces higher latency (typically seconds to minutes) and operational overhead, making it less suitable for sub-10-second alerting, and Delta Lake on Databricks is not as cost-effective for raw data storage as ADLS. Option B is wrong because Azure Data Factory batch ingestion every minute violates the 10-second alert requirement, and Azure Analysis Services is designed for OLAP on pre-aggregated data, not for direct historical analytics on raw event data. Option C is wrong because Azure Functions are stateless and not optimized for high-throughput (10,000 events/sec) real-time stream processing, and Azure Cosmos DB is a transactional database with higher cost per GB, making it less cost-effective for storing raw historical data compared to ADLS.

Full explanation →

537

MCQmedium

A SaaS provider hosts databases for hundreds of clients, each with light and sporadic usage. They notice that using separate single databases with provisioned DTUs is cost-inefficient. They need to consolidate the databases while optimizing cost and ensuring that a busy client does not monopolize resources. Which Azure SQL Database option should they use?

A.Single database with provisioned DTU

B.Elastic pool

C.Managed Instance

D.SQL Server on Azure VM

AnswerB

Elastic pools consolidate multiple databases into a shared resource pool. They are designed for SaaS applications with many databases having low average usage and unpredictable spikes, providing cost savings and performance isolation.

Why this answer

An Azure SQL Database elastic pool is designed to optimize cost for multiple databases with light and sporadic usage by sharing a fixed set of resources (eDTUs) across all databases in the pool. This prevents a busy client from monopolizing resources through per-database resource limits (min and max DTU), ensuring performance isolation while consolidating costs.

Exam trap

The trap here is that candidates often confuse elastic pools with single databases, thinking that provisioned DTUs are always cheaper for sporadic workloads, but they miss that elastic pools share resources across databases to eliminate over-provisioning waste.

How to eliminate wrong answers

Option A is wrong because a single database with provisioned DTUs allocates dedicated resources per database, which is cost-inefficient for hundreds of lightly used databases due to wasted capacity and higher per-database costs. Option C is wrong because Azure SQL Managed Instance is a fully managed instance with fixed resources, designed for lift-and-shift migrations, not for consolidating hundreds of small databases with sporadic usage; it lacks the elastic pooling feature for shared resource allocation. Option D is wrong because SQL Server on Azure VM requires manual management of OS and SQL Server, including resource governance, and incurs costs for the VM and licensing, making it more expensive and complex than an elastic pool for this scenario.

Full explanation →

538

MCQmedium

A company uses Azure Data Factory to run a pipeline that copies new orders from an on-premises SQL Server database to Azure Data Lake Storage every hour. After the data is in the data lake, an Azure Databricks notebook transforms it and loads it into Azure Synapse Analytics for reporting. Which type of data processing does the hourly copy operation represent?

A.Real-time streaming

B.Batch processing

C.Interactive query

D.Transactional processing

AnswerB

The hourly copy operation processes data in discrete, scheduled batches, which is the definition of batch processing.

Why this answer

The hourly copy operation from on-premises SQL Server to Azure Data Lake Storage is a classic batch processing pattern: data is collected over a fixed time interval (1 hour) and processed as a single unit. Azure Data Factory orchestrates this scheduled, non-continuous transfer, which aligns with batch processing's definition of handling data in discrete, periodic chunks rather than in real-time.

Exam trap

The trap here is that candidates confuse scheduled data movement (batch) with real-time streaming, especially when the pipeline runs frequently (e.g., every hour), but the key distinction is that batch processes data in discrete intervals, not continuously as it arrives.

How to eliminate wrong answers

Option A is wrong because real-time streaming requires continuous, low-latency ingestion (e.g., via Azure Event Hubs or Stream Analytics), not a scheduled hourly copy. Option C is wrong because interactive query refers to on-demand, ad-hoc exploration of data (e.g., using Azure Synapse Serverless SQL or Azure Data Explorer), not a scheduled, automated data movement. Option D is wrong because transactional processing (OLTP) involves individual, atomic transactions (e.g., INSERT/UPDATE in SQL Server), not bulk, periodic data movement between systems.

Full explanation →

539

MCQmedium

A company is migrating an on-premises SQL Server database to Azure. They want to ensure that database administrators (DBAs) can perform administrative tasks but cannot view sensitive customer data in query results. Which Azure SQL feature should they implement?

A.Dynamic Data Masking

B.Always Encrypted

C.Transparent Data Encryption

D.Row-Level Security

AnswerB

Always Encrypted encrypts data on the client side, so the database never sees plaintext. DBAs cannot access the encryption keys and therefore cannot view the sensitive data.

Why this answer

Always Encrypted ensures that sensitive data is encrypted at all times, including during query processing, and that the encryption keys are never revealed to the database engine. This allows DBAs to perform administrative tasks (e.g., backups, index maintenance) while being unable to view the plaintext data in query results, because the decryption occurs only on the client side.

Exam trap

The trap here is that candidates confuse Dynamic Data Masking with Always Encrypted, assuming masking prevents DBAs from seeing data, when in fact masking can be overridden by users with higher permissions, whereas Always Encrypted cryptographically prevents any server-side access to plaintext.

How to eliminate wrong answers

Option A (Dynamic Data Masking) is wrong because it only obfuscates data in query results for non-privileged users, but DBAs with elevated permissions (e.g., db_owner) can still view the unmasked data by running queries directly. Option C (Transparent Data Encryption) is wrong because it encrypts data at rest on disk but does not protect data in use or in query results; DBAs with access to the database can still read plaintext data. Option D (Row-Level Security) is wrong because it restricts which rows a user can see based on a predicate function, but it does not prevent DBAs (who typically have db_owner or higher privileges) from bypassing the policy or viewing all rows.

Full explanation →

540

Multi-Selectmedium

Which TWO Azure services can be used to host a relational database that is compatible with SQL Server?

Select 2 answers

A.Azure SQL Database

B.Azure Database for PostgreSQL

C.Azure Cosmos DB

D.Azure SQL Managed Instance

E.Azure Database for MySQL

AnswersA, D

Fully managed SQL Server database engine.

Why this answer

Options A and D are correct. Azure SQL Database and Azure SQL Managed Instance both provide SQL Server compatibility. Option B is wrong because Azure Cosmos DB is NoSQL.

Option C is wrong because Azure Database for MySQL is MySQL. Option E is wrong because Azure Database for PostgreSQL is PostgreSQL.

Full explanation →

541

MCQmedium

A data analyst needs to combine sales data from Azure SQL Database and inventory data from Azure Cosmos DB into a single Power BI report. Which Power BI feature should they use?

A.Power Query

B.Power BI Desktop

C.DAX formulas

D.Dataflows

AnswerA

Power Query enables connecting to and merging data from multiple sources.

Why this answer

Power Query is the correct feature because it is the data connection and transformation engine in Power BI that allows you to connect to multiple data sources—such as Azure SQL Database and Azure Cosmos DB—and combine them into a single dataset for reporting. It provides a graphical interface to merge, append, and shape data from disparate sources before loading it into the data model, which is exactly what the analyst needs to do.

Exam trap

The trap here is that candidates often confuse the tool (Power BI Desktop) with the feature (Power Query), or they mistakenly think DAX is used for data integration, when in fact DAX operates only on data already in the model, not on source connections.

How to eliminate wrong answers

Option B (Power BI Desktop) is wrong because Power BI Desktop is the application that hosts Power Query, not the specific feature for combining data from multiple sources; it is the environment where the report is built, not the tool for data integration. Option C (DAX formulas) is wrong because DAX (Data Analysis Expressions) is used for creating calculated columns, measures, and custom aggregations within the data model after data is loaded, not for connecting to or combining data from different source systems. Option D (Dataflows) is wrong because Dataflows are a cloud-based ETL tool for preparing and reusing data across workspaces, but they are not the direct feature used within a single Power BI Desktop report to combine live connections from Azure SQL Database and Azure Cosmos DB; Power Query is the immediate tool for that task.

Full explanation →

542

MCQmedium

A social media company stores user posts in Azure Cosmos DB. Posts are frequently queried by user ID and creation timestamp. To minimize Request Units (RU) per query, which property should be chosen as the partition key?

A.User ID

B.Timestamp

C.Post content

D.A composite key of user ID and timestamp

AnswerA

User ID is frequently used in queries and can distribute data evenly, making it an ideal partition key.

Why this answer

User ID is the correct partition key because it evenly distributes writes and reads across physical partitions, ensuring that queries filtering by user ID and timestamp are scoped to a single partition. This minimizes cross-partition queries, which consume more Request Units (RU) than single-partition queries. Azure Cosmos DB routes each query to the partition containing the matching partition key value, so choosing User ID keeps most queries efficient.

Exam trap

The trap here is that candidates often choose a composite key (Option D) thinking it improves query efficiency, but they overlook that Azure Cosmos DB requires the partition key to be a single property in the filter for single-partition queries, and a composite key would not be used as a single partition key unless explicitly defined as such in the container.

How to eliminate wrong answers

Option B (Timestamp) is wrong because using timestamp as the partition key would cause hot partitions—all posts created at the same time would land on the same physical partition, leading to throttling and uneven RU consumption. Option C (Post content) is wrong because post content is not a query filter and would result in unpredictable, non-uniform data distribution, causing cross-partition scans for every query. Option D (A composite key of user ID and timestamp) is wrong because while it might seem logical, it would force every query to include both values in the filter to target a single partition; queries filtering only by user ID would become cross-partition, increasing RU cost.

Full explanation →

543

Multi-Selecthard

Which THREE of the following are valid considerations when choosing between Azure Cosmos DB and Azure Table Storage?

Select 3 answers

A.Table Storage supports multi-region writes

B.Cosmos DB provides multiple consistency levels

C.Cosmos DB supports multi-region writes

D.Cosmos DB does not support JSON documents

E.Cosmos DB automatically indexes all properties

AnswersB, C, E

Cosmos DB offers five consistency models.

Why this answer

Options A, B, and C are correct. Cosmos DB offers multi-region writes (A), multiple consistency models (B), and automatic indexing (C). Option D is wrong because both support serverless.

Option E is wrong because both support JSON.

Full explanation →

544

MCQeasy

A data analyst needs to query a large dataset stored in Azure Blob Storage using serverless SQL pool in Azure Synapse Analytics. Which data format should they use to minimize storage costs while still supporting efficient querying?

A.CSV

B.JSON

C.Parquet

D.Avro

AnswerC

Parquet is a columnar format that offers high compression and efficient query performance, minimizing storage costs.

Why this answer

Parquet is a columnar storage format that compresses data efficiently and supports predicate pushdown, allowing serverless SQL pool in Azure Synapse to read only the necessary columns and rows. This minimizes storage costs while maintaining high query performance, unlike row-oriented formats such as CSV or JSON.

Exam trap

The trap here is that candidates often assume all compressed formats (like Avro) are equally efficient for analytics, but Azure Synapse serverless SQL pool is specifically optimized for columnar formats like Parquet, not row-oriented ones.

How to eliminate wrong answers

Option A is wrong because CSV is a row-oriented, plain-text format with no compression or schema, leading to larger storage footprint and slower queries due to full file scans. Option B is wrong because JSON is also row-oriented and self-describing, resulting in poor compression and inefficient querying as serverless SQL pool must parse the entire file. Option D is wrong because Avro, while compact and schema-based, is row-oriented and not optimized for analytical queries that benefit from columnar storage and predicate pushdown.

Full explanation →

545

MCQeasy

A retail company wants to store product catalog data in a non-relational format. The data includes product ID, name, description, price, and an array of tags. The data is frequently updated and must support low-latency reads and writes at global scale. Which Azure service should they use?

A.Azure Cosmos DB

B.Azure Table Storage

C.Azure Blob Storage

D.Azure Cache for Redis

AnswerA

Globally distributed NoSQL database with low-latency and JSON support.

Why this answer

Option B is correct because Azure Cosmos DB is a globally distributed NoSQL database that supports low-latency reads/writes and can store JSON documents with arrays. Option A is wrong because Azure Blob Storage is for unstructured blob data, not transactional updates. Option C is wrong because Azure Table Storage is a key-value store but does not support complex queries or global distribution as well as Cosmos DB.

Option D is wrong because Azure Cache for Redis is an in-memory cache, not a durable data store.

Full explanation →

546

MCQmedium

Your organization uses Azure SQL Database and needs to audit all database operations for compliance. The audit logs must be stored for at least five years and be easily searchable. What should you configure?

A.Enable auditing and store logs in Azure Blob Storage with a retention policy of five years.

B.Enable auditing and store logs in Azure Monitor Logs.

C.Use Azure Sentinel to collect and store audit logs.

D.Enable Transparent Data Encryption (TDE) to track changes.

AnswerA

Auditing with Blob Storage allows long-term retention and searchability.

Why this answer

Option D is correct because Azure SQL Database auditing can store logs in Azure Storage or Log Analytics. Long-term retention is best achieved by sending logs to Azure Storage with a retention policy. Option A (Azure Blob Storage) is the target, but the configuration is auditing.

Option B (Azure Monitor) can be used but may be costly for long-term retention. Option C (Azure Sentinel) is for SIEM, not primary audit storage.

Full explanation →

547

MCQmedium

Your team is building a real-time dashboard for monitoring website traffic. The data source is streaming click events from Azure Event Hubs. The dashboard must update within seconds. Which Azure service should you use to process the stream?

A.Azure Stream Analytics

B.Azure Synapse Pipelines

C.Azure Data Factory

D.Azure Databricks Structured Streaming

AnswerA

Stream Analytics is designed for real-time stream processing with sub-second latency and direct integration with Power BI.

Why this answer

Azure Stream Analytics is designed for real-time stream processing with low-latency output, making it ideal for processing click events from Event Hubs and updating a dashboard within seconds. It provides a SQL-like query language to define transformations and can output directly to Power BI or other visualization tools for near-instantaneous dashboard updates.

Exam trap

Microsoft often tests the misconception that any data processing service can handle streaming, but the trap here is that Azure Data Factory and Synapse Pipelines are batch-oriented, while Databricks Structured Streaming, though capable, is not the simplest or most cost-effective choice for a quick, SQL-based real-time dashboard.

How to eliminate wrong answers

Option B (Azure Synapse Pipelines) is wrong because it is primarily an orchestration tool for data movement and transformation in batch scenarios, not for real-time stream processing with sub-second latency. Option C (Azure Data Factory) is wrong because it is a cloud-based ETL service for batch data integration and scheduling, lacking native support for continuous streaming inputs like Event Hubs. Option D (Azure Databricks Structured Streaming) is wrong because while it can process streams, it is a more complex, code-heavy solution (Spark-based) that is overkill for simple dashboard updates and does not offer the same turnkey, low-latency output to Power BI as Stream Analytics.

Full explanation →

548

MCQmedium

A retail company runs analytical reporting queries on a large Sales table in Azure SQL Database. The table contains over 100 million rows and is updated daily with new transactions. The queries aggregate data by product and month, scanning millions of rows per query. The company wants to significantly reduce query execution time without changing the queries. Which indexing strategy should they implement?

A.Create a clustered columnstore index on the table.

B.Create a nonclustered index on the ProductID column.

C.Create a filtered index for the most recent month's data.

D.Create a clustered rowstore index (default) and rely on database compression.

AnswerA

A clustered columnstore index is designed for analytical workloads. It compresses data and allows efficient scanning of columns, making aggregation queries much faster.

Why this answer

A clustered columnstore index is ideal for large data warehousing and analytical workloads because it stores data column-wise, enabling high compression and batch-mode processing. For queries that aggregate millions of rows by product and month, columnstore indexes dramatically reduce I/O and CPU by scanning only the necessary columns and using segment elimination, which directly addresses the requirement to reduce query execution time without changing the queries.

Exam trap

The trap here is that candidates often choose a nonclustered index (B) thinking it will speed up all queries, but they overlook that analytical aggregations on millions of rows require columnstore's batch processing and column elimination, not row-based index seeks.

How to eliminate wrong answers

Option B is wrong because a nonclustered index on ProductID would only speed up point lookups or small range scans, not large aggregations scanning millions of rows; it would likely cause key lookups and still require scanning most of the table. Option C is wrong because a filtered index for the most recent month's data would only benefit queries restricted to that month, but the existing queries aggregate across all months and would not use the filtered index, leaving the full scan overhead unchanged. Option D is wrong because a clustered rowstore index with compression reduces storage size but does not change the fundamental row-based storage and scan pattern; queries still scan all rows and columns, so execution time remains high for large aggregations.

Full explanation →

549

MCQhard

A healthcare organization must store patient health records for 7 years to meet regulatory requirements. After 7 years, data must be deleted immediately. They use Azure Blob Storage. Which policy should they implement?

A.Soft delete policy

B.Legal hold policy

C.Lifecycle management policy with deletion after 7 years

D.Time-based retention policy

AnswerD

Time-based retention retains data for the specified period and then allows deletion.

Why this answer

A time-based retention policy (immutability policy) in Azure Blob Storage ensures that blobs are stored in a WORM (Write Once, Read Many) state for a specified period, preventing modification or deletion. After the retention period expires, the data can be deleted immediately, meeting the 7-year regulatory requirement. This policy is designed specifically for compliance scenarios where data must be preserved for a fixed duration and then removed.

Exam trap

The trap here is that candidates confuse lifecycle management (which automates deletion but does not prevent premature modification) with time-based retention (which enforces immutability during the retention period), leading them to choose lifecycle management despite its inability to guarantee data integrity before deletion.

How to eliminate wrong answers

Option A is wrong because a soft delete policy only protects against accidental deletion by retaining deleted blobs for a configurable period, but it does not enforce a minimum retention duration or guarantee immediate deletion after 7 years. Option B is wrong because a legal hold policy indefinitely prevents deletion or modification of blobs for legal or investigation purposes, with no automatic expiration, so it cannot enforce a fixed 7-year retention followed by deletion. Option C is wrong because a lifecycle management policy can delete blobs after a specified age, but it does not prevent modification or deletion during the retention period, meaning data could be altered or deleted before 7 years, violating compliance requirements.

Full explanation →

550

MCQmedium

A company has an Azure SQL Database that stores sensitive financial data. They need to ensure that database administrators (DBAs) cannot view the actual data but can still perform administrative tasks. Which feature should they implement?

A.Azure role-based access control (RBAC)

B.Dynamic data masking

C.Transparent data encryption (TDE)

D.Azure SQL Database auditing

AnswerB

Dynamic data masking can obfuscate sensitive data for specified users.

Why this answer

Option B is correct because dynamic data masking hides sensitive data from non-privileged users, including DBAs if they are not exempted. Option A is wrong because TDE encrypts at rest but does not hide data from DBAs. Option C is wrong because Azure RBAC controls access but doesn't mask data.

Option D is wrong because Azure SQL Database auditing logs actions but doesn't prevent viewing.

Full explanation →

551

MCQmedium

A smart building company stores IoT sensor data in Azure Cosmos DB using the NoSQL API. Each document contains fields: deviceId (partition key), timestamp, temperature, and humidity. The most common query is to retrieve all readings for a specific device within a time range, which runs efficiently. However, the analytics team occasionally runs a query to find all devices that reported a temperature above 50 degrees Celsius in the last hour, without specifying deviceId. This query is very slow and consumes a high number of request units (RUs). What is the most likely reason for the slow performance and high RU consumption?

A.The query does not use the partition key, causing a cross-partition scan.

B.The query is not using an index on the temperature field.

C.The time range filter is too large, causing a full table scan.

D.The document size is too large, increasing RU per read.

AnswerA

When a query does not include the partition key, Cosmos DB must execute the query across all partitions, which increases latency and RU consumption. This is the most likely cause.

Why this answer

The query does not include the partition key (deviceId) in the filter, so Azure Cosmos DB cannot route it to a single physical partition. Instead, it must fan out the query to every partition, scanning all documents across the container. This cross-partition query consumes significantly more RUs and takes longer because each partition must be queried sequentially or in parallel, and the results are merged server-side.

Exam trap

The trap here is that candidates may assume indexing is the culprit (Option B) because they think a missing index causes slow queries, but Azure Cosmos DB indexes all fields automatically, so the real issue is the missing partition key forcing a cross-partition scan.

How to eliminate wrong answers

Option B is wrong because Azure Cosmos DB automatically indexes all fields by default (unless the indexing policy is explicitly overridden), so the temperature field is already indexed; the slowness is not due to a missing index. Option C is wrong because the time range filter is only one hour, which is a narrow window; the performance issue is caused by the lack of partition key, not the size of the time range. Option D is wrong because document size affects RU cost per read, but the primary reason for the high RU consumption and slowness is the cross-partition scan, not the size of individual documents.

Full explanation →

552

MCQeasy

Your team needs to store JSON documents that require schema flexibility and global distribution. Which Azure data store should you choose?

A.Azure Table Storage

B.Azure SQL Database

C.Azure Blob Storage

D.Azure Cosmos DB

AnswerD

Azure Cosmos DB supports flexible schema and global distribution.

Why this answer

Option B is correct because Azure Cosmos DB is a globally distributed NoSQL database that natively supports JSON documents with flexible schema. Option A is wrong because Azure SQL Database is relational and enforces schema. Option C is wrong because Blob Storage stores unstructured data but does not provide a query interface for JSON documents.

Option D is wrong because Table Storage is for key-value data, not document storage.

Full explanation →

553

MCQeasy

A company stores user profile images in Azure Blob Storage. Each image is accessed via a URL that includes a Shared Access Signature (SAS) token generated using the storage account key. The company needs to immediately revoke access to all images for a specific user. Which action should they take?

A.Delete the individual SAS tokens associated with that user's images.

B.Change the storage account access keys.

C.Delete the container containing the user's images.

D.Regenerate the SAS token for each image.

AnswerB

All SAS tokens that were generated using the account key are invalidated when the key is changed or regenerated, instantly revoking access.

Why this answer

Option B is correct because changing the storage account access keys invalidates all SAS tokens that were generated using those keys, including any existing tokens. This immediately revokes access to all images for all users, including the specific user, without needing to manage individual tokens. SAS tokens are signed with the account key, so rotating the key renders all tokens generated with the old key invalid.

Exam trap

The trap here is that candidates think SAS tokens can be individually deleted or regenerated, but Azure does not maintain a token registry; the only way to invalidate all tokens derived from an account key is to rotate the key itself.

How to eliminate wrong answers

Option A is wrong because SAS tokens are not stored or managed individually by Azure; they are generated on-the-fly and embedded in URLs, so there is no central list of tokens to delete. Option C is wrong because deleting the entire container would remove all images for all users, which is excessive and not targeted to a specific user. Option D is wrong because regenerating the SAS token for each image would require knowing each token and would not revoke access to existing tokens that are already distributed; it also does not scale for immediate revocation.

Full explanation →

554

MCQmedium

An e-commerce application processes customer orders. When an order is placed, the system must decrement the inventory count and process the payment. The application ensures that either both operations complete successfully or both are rolled back if any error occurs. Which database property does this guarantee?

A.Atomicity

B.Consistency

C.Isolation

D.Durability

AnswerA

Atomicity ensures that a transaction is an all-or-nothing operation. If any part of the transaction fails, the entire transaction is rolled back, preventing partial updates.

Why this answer

Atomicity ensures that a transaction is treated as a single, indivisible unit of work: either all operations within it (decrement inventory and process payment) complete successfully, or none are applied. If any part fails, the database rolls back all changes, maintaining the 'all-or-nothing' guarantee. This is the core property described in the scenario.

Exam trap

The trap here is that candidates confuse atomicity with consistency, thinking that 'keeping data valid' is the same as 'all-or-nothing execution,' but atomicity is specifically about the transaction's indivisibility, not about data integrity rules.

How to eliminate wrong answers

Option B (Consistency) is wrong because consistency ensures that a transaction brings the database from one valid state to another, preserving defined rules (e.g., constraints, triggers), but it does not guarantee the all-or-nothing rollback of multiple operations. Option C (Isolation) is wrong because isolation controls how concurrent transactions are visible to each other (e.g., preventing dirty reads), not whether a transaction's operations are treated as a single unit. Option D (Durability) is wrong because durability guarantees that once a transaction is committed, its changes persist even after a system failure, but it does not address the rollback of partial failures within a transaction.

Full explanation →

555

MCQmedium

Your application uses Azure Table storage to store user preferences. You need to retrieve all preferences for a specific user quickly. Which key should you use as the partition key?

A.Region

B.Random GUID

C.UserID

D.Timestamp

AnswerC

UserID groups all preferences for a user in one partition.

Why this answer

Option A is correct because using UserID as partition key ensures all data for a user is stored in one partition, enabling fast point queries. Option B is wrong because Timestamp as partition key would scatter user data. Option C is wrong because Region may not group user data.

Option D is wrong because random GUID would scatter data.

Full explanation →

556

MCQhard

An organization stores sensitive customer data in Azure SQL Database. They need to encrypt the data at rest and ensure that only authorized applications can decrypt it. Which combination of features should they implement?

A.Row-Level Security (RLS) and Transparent Data Encryption (TDE)

B.Azure SQL Database Auditing and Transparent Data Encryption (TDE)

C.Transparent Data Encryption (TDE) and Always Encrypted

D.Dynamic Data Masking and Always Encrypted

AnswerC

TDE encrypts the database at rest; Always Encrypted allows client-side encryption and decryption by authorized applications.

Why this answer

Option C is correct because TDE encrypts the database at rest, and Always Encrypted ensures that only authorized applications with the column encryption key can decrypt sensitive columns. Option A is wrong because row-level security controls access but does not encrypt data. Option B is wrong because dynamic data masking obfuscates data but does not encrypt.

Option D is wrong because auditing logs access but does not encrypt.

Full explanation →

557

MCQeasy

A company stores three types of data: 1) Customer orders in a SQL table with fixed columns for OrderID, CustomerID, and OrderDate. 2) Product reviews in XML files where each file contains varying tags such as <rating> and <comment>. 3) Video files of product demonstrations. Which of the following correctly classifies these data types in order from first to third?

A.Structured, Semi-structured, Unstructured

B.Semi-structured, Unstructured, Structured

C.Unstructured, Structured, Semi-structured

D.Structured, Unstructured, Semi-structured

AnswerA

This correctly identifies the SQL table as structured, the XML files as semi-structured, and video files as unstructured.

Why this answer

Customer orders in a SQL table with fixed columns (OrderID, CustomerID, OrderDate) are structured data because they conform to a rigid schema. Product reviews in XML files with varying tags like <rating> and <comment> are semi-structured data because they have tags/metadata but no fixed schema. Video files of product demonstrations are unstructured data because they lack any predefined data model or organization.

Exam trap

Microsoft often tests the distinction between semi-structured and unstructured data by using XML/JSON as semi-structured examples, where candidates mistakenly classify them as unstructured due to the lack of a fixed schema, ignoring the presence of metadata tags.

How to eliminate wrong answers

Option B is wrong because it incorrectly classifies XML files as semi-structured (correct) but then misorders video files as unstructured (should be third) and SQL tables as structured (should be first), reversing the correct sequence. Option C is wrong because it starts with unstructured (video files) instead of structured (SQL tables), and places structured last, which is the opposite of the correct order. Option D is wrong because it correctly identifies SQL tables as structured first but then incorrectly swaps semi-structured (XML) and unstructured (video), placing video before XML.

Full explanation →

558

MCQeasy

A company wants to build a near-real-time analytics solution on Azure. IoT devices send telemetry data to Azure Event Hubs. The data must be processed and stored in Azure Cosmos DB for low-latency queries. Which Azure service should be used to process the streaming data?

A.Azure Logic Apps

B.Azure Functions

C.Azure Stream Analytics

D.Azure Data Factory

AnswerC

Azure Stream Analytics is a real-time analytics service that can process streaming data from Event Hubs and write results to Cosmos DB.

Why this answer

Azure Stream Analytics is the correct choice because it is a fully managed stream processing engine designed specifically for real-time analytics on high-throughput data streams from sources like Azure Event Hubs. It can run SQL-like queries to filter, aggregate, and join streaming data, and output results directly to Azure Cosmos DB for low-latency queries, making it ideal for near-real-time IoT analytics.

Exam trap

The trap here is that candidates often confuse Azure Functions (a general-purpose event-driven compute service) with a dedicated stream processing engine, overlooking that Functions lacks native support for continuous streaming, windowing, and exactly-once semantics required for near-real-time analytics.

How to eliminate wrong answers

Option A is wrong because Azure Logic Apps is a workflow orchestration service for integrating apps and data, not a stream processing engine; it lacks the ability to handle high-throughput, continuous streaming data with sub-second latency. Option B is wrong because Azure Functions is an event-driven compute service that can process individual events, but it is not optimized for continuous stream processing across large volumes of data and does not provide built-in windowing, aggregation, or exactly-once semantics for streaming analytics. Option D is wrong because Azure Data Factory is a data integration and orchestration service for batch and scheduled data movement, not for real-time stream processing; it cannot process streaming data from Event Hubs in near-real-time.

Full explanation →

559

MCQmedium

Your team is migrating on-premises SQL Server databases to Azure. They need to minimize application changes and support both OLTP and reporting workloads. Which Azure data service supports hybrid transactional and analytical processing (HTAP)?

A.Azure Analysis Services

B.Azure SQL Managed Instance

C.Azure SQL Database with Hyperscale

D.Azure Data Factory

AnswerC

Hyperscale supports large databases and can handle both transactional and analytical queries.

Why this answer

Azure SQL Database with Hyperscale supports hybrid transactional and analytical processing (HTAP) by using built-in columnstore indexes and near-instantaneous snapshot isolation. This allows the same database to handle high-volume OLTP transactions while simultaneously running complex analytical queries on up-to-date data, minimizing application changes because the existing SQL Server code and tools work without modification.

Exam trap

The trap here is that candidates often confuse Azure SQL Database with Hyperscale (which supports HTAP) with Azure SQL Managed Instance (which is a full SQL Server instance but lacks the built-in HTAP architecture), leading them to choose Option B because it sounds like a direct lift-and-shift option.

How to eliminate wrong answers

Option A is wrong because Azure Analysis Services is a dedicated analytical engine that requires data to be extracted, transformed, and loaded (ETL) from a source, making it unsuitable for direct OLTP workloads and failing to minimize application changes. Option B is wrong because Azure SQL Managed Instance is a fully managed SQL Server instance that supports OLTP and reporting, but it does not natively provide the HTAP architecture; it requires separate read replicas or external reporting solutions to avoid performance impact on transactional workloads. Option D is wrong because Azure Data Factory is a cloud-based ETL and data integration service, not a database platform, and cannot directly serve OLTP or analytical queries.

Full explanation →

560

MCQmedium

A data analyst needs to run interactive SQL queries against petabytes of sales data stored in Parquet format in Azure Data Lake Storage Gen2. The analyst wants the fastest query performance for ad-hoc exploration without provisioning or managing any infrastructure. Which Azure service should they use?

A.A. Azure SQL Database

B.B. Azure Synapse Serverless SQL pool

C.C. Azure HDInsight

D.D. Azure Data Factory

AnswerB

Serverless SQL pool in Azure Synapse Analytics provides a distributed query engine that can query data directly in Azure Data Lake Storage (including Parquet) using T-SQL. It is serverless, scales automatically, and charges per query, making it ideal for interactive ad-hoc analytics.

Why this answer

Azure Synapse Serverless SQL pool is correct because it enables running interactive T-SQL queries directly against Parquet files in Azure Data Lake Storage Gen2 without provisioning any infrastructure. It uses a pay-per-query model and leverages a distributed query engine to deliver fast performance on petabytes of data, making it ideal for ad-hoc exploration.

Exam trap

The trap here is that candidates often confuse Azure Synapse Serverless SQL pool with Azure SQL Database, assuming both are for querying data, but Azure SQL Database cannot directly query external files in Data Lake Storage without additional tools like PolyBase.

How to eliminate wrong answers

Option A is wrong because Azure SQL Database is a provisioned, managed relational database service designed for transactional workloads, not for querying petabytes of data in Parquet format in Data Lake Storage. Option C is wrong because Azure HDInsight requires provisioning and managing a cluster (e.g., Spark or Hive) and is not serverless, contradicting the requirement of no infrastructure management. Option D is wrong because Azure Data Factory is an orchestration and data movement service, not an interactive query engine; it cannot run SQL queries directly against data in Data Lake Storage.

Full explanation →

561

MCQeasy

A startup has an application with unpredictable usage patterns on Azure SQL Database. They want to minimize cost by paying only for the compute they use and the database should automatically pause during idle periods. Which Azure SQL Database option should they choose?

A.Serverless

B.Provisioned (DTU or vCore)

C.Hyperscale

D.Business Critical

AnswerA

Correct. Serverless pricing charges for compute per second used and auto-pauses during inactivity, making it ideal for sporadic workloads.

Why this answer

Azure SQL Database Serverless is designed for applications with unpredictable usage patterns, as it automatically scales compute resources based on demand and pauses the database during idle periods to eliminate compute costs. This model charges per second for the compute used, making it the most cost-effective choice for workloads that have periods of inactivity.

Exam trap

The trap here is that candidates often confuse 'serverless' with 'Hyperscale' or assume all Azure SQL tiers support auto-pause, but only the Serverless tier provides automatic compute pausing and per-second billing for idle periods.

How to eliminate wrong answers

Option B (Provisioned DTU or vCore) is wrong because it requires a fixed amount of compute resources to be allocated at all times, even when the database is idle, leading to continuous billing and no auto-pause capability. Option C (Hyperscale) is wrong because it is optimized for large databases with high scalability and fast recovery, not for cost savings through auto-pause; it uses a provisioned compute model with no idle pause feature. Option D (Business Critical) is wrong because it is a high-availability tier with multiple replicas and fast failover, designed for mission-critical workloads, and does not support auto-pause or pay-per-use compute billing.

Full explanation →

562

MCQeasy

A data analyst receives a dataset containing customer order details stored in a CSV file, a JSON file with product reviews, and a folder of JPEG images of products. Which of the following correctly categorizes these data types from most structured to least structured?

A.CSV → JPEG → JSON

B.JSON → CSV → JPEG

C.CSV → JSON → JPEG

D.JPEG → JSON → CSV

AnswerC

Correct. CSV is structured, JSON is semi-structured, and JPEG is unstructured, so this is the correct order from most to least structured.

Why this answer

CSV files are highly structured with rows and columns defined by a schema, making them the most structured. JSON files are semi-structured, using key-value pairs and nested objects that allow flexibility but lack a fixed schema. JPEG images are unstructured binary data with no inherent schema, so the correct order from most to least structured is CSV → JSON → JPEG, making option C correct.

Exam trap

Microsoft often tests the misconception that JSON is more structured than CSV because it uses named keys, but in reality, CSV's fixed schema makes it more structured than JSON's flexible, self-describing format.

How to eliminate wrong answers

Option A is wrong because it places JPEG (unstructured) before JSON (semi-structured), incorrectly suggesting that binary image data is more structured than key-value pairs. Option B is wrong because it orders JSON before CSV, but CSV's rigid tabular schema is more structured than JSON's flexible hierarchy. Option D is wrong because it lists JPEG first, which is the least structured, and JSON before CSV, reversing the correct order of structuredness.

Full explanation →

563

MCQhard

A company's data engineering team uses Azure Data Factory to orchestrate a pipeline that ingests data from Azure Blob Storage, transforms it using Azure Databricks, and loads it into Azure Synapse Dedicated SQL Pool. The pipeline fails intermittently due to transient errors. Which pattern should they implement to improve reliability?

A.Replace Azure Databricks with Azure Functions

B.Increase the pipeline timeout to 24 hours

C.Split the pipeline into multiple smaller pipelines

D.Configure retry policy with exponential backoff on activities

AnswerD

Retry policies with backoff automatically retry failed activities due to transient errors.

Why this answer

Option D is correct because configuring a retry policy with exponential backoff on the Azure Data Factory activities directly addresses transient errors (e.g., network blips, throttling) by automatically retrying the failed activity after increasing delays. This pattern is specifically designed for intermittent failures and is a built-in feature of Azure Data Factory, improving pipeline reliability without architectural changes.

Exam trap

The trap here is that candidates confuse increasing timeout (Option B) with retry logic, or think splitting pipelines (Option C) improves reliability against transient errors, when in fact only a retry policy with backoff directly mitigates intermittent failures in Azure Data Factory.

How to eliminate wrong answers

Option A is wrong because replacing Azure Databricks with Azure Functions would remove the distributed compute engine needed for complex transformations, and Azure Functions are not designed for long-running, data-intensive ETL workloads. Option B is wrong because increasing the pipeline timeout to 24 hours does not handle transient errors; it only allows the pipeline to run longer, but a single transient failure still causes the entire pipeline to fail. Option C is wrong because splitting the pipeline into multiple smaller pipelines does not inherently handle transient errors; it may reduce blast radius but does not provide automatic retry logic for intermittent failures.

Full explanation →

564

MCQeasy

A company plans to migrate an on-premises SQL Server database to Azure. The database uses SQL Server Agent to run scheduled jobs and performs cross-database queries within the same instance. The company wants a fully managed PaaS solution that requires minimal application changes. Which Azure SQL deployment option should they choose?

A.Azure SQL Database (single database)

B.Azure SQL Managed Instance

C.Azure SQL Database elastic pool

D.SQL Server on Azure Virtual Machines

AnswerB

Azure SQL Managed Instance offers high compatibility with on-premises SQL Server, including SQL Server Agent, cross-database queries, and linked servers. It is fully managed and requires minimal changes to existing applications, making it the best fit for this migration.

Why this answer

Azure SQL Managed Instance is the correct choice because it provides full SQL Server Agent support for scheduled jobs and enables cross-database queries within the same instance, while being a fully managed PaaS service. It offers near 100% compatibility with on-premises SQL Server, minimizing application changes during migration.

Exam trap

The trap here is that candidates often choose Azure SQL Database (single database) or elastic pool because they are more commonly discussed as PaaS, but they overlook the specific requirements for SQL Server Agent and cross-database queries, which only Managed Instance supports.

How to eliminate wrong answers

Option A is wrong because Azure SQL Database (single database) does not support SQL Server Agent for scheduled jobs and cannot perform cross-database queries within the same instance (it uses elastic query or external tables instead). Option C is wrong because Azure SQL Database elastic pool is a resource-sharing model for multiple single databases, inheriting the same limitations as single databases (no SQL Server Agent, no cross-database queries). Option D is wrong because SQL Server on Azure Virtual Machines is an IaaS solution, not a fully managed PaaS, requiring the customer to manage the VM and SQL Server, which contradicts the requirement for minimal management overhead.

Full explanation →

565

MCQmedium

A development team is designing a relational database for a hospital patient management system. They need to ensure that each patient's medical record number is unique and not null. Which database constraint should they use?

A.FOREIGN KEY

B.CHECK

C.UNIQUE

D.PRIMARY KEY

AnswerD

Correct. A PRIMARY KEY constraint enforces both uniqueness and non-null, making it ideal for the medical record number.

Why this answer

The PRIMARY KEY constraint enforces both uniqueness and non-nullability on the column(s) it is applied to. In a relational database, the medical record number is the natural candidate for the primary key of the Patient table because it uniquely identifies each patient and must always have a value. This directly meets the requirement that each patient's medical record number is unique and not null.

Exam trap

The trap here is that candidates often confuse UNIQUE with PRIMARY KEY, forgetting that UNIQUE allows NULL values (in most RDBMS implementations) and therefore does not satisfy the 'not null' requirement without an additional NOT NULL constraint.

How to eliminate wrong answers

Option A is wrong because a FOREIGN KEY constraint enforces referential integrity between two tables, not uniqueness or non-nullability on a single column. Option B is wrong because a CHECK constraint validates that column values meet a specified condition (e.g., age > 0) but does not enforce uniqueness or non-nullability by itself. Option C is wrong because a UNIQUE constraint ensures all values in a column are distinct but allows NULL values (unless combined with a NOT NULL constraint), so it does not guarantee the 'not null' requirement on its own.

Full explanation →

566

MCQeasy

A retail company operates an online store. The store processes each customer's order immediately upon submission, updating inventory and payment records in real-time. Additionally, the company's business analysts run weekly reports that aggregate sales data over the past month to identify trends. Which of the following correctly describes the two workload types represented in this scenario?

A.The order processing is an OLTP workload; the weekly reporting is an OLAP workload.

B.The order processing is an OLAP workload; the weekly reporting is an OLTP workload.

C.Both workloads are OLTP workloads.

D.Both workloads are batch processing workloads.

AnswerA

Correct. Order processing involves many small, real-time transactions (OLTP). Weekly reporting aggregates large volumes of historical data (OLAP).

Why this answer

The order processing system handles individual transactions (inserts/updates) in real-time, which is the hallmark of an Online Transaction Processing (OLTP) workload. The weekly reporting aggregates large volumes of historical data for trend analysis, which is an Online Analytical Processing (OLAP) workload. OLTP is optimized for high-volume, low-latency writes, while OLAP is optimized for complex read-heavy queries over large datasets.

Exam trap

The trap here is that candidates confuse the real-time nature of order processing with batch processing or mistakenly think that any reporting is OLTP, failing to recognize that OLAP is specifically designed for analytical queries over historical data.

How to eliminate wrong answers

Option B is wrong because it reverses the definitions: order processing is not analytical (OLAP) and weekly reporting is not transactional (OLTP). Option C is wrong because the weekly reporting is not a transactional workload; it involves aggregation and analysis, not real-time inserts/updates. Option D is wrong because only the weekly reporting could be considered batch processing, but the order processing is real-time, not batch; moreover, the question asks for workload types (OLTP vs OLAP), not processing modes.

Full explanation →

567

MCQhard

A retail company uses Azure SQL Database to store transactional data. They need to ensure that reporting queries do not impact the performance of the transactional workload. Which solution should you recommend?

A.Configure a read replica in Azure SQL Database

B.Increase the DTU or vCore limit of the database

C.Add indexes to the reporting tables

D.Partition the largest tables by date

AnswerA

A read replica handles reporting queries without impacting the primary transactional workload.

Why this answer

A read replica in Azure SQL Database allows reporting queries to be offloaded to a read-only copy of the database, isolating them from the primary transactional workload. This ensures that reporting activities do not consume resources (CPU, IO, memory) on the primary instance, preventing performance degradation for transactional operations.

Exam trap

The trap here is that candidates often confuse scaling up the database (Option B) with workload isolation, not realizing that scaling up only adds more resources but does not separate read and write operations, so reporting queries can still cause blocking or resource contention on the primary.

How to eliminate wrong answers

Option B is wrong because increasing DTU or vCore limits scales up the entire database, which does not isolate reporting queries from transactional workloads; both workloads still compete for the same resources. Option C is wrong because adding indexes to reporting tables can improve query performance but does not prevent reporting queries from impacting the transactional workload, as they still run on the same database engine. Option D is wrong because partitioning tables by date can improve query performance and manageability but does not provide workload isolation; reporting queries still execute on the same primary database and can contend with transactional operations.

Full explanation →

568

Multi-Selectmedium

Which TWO Azure services can be used to perform data transformation in an analytics pipeline? (Choose two.)

Select 2 answers

A.Azure Data Lake Storage Gen2

B.Azure Event Hubs

C.Azure Data Factory

D.Power BI

E.Azure Databricks

AnswersC, E

Supports data flows for transformation.

Why this answer

Azure Data Factory is a cloud-based ETL service that allows you to create data pipelines to transform data at scale using mapping data flows or by invoking external compute services like Azure Databricks. It supports code-free visual transformations as well as custom code via Azure HDInsight or Databricks, making it a core service for data transformation in analytics pipelines.

Exam trap

The trap here is that candidates confuse storage services (Data Lake Storage) or ingestion services (Event Hubs) with transformation services, or assume that visualization tools like Power BI can perform data transformation, when in fact they only consume pre-transformed data.

Full explanation →

569

MCQeasy

A retail company uses a point-of-sale (POS) system that records each sales transaction in a database. Each transaction involves reading the current inventory, updating the stock level, and recording the sale. The database must ensure that concurrent transactions do not interfere with each other, so that one transaction does not see partially updated data from another. Which property of a database transaction ensures this isolation?

A.Atomicity

B.Consistency

C.Isolation

D.Durability

AnswerC

Correct. Isolation ensures that concurrent transactions do not see each other's intermediate states, preventing dirty reads and other anomalies.

Why this answer

Isolation ensures that concurrent transactions do not interfere with each other, so each transaction sees a consistent snapshot of the database as if it were the only transaction running. In the POS scenario, isolation prevents one transaction from reading partially updated inventory data from another transaction, which could lead to overselling or stock discrepancies. This property is typically implemented through locking mechanisms or multi-version concurrency control (MVCC).

Exam trap

The trap here is that candidates often confuse isolation with atomicity, thinking that 'not seeing partially updated data' is about the transaction being all-or-nothing, when in fact it is about preventing interference between concurrent transactions.

How to eliminate wrong answers

Option A is wrong because atomicity ensures that a transaction is treated as a single, indivisible unit that either fully completes or fully rolls back, but it does not control how concurrent transactions interact. Option B is wrong because consistency ensures that a transaction brings the database from one valid state to another, preserving integrity constraints, but it does not manage concurrent access. Option D is wrong because durability guarantees that once a transaction is committed, its changes persist even in the event of a system failure, but it has no role in isolating concurrent transactions.

Full explanation →

570

MCQeasy

A company stores customer contact information in a table with columns for CustomerID, Name, Email, and Phone. They also store customer support chat transcripts as plain text files. Which of the following correctly classifies these data types?

A.Both are structured data

B.Customer contact information is structured; chat transcripts are semi-structured

C.Customer contact information is structured; chat transcripts are unstructured

D.Both are semi-structured

AnswerC

The table has a fixed schema (structured), while chat transcripts are free text (unstructured).

Why this answer

Customer contact information stored in a table with columns like CustomerID, Name, Email, and Phone is structured data because it has a fixed schema with rows and columns. Chat transcripts stored as plain text files have no predefined schema or organization, making them unstructured data. Therefore, option C correctly classifies the contact info as structured and the chat transcripts as unstructured.

Exam trap

The trap here is that candidates often confuse semi-structured data (like JSON or XML) with unstructured data (like plain text), incorrectly classifying chat transcripts as semi-structured because they contain some implicit structure (e.g., timestamps or user names) when in fact they lack a formal schema or metadata tags.

How to eliminate wrong answers

Option A is wrong because it claims both are structured, but chat transcripts as plain text files lack a fixed schema and are not organized into rows and columns. Option B is wrong because it classifies chat transcripts as semi-structured; semi-structured data (e.g., JSON, XML) has tags or markers to separate elements, whereas plain text files have no such structure. Option D is wrong because it claims both are semi-structured, but customer contact info in a relational table is strictly structured, and chat transcripts are unstructured.

Full explanation →

571

MCQmedium

A retail company receives daily sales data as CSV files in Azure Data Lake Storage Gen2. They need to load this data into an Azure Synapse Analytics dedicated SQL pool every night. The process must be automated, scheduled, and include error handling for failed loads. Which Azure service should they use to orchestrate this pipeline?

A.Azure Data Factory

B.Azure Stream Analytics

C.Azure Databricks

D.Azure Logic Apps

AnswerA

Azure Data Factory is designed for orchestrating data pipelines with scheduling, monitoring, and error handling. It can copy CSV files from Azure Data Lake Storage to Azure Synapse Analytics and handle failures gracefully.

Why this answer

Azure Data Factory (ADF) is the correct choice because it is a cloud-based ETL and data integration service designed specifically for orchestrating and automating data pipelines. It supports scheduled triggers, can copy CSV files from Azure Data Lake Storage Gen2 into an Azure Synapse dedicated SQL pool, and provides built-in error handling via retry policies, activity-level error outputs, and pipeline failure notifications.

Exam trap

The trap here is that candidates may confuse Azure Data Factory with Azure Logic Apps because both can schedule and trigger actions, but Logic Apps is designed for lightweight API integrations and lacks the native data movement capabilities and PolyBase support required for bulk loading into a dedicated SQL pool.

How to eliminate wrong answers

Option B (Azure Stream Analytics) is wrong because it is a real-time stream processing service for analyzing data in motion (e.g., from IoT devices or event hubs), not for scheduled batch loading of CSV files. Option C (Azure Databricks) is wrong because it is an Apache Spark-based analytics platform focused on big data processing and machine learning, not a native orchestration service; while it can load data, it lacks built-in scheduling and error-handling features for pipeline orchestration without additional tooling. Option D (Azure Logic Apps) is wrong because it is a low-code workflow automation service for integrating applications and services (e.g., email, Office 365), not designed for high-throughput data movement or complex ETL pipelines with dedicated SQL pool sinks.

Full explanation →

572

MCQhard

A financial services company stores transaction data in Azure Data Lake Storage Gen2 as Parquet files, partitioned by date. The data volume is 5 TB per day. The analytics team runs ad-hoc SQL queries to detect fraudulent patterns. Queries are highly selective (filtering on AccountID and date range). The team also needs to create external tables and views for use in Power BI. They want to pay only for the data processed by each query and avoid provisioning any compute resources. Which Azure service should they use?

A.Azure Synapse Serverless SQL pool

B.Azure Databricks with interactive clusters

C.Azure Stream Analytics

D.Azure HDInsight with Spark

AnswerA

Synapse Serverless SQL pool allows querying data directly from ADLS Gen2 with T-SQL. It is serverless, charges per data scanned, and supports creating external tables and views for tools like Power BI.

Why this answer

Azure Synapse Serverless SQL pool is the correct choice because it allows querying data directly from Azure Data Lake Storage Gen2 using T-SQL without provisioning any compute resources. It charges per terabyte of data processed, aligning with the requirement to pay only for data scanned by each query. It also supports creating external tables and views for Power BI, making it ideal for ad-hoc, selective queries on partitioned Parquet files.

Exam trap

The trap here is that candidates may confuse Azure Synapse Serverless SQL pool with Azure Synapse Dedicated SQL pool (which requires provisioning compute) or assume that any Spark-based service (like Databricks or HDInsight) is serverless, but only the serverless SQL pool offers true pay-per-query without compute provisioning.

How to eliminate wrong answers

Option B is wrong because Azure Databricks with interactive clusters requires provisioning and managing compute clusters (even if auto-terminating), incurring costs for running VMs regardless of query execution, and does not offer a true pay-per-query model. Option C is wrong because Azure Stream Analytics is designed for real-time stream processing (e.g., from Event Hubs or IoT Hub), not for ad-hoc SQL queries on stored Parquet files in Data Lake Storage. Option D is wrong because Azure HDInsight with Spark requires provisioning a persistent cluster (with associated compute costs) and is not a serverless, pay-per-query service; it also lacks the direct T-SQL external table creation for Power BI without additional setup.

Full explanation →

573

MCQmedium

A startup is deploying a new application on Azure SQL Database. They expect the database to start at 10 GB but grow to 500 GB over time. They want to be able to scale compute independently of storage and only pay for the compute resources they use. They also want to avoid over-provisioning and automatically pause during idle periods. Which purchasing model and service tier should they choose?

A.DTU-based model with Basic tier

B.vCore-based model with General Purpose serverless tier

C.vCore-based model with Business Critical provisioned tier

D.DTU-based model with Standard tier

AnswerB

vCore model separates compute and storage, offering independent scaling. The serverless tier auto-pauses during idle periods and scales automatically, aligning with the startup's need to only pay for used compute.

Why this answer

The vCore-based General Purpose serverless tier is correct because it allows independent scaling of compute and storage, automatically pauses the database during idle periods to eliminate compute costs, and supports growth from 10 GB to 500 GB without manual intervention. This model aligns with the startup's need to pay only for consumed compute resources and avoid over-provisioning.

Exam trap

The trap here is that candidates often confuse the DTU model's 'auto-pause' feature (which does not exist) with the vCore serverless tier's auto-pause, or assume the Basic tier's low cost and simplicity fit a growing database without checking its 2 GB storage limit.

How to eliminate wrong answers

Option A is wrong because the DTU-based Basic tier is limited to 2 GB of storage and lacks independent compute-storage scaling, making it unsuitable for growth to 500 GB. Option C is wrong because the vCore-based Business Critical provisioned tier does not support auto-pause and is designed for high-performance, always-on workloads, not cost-efficient idle management. Option D is wrong because the DTU-based Standard tier bundles compute and storage together, preventing independent scaling, and does not offer an auto-pause feature.

Full explanation →

574

MCQmedium

A marketing team needs to analyze customer purchase history data stored in Azure SQL Database. They want to create interactive dashboards with drill-down capabilities. Which Microsoft tool should they use?

A.Power BI

B.Azure Data Studio

C.Microsoft Excel

D.Azure Analysis Services

AnswerA

Power BI is designed for interactive dashboards with drill-down capabilities.

Why this answer

Power BI is the correct tool because it is designed specifically for creating interactive dashboards with drill-down capabilities using data from Azure SQL Database. It connects directly to Azure SQL Database via built-in connectors, allowing users to build visualizations that support hierarchical navigation and real-time filtering.

Exam trap

The trap here is that candidates confuse Azure Analysis Services as a visualization tool, when in fact it is a backend analytical engine that requires Power BI or another client for dashboard creation.

How to eliminate wrong answers

Option B is wrong because Azure Data Studio is a database management and query tool, not a dashboarding or visualization tool; it lacks native interactive dashboard and drill-down features. Option C is wrong because Microsoft Excel can create charts and pivot tables, but it does not provide native drill-down capabilities for interactive dashboards and is not optimized for real-time, cloud-based data exploration. Option D is wrong because Azure Analysis Services is a data modeling and analytical engine that provides OLAP cubes and tabular models, but it is not a front-end visualization tool; it requires a separate client like Power BI to render interactive dashboards.

Full explanation →

575

MCQeasy

The exhibit shows a KQL query in Azure Data Explorer. What is the output of this query?

A.Bottom 5 states by total property damage

B.Top 5 states by total property damage

C.All states with total property damage

D.All storm events after 2024-01-01

AnswerB

The query returns top 5 states with highest sum of DamageProperty.

Why this answer

The KQL query uses `summarize` to aggregate total property damage by state, then `top 5 by total_property_damage` to return the five states with the highest total damage. The `desc` argument (default) orders the results in descending order, making option B correct.

Exam trap

The trap here is that candidates may confuse `top` with `take` or `limit`, forgetting that `top` implicitly sorts in descending order unless `asc` is specified, leading them to think it returns the bottom values or all rows.

How to eliminate wrong answers

Option A is wrong because `top 5` returns the highest values, not the lowest; to get bottom 5, you would need `top 5 by total_property_damage asc`. Option C is wrong because `top 5` limits the output to exactly five rows, not all states. Option D is wrong because the query does not filter by date; it aggregates all storm events regardless of date.

Full explanation →

576

Multi-Selecthard

Which THREE are valid deployment options for Azure SQL Database?

Select 3 answers

A.Managed instance

B.Single database

C.SQL Server on Azure Virtual Machine

D.Elastic pool

E.SQL Data Warehouse

AnswersA, B, D

Managed instance is a deployment option near 100% compatibility.

Why this answer

Options B, C, and E are correct. Single database, elastic pool, and managed instance are all deployment options for Azure SQL Database. Option A is wrong because virtual machine is not a deployment option for Azure SQL Database; it's SQL Server on VM.

Option D is wrong because SQL Data Warehouse is a separate service (now Synapse).

Full explanation →

577

MCQmedium

A data analyst needs to run ad-hoc SQL queries on petabytes of data stored as Parquet files in Azure Data Lake Storage Gen2. The queries are infrequent but must return results within seconds. The analyst wants to pay only for the amount of data processed and does not want to manage any compute infrastructure. Additionally, they need to create views to simplify future reporting in Power BI. Which Azure service should they use?

A.Azure Synapse Serverless SQL pool

B.Azure SQL Database

C.Azure Synapse Dedicated SQL pool

D.Azure HDInsight with Spark

AnswerA

Serverless SQL pool is designed for on-demand querying of data in a data lake, with pay-per-query pricing and support for T-SQL views, making it ideal for this scenario.

Why this answer

Azure Synapse Serverless SQL pool is the correct choice because it allows querying petabytes of data in Azure Data Lake Storage Gen2 using standard T-SQL without provisioning any compute infrastructure. It charges only for the amount of data processed per query (pay-per-query model) and supports creating views for Power BI reporting, meeting all stated requirements.

Exam trap

The trap here is that candidates often confuse 'serverless' with 'Dedicated SQL pool' (Option C) because both are part of Azure Synapse Analytics, but Dedicated SQL pool requires provisioning and pays for reserved compute, not data processed.

How to eliminate wrong answers

Option B is wrong because Azure SQL Database is a fully managed relational database with provisioned compute and storage, not designed for ad-hoc queries on external Parquet files in Data Lake Storage, and it charges for reserved resources rather than data processed. Option C is wrong because Azure Synapse Dedicated SQL pool requires provisioning and managing a fixed-size compute cluster, incurring costs even when idle, and does not support the pay-per-query model. Option D is wrong because Azure HDInsight with Spark requires managing a Spark cluster (provisioned compute) and is not a serverless, pay-per-query service; it also does not natively support creating T-SQL views for Power BI without additional configuration.

Full explanation →

578

MCQeasy

A startup is building a new application and expects rapid growth in data volume. They want a fully managed relational database service with built-in high availability and automatic backups. Which Azure service should they choose?

A.SQL Server on Azure Virtual Machines

B.Azure Database for MySQL

C.Azure SQL Database

D.Azure Cosmos DB

AnswerC

Azure SQL Database is a fully managed PaaS offering.

Why this answer

Azure SQL Database is a fully managed Platform-as-a-Service (PaaS) relational database that provides built-in high availability with a 99.99% SLA and automatic backups with point-in-time restore. It meets the startup's need for rapid growth by offering elastic scaling and minimal administrative overhead, unlike Infrastructure-as-a-Service (IaaS) options.

Exam trap

The trap here is that candidates often confuse 'fully managed' with 'IaaS' (Option A) or pick a specific database engine (Option B) without recognizing that Azure SQL Database is the default PaaS relational service that satisfies all stated requirements.

How to eliminate wrong answers

Option A is wrong because SQL Server on Azure Virtual Machines is an IaaS offering that requires manual configuration of high availability (e.g., Always On Availability Groups) and backups, contradicting the 'fully managed' requirement. Option B is wrong because Azure Database for MySQL is a fully managed relational database, but the question does not specify MySQL as the required engine; Azure SQL Database is the more general and correct choice for a relational database service with built-in high availability and automatic backups. Option D is wrong because Azure Cosmos DB is a NoSQL database service, not a relational database, and does not support SQL-based relational schemas or ACID transactions in the same way.

Full explanation →

579

MCQmedium

A company uses Azure SQL Database for a customer relationship management (CRM) system. The database is deployed in the East US region. The company has customers worldwide who access the CRM via a web application. Currently, all read and write traffic goes to the primary database in East US, causing high latency for users in Europe and Asia. The company needs to improve read performance for global users while maintaining strong consistency for writes. They also want to minimize cost. Which solution should you recommend?

A.Implement Azure Redis Cache to cache frequently accessed data

B.Configure active geo-replication with readable secondary replicas in West Europe and Southeast Asia

C.Use Azure SQL Database with geo-redundant storage and enable read-scale

D.Create read replicas in the same region and use Azure Traffic Manager to route read traffic

AnswerB

Readable secondaries in user regions reduce read latency, and writes still go to the primary for consistency.

Why this answer

Option B is correct because active geo-replication provides readable secondaries in other regions for read traffic, while writes go to the primary, maintaining consistency. Option A is wrong because read replicas in the same region do not help global users. Option C is wrong because Azure Traffic Manager does not improve database read latency.

Option D is wrong because Azure Redis Cache adds cost and complexity and does not solve the read latency for all queries.

Full explanation →

580

MCQhard

You are deploying a high-availability solution for an Azure SQL Database that must have a Recovery Point Objective (RPO) of less than 5 seconds. Which deployment option meets this requirement?

A.Azure SQL Database Hyperscale tier

B.Azure SQL Managed Instance General Purpose tier

C.Azure SQL Database Business Critical tier with zone redundancy

D.Azure SQL Database with active geo-replication

AnswerC

Synchronous replication with low RPO.

Why this answer

Option C is correct because Azure SQL Database Business Critical tier with zone redundancy provides synchronous replication achieving RPO of 0-5 seconds. Option A is wrong because active geo-replication has asynchronous replication. Option B is wrong because Azure SQL Database Hyperscale has asynchronous replication.

Option D is wrong because Azure SQL Managed Instance General Purpose has asynchronous replication.

Full explanation →

581

MCQeasy

A company stores customer orders in a database. Each order has an OrderID (integer), CustomerName (text), OrderDate (date), and a JSON column for order details that contains varying fields such as discount codes or gift messages. Which statement best describes the data types in this table?

A.The table stores only structured data.

B.The table stores both structured and semi-structured data.

C.The table stores only unstructured data.

D.The table stores only semi-structured data.

AnswerB

Correct. The fixed columns (OrderID, CustomerName, OrderDate) are structured, while the JSON column is semi-structured due to its flexible schema.

Why this answer

The table includes structured columns (OrderID integer, CustomerName text, OrderDate date) and a JSON column for order details, which stores semi-structured data because JSON allows flexible schemas with varying fields like discount codes or gift messages. This combination of fixed-schema columns and a schema-less JSON column means the table holds both structured and semi-structured data, making option B correct.

Exam trap

The trap here is that candidates often mistake JSON for unstructured data, but JSON is semi-structured because it has a logical structure (key-value pairs) even though the schema is flexible, leading them to incorrectly choose option C.

How to eliminate wrong answers

Option A is wrong because the JSON column contains semi-structured data, not purely structured data, as structured data requires a fixed schema with consistent fields. Option C is wrong because unstructured data (e.g., images, videos, raw text files) is not present; JSON is semi-structured, not unstructured. Option D is wrong because the table also includes structured columns (OrderID, CustomerName, OrderDate) with fixed data types, so it does not store only semi-structured data.

Full explanation →

582

MCQhard

A financial services company needs to run ad-hoc SQL queries on petabytes of data stored in Azure Data Lake Storage without provisioning a dedicated data warehouse. Which Azure service should they use?

A.Azure Synapse Analytics serverless SQL pool

B.Azure Analysis Services

C.Azure SQL Database

D.Azure Data Lake Storage

AnswerA

Serverless querying of data lake files.

Why this answer

Azure Synapse Analytics serverless SQL pool is the correct choice because it allows you to run ad-hoc SQL queries directly against data in Azure Data Lake Storage without provisioning any dedicated compute resources. It uses a pay-per-query model, automatically scaling compute to handle petabytes of data, making it ideal for intermittent, exploratory workloads.

Exam trap

The trap here is that candidates often confuse Azure Data Lake Storage (a storage service) with a query engine, or assume that a provisioned data warehouse like Azure SQL Database is required for any SQL workload, missing the serverless, on-demand nature of Synapse serverless SQL pool.

How to eliminate wrong answers

Option B is wrong because Azure Analysis Services is an OLAP engine for semantic models and pre-aggregated data, not designed for direct ad-hoc SQL queries on raw petabyte-scale data in Data Lake Storage. Option C is wrong because Azure SQL Database is a provisioned, transactional relational database with fixed storage limits, unsuitable for petabyte-scale data lake queries without prior data loading. Option D is wrong because Azure Data Lake Storage is a storage service, not a query engine; it provides the data layer but cannot execute SQL queries itself.

Full explanation →

583

MCQeasy

A manufacturing company stores IoT sensor data as blobs in Azure Blob Storage. Each blob is named with a device ID and a timestamp, and they need to quickly find all blobs for a specific device within a date range. Which Azure Blob Storage feature should they use to query blobs based on custom metadata?

A.Blob snapshots

B.Blob soft delete

C.Blob index tags

D.Blob lifecycle management

AnswerC

Blob index tags let you apply custom key-value pairs as metadata and then filter and query blobs based on those tags, enabling efficient retrieval of blobs by device ID and timestamp.

Why this answer

Blob index tags allow you to apply custom key-value metadata to blobs and then query them using a filtered query across containers or storage accounts. This enables efficient retrieval of blobs by device ID and timestamp without scanning all blob names or maintaining a separate index.

Exam trap

The trap here is that candidates confuse blob index tags with blob naming conventions or metadata stored in a separate database, thinking that blob name patterns alone are sufficient for efficient querying, but Azure Blob Storage does not natively support server-side filtering by name patterns.

How to eliminate wrong answers

Option A is wrong because blob snapshots are point-in-time read-only copies of a blob used for versioning or backup, not for querying blobs by metadata. Option B is wrong because blob soft delete protects against accidental deletion by retaining deleted blobs for a retention period, but it does not provide a query mechanism for custom metadata. Option D is wrong because blob lifecycle management automates tier transitions or deletion based on age or last modification time, not for querying blobs by custom metadata.

Full explanation →

584

MCQmedium

A data engineering team needs to build a real-time dashboard showing sales totals by region. Sales transactions are streamed from point-of-sale systems into Azure Event Hubs. The team wants to aggregate the data in near real-time (e.g., every minute) and store the results in Azure SQL Database for visualization in Power BI. Which Azure service should they use for the aggregation step?

A.Azure Stream Analytics

B.Azure Data Factory

C.Azure Synapse Pipelines

D.Azure Logic Apps

AnswerA

Correct. Stream Analytics processes streaming data with SQL-like queries and can aggregate and output to SQL Database in real time.

Why this answer

Azure Stream Analytics is the correct choice because it is designed for real-time stream processing, allowing you to define a query that aggregates sales data from Event Hubs over a one-minute tumbling window and output the results directly to Azure SQL Database. This meets the requirement for near real-time aggregation without needing to write custom code or manage infrastructure.

Exam trap

The trap here is that candidates often confuse Azure Data Factory or Synapse Pipelines (which are batch-oriented) with real-time processing, overlooking that Stream Analytics is the only service among the options purpose-built for continuous, low-latency stream aggregation.

How to eliminate wrong answers

Option B (Azure Data Factory) is wrong because it is a cloud-based ETL and data integration service for batch-oriented data movement and orchestration, not designed for real-time stream processing or sub-minute aggregations. Option C (Azure Synapse Pipelines) is wrong because it is essentially the same as Azure Data Factory within Synapse Analytics, focused on batch data integration and orchestration, lacking native real-time stream processing capabilities. Option D (Azure Logic Apps) is wrong because it is a workflow automation service for integrating applications and services using connectors, not built for high-throughput, low-latency stream aggregation or windowed computations on event streams.

Full explanation →

585

MCQeasy

A company collects customer feedback in three forms: a structured table with customer ID and rating (1-5), free-text comments, and audio recordings of phone calls. Which of the following correctly orders these data from least structured to most structured?

A.Audio recordings, free-text comments, structured table

B.Structured table, free-text comments, audio recordings

C.Free-text comments, structured table, audio recordings

D.Audio recordings, structured table, free-text comments

AnswerA

Correct: audio is unstructured, free-text is semi-structured, and the table is structured.

Why this answer

Option A is correct because data structure ranges from unstructured (audio recordings with no schema), through semi-structured (free-text comments with no fixed format), to structured (a table with defined columns and data types). This ordering aligns with the DP-900 core data concept of data classification based on schema rigidity.

Exam trap

The trap here is that candidates often confuse 'least structured' with 'most organized', incorrectly ranking the structured table as least structured due to its simplicity, rather than recognizing that structure refers to schema rigidity, not data size or complexity.

How to eliminate wrong answers

Option B is wrong because it reverses the order, placing the structured table as least structured, which contradicts the definition of structured data having a fixed schema. Option C is wrong because it places free-text comments (semi-structured) before the structured table, but free-text comments lack the rigid schema of a table, making them less structured than the table. Option D is wrong because it places the structured table in the middle, but audio recordings are unstructured and should be the least structured, not the table.

Full explanation →

586

Multi-Selecthard

Which THREE factors should you consider when choosing between Azure SQL Database and Azure SQL Managed Instance?

Select 3 answers

A.Support for Microsoft Entra ID authentication

B.Need for SQL Server Agent jobs

C.Need for cross-database queries

D.Automatic scaling of compute resources

E.Requirement for VNet integration

AnswersB, C, D

SQL Managed Instance supports SQL Agent.

Why this answer

Option A is correct because SQL Managed Instance supports instance-level features like SQL Agent jobs. Option C is correct because SQL Managed Instance supports cross-database queries. Option D is correct because SQL Database has auto-scaling options like serverless.

Option B is wrong because both support VNet integration. Option E is wrong because both support Microsoft Entra ID authentication.

Full explanation →

587

MCQmedium

A manufacturing company collects real-time temperature data from thousands of IoT sensors. They need to build an analytics solution that processes the streaming data, computes the average temperature per device every minute, and outputs the results to a Power BI dashboard for near real-time visualization. Which Azure service should they use for the real-time stream processing?

A.Azure Stream Analytics

B.Azure Data Factory

C.Azure Databricks

D.Azure SQL Database

AnswerA

Correct. Azure Stream Analytics provides a serverless, real-time stream processing engine that can compute aggregations using SQL-like queries and output to Power BI.

Why this answer

Azure Stream Analytics is the correct choice because it is a fully managed, real-time stream processing engine designed specifically for scenarios like this: ingesting high-velocity data from IoT sensors, performing time-windowed aggregations (e.g., average temperature per device every minute), and outputting results directly to Power BI for near real-time dashboards. It natively supports SQL-like query language for defining windowed computations and has built-in connectors for both IoT Hub/Event Hubs (input) and Power BI (output), making it the most efficient and purpose-built service for this streaming analytics workload.

Exam trap

The trap here is that candidates often confuse Azure Data Factory (a batch ETL tool) with a real-time processing service, or assume that Azure Databricks is always the best choice for streaming because of its Spark foundation, overlooking the simpler, fully managed, and cost-effective alternative of Azure Stream Analytics for straightforward windowed aggregations.

How to eliminate wrong answers

Option B (Azure Data Factory) is wrong because it is a cloud-based ETL and data orchestration service designed for batch data movement and transformation, not for real-time stream processing; it cannot compute sliding-window averages on streaming data in sub-minute latency. Option C (Azure Databricks) is wrong because while it can process streaming data via Structured Streaming, it is a general-purpose analytics platform that requires more complex setup, cluster management, and coding (Scala/Python/SQL) compared to the simpler, declarative SQL-based approach of Stream Analytics; it is overkill for a straightforward windowed aggregation and not the simplest or most cost-effective choice for this specific requirement. Option D (Azure SQL Database) is wrong because it is a relational database for storing and querying structured data, not a stream processing engine; it cannot ingest real-time streaming data or perform continuous time-windowed aggregations without additional services like Stream Analytics or a custom application layer.

Full explanation →

588

MCQmedium

A gaming company stores player profiles as JSON documents. Each profile can have different attributes; for example, some profiles include an 'achievements' field while others include a 'purchaseHistory' field. The application must retrieve profiles by player ID with single-digit-millisecond latency and also support SQL-like queries on any attribute. Which Azure data store should the company use?

A.A. Azure Table Storage

B.B. Azure Cosmos DB Core (SQL) API

C.C. Azure Blob Storage

D.D. Azure Database for PostgreSQL

AnswerB

Azure Cosmos DB with the Core (SQL) API natively stores JSON documents with varying schemas, provides low-latency point reads, and supports advanced SQL-like queries on any attribute.

Why this answer

Azure Cosmos DB Core (SQL) API is the correct choice because it natively stores JSON documents with flexible schemas, supports indexing on any attribute for SQL-like queries, and guarantees single-digit-millisecond latency for point reads by player ID. This meets the requirement for both fast key-based lookups and ad-hoc querying across varying profile attributes.

Exam trap

The trap here is that candidates may confuse Azure Table Storage's key-value capabilities with the need for flexible schema and SQL-like queries, overlooking that Table Storage does not support querying arbitrary attributes or guarantee single-digit-millisecond latency for such queries.

How to eliminate wrong answers

Option A is wrong because Azure Table Storage is a key-value store that does not support SQL-like queries on arbitrary attributes; it only supports queries on partition key and row key, and its latency is not guaranteed to be single-digit-milliseconds for all operations. Option C is wrong because Azure Blob Storage is designed for unstructured binary or text data, not for querying individual JSON attributes with SQL-like syntax, and it lacks native indexing for ad-hoc queries. Option D is wrong because Azure Database for PostgreSQL is a relational database that requires a fixed schema, making it unsuitable for storing JSON documents with varying attributes without complex schema management and potential performance overhead.

Full explanation →

589

MCQhard

The exhibit shows a Kusto Query Language (KQL) query run in Azure Data Explorer. What is the output of this query?

A.All storm events in Texas with property damage

B.The total property damage for all event types in Texas

C.The top 5 event types in Texas by total property damage

D.A list of the top 5 property damage amounts in Texas

AnswerC

The query summarizes by EventType and returns the top 5.

Why this answer

The query uses `summarize sum(PropertyDamage) by EventType` to aggregate total property damage per event type, then `top 5 by TotalPropertyDamage` to return the five event types with the highest totals. The `where State == 'TEXAS'` filter ensures only Texas storms are considered. This directly yields the top 5 event types in Texas by total property damage.

Exam trap

The trap here is that candidates confuse 'top 5 property damage amounts' (raw values) with 'top 5 event types by total property damage' (aggregated categories), or they think the query lists individual events rather than summarized groups.

How to eliminate wrong answers

Option A is wrong because the query does not list individual storm events; it aggregates damage by event type, so it cannot output 'all storm events'. Option B is wrong because the query groups by EventType and returns multiple rows (top 5), not a single total for all event types combined. Option D is wrong because the query outputs event types, not raw property damage amounts; the `top 5` operator returns the entire row (EventType and TotalPropertyDamage), not just the damage values.

Full explanation →

590

MCQhard

A banking application processes a funds transfer transaction consisting of two steps: debit $100 from Account A and credit $100 to Account B. If the system crashes after debiting Account A but before crediting Account B, the database automatically reverts the debit, restoring Account A to its original balance. Which ACID property guarantees this behavior?

A.Atomicity

B.Consistency

C.Isolation

D.Durability

AnswerA

Correct. Atomicity guarantees the all-or-nothing nature of transactions. Since the debit is rolled back, the transaction is atomic.

Why this answer

Atomicity ensures that a transaction is treated as a single, indivisible unit of work. In this scenario, the debit and credit are part of one transaction; if the system crashes after the debit but before the credit, the database management system (DBMS) automatically rolls back the entire transaction, undoing the debit to restore Account A's original balance. This all-or-nothing behavior is the defining characteristic of atomicity.

Exam trap

The trap here is that candidates often confuse atomicity with consistency, thinking that 'restoring the original balance' is about maintaining data rules, when in fact it is the rollback of an incomplete transaction that demonstrates atomicity.

How to eliminate wrong answers

Option B (Consistency) is wrong because consistency ensures that a transaction brings the database from one valid state to another, preserving all defined rules (e.g., constraints, triggers), but it does not inherently handle crash recovery or rollback of partial changes. Option C (Isolation) is wrong because isolation governs how concurrent transactions are executed independently to prevent interference, not how a single transaction recovers from a crash. Option D (Durability) is wrong because durability guarantees that once a transaction is committed, its changes persist even after a system failure; it does not apply to uncommitted transactions that need to be rolled back.

Full explanation →

591

MCQhard

You are designing a data lake architecture for a large enterprise. You need to organize data into zones (raw, curated, and analytics) and enforce data lineage tracking. Which Azure service should you use to catalog and govern the data?

A.Azure Synapse Analytics

B.Azure Data Factory

C.Microsoft Purview

D.Azure Databricks

AnswerC

Purview provides unified data governance, cataloging, and lineage.

Why this answer

Microsoft Purview is the correct choice because it is a unified data governance service designed specifically for cataloging data assets, tracking lineage across hybrid and multi-cloud environments, and enforcing data policies. Unlike the other options, Purview provides out-of-the-box lineage scanning, a business glossary, and automated classification, making it the appropriate tool for organizing data into zones and ensuring end-to-end lineage in a data lake architecture.

Exam trap

The trap here is that candidates confuse data integration or analytics services (like Azure Data Factory or Synapse) with a dedicated governance and cataloging tool, assuming lineage tracking is a built-in feature of those services rather than a separate function provided by Microsoft Purview.

How to eliminate wrong answers

Option A is wrong because Azure Synapse Analytics is an analytics service that combines data warehousing and big data processing, but it does not provide native data cataloging or lineage tracking capabilities beyond basic metadata; it relies on Purview for governance. Option B is wrong because Azure Data Factory is an ETL and data integration service that can capture lineage during pipeline runs, but it is not a dedicated catalog or governance tool; it lacks persistent cataloging, business glossary, and policy enforcement features. Option D is wrong because Azure Databricks is a unified analytics platform for data engineering and machine learning, but it does not include a built-in data catalog or lineage governance; it integrates with Purview for such purposes.

Full explanation →

592

Multi-Selectmedium

Which TWO of the following are true about Azure Database for PostgreSQL? (Select TWO.)

Select 2 answers

A.It only supports the open-source community edition of PostgreSQL.

B.It is a NoSQL database service.

C.It supports read replicas to offload read traffic.

D.It allows cross-database queries across multiple servers.

E.It provides automated backups with point-in-time restore.

AnswersC, E

Read replicas can be created in the same or different region.

Why this answer

Options B and D are correct. Azure Database for PostgreSQL is a managed service that provides automated backups and supports read replicas. Option A is incorrect because it is not a NoSQL database.

Option C is incorrect because it supports both the community and enterprise editions. Option E is incorrect because it does not support cross-database queries across different servers.

Full explanation →

593

MCQeasy

Your organization wants to run SQL queries on data stored in Azure Blob Storage without moving the data. Which Azure service supports this?

A.Azure SQL Database

B.Azure Analysis Services

C.Azure Synapse Serverless SQL pool

D.Azure Data Lake Storage Gen2

AnswerC

Serverless SQL pool can query Blob Storage directly using T-SQL.

Why this answer

Azure Synapse Serverless SQL pool allows you to query data directly from Azure Blob Storage using T-SQL without moving the data. It uses a distributed query engine that reads files in place, supporting formats like Parquet, CSV, and JSON, making it ideal for ad-hoc analytics on stored data.

Exam trap

The trap here is that candidates confuse Azure Data Lake Storage Gen2 (a storage service) with a query engine, or assume Azure SQL Database can query external blobs natively, when in fact only Synapse Serverless SQL pool provides serverless T-SQL querying over Blob Storage without data movement.

How to eliminate wrong answers

Option A is wrong because Azure SQL Database is a fully managed relational database that requires data to be imported or loaded into its storage; it cannot query external Blob Storage directly without additional tools like PolyBase. Option B is wrong because Azure Analysis Services is an OLAP engine that requires data to be loaded into its in-memory tabular model from sources like SQL databases; it does not support direct querying of Blob Storage. Option D is wrong because Azure Data Lake Storage Gen2 is a storage service built on Blob Storage with a hierarchical namespace, but it is not a query engine; it stores data but does not provide SQL query capabilities itself.

Full explanation →

594

MCQeasy

A company stores customer information in a table with columns CustomerID, Name, Address, and PhoneNumber. Every row has values for all these columns, and the data follows a fixed schema. Which type of data does this represent?

A.Unstructured data

B.Semi-structured data

C.Structured data

D.Streaming data

AnswerC

Correct. Structured data has a fixed schema with defined columns and data types, typical of relational database tables.

Why this answer

Structured data conforms to a fixed schema where each row has the same columns and data types. The table with CustomerID, Name, Address, and PhoneNumber, where every row contains values for all columns, perfectly fits this definition. This is typical of relational database tables (e.g., in Azure SQL Database) where the schema is enforced at the table level.

Exam trap

The trap here is that candidates may confuse 'semi-structured' with 'structured' because both have some organization, but the key distinction is that structured data enforces a fixed schema for all rows, while semi-structured data allows schema flexibility (e.g., missing attributes or varying data types).

How to eliminate wrong answers

Option A is wrong because unstructured data has no predefined schema or organization (e.g., text files, images, videos), whereas the table has a fixed schema with defined columns. Option B is wrong because semi-structured data has some organizational properties (like tags or key-value pairs) but does not enforce a rigid schema across all records (e.g., JSON or XML files), unlike the fixed schema described. Option D is wrong because streaming data refers to data that is continuously generated and processed in real time (e.g., from IoT devices or event hubs), not to the static storage format of a table.

Full explanation →

595

MCQmedium

A gaming company stores player scores in Azure Cosmos DB using the NoSQL API. Each document contains: PlayerID (unique to player), GameID, Score, Timestamp. The most common query is: 'Retrieve all scores for a specific GameID, ordered by Score descending.' Which property should be chosen as the partition key to minimize Request Unit (RU) consumption?

A.PlayerID

B.GameID

C.Score

D.Timestamp

AnswerB

Correct. GameID groups all scores for a game together, so the query targets a single partition, minimizing RU.

Why this answer

GameID is the correct partition key because the most common query filters on GameID, and Cosmos DB routes each query to the physical partition(s) containing that GameID's data. Using GameID ensures the query touches only the relevant partition(s), minimizing RU consumption by avoiding cross-partition fan-out. A partition key that matches the filter predicate is essential for efficient, single-partition queries.

Exam trap

The trap here is that candidates often pick PlayerID because it's unique and seems like a natural key, but they fail to realize that a partition key must align with the most common query filter to avoid costly cross-partition queries, not just be unique.

How to eliminate wrong answers

Option A is wrong because PlayerID is unique per player, causing each query for a GameID to scatter across all partitions (since scores for the same GameID would be distributed across many PlayerID partitions), resulting in a cross-partition query that consumes more RUs. Option C is wrong because Score is a high-cardinality, frequently updated value that would cause hot partitions and inefficient queries, as filtering on GameID would still require scanning all partitions. Option D is wrong because Timestamp is monotonically increasing, leading to hot partitions on the latest timestamp and requiring cross-partition queries when filtering by GameID, which increases RU cost.

Full explanation →

596

MCQeasy

A company stores employee records in a relational database table with columns EmployeeID, FirstName, LastName, Department. They also store employee handbooks as PDF files, and customer feedback as XML documents. Which of the following correctly classifies these data types?

A.Employee records: structured, Employee handbooks: semi-structured, Customer feedback: unstructured

B.Employee records: structured, Employee handbooks: unstructured, Customer feedback: semi-structured

C.Employee records: semi-structured, Employee handbooks: unstructured, Customer feedback: structured

D.Employee records: unstructured, Employee handbooks: semi-structured, Customer feedback: structured

AnswerB

Correct classification: employee records in a relational table are structured; PDF handbooks are unstructured; XML feedback is semi-structured due to its hierarchical tags.

Why this answer

Option B is correct because employee records in a relational database table have a fixed schema (columns and data types), making them structured data. Employee handbooks stored as PDF files have no internal schema and are binary blobs, classifying them as unstructured data. Customer feedback stored as XML documents have a flexible, self-describing schema with tags, making them semi-structured data.

Exam trap

The trap here is confusing semi-structured data (which has some organizational properties like tags in XML) with unstructured data (which has no inherent structure), leading candidates to misclassify PDFs as semi-structured or XML as structured.

How to eliminate wrong answers

Option A is wrong because it incorrectly classifies employee handbooks (PDF files) as semi-structured; PDFs are unstructured binary files with no inherent schema, not semi-structured like XML or JSON. Option C is wrong because it classifies employee records as semi-structured; relational database tables with a fixed schema are structured, not semi-structured. Option D is wrong because it classifies employee records as unstructured; relational tables with defined columns and data types are structured, and customer feedback as structured; XML documents are semi-structured, not structured.

Full explanation →

597

MCQeasy

A small business wants to migrate their on-premises SQL Server database to Azure. They have limited budget and want to minimize ongoing management overhead. The database is less than 50 GB and is used by a single application with low concurrent users. The application requires compatibility with SQL Server features such as T-SQL, stored procedures, and functions. The business does not require high availability or disaster recovery. Which Azure relational database service should they choose?

A.SQL Server on Azure Virtual Machines

B.Azure SQL Database (serverless tier)

C.Azure SQL Managed Instance

D.Azure Database for MySQL

AnswerB

Serverless tier automatically pauses during inactivity, reducing cost, and requires minimal management.

Why this answer

Azure SQL Database serverless tier is the best choice because it provides a fully managed, PaaS relational database service that supports T-SQL, stored procedures, and functions, while automatically pausing during idle periods to reduce costs. With a database under 50 GB, low concurrency, and no HA/DR requirements, the serverless tier minimizes both management overhead and cost, as it charges only for compute used per second and storage consumed.

Exam trap

The trap here is that candidates often choose Azure SQL Managed Instance because of its full SQL Server compatibility, overlooking that the serverless tier of Azure SQL Database also supports T-SQL, stored procedures, and functions, and is far more cost-effective for small, low-usage workloads without HA/DR needs.

How to eliminate wrong answers

Option A is wrong because SQL Server on Azure Virtual Machines is an IaaS solution that requires ongoing management of the OS, SQL Server patches, and backups, increasing overhead and cost, which contradicts the business goal of minimizing management. Option C is wrong because Azure SQL Managed Instance is a PaaS service with near-100% SQL Server compatibility but includes built-in high availability and a higher base cost, making it overkill for a small, low-concurrency database that does not need HA/DR. Option D is wrong because Azure Database for MySQL does not support SQL Server-specific features like T-SQL, stored procedures, and functions; it uses a different SQL dialect, so the application would require significant code changes.

Full explanation →

598

Multi-Selectmedium

Which TWO of the following are benefits of using Azure SQL Database over SQL Server on Azure Virtual Machines?

Select 2 answers

A.Built-in high availability with automatic failover

B.Automated patching and updates

C.Lower cost because you can choose any number of vCores

D.Ability to install custom software on the database server

E.Full control over the operating system

AnswersA, B

Azure SQL Database provides built-in high availability without manual configuration.

Why this answer

Options B and D are correct. Azure SQL Database automates patching and provides built-in high availability. Option A is incorrect because Azure SQL Database does not provide full control over the OS.

Option C is incorrect because you cannot install custom software. Option E is incorrect because you cannot configure the exact number of vCores in some tiers, but that is not a benefit over VMs.

Full explanation →

599

Multi-Selectmedium

Which THREE components are required to implement a modern data warehouse architecture (medallion architecture) on Azure?

Select 3 answers

A.Azure Data Lake Storage Gen2

B.Azure Analysis Services

C.Azure Synapse Analytics

D.Azure Databricks

E.Azure DevOps

AnswersA, C, D

Serves as the storage foundation for raw and transformed data.

Why this answer

Azure Data Lake Storage Gen2 is required because it provides the scalable, hierarchical storage foundation for the medallion architecture's bronze, silver, and gold layers. It combines blob storage with a hierarchical namespace, enabling efficient data ingestion and transformation at petabyte scale.

Exam trap

The trap here is that candidates often confuse optional analytics services (like Azure Analysis Services or Azure DevOps) with the mandatory storage and compute components required for the medallion architecture.

Full explanation →

600

Multi-Selectmedium

Which TWO Azure services can be used to perform large-scale data transformation and processing in a serverless manner?

Select 2 answers

A.Azure Analysis Services

B.Azure Synapse Serverless SQL pool

C.Azure Data Factory

D.Azure Databricks

E.Azure SQL Database

AnswersB, C

Serverless SQL pool is serverless for querying data.

Why this answer

Azure Synapse Serverless SQL pool (Option B) is correct because it allows you to run T-SQL queries over data stored in Azure Data Lake or Blob Storage without provisioning any dedicated compute resources, paying only for the data processed. Azure Data Factory (Option C) is correct because it provides a serverless orchestration and data integration service that can execute data transformation activities (like Mapping Data Flows) at scale without managing underlying infrastructure.

Exam trap

The trap here is that candidates often confuse 'serverless' with 'fully managed' or 'PaaS', leading them to select Azure Databricks or Azure SQL Database, which still require explicit compute provisioning or cluster management, unlike the truly serverless models of Synapse Serverless SQL pool and Data Factory.

Full explanation →

Microsoft Azure Data Fundamentals DP-900 (DP-900) — Questions 526–600