Microsoft Azure Data Fundamentals DP-900 DP-900 Questions 376–450 | Page 6/14

376

MCQmedium

A gaming company stores player session data as JSON documents. Each document contains fields like sessionId, userId, startTime, and a varying set of optional fields such as deviceType or campaignId. The application needs to query sessions by userId and startTime range using SQL-like queries, and also by sessionId with low latency. Which Azure Cosmos DB API should the company choose?

A.SQL (Core) API

B.MongoDB API

C.Table API

D.Gremlin (Graph) API

AnswerA

The SQL API provides SQL query support over JSON documents and handles schema flexibility well.

Why this answer

The SQL (Core) API is the correct choice because it natively supports SQL-like queries over JSON documents, enabling efficient filtering by userId and startTime range. It also provides low-latency point reads by sessionId when a well-designed partition key (e.g., /userId) is used, and it offers automatic indexing of all JSON properties, including optional fields like deviceType or campaignId.

Exam trap

The trap here is that candidates may choose the MongoDB API because they assume 'SQL-like queries' require MongoDB's query language, but the Core API actually provides native SQL syntax and is the only Azure Cosmos DB API that supports SQL directly over JSON documents.

How to eliminate wrong answers

Option B (MongoDB API) is wrong because while it supports JSON documents and SQL-like queries via MongoDB's query language, it does not natively support the exact SQL syntax the application requires, and its indexing behavior differs from the Core API's automatic indexing of all fields. Option C (Table API) is wrong because it is designed for key-value storage with a flat schema and does not support nested JSON documents or SQL-like queries on varying optional fields. Option D (Gremlin (Graph) API) is wrong because it is optimized for graph traversal queries on entities and relationships, not for document-based queries on JSON fields like userId or startTime.

Full explanation →

377

MCQeasy

An organization needs to run complex queries on petabytes of data stored in Azure Data Lake Storage. They want to use serverless compute to avoid managing infrastructure. Which Azure service should they use?

A.Azure Analysis Services

B.Azure Synapse Serverless SQL pool

C.Azure HDInsight

D.Azure SQL Database

AnswerB

Provides serverless querying over data in Data Lake.

Why this answer

Azure Synapse Serverless SQL pool is the correct choice because it provides serverless compute that can run complex T-SQL queries directly against data stored in Azure Data Lake Storage without requiring any infrastructure management. It uses a pay-per-query billing model and can scale automatically to handle petabytes of data, making it ideal for ad-hoc analytics on large-scale data lakes.

Exam trap

The trap here is that candidates often confuse Azure Synapse Serverless SQL pool with Azure SQL Database or HDInsight, mistakenly thinking that any SQL-based service can handle serverless data lake queries, but only Synapse Serverless SQL pool provides true serverless compute with direct, on-demand querying of external data in Azure Data Lake Storage.

How to eliminate wrong answers

Option A is wrong because Azure Analysis Services is a fully managed platform-as-a-service (PaaS) that provides semantic modeling and in-memory analytics, but it is not serverless and requires provisioning of a dedicated server instance; it also does not directly query Data Lake Storage without additional data import or gateway configuration. Option C is wrong because Azure HDInsight is a managed cluster service that requires provisioning and managing virtual machines (e.g., for Hadoop, Spark, or Hive), which contradicts the requirement for serverless compute to avoid infrastructure management. Option D is wrong because Azure SQL Database is a relational database service that requires provisioning a logical server and managing database resources (DTUs or vCores), and it is not designed for serverless querying of petabytes of data in Data Lake Storage; it stores data in its own managed storage, not directly on the data lake.

Full explanation →

378

MCQmedium

A company is migrating a legacy on-premises database to Azure. They require the ability to run cross-database queries within the same logical server, full control over database collation settings, and want to minimize management overhead for infrastructure patching. The database size is under 1 TB and they do not need instance-level features like SQL Agent jobs or linked servers. Which Azure SQL offering should they choose?

A.Azure SQL Database

B.Azure SQL Managed Instance

C.SQL Server on Azure Virtual Machine

D.Azure Synapse SQL pool

AnswerA

Azure SQL Database is a PaaS service that handles patching, supports elastic query for cross-database queries, and allows collation settings on a per-database level. It does not include SQL Agent or linked servers, which are not required here.

Why this answer

Azure SQL Database is the correct choice because it supports cross-database queries within the same logical server via elastic queries, allows full control over database-level collation settings, and is a fully managed Platform-as-a-Service (PaaS) offering that handles infrastructure patching automatically. With a database size under 1 TB and no need for instance-level features like SQL Agent jobs or linked servers, Azure SQL Database meets all requirements while minimizing management overhead.

Exam trap

The trap here is that candidates often confuse Azure SQL Database with Azure SQL Managed Instance, assuming that cross-database queries require instance-level features like linked servers, but Azure SQL Database supports this via elastic queries without the need for instance-level management.

How to eliminate wrong answers

Option B (Azure SQL Managed Instance) is wrong because it provides instance-level features like SQL Agent jobs and linked servers, which are not needed, and it introduces more management overhead than Azure SQL Database. Option C (SQL Server on Azure Virtual Machine) is wrong because it requires the customer to manage patching and infrastructure, contradicting the requirement to minimize management overhead. Option D (Azure Synapse SQL pool) is wrong because it is designed for large-scale data warehousing and analytics, not for cross-database queries within a logical server, and it does not offer the same level of control over database collation settings.

Full explanation →

379

MCQeasy

A company uses Azure SQL Database and needs to audit all data modifications (INSERT, UPDATE, DELETE) for compliance purposes. The audit logs must be stored for 7 years. Which feature should they enable?

A.Advanced Threat Protection

B.SQL Database auditing

C.Vulnerability assessment

D.Transparent Data Encryption (TDE)

AnswerB

Auditing logs data modifications and can be retained for years.

Why this answer

Option B is correct because SQL Database auditing logs database events to Azure storage, Log Analytics, or Event Hubs, and can be retained for long periods. Option A is wrong because TDE encrypts data, does not log modifications. Option C is wrong because vulnerability assessment scans for security issues.

Option D is wrong because threat detection alerts on suspicious activities.

Full explanation →

380

MCQmedium

In a banking application, a transaction transfers $100 from Account A to Account B. The system deducts $100 from Account A successfully, but due to a network error, the credit to Account B fails. The application rolls back the deduction from Account A, ensuring that neither account is affected. Which ACID property is being enforced?

A.Atomicity

B.Consistency

C.Isolation

D.Durability

AnswerA

Atomicity ensures the transaction is all-or-nothing; if any part fails, the entire transaction is rolled back, as seen in this example.

Why this answer

Atomicity ensures that a transaction is treated as a single, indivisible unit of work. In this scenario, the deduction from Account A and the credit to Account B must both succeed or both fail entirely. When the credit to Account B fails, the system rolls back the deduction from Account A, preserving the all-or-nothing nature of the transaction.

This is the core behavior of atomicity in ACID-compliant database systems like Azure SQL Database or SQL Server.

Exam trap

The trap here is that candidates confuse the rollback action with consistency, because both involve maintaining a correct state, but atomicity specifically governs the all-or-nothing completion of the transaction itself, not the validity of the data rules.

How to eliminate wrong answers

Option B (Consistency) is wrong because consistency ensures that a transaction brings the database from one valid state to another, respecting all defined rules (e.g., constraints, triggers, cascades). While the rollback does maintain consistency, the specific action of rolling back a partial change is the hallmark of atomicity, not consistency. Option C (Isolation) is wrong because isolation controls how concurrent transactions are visible to each other (e.g., via locking or snapshot isolation levels), not the rollback of a failed transaction.

Option D (Durability) is wrong because durability guarantees that once a transaction is committed, its changes persist even after a system failure; here, the transaction was not committed, so durability is not relevant.

Full explanation →

381

MCQmedium

A company stores user-submitted profile photos. Each photo is accessed frequently for the first month after upload, then accessed rarely for the next year. After one year, the photos are deleted. Which Azure Blob Storage access tier should be used for the first month to minimize cost while ensuring low-latency access?

A.Hot tier

B.Cool tier

C.Archive tier

D.Premium tier

AnswerA

Correct. The Hot tier is optimized for frequent access, offering low latency and the highest storage cost but no retrieval cost, which is cost-effective for the high-access initial month.

Why this answer

The Hot tier is correct because it provides low-latency access and is optimized for frequent read/write operations, which matches the requirement of frequent access during the first month. It minimizes cost compared to Premium (which is for high-throughput scenarios) while still offering the necessary performance for user-submitted profile photos.

Exam trap

The trap here is that candidates often choose Cool tier thinking it balances cost and access, but they overlook the frequent access pattern in the first month, which makes Hot tier cheaper due to lower access costs and no early deletion penalty.

How to eliminate wrong answers

Option B (Cool tier) is wrong because it is designed for infrequently accessed data with a 30-day minimum storage duration and higher access costs, which would be more expensive for frequent access in the first month. Option C (Archive tier) is wrong because it has a 180-day minimum storage duration and retrieval latency of hours, making it unsuitable for low-latency access. Option D (Premium tier) is wrong because it uses SSDs and is optimized for high transaction rates and low latency for block blobs, but it is significantly more expensive than Hot tier and unnecessary for this access pattern.

Full explanation →

382

MCQhard

A company uses Azure Databricks for data engineering. The team wants to implement a medallion architecture (bronze, silver, gold) to organize data quality layers. In which layer should data be stored in a format optimized for analytics and reporting?

A.Bronze layer

B.Gold layer

C.Silver layer

D.Lakehouse layer

AnswerB

Gold layer stores aggregated, business-level data that is ready for analytics and reporting.

Why this answer

The gold layer in a medallion architecture contains data that has been refined, aggregated, and validated for business-level analytics and reporting. This layer stores data in a format optimized for query performance, such as Delta Lake with partitioning and Z-ordering, enabling efficient consumption by tools like Power BI or Azure Synapse.

Exam trap

The trap here is that candidates confuse the gold layer with the silver layer, assuming that cleaned data (silver) is sufficient for reporting, but the gold layer is specifically designed for analytics with aggregations and business logic applied.

How to eliminate wrong answers

Option A is wrong because the bronze layer stores raw ingested data in its original format, optimized for data ingestion and replay, not for analytics or reporting. Option C is wrong because the silver layer contains cleaned and deduplicated data but is still optimized for data science and intermediate transformations, not for final reporting. Option D is wrong because 'Lakehouse layer' is not a defined layer in the medallion architecture; the lakehouse is an architectural pattern that encompasses all layers (bronze, silver, gold), not a specific data quality layer.

Full explanation →

383

MCQeasy

A banking system processes a money transfer between two accounts. The system is designed so that after the transaction is committed, the results are permanently saved and survive any subsequent system failure, such as a power outage. Which ACID property ensures this behavior?

A.Durability

B.Atomicity

C.Consistency

D.Isolation

AnswerA

Correct. Durability guarantees that committed changes are saved permanently, surviving failures.

Why this answer

Durability ensures that once a transaction is committed, its changes are permanently stored and survive system failures, such as power outages or crashes. In this banking scenario, the money transfer results are written to non-volatile storage (e.g., disk) via a write-ahead log, guaranteeing that the committed state is recoverable even after a restart.

Exam trap

The trap here is that candidates often confuse durability with atomicity, thinking that 'surviving failures' means the transaction either completes fully or not at all, but atomicity handles the rollback of partial transactions, not the persistence of committed ones.

How to eliminate wrong answers

Option B (Atomicity) is wrong because atomicity ensures that a transaction is treated as an all-or-nothing unit, meaning either all operations complete or none do, but it does not guarantee that committed data survives failures. Option C (Consistency) is wrong because consistency ensures that a transaction brings the database from one valid state to another, preserving integrity constraints, but it does not address persistence after a commit. Option D (Isolation) is wrong because isolation ensures that concurrent transactions do not interfere with each other, preventing dirty reads or lost updates, but it does not provide durability against system crashes.

Full explanation →

384

MCQhard

Refer to the exhibit. A developer is creating an ARM template for an Azure Synapse workspace. What is the purpose of the 'defaultDataLakeStorage' property?

A.Sets the location for pipeline execution history

B.Defines the primary storage account for the workspace

C.Specifies the storage account for Apache Spark logs

D.Configures the storage for SQL pool backups

AnswerB

Default storage for data and workspace files.

Why this answer

The 'defaultDataLakeStorage' property in an ARM template for Azure Synapse Analytics defines the primary Azure Data Lake Storage Gen2 account that the workspace uses as its default storage. This storage account is where the workspace stores its data, including the data lake files and the metadata for the SQL and Spark engines. It is essential for the workspace to function, as it provides the underlying storage for tables, pipelines, and other workspace assets.

Exam trap

The trap here is that candidates confuse the 'defaultDataLakeStorage' property with a configuration for specific features like Spark logs or backups, when in fact it is the foundational storage account that the entire workspace relies on for its primary data lake operations.

How to eliminate wrong answers

Option A is wrong because pipeline execution history is stored in the Azure Synapse workspace's built-in database (the 'control' database) or in a user-configured log analytics workspace, not in the defaultDataLakeStorage property. Option C is wrong because Apache Spark logs are written to a separate storage location (often a user-specified container or a workspace-managed location) and are not configured via the defaultDataLakeStorage property; that property is for the primary data lake, not Spark-specific logs. Option D is wrong because SQL pool backups are managed by Azure Synapse's built-in backup service and are stored in the workspace's default storage account automatically, but the 'defaultDataLakeStorage' property does not configure backup settings; it defines the primary storage account for the workspace's data.

Full explanation →

385

MCQmedium

A database designer wants to reduce data redundancy and improve data integrity by splitting a large table into multiple related tables based on functional dependencies. This process is known as:

A.Denormalization

B.Normalization

C.Partitioning

D.Indexing

AnswerB

Normalization reduces redundancy by breaking tables into smaller, related tables based on dependencies.

Why this answer

Normalization is the process of organizing a relational database into multiple related tables to reduce data redundancy and improve data integrity by eliminating functional dependencies that cause anomalies. This is a core concept in relational database design, directly aligning with the scenario described in the question.

Exam trap

The trap here is that candidates often confuse normalization with partitioning, because both involve splitting tables, but partitioning is a physical storage optimization, not a logical design technique for reducing redundancy.

How to eliminate wrong answers

Option A is wrong because denormalization is the opposite process—it intentionally adds redundancy by merging tables to improve read performance, often at the cost of data integrity. Option C is wrong because partitioning splits a table horizontally or vertically for performance or manageability, but it does not inherently reduce redundancy or address functional dependencies. Option D is wrong because indexing creates data structures to speed up query performance on existing tables, but it does not restructure tables to eliminate redundancy or enforce integrity.

Full explanation →

386

MCQeasy

A company stores customer names, addresses, and order history. They need to perform complex queries that join customer and order data. Which type of data store is most appropriate for this scenario?

A.Key-value store

B.Relational database

C.Document database

D.Graph database

AnswerB

Relational databases organize data into tables with defined schemas and support SQL queries including joins, making them ideal for this requirement.

Why this answer

A relational database (e.g., Azure SQL Database) is most appropriate because the scenario requires joining customer and order data via complex queries. Relational databases enforce a fixed schema with tables, primary keys, and foreign keys, enabling efficient JOIN operations using SQL. This structure ensures data integrity and supports ACID transactions, which are essential for accurate order history and customer records.

Exam trap

The trap here is that candidates often choose a document database (Option C) because they associate 'complex queries' with JSON flexibility, but fail to recognize that 'joining' specifically requires relational database features like SQL JOINs and foreign keys, which document stores lack.

How to eliminate wrong answers

Option A is wrong because a key-value store (e.g., Azure Cosmos DB Table API) is optimized for simple lookups by a single key and does not support complex JOIN queries or relational integrity between entities. Option C is wrong because a document database (e.g., Azure Cosmos DB Core API) stores semi-structured JSON documents and, while it can embed related data, it lacks native JOIN capabilities and enforces no schema, making complex relational queries inefficient. Option D is wrong because a graph database (e.g., Azure Cosmos DB Gremlin API) is designed for traversing relationships between highly connected entities (e.g., social networks), not for tabular JOINs on structured customer and order data.

Full explanation →

387

MCQhard

A company stores user session data for a web application. Each session has a unique SessionID, UserID, start time, end time, and a variable set of attributes (e.g., pages visited, clicks, device type). The workload requires low-latency reads by SessionID and occasional queries by UserID and time range. Schema flexibility is critical because the attributes evolve over time. The team wants a fully managed NoSQL database that supports secondary indexing. Which Azure data store should they choose?

A.Azure Cosmos DB (NoSQL API)

B.Azure SQL Database

C.Azure Table Storage

D.Azure Blob Storage

AnswerA

Correct. Cosmos DB's NoSQL API natively supports JSON documents with flexible schema. It offers low-latency reads on the partition key and allows secondary indexes to support queries on other attributes like UserID.

Why this answer

Azure Cosmos DB (NoSQL API) is correct because it is a fully managed NoSQL database that offers low-latency reads by SessionID (using a partition key), supports secondary indexing for queries by UserID and time range, and provides schema flexibility for evolving session attributes. Its multi-model API and global distribution meet the workload requirements without manual indexing or schema management.

Exam trap

The trap here is that candidates may confuse Azure Table Storage (which is also NoSQL and schema-flexible) with Cosmos DB, but Table Storage lacks secondary indexing, making it unsuitable for queries by UserID and time range without expensive scans.

How to eliminate wrong answers

Option B (Azure SQL Database) is wrong because it is a relational database requiring a fixed schema, which contradicts the need for schema flexibility with evolving attributes. Option C (Azure Table Storage) is wrong because it does not support secondary indexing; queries by UserID and time range would require full table scans, failing the low-latency requirement. Option D (Azure Blob Storage) is wrong because it is an object store for unstructured data (e.g., files, images), not a database with query capabilities or indexing for session data.

Full explanation →

388

MCQmedium

Refer to the exhibit. You are reviewing an ARM template for a new storage account. The storage account will store data that must be accessible from any Azure region and must be highly durable. Which change should you make to the template?

A.Set supportsHttpsTrafficOnly to false

B.Change the SKU name to Premium_LRS

C.Change the SKU name to Standard_GRS

D.Change the kind to BlobStorage

AnswerC

Geo-redundant storage replicates data to a secondary region, improving durability across regions.

Why this answer

Standard_GRS (Geo-Redundant Storage) is the correct SKU because it replicates data synchronously three times within a primary region and asynchronously to a secondary region hundreds of miles away, ensuring high durability (11 nines) and accessibility from any Azure region via read-access (RA-GRS). The requirement for data to be accessible from any Azure region and highly durable aligns with GRS's geo-replication, whereas LRS only replicates within a single datacenter and Premium_LRS is for low-latency workloads, not geo-accessibility.

Exam trap

Microsoft often tests the misconception that changing the 'kind' (e.g., to BlobStorage) or disabling HTTPS affects durability or geo-accessibility, when in fact only the SKU name (replication strategy) controls these properties, and candidates confuse security settings with replication settings.

How to eliminate wrong answers

Option A is wrong because setting supportsHttpsTrafficOnly to false disables HTTPS enforcement, which is a security setting unrelated to durability or regional accessibility; it would expose data to insecure HTTP traffic. Option B is wrong because Premium_LRS uses SSD-based storage with local redundancy only, offering lower durability (11 nines vs. 16 nines for GRS) and no geo-replication, failing the 'accessible from any Azure region' requirement. Option D is wrong because changing the kind to BlobStorage restricts the account to blob-only storage (block blobs and append blobs), but the question does not specify blob-only data; moreover, the kind does not affect durability or geo-accessibility—that is determined by the SKU.

Full explanation →

389

MCQhard

A company operates a high-volume order processing system on Azure SQL Database. During peak hours, many concurrent transactions try to insert and update rows in the same table, causing contention on page latches. Indexing and query optimization are already tuned. Which feature should the company implement to reduce write contention while preserving ACID properties?

A.Read Scale-out

B.In-Memory OLTP

C.Elastic Database Query

D.Transparent Data Encryption (TDE)

AnswerB

In-Memory OLTP improves write performance by eliminating latch contention through memory-optimized tables and optimistic concurrency, ideal for high-concurrency transactional workloads.

Why this answer

In-Memory OLTP is correct because it uses memory-optimized tables and natively compiled stored procedures to reduce latch contention by eliminating the need for page latches entirely. Transactions operate directly on in-memory data structures, using optimistic multi-version concurrency control (MVCC) to detect conflicts without blocking, which preserves ACID properties while allowing high concurrency.

Exam trap

The trap here is that candidates confuse In-Memory OLTP with caching or read optimization, but the question specifically targets write contention and ACID preservation, which In-Memory OLTP uniquely addresses through latch-free design and optimistic concurrency.

How to eliminate wrong answers

Option A is wrong because Read Scale-out is designed to offload read-only workloads to a read-only replica, not to reduce write contention on the primary database. Option C is wrong because Elastic Database Query enables cross-database querying across shards or databases, but does not address intra-table latch contention or improve write performance. Option D is wrong because Transparent Data Encryption (TDE) performs real-time encryption/decryption of data at rest and has no effect on concurrency, locking, or latch contention.

Full explanation →

390

MCQhard

A healthcare application stores patient medical history in a relational database. The system must ensure that after a transaction updates multiple records (e.g., diagnosis and medication), all changes are saved or none are saved. This property is best described as:

A.Atomicity

B.Consistency

C.Durability

D.Isolation

AnswerA

Atomicity ensures that a transaction is either fully completed or fully rolled back, matching the all-or-nothing requirement.

Why this answer

Atomicity ensures that a transaction is treated as a single, indivisible unit of work. In the context of a relational database storing patient medical history, if a transaction updates both the diagnosis and medication records, atomicity guarantees that either both updates are committed or both are rolled back, preventing partial updates that could leave the data in an inconsistent state.

Exam trap

The trap here is that candidates often confuse atomicity with consistency, mistakenly thinking that 'all-or-nothing' is about maintaining data rules, when in fact atomicity is specifically about the transaction's indivisibility at the write level.

How to eliminate wrong answers

Option B (Consistency) is wrong because consistency ensures that a transaction brings the database from one valid state to another, respecting all defined rules (e.g., constraints, triggers), but it does not directly enforce the all-or-nothing behavior of multiple record updates. Option C (Durability) is wrong because durability guarantees that once a transaction is committed, its changes persist even after a system failure, but it does not control whether the transaction is fully applied or rolled back. Option D (Isolation) is wrong because isolation ensures that concurrent transactions do not interfere with each other, preventing dirty reads or lost updates, but it does not mandate that all changes within a single transaction are saved or none are saved.

Full explanation →

391

Multi-Selectmedium

Which TWO Azure services can be used to orchestrate and automate data pipelines? (Choose two.)

Select 2 answers

A.Azure SQL Database

B.Azure Synapse Pipelines

C.Power BI

D.Azure Databricks

E.Azure Data Factory

AnswersB, E

Azure Synapse Pipelines provide similar orchestration capabilities as Data Factory within Synapse workspace.

Why this answer

Azure Data Factory (E) is a dedicated cloud-based ETL and data integration service that allows you to create, schedule, and orchestrate data pipelines at scale. Azure Synapse Pipelines (B) is built on the same engine as Azure Data Factory and provides native pipeline orchestration within the Synapse Analytics workspace, enabling you to move and transform data across various sources and sinks. Both services offer visual design tools, code-free transformations, and robust scheduling capabilities for automating data workflows.

Exam trap

The trap here is that candidates often confuse Azure Databricks (which can run data transformation code) with a pipeline orchestration service, but it lacks the native scheduling, dependency management, and visual pipeline designer that Azure Data Factory and Synapse Pipelines provide.

Full explanation →

392

MCQhard

A company uses Azure Synapse Analytics to run both interactive queries and large batch loads. The interactive queries must have consistent performance regardless of batch load activity. Which Synapse feature should the company use?

A.Workload management with workload isolation.

B.Result-set caching for frequently run queries.

C.Materialized views for aggregate data.

D.Data compression with columnstore indexes.

AnswerA

Workload isolation reserves compute resources for specific workloads, preventing contention.

Why this answer

Workload management with workload isolation in Azure Synapse Analytics allows you to reserve resources for specific workloads, such as interactive queries, ensuring they have consistent performance even when large batch loads are running. By creating a workload group with 'REQUEST_MIN_RESOURCE_PERCENT' set to a non-zero value, you guarantee a minimum amount of resources are always available for that group, preventing contention from other workloads.

Exam trap

The trap here is that candidates confuse performance optimization features like caching or materialized views with resource governance features, assuming they provide isolation when they only improve query speed without guaranteeing resource availability.

How to eliminate wrong answers

Option B is wrong because result-set caching improves performance for repeated queries by storing results in memory, but it does not isolate resources or guarantee consistent performance during concurrent batch loads. Option C is wrong because materialized views pre-compute and store aggregated data, reducing query execution time, but they do not provide resource isolation or protect interactive queries from batch load activity. Option D is wrong because data compression with columnstore indexes improves storage efficiency and query performance through data compression and columnar storage, but it does not manage resource allocation or prevent performance degradation from concurrent workloads.

Full explanation →

393

MCQeasy

A company needs to migrate a large on-premises SQL Server database to Azure with minimal downtime. Which Azure service should they use for the migration?

A.Azure SQL Database

B.Azure Backup

C.Azure Data Factory

D.Azure Database Migration Service

AnswerD

Supports online migrations with minimal downtime.

Why this answer

Option B is correct because Azure Database Migration Service (DMS) is designed for online migrations with minimal downtime, supporting continuous data sync. Option A is wrong because Azure SQL Database is the target, not a migration tool. Option C is wrong because Azure Data Factory is for data integration, not database migration with minimal downtime.

Option D is wrong because Azure Backup is for backup, not migration.

Full explanation →

394

Multi-Selecthard

Which THREE of the following are valid Azure data storage services? (Choose three.)

Select 3 answers

A.Azure Files

B.Azure Blob Storage

C.Azure Redis Cache

D.Azure Table Storage

E.Azure Service Bus

AnswersA, B, D

Yes, it's a fully managed file share.

Why this answer

Azure Files provides fully managed file shares in the cloud that can be accessed via the Server Message Block (SMB) protocol or the Network File System (NFS) protocol. It is a valid Azure data storage service because it stores data as files in a hierarchical structure, making it suitable for lift-and-shift scenarios for on-premises file servers.

Exam trap

The trap here is that candidates may confuse Azure Redis Cache and Azure Service Bus as data storage services because they store data temporarily, but the DP-900 exam defines 'data storage services' as those designed for persistent, structured or unstructured data storage, not transient messaging or caching.

Full explanation →

395

MCQeasy

A retail company stores product information in a relational database table with fixed columns: ProductID (integer), Name (string), Price (decimal). They also store customer reviews as JSON documents where each review may contain different fields such as rating, comment, and optional images. Additionally, they store product images as JPEG files in Azure Blob Storage. Which of the following correctly classifies these data types from most structured to least structured?

A.Structured (product info), Semi-structured (reviews), Unstructured (images)

B.Semi-structured (product info), Structured (reviews), Unstructured (images)

C.Unstructured (product info), Semi-structured (reviews), Structured (images)

D.Structured (product info), Unstructured (reviews), Semi-structured (images)

AnswerA

Product info in a relational table is structured. JSON reviews are semi-structured because they have a flexible schema. JPEG images are unstructured binary data.

Why this answer

Product info in a relational table with fixed columns (ProductID, Name, Price) is structured data. Customer reviews stored as JSON documents, which may have varying fields like rating, comment, and optional images, are semi-structured because they have a flexible schema. Product images stored as JPEG files in Azure Blob Storage are unstructured binary data.

This ordering from most to least structured matches option A.

Exam trap

Microsoft often tests the distinction between semi-structured and unstructured data, where candidates mistakenly classify JSON as unstructured because it lacks a fixed schema, but JSON is semi-structured due to its inherent key-value structure and optional fields.

How to eliminate wrong answers

Option B is wrong because it incorrectly classifies product info as semi-structured (it has a fixed schema, making it structured) and reviews as structured (JSON with optional fields is semi-structured). Option C is wrong because it classifies product info as unstructured (it is structured) and images as structured (JPEG files are unstructured binary data). Option D is wrong because it classifies reviews as unstructured (JSON with a schema is semi-structured) and images as semi-structured (JPEG files have no schema, making them unstructured).

Full explanation →

396

MCQmedium

A company uses Azure SQL Database for a financial system. The Transactions table contains millions of rows. Queries frequently aggregate data for the current month, but also need to retain historical data for 7 years. The company wants to improve query performance for the monthly aggregations and simplify data archiving. Which design should they implement?

A.Create a clustered columnstore index on the entire table.

B.Partition the table by month and create aligned indexes.

C.Use Azure SQL Database elastic pool for the database.

D.Implement transparent data encryption.

AnswerB

Correct. Partitioning by month supports partition elimination for monthly queries and allows for easy partition switching to archive old data.

Why this answer

Partitioning the Transactions table by month allows SQL Server to perform partition elimination during queries that aggregate data for the current month, scanning only the relevant partition(s) instead of the entire table. Aligned indexes ensure that index structures follow the same partition scheme, maintaining efficiency for both queries and maintenance. This design also simplifies data archiving by enabling fast partition switching to move older months out of the table without costly delete operations.

Exam trap

The trap here is that candidates confuse performance features like columnstore indexes or elastic pools with the specific need for partition elimination and data archiving, overlooking that partitioning directly addresses both the query performance and data lifecycle requirements.

How to eliminate wrong answers

Option A is wrong because a clustered columnstore index is optimized for large-scale analytical workloads and data warehousing, not for transactional systems with frequent point lookups or updates; it would degrade performance for the financial system's mixed workload. Option C is wrong because an elastic pool is a resource management feature for scaling multiple databases, not a design choice to improve query performance or archiving for a single table. Option D is wrong because transparent data encryption (TDE) provides security at rest but has no impact on query performance or data archiving capabilities.

Full explanation →

397

MCQhard

A financial services company uses Azure Synapse Analytics to process large volumes of transaction data. They have a dedicated SQL pool (formerly SQL DW) that ingests curated, aggregated data nightly from a data lake. Data analysts need to run ad-hoc, exploratory T-SQL queries on raw transaction data stored as Parquet files in Azure Data Lake Storage Gen2. These queries vary widely in complexity and frequency. The company wants to minimize costs for these ad-hoc queries while still using full T-SQL capabilities. Which approach should they recommend?

A.Use external tables in the dedicated SQL pool to query the data lake directly.

B.Create a serverless SQL pool endpoint to query the data lake directly.

C.Load the raw data into the dedicated SQL pool before querying.

D.Use Azure Data Explorer to query the data lake.

AnswerB

Serverless SQL pool is a pay-per-query service that auto-scales and charges only for the data processed. It supports full T-SQL and is ideal for ad-hoc, exploratory queries on data lake files.

Why this answer

Serverless SQL pool in Azure Synapse Analytics is designed for ad-hoc, on-demand querying of data lake files (like Parquet) without provisioning or paying for dedicated compute resources. It supports full T-SQL syntax and charges only for the data processed per query, making it cost-effective for exploratory workloads with variable complexity and frequency.

Exam trap

The trap here is that candidates often confuse external tables in a dedicated SQL pool with serverless SQL pool, assuming both are equally cost-effective, but they overlook that dedicated SQL pool incurs fixed compute costs regardless of usage, while serverless SQL pool is truly pay-per-query.

How to eliminate wrong answers

Option A is wrong because external tables in a dedicated SQL pool still require the pool to be running and incur compute costs even when idle, which is not cost-minimizing for sporadic ad-hoc queries. Option C is wrong because loading raw data into the dedicated SQL pool incurs ingestion costs and storage costs, and the pool must be active, defeating the goal of minimizing costs for exploratory queries. Option D is wrong because Azure Data Explorer (ADX) is optimized for time-series and log analytics, not for full T-SQL querying of Parquet files in a data lake, and it introduces additional service costs and complexity.

Full explanation →

398

MCQeasy

A data analyst needs to create interactive reports from data stored in an Azure SQL Database. They want to use a self-service tool that requires minimal IT support. Which tool should they use?

A.Azure Synapse Studio

B.SQL Server Management Studio

C.Power BI Desktop

D.Azure Data Studio

AnswerC

Self-service BI tool for creating interactive reports.

Why this answer

Power BI Desktop is a self-service business intelligence tool designed for creating interactive reports and dashboards with minimal IT support. It connects directly to Azure SQL Database, allowing analysts to import or query data using DirectQuery, and provides drag-and-drop visualizations without requiring database administration skills.

Exam trap

The trap here is that candidates confuse Azure Synapse Studio or Azure Data Studio as reporting tools, but they are primarily for data engineering and development, not for self-service interactive report creation.

How to eliminate wrong answers

Option A is wrong because Azure Synapse Studio is a unified analytics platform for large-scale data warehousing and big data processing, requiring more IT setup and expertise than a self-service tool. Option B is wrong because SQL Server Management Studio (SSMS) is a database management tool for administering and querying SQL Server, not for creating interactive reports. Option D is wrong because Azure Data Studio is a cross-platform database tool focused on querying and development, lacking the rich visualization and report-authoring capabilities of Power BI Desktop.

Full explanation →

399

Multi-Selecteasy

Which TWO factors should you consider when choosing between Azure SQL Database and SQL Server on Azure Virtual Machines?

Select 2 answers

A.Need for database auditing

B.Support for geo-replication

C.Compatibility with on-premises SQL Server features

D.Need for Transparent Data Encryption

E.Level of administrative control required

AnswersC, E

Managed instance offers high compatibility; virtual machine offers full compatibility.

Why this answer

Options A and C are correct. Managed service vs. IaaS is a key differentiator.

Feature compatibility is also important because managed instance offers high compatibility but not 100%. Option B is wrong because both services support TDE. Option D is wrong because both support geo-replication (Azure SQL Database has active geo-replication; SQL Server on VM can use Always On AG).

Option E is wrong because both support auditing.

Full explanation →

400

MCQeasy

Refer to the exhibit. You are deploying an Azure Storage account. The JSON snippet represents a template parameter. What does the 'isHnsEnabled' property enable?

A.Blob versioning

B.Soft delete for blobs

C.Geo-redundant storage

D.Hierarchical namespace for the storage account

AnswerD

Enables Data Lake Storage Gen2 capabilities.

Why this answer

The 'isHnsEnabled' property enables the hierarchical namespace for the storage account, which is a core feature of Azure Data Lake Storage Gen2. When set to true, it allows the storage account to organize blobs into a directory hierarchy, enabling POSIX-like access control lists (ACLs) and file system semantics. This is essential for big data analytics workloads that require a file system structure rather than a flat blob storage model.

Exam trap

The trap here is that candidates often confuse 'isHnsEnabled' with blob-level features like versioning or soft delete, because all three are related to data management, but only the hierarchical namespace fundamentally changes the storage account's architecture to support file system semantics.

How to eliminate wrong answers

Option A is wrong because blob versioning is enabled via the 'Versioning' property in the Blob service settings, not by 'isHnsEnabled'. Option B is wrong because soft delete for blobs is configured through the 'DeleteRetentionPolicy' property in the Blob service, not through the hierarchical namespace flag. Option C is wrong because geo-redundant storage (GRS) is a replication option set via the 'sku.name' property (e.g., 'Standard_GRS'), not by enabling a hierarchical namespace.

Full explanation →

401

MCQhard

A company uses Azure Synapse Analytics to run complex queries against large datasets stored in Parquet files in Azure Data Lake Storage Gen2. They notice that queries scanning entire partitions are slow due to high I/O overhead on the compute nodes. Investigation shows each daily partition contains thousands of small files (under 1 MB each). Which optimization should be implemented first to improve query performance?

A.Increase the number of compute nodes

B.Use columnstore indexes on external tables

C.Compact small files into larger ones before querying

D.Change the partition column to a different date granularity

AnswerC

Compacting small files into larger ones (e.g., 256 MB) reduces file open operations and I/O overhead, significantly improving scan performance in distributed query engines.

Why this answer

Option C is correct because the high I/O overhead is caused by the thousands of small files per partition. When Synapse compute nodes read many small files, the overhead of opening, reading metadata, and closing each file dominates, even though the total data volume is small. Compacting these small files into fewer, larger files (e.g., 128 MB or more) reduces the number of file operations, improves read throughput, and allows more efficient predicate pushdown and parallelism.

Exam trap

The trap here is that candidates often confuse scaling out compute nodes (Option A) with solving a data layout problem, or mistakenly think columnstore indexes (Option B) apply to external tables, when in fact the issue is purely about file size and count in the storage layer.

How to eliminate wrong answers

Option A is wrong because increasing compute nodes adds more parallelism but does not address the root cause of excessive file open/close overhead; it may even worsen the problem by distributing the many small files across more nodes. Option B is wrong because columnstore indexes are not supported on external tables in Azure Synapse; they apply only to tables in a dedicated SQL pool, and the question describes queries against Parquet files in Data Lake Storage, not a SQL pool table. Option D is wrong because changing the partition column granularity (e.g., from daily to monthly) would create even larger partitions with more small files, exacerbating the I/O overhead, and does not solve the small-file problem.

Full explanation →

402

MCQmedium

A manufacturing company ingests real-time sensor data from assembly line machines into Azure Event Hubs. The company needs to calculate a 5-minute rolling average of temperature readings for each machine and compare it against a static threshold value stored in a CSV file in Azure Blob Storage. If the average exceeds the threshold, an alert must be triggered. Which Azure service should be used for this real-time data processing?

A.Azure Stream Analytics

B.Azure Data Factory

C.Azure Synapse Analytics

D.Azure HDInsight

AnswerA

Correct. Stream Analytics can process streaming data with window functions and join with reference data from Blob Storage.

Why this answer

Azure Stream Analytics is the correct choice because it is designed for real-time stream processing, including windowed aggregations like a 5-minute rolling average. It can directly ingest data from Azure Event Hubs, perform the calculation using a TumblingWindow or HoppingWindow function, and reference static data (the threshold CSV) from Azure Blob Storage via a reference data input. If the computed average exceeds the threshold, Stream Analytics can output the alert to a sink like Azure Functions or a notification service.

Exam trap

The trap here is that candidates often confuse Azure Stream Analytics with Azure Data Factory or Synapse Analytics, mistakenly thinking that any data processing involving Blob Storage or SQL-like queries must use a batch-oriented service, when in fact Stream Analytics is the only option that natively supports real-time windowed aggregations and reference data joins from Blob Storage.

How to eliminate wrong answers

Option B (Azure Data Factory) is wrong because it is an orchestration and ETL service for batch data movement and transformation, not a real-time stream processing engine; it cannot perform continuous windowed aggregations on live Event Hubs data. Option C (Azure Synapse Analytics) is wrong because it is a unified analytics platform primarily for large-scale data warehousing and batch/query processing, not for real-time stream processing with sub-second latency requirements. Option D (Azure HDInsight) is wrong because it is a managed cluster service for big data frameworks like Apache Spark and Hadoop, which can handle streaming but requires manual cluster management and is overkill for a simple rolling average and threshold comparison; Azure Stream Analytics is a purpose-built, serverless alternative for this exact use case.

Full explanation →

403

MCQmedium

You are designing a relational database for a multi-tenant SaaS application. Each tenant's data must be isolated for security and compliance. Which design approach best ensures data isolation while keeping cost manageable?

A.Use Azure Synapse Analytics with workload isolation

B.Use a separate database per tenant

C.Use a single database with a TenantID column and row-level security

D.Use a single database with separate schemas per tenant

AnswerB

Provides strong isolation.

Why this answer

Option A is correct because a separate database per tenant provides strong isolation. Option B is wrong because shared tables with a TenantID column require careful row-level security. Option C is wrong because a separate schema per database is not a standard Azure feature.

Option D is wrong because Azure Synapse Analytics is for data warehousing, not OLTP.

Full explanation →

404

MCQmedium

A company stores log files in Azure Blob Storage. Each log file is a CSV file ranging from 100 MB to 1 GB. They need to query the logs using SQL queries and plan to use Azure Synapse Serverless SQL. The logs are stored in a container named 'logs' with a folder structure like 'year=2023/month=01/day=01/logfile.csv'. They want to optimize query performance and reduce data scanned. Which partitioning strategy should they implement?

A.Convert logs to Parquet format and store in a single folder.

B.Use the existing folder structure with year/month/day partitions.

C.Consolidate all logs into a single large CSV file.

D.Use random file names to distribute load.

AnswerB

Synapse Serverless SQL can use folder partitioning to prune partitions, reducing data scanned.

Why this answer

Partitioning by year, month, and day (option A) using folder structure allows Synapse to perform partition elimination, reducing data scanned. Option B (single large file) is not partitioned. Option C (parquet format) is a columnar format that improves performance but does not address partition elimination.

Option D (random GUID names) prevents partition elimination.

Full explanation →

405

Multi-Selectmedium

Which TWO of the following are common characteristics of a NoSQL database?

Select 2 answers

A.Flexible schema

B.Normalized data storage

C.Strong ACID transaction support

D.Relational data model

E.Horizontal scaling

AnswersA, E

NoSQL databases allow schema flexibility, making them suitable for semi-structured or unstructured data.

Why this answer

Option A is correct because NoSQL databases, such as MongoDB or Cassandra, use a flexible schema that allows documents or records to have varying fields without requiring predefined table structures. This enables developers to iterate quickly and store semi-structured or unstructured data, such as JSON documents, without costly schema migrations.

Exam trap

The trap here is that candidates confuse 'flexible schema' with 'no schema at all' or mistakenly think NoSQL always supports strong ACID transactions, when in reality most NoSQL systems trade ACID for scalability and performance.

Full explanation →

406

MCQhard

A financial institution runs complex analytical queries on trading data stored in Parquet files in Azure Data Lake Storage Gen2. The data is partitioned by date and contains billions of rows. Analysts frequently query within a specific date range, and the queries must return results in under 5 seconds. The current solution uses Azure Synapse Serverless SQL pool, but queries are slow because the serverless pool scans all partitions even when the WHERE clause filters on the date column. Which optimization should be implemented to improve query performance?

A.Switch to Azure Synapse dedicated SQL pool with proper table partitioning

B.Create a clustered columnstore index on the external table

C.Convert the Parquet files to CSV format

D.Use Azure Databricks with Delta Lake for querying

AnswerA

Correct. Dedicated SQL pools support partition elimination, allowing queries to skip scanning partitions that don't match the filter, dramatically improving performance.

Why this answer

Azure Synapse Serverless SQL pool does not support partition elimination based on the partitioning of the underlying Parquet files in Azure Data Lake Storage Gen2. By switching to an Azure Synapse dedicated SQL pool with proper table partitioning on the date column, the query engine can perform partition pruning, scanning only the relevant partitions for the specified date range, which drastically reduces I/O and improves query performance to meet the sub-5-second requirement.

Exam trap

The trap here is that candidates may assume serverless SQL pool automatically performs partition elimination on folder-partitioned data, but it does not; it scans all files unless explicit filepath() filtering is used, making dedicated SQL pool with table partitioning the correct choice for guaranteed partition pruning.

How to eliminate wrong answers

Option B is wrong because creating a clustered columnstore index on an external table is not supported in Azure Synapse Serverless SQL pool; external tables are read-only and cannot have indexes. Option C is wrong because converting Parquet files to CSV format would increase file size and degrade performance due to lack of compression and columnar storage benefits, making queries slower. Option D is wrong because while Azure Databricks with Delta Lake can provide performance optimizations, it is not the most direct or cost-effective solution for the described scenario, and the question specifically asks for an optimization to the existing Azure Synapse Serverless SQL pool solution.

Full explanation →

407

MCQmedium

A social media application stores user profiles as JSON documents. Each user profile can have different attributes (e.g., some have 'education', others have 'work experience'). The application needs to query profiles by any attribute with low latency. Which Azure data store is most appropriate?

A.Azure Blob Storage

B.Azure Table Storage

C.Azure Cosmos DB (SQL API)

D.Azure SQL Database

AnswerC

Cosmos DB SQL API natively supports JSON documents with flexible schemas and provides indexing on all properties for fast queries.

Why this answer

Azure Cosmos DB with the SQL API is the correct choice because it natively supports schema-agnostic JSON documents, allowing each user profile to have varying attributes without requiring a fixed schema. Its indexing policies enable low-latency queries on any attribute, and it provides single-digit millisecond response times for point reads and queries, which is essential for a social media application.

Exam trap

The trap here is that candidates often confuse Azure Table Storage's key-value model with a document database, assuming it can query arbitrary attributes efficiently, but Table Storage requires a composite key and lacks secondary indexes for ad-hoc queries on non-key fields.

How to eliminate wrong answers

Option A is wrong because Azure Blob Storage is designed for unstructured binary or text data (like images or videos) and does not support querying individual attributes within JSON documents; it would require loading entire blobs and parsing them client-side. Option B is wrong because Azure Table Storage is a key-value store that requires a predefined partition key and row key, and it does not support querying on arbitrary attributes without scanning all entities, leading to higher latency. Option D is wrong because Azure SQL Database is a relational database that requires a fixed schema, so storing user profiles with varying attributes would necessitate complex schema designs (e.g., EAV pattern) or frequent ALTER TABLE operations, which adds overhead and reduces query performance.

Full explanation →

408

MCQmedium

A startup is building a web application with a relational database backend. They expect variable traffic and want to minimize costs by scaling the database automatically based on demand. Which Azure service should they use?

A.Azure SQL Database serverless

B.Azure SQL Managed Instance

C.SQL Server on Azure Virtual Machines

D.Azure Database for MariaDB

AnswerA

Serverless compute auto-scales and pauses, reducing cost for variable workloads.

Why this answer

Option C is correct because Azure SQL Database serverless automatically scales compute resources and pauses during inactivity to save costs. Option A (Azure SQL Managed Instance) is not serverless. Option B (SQL Server on Azure VM) requires manual scaling.

Option D (Azure Database for MariaDB) does not have serverless compute.

Full explanation →

409

MCQhard

A company uses Azure Synapse Analytics dedicated SQL pool to run large-scale analytics. The data engineering team notices that queries are slow due to excessive data movement between distributions. Which index type should be recommended to minimize data movement for fact tables that are frequently joined on a specific column?

A.Ordered clustered columnstore index

B.Clustered columnstore index

C.Round-robin distributed table

D.Hash-distributed table

AnswerD

Hash-distributed table distributes rows based on a hash of the distribution column, ensuring that rows with the same join key are on the same distribution, minimizing data movement.

Why this answer

Hash-distributed tables distribute rows across distributions based on a hash of a chosen column. When fact tables are frequently joined on that specific column, using the same distribution column ensures that matching rows from both tables reside on the same distribution, eliminating the need to shuffle data between distributions during the join. This minimizes data movement and significantly improves query performance in Azure Synapse dedicated SQL pools.

Exam trap

The trap here is that candidates confuse index types (like columnstore) with table distribution strategies, assuming that a better index alone can solve data movement issues, when in fact distribution design is the primary mechanism to minimize cross-distribution data shuffling in Azure Synapse dedicated SQL pools.

How to eliminate wrong answers

Option A is wrong because an ordered clustered columnstore index improves compression and query performance for range scans and order-dependent queries, but it does not address data movement during joins. Option B is wrong because a clustered columnstore index provides high compression and fast scan performance for analytics, but it does not control how data is distributed across nodes, so it cannot reduce data movement for joins. Option C is wrong because round-robin distributed tables distribute data evenly without any hash key, causing all join operations to require full data movement across distributions, which is the opposite of minimizing data movement.

Full explanation →

410

MCQmedium

Your organization uses Azure SQL Database and needs to ensure that all customer data is encrypted at rest and in transit with minimal administrative overhead. Which solution should you recommend?

A.Use Microsoft Purview Information Protection to label and encrypt the data.

B.Enable Transparent Data Encryption (TDE) and enforce TLS 1.2 for connections.

C.Implement Dynamic Data Masking on the customer table.

D.Enable Always Encrypted for all sensitive columns and use client-side encryption.

AnswerB

TDE encrypts the database at rest automatically, and enforcing TLS ensures encryption in transit with minimal overhead.

Why this answer

Option B is correct because Transparent Data Encryption (TDE) encrypts Azure SQL Database data files at rest without requiring any application changes, and enforcing TLS 1.2 ensures all data in transit is encrypted using a strong, industry-standard protocol. This combination meets the requirement for encryption at rest and in transit with minimal administrative overhead, as TDE is managed by the platform and TLS enforcement is a simple server-level setting.

Exam trap

The trap here is that candidates often confuse Dynamic Data Masking (which only hides data in results) with encryption, or they overcomplicate the solution by choosing Always Encrypted, which requires client-side changes and key management, when the question explicitly asks for minimal administrative overhead.

How to eliminate wrong answers

Option A is wrong because Microsoft Purview Information Protection is a data classification and labeling service, not a native encryption mechanism for Azure SQL Database; it does not encrypt data at rest or in transit within the database engine. Option C is wrong because Dynamic Data Masking only obfuscates data in query results for unauthorized users, it does not encrypt data at rest or in transit. Option D is wrong because Always Encrypted requires client-side encryption and key management, which adds significant administrative overhead and application changes, contradicting the 'minimal administrative overhead' requirement.

Full explanation →

411

MCQmedium

A social media application stores user posts as JSON documents. Each post contains fields like post_id, author, content, and timestamp. The application needs to query posts by author and date range using SQL-like queries. Additionally, the application requires the ability to traverse follower relationships as a graph to suggest new friends. The development team wants to use a single Azure Cosmos DB account to minimize management overhead. Which combination of Azure Cosmos DB APIs should they choose?

A.Use the SQL API only for both workloads.

B.Use the MongoDB API only for both workloads.

C.Use the Gremlin API only for both workloads.

D.Use the SQL API for the posts and the Gremlin API for the follower graph.

AnswerD

Correct. The SQL API handles document queries with SQL, and the Gremlin API handles graph traversals. Both can be used within the same Azure Cosmos DB account.

Why this answer

Option D is correct because Azure Cosmos DB supports multiple API models within a single account, but only one API per account. To handle both SQL-like queries on JSON documents and graph traversal queries, you need separate accounts: one using the SQL API (or MongoDB API) for the document workload and another using the Gremlin API for the graph workload. The question states 'a single Azure Cosmos DB account,' which is a constraint; however, the correct answer acknowledges that you cannot mix APIs in one account, so the only way to meet both requirements is to use two accounts—one for each API.

The answer D correctly pairs the SQL API for posts and the Gremlin API for the follower graph, implying two accounts.

Exam trap

The trap here is that candidates assume a single Azure Cosmos DB account can support multiple APIs simultaneously, but in reality each account is locked to one API at creation time, so you must use separate accounts for document and graph workloads.

How to eliminate wrong answers

Option A is wrong because the SQL API cannot perform graph traversal queries (e.g., follower relationships) natively; it lacks Gremlin's graph traversal capabilities. Option B is wrong because the MongoDB API also lacks native graph traversal support and is not designed for graph workloads. Option C is wrong because the Gremlin API is not optimized for SQL-like queries on JSON documents; it uses Gremlin graph traversal language, not SQL, and does not support document querying with filters like author and date range efficiently.

Full explanation →

412

MCQeasy

A company stores employee records in a database. Each employee record contains an EmployeeID (unique), Name, Department, and HireDate. The EmployeeID is used to uniquely identify each employee. Which data concept does the EmployeeID represent?

A.Index

B.Foreign key

C.Primary key

D.Unique constraint

AnswerC

The primary key uniquely identifies each row and is a fundamental concept in relational databases.

Why this answer

The EmployeeID is used to uniquely identify each employee record, which is the defining characteristic of a primary key. In relational databases, a primary key enforces entity integrity by ensuring each row has a unique, non-null identifier. This aligns with the core data concept of a primary key as the unique identifier for a table.

Exam trap

The trap here is that candidates often confuse a unique constraint with a primary key because both enforce uniqueness, but the primary key uniquely identifies the row and cannot contain NULLs, while a unique constraint is a secondary uniqueness enforcement that can allow a single NULL value.

How to eliminate wrong answers

Option A is wrong because an index is a performance optimization structure that speeds up data retrieval, not a constraint that uniquely identifies rows. Option B is wrong because a foreign key is a column that references a primary key in another table to establish a relationship, not a unique identifier within its own table. Option D is wrong because a unique constraint ensures all values in a column are distinct but does not inherently designate the column as the table's primary identifier; a table can have multiple unique constraints but only one primary key.

Full explanation →

413

MCQhard

A data analyst needs to run ad-hoc SQL queries on large volumes of data stored as Parquet files in Azure Data Lake Storage Gen2. The queries are unpredictable, and the analyst wants to pay only for the compute resources consumed by each query. Which Azure Synapse Analytics compute model should be used?

A.Serverless SQL pool

B.Dedicated SQL pool

C.Apache Spark pool

D.Azure Data Explorer pool

AnswerA

Correct. Serverless SQL pool is an on-demand compute that charges per TB of data processed, making it ideal for unpredictable ad-hoc queries without provisioning or idle costs.

Why this answer

Serverless SQL pool is the correct choice because it allows running ad-hoc SQL queries directly on data in Azure Data Lake Storage Gen2 without provisioning any fixed compute resources. It uses a pay-per-query billing model, charging only for the amount of data processed by each query, which aligns perfectly with the unpredictable query patterns described.

Exam trap

The trap here is that candidates often confuse 'serverless' with 'dedicated' SQL pools, assuming that any SQL query requires a provisioned warehouse, when in fact Serverless SQL pool is purpose-built for ad-hoc, pay-per-query scenarios on data lakes.

How to eliminate wrong answers

Option B (Dedicated SQL pool) is wrong because it requires provisioning a fixed set of compute resources (DWUs) that are billed per hour regardless of usage, making it unsuitable for unpredictable, ad-hoc workloads where you want to pay only per query. Option C (Apache Spark pool) is wrong because it is designed for big data processing using Spark (Scala, Python, .NET) and not for running ad-hoc SQL queries directly on Parquet files; it also requires a running cluster that incurs costs even when idle. Option D (Azure Data Explorer pool) is wrong because it is optimized for interactive analytics on time-series and log data using Kusto Query Language (KQL), not for standard SQL queries on Parquet files in Data Lake Storage.

Full explanation →

414

MCQhard

Refer to the exhibit. You are reviewing an ARM template for an Azure SQL Database deployment. The database must support a read-only workload that requires low latency. The current configuration uses General Purpose tier with 4 vCores. What is the most significant performance improvement you can make without changing the tier?

A.Increase maxSizeBytes to 1 TB

B.Set the edition to 'Serverless'

C.Enable read scale-out by adding 'readScale' property

D.Change requestedBackupStorageRedundancy to 'Local'

AnswerC

Read scale-out allows read-only queries to be routed to a secondary replica, improving performance for read workloads.

Why this answer

Enabling read scale-out by adding the 'readScale' property allows the database to use a read-only replica, offloading read workloads from the primary and providing low-latency reads. This is the most significant performance improvement within the General Purpose tier because it directly addresses the read-only workload requirement without changing the tier or incurring additional compute costs.

Exam trap

The trap here is that candidates often confuse scaling storage (maxSizeBytes) or changing backup redundancy with performance improvements, but the question specifically targets read latency for a read-only workload, which is directly addressed by read scale-out rather than storage or backup changes.

How to eliminate wrong answers

Option A is wrong because increasing maxSizeBytes to 1 TB only expands storage capacity, which does not improve read performance or latency for a read-only workload. Option B is wrong because setting the edition to 'Serverless' changes the tier (from provisioned to serverless compute), which violates the constraint of not changing the tier, and serverless is designed for intermittent workloads, not low-latency read performance. Option D is wrong because changing requestedBackupStorageRedundancy to 'Local' affects backup storage redundancy (e.g., LRS vs.

GRS), not query performance or read latency.

Full explanation →

415

MCQhard

Refer to the exhibit. You are storing product data in Azure Cosmos DB using the SQL API. The JSON shows a sample document. You need to query for all products in the 'Electronics' category with a price less than 200. Which query should you use?

A.SELECT * FROM c WHERE c.category = 'Electronics' OR c.price < 200

B.SELECT * FROM c WHERE c.category = "Electronics" AND c.price < 200

C.SELECT * FROM p WHERE p.category = 'Electronics' AND p.price < 200

D.SELECT * FROM c WHERE c.category = 'Electronics' AND c.price < 200

AnswerD

This is the correct SQL API syntax.

Why this answer

Option A is correct because the SQL API uses SELECT * FROM c WHERE c.category = 'Electronics' AND c.price < 200. Option B is wrong because using double quotes is invalid. Option C is wrong because the FROM clause is incorrect.

Option D is wrong because the WHERE clause uses OR instead of AND.

Full explanation →

416

MCQeasy

A company uses Azure Table Storage to store customer session data. Each session has a PartitionKey (CustomerId) and RowKey (SessionId). They need to retrieve all sessions for a specific customer quickly. Which query design will be most efficient?

A.Select all entities and filter client-side.

B.Create a secondary index on RowKey.

C.Query with RowKey equal to a specific SessionId.

D.Query with PartitionKey equal to the specific CustomerId.

AnswerD

Table Storage uses PartitionKey for fast partition-level queries.

Why this answer

Using PartitionKey filter (option A) allows Table Storage to directly access the partition, making it efficient. Option B (filter by RowKey) requires scanning all partitions. Option C (selecting all entities) scans the entire table.

Option D (using a secondary index) is not available in standard Table Storage.

Full explanation →

417

MCQhard

A SaaS company manages hundreds of customer databases, each representing a tenant. Each tenant database has its own predictable usage pattern, but the aggregate workload across all tenants is variable. The company wants to optimize costs by pooling compute resources across tenants while still ensuring that each tenant benefits from resource isolation under normal loads. Which Azure SQL Database deployment model should they choose?

A.Single database

B.Elastic pool

C.Managed Instance

D.SQL Server on Azure Virtual Machine

AnswerB

Elastic pools allow multiple databases to share a pool of resources, providing cost savings for multi-tenant SaaS applications while maintaining predictable performance per database.

Why this answer

Elastic pools are designed for SaaS multi-tenant scenarios where each tenant has a predictable, low average usage but the aggregate workload across tenants is variable. They allow pooling of compute resources (eDTUs or vCores) across multiple databases, providing resource isolation under normal loads via per-database min/max resource limits, while optimizing cost by sharing unused capacity among tenants.

Exam trap

The trap here is that candidates often confuse 'resource isolation' with 'dedicated resources' and choose Single Database, failing to recognize that Elastic Pools provide isolation via per-database resource limits while still pooling compute for cost efficiency.

How to eliminate wrong answers

Option A is wrong because a single database model allocates dedicated compute resources to each database, which would be cost-inefficient for hundreds of small, predictable tenant databases as it does not allow resource pooling. Option C is wrong because Managed Instance is a fully managed instance of SQL Server with fixed resources, designed for lift-and-shift migrations, not for pooling compute across many small tenant databases. Option D is wrong because SQL Server on Azure Virtual Machine requires manual management of the VM and SQL Server, and does not provide built-in resource pooling or isolation for multi-tenant databases.

Full explanation →

418

Multi-Selectmedium

Which TWO Azure services can be used to perform interactive data analytics on large datasets without managing infrastructure? (Choose two.)

Select 2 answers

A.Azure Synapse Analytics Serverless SQL pool

B.Azure SQL Database

C.Azure Databricks

D.Azure Data Factory

E.Azure Data Lake Storage Gen2

AnswersA, C

Serverless SQL pool is a PaaS analytics service.

Why this answer

Azure Synapse Analytics Serverless SQL pool is correct because it enables interactive analytics on large datasets stored in Azure Data Lake Storage or other sources using T-SQL queries, without requiring any infrastructure management. It automatically scales compute resources based on query demand, allowing analysts to run ad-hoc queries on petabyte-scale data without provisioning or managing servers.

Exam trap

The trap here is that candidates confuse Azure Data Lake Storage Gen2 as an analytics service rather than a storage service, or mistake Azure Data Factory's orchestration capabilities for interactive querying, leading them to select options that do not provide direct interactive analytics.

Full explanation →

419

MCQmedium

Refer to the exhibit. An analyst runs this Kusto Query Language (KQL) query in Azure Data Explorer. What is the primary purpose of this query?

A.Find the top 5 most common event types in Texas

B.Calculate total damage in Texas

C.Identify events with the highest damage

D.List all storm events in Texas

AnswerA

Correctly identifies the purpose.

Why this answer

The query uses the `summarize` operator with `count()` to count events per `EventType`, then `top 5 by count_` to return the five event types with the highest counts, filtered to only rows where `State == 'TEXAS'`. This directly finds the top 5 most common event types in Texas.

Exam trap

Microsoft often tests the distinction between counting occurrences (using `count()` with `summarize`) versus summing numeric values (using `sum()`), leading candidates to confuse 'most common' with 'highest damage'.

How to eliminate wrong answers

Option B is wrong because the query does not include any aggregation of damage amounts (e.g., `sum(Damage)` or `avg(Damage)`), so it cannot calculate total damage. Option C is wrong because the query counts events by type, not by damage amount; to identify events with the highest damage, you would need to sort or top by a damage column, not by `count_`. Option D is wrong because the query does not list individual storm events; it aggregates events into groups by `EventType` and returns only the top 5 counts, not a list of all events.

Full explanation →

420

MCQmedium

A company has an Azure SQL Database and needs to run a weekly data aggregation job that takes several hours. They want to minimize cost and avoid impacting production workload. Which approach should they use?

A.Migrate the database to the Hyperscale service tier

B.Use Azure Elastic Jobs to run the aggregation during off-peak hours

C.Increase the DTU or vCore size of the database to handle the load

D.Create a read-only replica and run the aggregation on the replica

AnswerD

Read-only replica offloads read workloads without affecting the primary.

Why this answer

Creating a read-only replica allows the weekly aggregation job to run against a separate copy of the database without affecting the production workload. Since the replica is read-only, it incurs additional compute costs only during the aggregation window, and you can scale it down or stop it when not in use, minimizing overall cost.

Exam trap

The trap here is that candidates may confuse Azure Elastic Jobs as a workload isolation tool, when in fact it only schedules jobs on the same database and does not provide a separate compute resource.

How to eliminate wrong answers

Option A is wrong because migrating to the Hyperscale service tier is designed for large databases and high throughput, not for cost-effective batch processing; it increases cost and complexity without addressing the need to avoid impacting production. Option B is wrong because Azure Elastic Jobs is a scheduling service for running T-SQL scripts across multiple databases, but it does not isolate the workload from the production database; the aggregation would still run on the same primary database, impacting performance. Option C is wrong because increasing DTU or vCore size on the primary database would temporarily improve performance but would significantly increase cost and still risk impacting production workload during the aggregation run.

Full explanation →

421

MCQhard

A manufacturing company connects thousands of IoT sensors on an assembly line, each sending telemetry data every second. The data volume is terabyte-scale per day. The company needs to analyze the sensor data in near real-time to detect anomalies (e.g., temperature spikes) and also allow data scientists to run interactive ad-hoc queries on the historical data to find patterns. They prefer using a query language similar to SQL. Which Azure service should they choose?

A.Azure Stream Analytics

B.Azure Data Explorer

C.Azure Synapse Analytics dedicated SQL pool

D.Azure Databricks with Structured Streaming

AnswerB

Azure Data Explorer is optimized for high-velocity time-series data, supports near real-time anomaly detection, and enables fast interactive queries on both streaming and historical data using a SQL-like language (KQL).

Why this answer

Azure Data Explorer (ADX) is designed for high-velocity telemetry data, ingesting terabytes per day from IoT sensors with sub-second latency. It supports Kusto Query Language (KQL), which is SQL-like and optimized for time-series analysis, anomaly detection, and interactive ad-hoc queries on both real-time and historical data. This makes it the ideal choice for the described scenario.

Exam trap

The trap here is that candidates often choose Azure Stream Analytics because it handles real-time streaming and uses SQL-like syntax, but they overlook the requirement for interactive ad-hoc queries on historical data, which Stream Analytics cannot efficiently support.

How to eliminate wrong answers

Option A is wrong because Azure Stream Analytics is a real-time stream processing service that uses a SQL-like language but is not optimized for interactive ad-hoc queries on terabyte-scale historical data; it lacks the columnar storage and indexing for fast ad-hoc exploration. Option C is wrong because Azure Synapse Analytics dedicated SQL pool is a massively parallel processing (MPP) data warehouse designed for large-scale batch analytics and data warehousing, not for near real-time ingestion and query of high-velocity telemetry data with sub-second latency. Option D is wrong because Azure Databricks with Structured Streaming is a big data analytics platform using Apache Spark, which requires more complex setup and is not natively optimized for interactive ad-hoc SQL queries on time-series data at the same performance level as ADX; it also introduces overhead for simple anomaly detection tasks.

Full explanation →

422

MCQeasy

A company runs an e-commerce application on Azure SQL Database. During seasonal promotions, traffic spikes significantly, but at other times traffic is low. They want to automatically adjust compute resources based on demand without manual intervention or provisioning. Which Azure SQL Database feature should they use?

A.Geo-replication

B.Elastic pools

C.Serverless compute

D.Hyperscale

AnswerC

Serverless compute automatically scales compute based on demand and pauses when idle, fitting the described requirement.

Why this answer

Serverless compute for Azure SQL Database automatically scales compute resources based on workload demand and pauses the database during inactive periods, charging only for storage and compute used per second. This matches the requirement for automatic adjustment without manual intervention or provisioning, especially for intermittent, unpredictable traffic spikes like seasonal promotions.

Exam trap

The trap here is that candidates confuse Elastic pools (which scale shared resources across multiple databases) with the single-database auto-scaling behavior of Serverless compute, or they assume Hyperscale's high scalability automatically includes dynamic compute scaling without manual intervention.

How to eliminate wrong answers

Option A is wrong because Geo-replication is a disaster recovery and business continuity feature that creates readable replicas in different Azure regions, not an auto-scaling mechanism for compute resources. Option B is wrong because Elastic pools are designed for managing and scaling multiple databases with shared resources in a predictable pattern, not for automatically adjusting compute of a single database based on demand spikes. Option D is wrong because Hyperscale is a service tier for very large databases (up to 100 TB) with fast scaling of storage and compute, but it requires manual scaling of compute replicas and does not provide the automatic pause/resume or per-second billing of serverless compute.

Full explanation →

423

Multi-Selecteasy

Which TWO Azure data services are classified as NoSQL databases? (Choose two.)

Select 2 answers

A.Azure SQL Managed Instance

B.Azure Cosmos DB

C.Azure Table Storage

D.Azure Database for PostgreSQL

E.Azure SQL Database

AnswersB, C

NoSQL database.

Why this answer

Azure Cosmos DB is a fully managed NoSQL database service that supports multiple data models, including document, key-value, graph, and column-family, via APIs like SQL, MongoDB, Cassandra, Gremlin, and Table. It is explicitly designed as a NoSQL database with schema-agnostic, horizontally scalable storage.

Exam trap

The trap here is that candidates often confuse Azure Table Storage (a NoSQL key-value store) with Azure SQL Database or Managed Instance, assuming 'Table' implies a relational table, but it is actually a NoSQL service.

Full explanation →

424

MCQhard

Your data engineering team is designing a data pipeline that ingests data from multiple sources into Azure Data Lake Storage Gen2. The data must be cataloged in Azure Purview for discoverability. Which approach ensures that the data lineage is automatically captured?

A.Use Azure Data Factory to copy data and manually register the datasets in Purview.

B.Use Azure Data Factory with Purview integration enabled to copy data.

C.Use Azure Databricks to write data and call Purview's Atlas API to update lineage.

D.Schedule Purview scans on the data lake after data ingestion.

AnswerB

Purview integration in ADF automatically captures lineage for copy activities.

Why this answer

Option B is correct because Azure Data Factory's native Purview integration automatically captures lineage metadata during data copy activities. When enabled, Data Factory pushes runtime lineage information (source, sink, transformation steps) directly to Purview without manual intervention, ensuring complete and accurate data provenance.

Exam trap

The trap here is that candidates often confuse data cataloging (scanning) with lineage capture, assuming that scanning the data lake after ingestion (Option D) will automatically show how data got there, but scanning only reveals schema and classification, not the data flow path.

How to eliminate wrong answers

Option A is wrong because manually registering datasets in Purview after copying data does not capture lineage automatically; it only adds static metadata without the runtime execution details that show data flow. Option C is wrong because while Azure Databricks can call Purview's Atlas API, this requires custom code and does not provide the automatic, out-of-the-box lineage capture that Data Factory's integration offers. Option D is wrong because scheduling Purview scans on the data lake after ingestion only catalogs the data at rest and captures schema/classification metadata, but it does not capture the lineage of how data moved from source to destination.

Full explanation →

425

MCQmedium

A social media company stores user-generated posts as JSON documents. Each post contains fields such as postId, userId, timestamp, and content. The application needs to query posts by userId and timestamp ranges with low latency, and also perform SQL-like queries across all posts. The data volume is growing rapidly and must scale globally. Which Azure data store should the company use?

A.A) Azure Table Storage

B.B) Azure Cosmos DB SQL API

C.C) Azure Blob Storage

D.D) Azure Cache for Redis

AnswerB

Correct. The Cosmos DB SQL API natively stores JSON documents, supports indexing on any field, and allows rich SQL-like queries. It offers global distribution, low latency, and scalable throughput, making it ideal for this scenario.

Why this answer

Azure Cosmos DB SQL API is the correct choice because it provides native support for querying JSON documents with low-latency, including indexed queries on fields like userId and timestamp. Its global distribution capability ensures data can be replicated across multiple Azure regions for low-latency access worldwide, while its SQL API allows SQL-like queries across all posts, meeting both requirements.

Exam trap

The trap here is that candidates often confuse Azure Table Storage's key-value model with document storage, mistakenly thinking its OData queries can handle complex JSON queries, but Table Storage cannot query nested JSON fields or perform SQL-like operations across all posts.

How to eliminate wrong answers

Option A is wrong because Azure Table Storage is a key-value store that does not support native JSON document queries or SQL-like querying; it only supports OData-based queries on partition and row keys, not flexible field-level queries on nested JSON. Option C is wrong because Azure Blob Storage is an object store for unstructured binary or text data, not designed for low-latency queries on individual JSON fields or SQL-like querying; it would require loading entire blobs and parsing them client-side. Option D is wrong because Azure Cache for Redis is an in-memory cache, not a durable data store; it lacks persistent storage and SQL-like query capabilities, and is intended for caching frequently accessed data, not for primary storage of growing datasets.

Full explanation →

426

MCQeasy

A retail company collects data from online transactions including order ID, customer details, product IDs, quantities, and timestamps. The data is stored in a relational database and used for order processing and inventory management. Which characteristic of this data makes it structured?

A.It is stored in rows and columns with a predefined schema.

B.It is stored as key-value pairs.

C.It is stored in JSON format with variable fields.

D.It is stored in unstructured text files.

AnswerA

Structured data is characterized by a rigid schema and tabular format, enabling relational database features like ACID transactions.

Why this answer

Option A is correct because structured data is defined by a fixed schema where each entity (e.g., orders) is stored in rows and columns with predefined data types (e.g., INT for order ID, VARCHAR for customer details). This relational model enforces consistency and enables efficient querying via SQL for order processing and inventory management.

Exam trap

The trap here is that candidates confuse 'structured' with any organized storage format (like JSON or key-value pairs), but the DP-900 exam specifically defines structured data as having a fixed schema with rows and columns in a relational database.

How to eliminate wrong answers

Option B is wrong because key-value pairs (e.g., in Redis or DynamoDB) are a NoSQL model that does not enforce a fixed schema or relational integrity, unlike the structured data described. Option C is wrong because JSON with variable fields is semi-structured data; it allows flexible schemas and nested structures, not the rigid rows-and-columns format of a relational database. Option D is wrong because unstructured text files (e.g., .txt or .log files) lack any predefined schema or organization, making them unsuitable for direct SQL-based order processing and inventory management.

Full explanation →

427

MCQmedium

You need to design a data storage solution for an e-commerce platform that requires ACID transactions for order processing and high availability across regions. Which Azure service meets these requirements?

A.Azure Database for MySQL with read replicas

B.Azure Synapse Analytics

C.Azure SQL Database with active geo-replication

D.Azure Cosmos DB with multiple write regions

AnswerC

Active geo-replication provides readable secondaries in other regions for HA.

Why this answer

Azure SQL Database with active geo-region replication supports ACID transactions natively and provides automatic failover to a secondary region, ensuring high availability across regions. This meets the e-commerce platform's need for transactional consistency and regional resilience.

Exam trap

The trap here is that candidates often confuse 'high availability' with 'multi-region writes' and choose Cosmos DB, overlooking that ACID transactions require a relational database with strict consistency guarantees, not just eventual consistency or single-document atomicity.

How to eliminate wrong answers

Option A is wrong because Azure Database for MySQL with read replicas supports ACID transactions but read replicas are read-only and do not provide automatic failover for write workloads, thus failing high availability for order processing writes. Option B is wrong because Azure Synapse Analytics is a big data analytics service optimized for large-scale data warehousing and analytics, not for OLTP workloads requiring ACID transactions. Option D is wrong because Azure Cosmos DB with multiple write regions provides multi-region writes and high availability but does not support full ACID transactions across multiple documents; it offers single-document atomicity and eventual consistency by default, not the strict ACID guarantees needed for order processing.

Full explanation →

428

MCQmedium

A company has a data warehouse in Azure Synapse Analytics dedicated SQL pool. They need to load new sales data every night from a CSV file stored in Azure Data Lake Storage Gen2. The load process must be automated, scheduled, and have error handling for failed loads. Which Azure service should they use to orchestrate this process?

A.Azure Data Factory

B.Azure Stream Analytics

C.Azure Databricks

D.Azure Logic Apps

AnswerA

Azure Data Factory provides pipeline orchestration with scheduling, triggers, and error handling. It can copy data from ADLS Gen2 to Synapse dedicated SQL pool using the 'Copy Data' activity or stored procedure activities.

Why this answer

Azure Data Factory is the correct choice because it is a cloud-based ETL service designed specifically for orchestrating and automating data movement and transformation at scale. It supports scheduled triggers, native connectors to Azure Data Lake Storage Gen2 and Azure Synapse Analytics, and built-in error handling via retry policies and failure activities, making it ideal for nightly CSV file loads.

Exam trap

The trap here is that candidates may confuse Azure Data Factory with Azure Logic Apps because both can schedule and automate tasks, but Logic Apps lacks native data warehouse connectors and high-throughput data movement capabilities required for enterprise ETL workloads.

How to eliminate wrong answers

Option B (Azure Stream Analytics) is wrong because it is a real-time stream processing service for analyzing data in motion, not a batch orchestration tool for scheduled file loads. Option C (Azure Databricks) is wrong because it is an Apache Spark-based analytics platform focused on big data processing and machine learning, not a native orchestration service for scheduled data movement with built-in error handling. Option D (Azure Logic Apps) is wrong because it is a low-code workflow automation service primarily for integrating SaaS applications and APIs, not designed for high-throughput data warehouse loading with enterprise-grade error handling and scheduling.

Full explanation →

429

MCQmedium

Refer to the exhibit. You are designing an Azure Table storage schema for user settings. The JSON shows a sample entity. Which query will retrieve all settings for user123 efficiently?

A.Filter by RowKey eq 'settings'

B.Filter by PartitionKey eq 'user123' and RowKey eq 'settings'

C.Filter by PartitionKey eq 'user123'

D.Filter by PartitionKey eq 'user123' or RowKey eq 'settings'

AnswerB

This is a point query, the most efficient.

Why this answer

Option B is correct because using PartitionKey and RowKey together is a point query, which is the most efficient in Table storage. Option A is wrong because filtering only by PartitionKey scans the entire partition. Option C is wrong because filtering by RowKey without PartitionKey performs a full table scan.

Option D is wrong because filtering by both properties with OR is inefficient.

Full explanation →

430

Multi-Selecthard

Which THREE components are part of a typical modern data warehouse architecture on Azure?

Select 3 answers

A.Azure Data Factory

B.Azure Cosmos DB

C.Azure Analysis Services

D.Azure Synapse Analytics

E.Azure Data Lake Storage Gen2

AnswersA, D, E

Data Factory provides ETL/ELT capabilities to move and transform data.

Why this answer

Azure Data Factory is a cloud-based ETL and data integration service that orchestrates and automates the movement and transformation of data. In a modern data warehouse architecture, it is used to ingest data from various sources, perform transformations, and load the data into the data warehouse or data lake. This makes it a core component for the 'ingest' and 'prepare' stages of the architecture.

Exam trap

The trap here is that candidates often confuse Azure Cosmos DB (a transactional NoSQL database) with an analytical store, or mistakenly think Azure Analysis Services is a required part of the data warehouse architecture when it is actually an optional semantic layer.

Full explanation →

431

MCQmedium

A company uses Azure SQL Database for a sales application. They need to replicate the database to a secondary region for disaster recovery. The secondary should be readable for reporting purposes and data should be synchronized within seconds. Which feature should they use?

A.Active Geo-Replication

B.Auto-failover groups

C.Point-in-time restore

D.Long-term retention

AnswerA

Active Geo-Replication provides a readable secondary replica that stays synchronized within seconds, enabling both DR and read-scale out.

Why this answer

Active Geo-Replication is the correct choice because it creates a readable secondary replica in a different Azure region, with data synchronized within seconds via asynchronous replication. This meets the requirement for both disaster recovery and read-only reporting access, as the secondary can be queried directly without impacting the primary database.

Exam trap

The trap here is that candidates often confuse Auto-failover groups with Active Geo-Replication, assuming the group feature provides faster synchronization, when in fact both use the same asynchronous replication and the key difference is that Auto-failover groups add automatic failover and endpoint management, not lower latency.

How to eliminate wrong answers

Option B (Auto-failover groups) is wrong because while it supports readable secondaries and automatic failover, it is designed for group-level failover of multiple databases and does not guarantee sub-second synchronization; it uses the same underlying geo-replication but adds orchestration, not faster sync. Option C (Point-in-time restore) is wrong because it restores a database to a past state from backups, not a continuously synchronized readable secondary for disaster recovery. Option D (Long-term retention) is wrong because it preserves backups for years for compliance, not for real-time replication or readable secondaries.

Full explanation →

432

MCQmedium

A company stores customer data in an Azure SQL Database. To comply with data residency requirements, they need to ensure that all customer data remains within a specific Azure region. Which feature should they use?

A.Use Azure Policy to restrict resource creation to allowed regions

B.Enable geo-replication

C.Configure dynamic data masking

D.Enable transparent data encryption (TDE)

AnswerA

Azure Policy can enforce that SQL Database and its replicas are only created in the required region.

Why this answer

Option B is correct because Azure SQL Database geo-replication allows configuring a secondary replica in a different region, but the question asks to keep data within a specific region; however, the scenario is about ensuring data does not leave the region. Actually, to keep data within a region, you should not use geo-replication. But the correct answer is to use a single-region deployment; but among options, 'Geo-replication' would allow cross-region, so not that. 'Failover groups' also. 'Azure Policy' can enforce resource location, but not data. 'Data masking' is for security.

Wait, re-evaluating: The best answer is 'Azure Policy' to restrict resource creation to a region, but data still could be replicated? Actually, to ensure data remains within a region, you can use 'Geo-replication' to replicate to same region? No. Let's think: The requirement is to keep data within a specific region. Geo-replication is used for disaster recovery across regions, so that would violate.

The correct approach is to use Azure Policy to enforce that resources are only created in allowed regions, and also to disable geo-replication. Among options, 'Azure Policy' is the most direct. However, the stem implies a feature of SQL Database itself.

Actually, Azure SQL Database allows configuring a 'geo-replication' secondary in the same region? No, geo means different region. So 'Failover groups' also cross-region. 'Data masking' is irrelevant. 'Transparent Data Encryption' (TDE) is for encryption at rest, not residency. So 'Azure Policy' is the correct answer.

Full explanation →

433

MCQmedium

A company uses Azure SQL Database for an order management system. The Orders table has columns: OrderID (int, primary key), CustomerID (int), OrderDate (datetime), Status (varchar), and TotalAmount (decimal). The most frequent queries retrieve orders for a specific CustomerID within a date range and order the results by OrderDate. Additionally, single order lookups by OrderID must remain fast. Which indexing strategy best satisfies both requirements?

A.A. Create a clustered index on (CustomerID, OrderDate) and a non-clustered index on OrderID.

B.B. Create a clustered index on OrderID and a non-clustered index on (CustomerID, OrderDate).

C.C. Create a non-clustered index on each column used in WHERE clauses: OrderID, CustomerID, and OrderDate.

D.D. Create a clustered columnstore index on the entire table and no other indexes.

AnswerB

Correct. The clustered index on OrderID optimizes single row lookups. The non-clustered index on (CustomerID, OrderDate) supports range queries filtering on those columns. This combination provides optimal performance for both query patterns.

Why this answer

Option B is correct because a clustered index on OrderID ensures fast single-row lookups by the primary key, while a non-clustered index on (CustomerID, OrderDate) efficiently covers the range query filtering by CustomerID and ordering by OrderDate. This design leverages the clustered index for point lookups and the non-clustered index as a covering index for the most frequent query pattern.

Exam trap

The trap here is that candidates often assume the clustered index must match the most frequent query pattern, but they overlook that a clustered index on a non-unique column like CustomerID can cause fragmentation and slower point lookups, whereas the correct design separates the point-lookup and range-query concerns into two distinct indexes.

How to eliminate wrong answers

Option A is wrong because creating a clustered index on (CustomerID, OrderDate) would make single-order lookups by OrderID slower, as the clustered index determines the physical order and a non-clustered index on OrderID would require a key lookup into the clustered index, adding overhead. Option C is wrong because creating separate non-clustered indexes on each column does not provide a covering index for the range query; SQL Server would need to combine indexes via index intersection or perform key lookups, which is less efficient than a composite index on (CustomerID, OrderDate). Option D is wrong because a clustered columnstore index is optimized for large analytical scans and aggregations, not for point lookups or small range queries; it would degrade performance for the transactional workloads described.

Full explanation →

434

MCQeasy

An organization wants to build a real-time dashboard that visualizes IoT sensor data as it arrives. Which Azure service should they use for processing the streaming data?

A.Azure Analysis Services

B.Azure Data Factory

C.Azure Databricks

D.Azure Stream Analytics

AnswerD

Stream Analytics is purpose-built for real-time stream processing and integration with Power BI.

Why this answer

Azure Stream Analytics is a real-time analytics service designed to process streaming data from sources like IoT devices. It can ingest data from Azure Event Hubs or IoT Hub, apply SQL-based queries to detect patterns or anomalies, and output results to a dashboard or storage with sub-second latency, making it ideal for real-time IoT dashboards.

Exam trap

Microsoft often tests the distinction between batch processing (Data Factory) and real-time stream processing (Stream Analytics), and candidates mistakenly choose Azure Databricks because they associate it with 'big data' without recognizing Stream Analytics as the simpler, purpose-built service for streaming IoT dashboards.

How to eliminate wrong answers

Option A is wrong because Azure Analysis Services is an OLAP engine for semantic modeling and reporting on historical data, not for real-time stream processing. Option B is wrong because Azure Data Factory is a cloud-based ETL and data orchestration service for batch data movement and transformation, not designed for low-latency streaming. Option C is wrong because Azure Databricks is a big data analytics platform that can handle streaming via Structured Streaming, but it is overkill for simple real-time dashboards and requires more complex setup compared to the purpose-built Stream Analytics service.

Full explanation →

435

MCQhard

A company uses Azure SQL Database for a financial system. The Transactions table contains millions of rows with a TransactionDate column. Queries frequently aggregate sales totals for the current month, but historical data must be retained for 7 years. Currently, queries scan the entire table, causing performance issues. The company also wants to simplify archiving of old data. Which design should they implement?

A.Create a non-clustered index on the TransactionDate column.

B.Implement table partitioning by month on TransactionDate.

C.Create a materialized view for the current month's data.

D.Convert the table to use a clustered columnstore index.

AnswerB

Partitioning enables partition elimination for queries filtering on TransactionDate, reducing scan size. Old partitions can be switched out for easy archiving without impacting the live table.

Why this answer

Table partitioning by month on TransactionDate allows Azure SQL Database to efficiently manage and query large tables by splitting data into manageable segments. Queries that filter on TransactionDate for the current month will only scan the relevant partition(s), eliminating full table scans. Additionally, partitioning simplifies archiving by enabling swift partition switching to move old data to archive tables without complex ETL processes.

Exam trap

The trap here is that candidates often choose a non-clustered index (Option A) thinking it will speed up range queries, but they overlook that partitioning is specifically designed for both performance on large tables and simplified data lifecycle management, which the question explicitly requires.

How to eliminate wrong answers

Option A is wrong because a non-clustered index on TransactionDate would still require key lookups for non-indexed columns and does not eliminate scanning all partitions of historical data; it also does not simplify archiving. Option B is wrong because a materialized view for the current month's data would require manual maintenance and does not address the need to retain and efficiently query 7 years of historical data; it also does not simplify archiving of old data. Option D is wrong because a clustered columnstore index is optimized for analytical workloads on large tables but does not inherently partition data by month, so queries for the current month would still scan all column segments, and it does not provide a built-in mechanism for archiving old data.

Full explanation →

436

MCQhard

A multinational corporation is deploying a global application using Azure SQL Database. They need to ensure that users in different geographic regions experience low latency reads. The application can tolerate slightly stale data for reads, but writes must be strongly consistent and must occur in a single primary region. Which feature should they implement?

A.Azure Cosmos DB with multi-master

B.Active geo-replication

C.Failover groups

D.Read scale-out

AnswerB

Active geo-replication provides readable secondaries in different regions for low-latency reads.

Why this answer

Option C is correct because active geo-replication allows creating readable secondaries in other regions for low-latency reads, while writes go to the primary. Option A is wrong because failover groups provide automatic failover but not necessarily low-latency reads from multiple regions. Option B is wrong because Cosmos DB with multi-master is NoSQL and not relational.

Option D is wrong because read scale-out uses local replicas, not global.

Full explanation →

437

MCQmedium

A social networking application uses Azure Cosmos DB to store user posts. When a user publishes a new post, they immediately refresh their feed and expect to see their own post. However, the application can tolerate temporary staleness for posts from other users (e.g., a few seconds delay). Which Azure Cosmos DB consistency level should the application use for read operations that display the feed?

A.Strong

B.Bounded staleness

C.Session

D.Eventual

AnswerC

Session consistency uses a session token to ensure that within the same client session, reads reflect the writes made by that client. This satisfies the requirement that the user sees their own post immediately, while other reads may see slightly stale data.

Why this answer

Session consistency is the correct choice because it guarantees that the user who writes a post will read their own write within the same session, while allowing other users to see slightly stale data. This matches the requirement: the author immediately sees their new post, but the application can tolerate a few seconds of staleness for other users' posts. Session consistency uses a session token to ensure monotonic reads and writes for the same client, making it ideal for per-user feed scenarios.

Exam trap

The trap here is that candidates confuse 'session' with 'eventual' because both allow staleness, but session guarantees per-user write-read consistency, which eventual does not, and they overlook that bounded staleness applies globally, not per-user.

How to eliminate wrong answers

Option A is wrong because Strong consistency would force all replicas to agree on the latest write before any read, causing high latency and reduced availability, which is unnecessary when only the author needs immediate consistency. Option B is wrong because Bounded staleness allows a configurable lag (time or operations) but applies globally to all reads, not per-user, so it would either be too strict for other users or too loose for the author's own post. Option D is wrong because Eventual consistency provides no guarantee that the author's own post will be immediately visible after a write, which violates the requirement that the author sees their post upon refresh.

Full explanation →

438

MCQeasy

A startup is building a mobile app that allows users to share short text updates. Each update includes a user ID, timestamp, and message text. The development team expects rapid growth and needs a storage solution that can scale horizontally, handle high write throughput, and provide low-latency reads globally. Which Azure data service is most appropriate?

A.Azure SQL Database with a single database.

B.Azure Cosmos DB with a multi-master configuration and partition on user ID.

C.Azure Blob Storage with append blobs.

D.Azure Table Storage with user ID as partition key and timestamp as row key.

AnswerB

Cosmos DB provides global distribution, multi-master writes, and single-digit millisecond latency, ideal for high-throughput NoSQL workloads.

Why this answer

Azure Cosmos DB with a multi-master configuration is the most appropriate choice because it provides global distribution with multiple write regions, enabling horizontal scaling and low-latency reads and writes worldwide. Partitioning on user ID ensures even data distribution and efficient query performance for the app's high write throughput requirements.

Exam trap

The trap here is that candidates often confuse Azure Table Storage's horizontal scaling with the global, multi-master capabilities of Cosmos DB, assuming Table Storage can provide low-latency writes worldwide when it lacks native multi-region write support and has higher latency for cross-region scenarios.

How to eliminate wrong answers

Option A is wrong because Azure SQL Database with a single database is a relational database that scales vertically (up to a maximum size and DTU/vCore limit) and cannot natively handle global low-latency reads or multi-region writes without complex sharding or read replicas. Option C is wrong because Azure Blob Storage with append blobs is designed for unstructured data like logs or files, not for low-latency, high-throughput transactional updates with querying by user ID and timestamp. Option D is wrong because Azure Table Storage, while scalable, does not support multi-master writes or global low-latency reads natively; it is a key-value store with limited query capabilities and eventual consistency by default, which may not meet the app's need for low-latency writes globally.

Full explanation →

439

MCQmedium

A social media company stores user posts as JSON documents in Azure Cosmos DB. Each post may have a different number of fields and nested objects. Which type of data model does this represent?

A.Key-value

B.Column-family

C.Document

D.Graph

AnswerC

Correct. Document databases like Azure Cosmos DB store each record as a self-contained document (JSON) with a flexible schema, allowing varying fields and nesting.

Why this answer

The scenario describes user posts stored as JSON documents with varying fields and nested objects. Azure Cosmos DB's Document data model (using the SQL API or MongoDB API) is designed for semi-structured, schema-agnostic data where each document can have a different structure, making it the correct choice.

Exam trap

The trap here is that candidates may confuse the document model with key-value because both handle unstructured data, but key-value stores lack the ability to query on nested fields or perform rich queries like those supported by Cosmos DB's SQL API.

How to eliminate wrong answers

Option A is wrong because a key-value data model stores data as simple key-value pairs without support for nested objects or querying on fields within the value. Option B is wrong because a column-family data model organizes data into rows and column families, requiring a predefined schema for columns, not flexible JSON documents. Option D is wrong because a graph data model is optimized for relationships between entities using nodes and edges, not for storing semi-structured documents with varying fields.

Full explanation →

440

MCQmedium

A banking application processes fund transfers. When a transfer is executed, the system must either successfully debit one account and credit the other, or if any step fails, the entire operation must be rolled back so no partial changes remain. Which ACID property directly enforces this behavior?

A.A) Atomicity

B.B) Consistency

C.C) Isolation

D.D) Durability

AnswerA

Atomicity ensures the entire transaction is completed or fully rolled back, preventing partial updates.

Why this answer

Atomicity ensures that a transaction is treated as a single, indivisible unit of work. In this banking scenario, the debit and credit operations are part of one transaction; if either step fails, the entire transaction is rolled back, leaving no partial changes. This is the core property that enforces the 'all-or-nothing' behavior described.

Exam trap

The trap here is that candidates confuse Consistency with Atomicity, thinking that 'keeping data consistent' means the same as 'all-or-nothing rollback,' but Consistency only enforces rules like constraints and triggers, not the indivisible execution of a multi-step operation.

How to eliminate wrong answers

Option B (Consistency) is wrong because consistency ensures that a transaction brings the database from one valid state to another, enforcing integrity constraints (e.g., account balances must never go negative), but it does not guarantee the all-or-nothing rollback of the entire operation. Option C (Isolation) is wrong because isolation controls how concurrent transactions are executed to prevent interference (e.g., dirty reads), but it does not enforce the atomic rollback of a failed multi-step transfer. Option D (Durability) is wrong because durability guarantees that once a transaction is committed, its changes persist even after a system failure, but it has no role in rolling back a failed transaction.

Full explanation →

441

MCQeasy

A company collects data from multiple sources: IoT sensor streams, social media feeds, and CSV files from legacy systems. They want to store all this data in its original format without any transformation, so that data scientists can later apply machine learning models or run ad-hoc queries. Which data storage pattern best describes this approach?

A.Data warehouse

B.Data lake

C.Relational database

D.Data mart

AnswerB

A data lake stores data in its native format without transformation, supporting diverse data types and ad-hoc exploration by data scientists.

Why this answer

A data lake is designed to store vast amounts of raw data in its native format (structured, semi-structured, or unstructured) without requiring upfront schema or transformation. This aligns perfectly with the scenario of ingesting IoT streams, social media feeds, and CSV files as-is, enabling data scientists to later apply machine learning or run ad-hoc queries directly against the raw data.

Exam trap

The trap here is that candidates often confuse a data lake with a data warehouse, assuming both are for analytics, but the key differentiator is that a data lake stores raw, unprocessed data while a data warehouse requires transformation and schema-on-write.

How to eliminate wrong answers

Option A is wrong because a data warehouse stores data that has been transformed, cleaned, and structured into a schema optimized for analytics and reporting, not raw, unprocessed data. Option C is wrong because a relational database enforces a rigid schema and ACID transactions, making it unsuitable for storing diverse raw formats like IoT streams and social media feeds without transformation. Option D is wrong because a data mart is a subset of a data warehouse focused on a specific business domain, requiring pre-processed and aggregated data, not raw, unaltered source data.

Full explanation →

442

MCQmedium

A company stores customer orders in an Azure SQL Database. They need to ensure that the database can automatically scale to handle peak loads without manual intervention. Which Azure feature should they use?

A.Purchase reserved capacity

B.Add a read replica

C.Enable the serverless compute tier

D.Configure an elastic pool

AnswerC

Serverless compute automatically scales compute and pauses during inactivity.

Why this answer

Option B is correct because Azure SQL Database's serverless compute tier automatically scales compute resources based on workload demand, pausing during inactivity to save costs. Option A is wrong because elastic pools are for managing multiple databases with shared resources, not automatic scaling of a single database. Option C is wrong because read replicas are for read scalability, not compute scaling.

Option D is wrong because reserved capacity is a pricing model, not a scaling feature.

Full explanation →

443

MCQmedium

A financial services company is building a real-time fraud detection system. Transactions are streamed from multiple sources into Azure Event Hubs. The system must run a trained machine learning model (scored in near real-time) to flag suspicious transactions. The model is a Python pickle file that needs to be deployed as a web service with low latency (under 100 ms per prediction). The data engineering team wants to use a serverless compute option to run the scoring logic, and the solution must integrate with Azure Stream Analytics for alerting. Which Azure service should you use to deploy the model?

A.Azure Functions

B.Azure Machine Learning managed online endpoint

C.Azure Kubernetes Service (AKS)

D.Azure Databricks

AnswerB

Managed endpoints are serverless and provide low-latency inference APIs, easily integrated with Azure Stream Analytics.

Why this answer

Azure Machine Learning managed online endpoints are the correct choice because they are designed for deploying trained models (including Python pickle files) as low-latency web services (under 100 ms per prediction) with serverless compute. They natively integrate with Azure Stream Analytics for alerting, allowing real-time scoring of streaming transactions from Event Hubs without managing infrastructure.

Exam trap

The trap here is that candidates often choose Azure Functions because it is serverless and familiar, but they overlook the strict latency requirement (under 100 ms) and the need for native integration with Azure Stream Analytics, which Azure Machine Learning managed online endpoints satisfy directly.

How to eliminate wrong answers

Option A is wrong because Azure Functions, while serverless, has a cold-start latency that often exceeds 100 ms and is not optimized for hosting machine learning models (especially pickle files) with sub-100 ms inference requirements; it also lacks native integration with Azure Stream Analytics for alerting. Option C is wrong because Azure Kubernetes Service (AKS) is not serverless (it requires cluster management and scaling configuration) and introduces additional latency and complexity for a simple scoring endpoint, making it unsuitable for the stated serverless requirement. Option D is wrong because Azure Databricks is a big data analytics platform designed for batch and interactive processing, not for deploying low-latency web services; it would introduce significant overhead and latency for real-time scoring and does not natively integrate with Azure Stream Analytics for alerting.

Full explanation →

444

MCQhard

A global e-commerce company uses Azure SQL Database for its order management system. They need to ensure high availability with the ability to fail over to an Azure region in a different continent in case of a regional outage. They also want to use the secondary database for read-intensive reporting without affecting the primary's performance. Which Azure SQL Database feature should they enable?

A.Active geo-replication

B.Long-term backup retention

C.Automatic tuning

D.Connection pooling

AnswerA

Active geo-replication creates a readable secondary database in a different Azure region. It allows failover and offloads read-heavy workloads to the secondary. The secondary is readable and can be used for reporting.

Why this answer

Active geo-replication is the correct choice because it creates readable secondary replicas of an Azure SQL Database in a different Azure region (including a different continent). It supports manual failover to the secondary region during an outage, and the secondary can be used for read-only query workloads like reporting without impacting the primary database's performance.

Exam trap

The trap here is that candidates may confuse 'geo-replication' with 'failover groups' or assume that any backup feature (like long-term retention) can serve as a high-availability solution, but only active geo-replication provides a readable secondary in a different continent for both failover and read-scale.

How to eliminate wrong answers

Option B (Long-term backup retention) is wrong because it only preserves database backups for extended periods (up to 10 years) for compliance or recovery, not for real-time failover or read-scale. Option C (Automatic tuning) is wrong because it optimizes query performance through index and plan recommendations, not for high availability or geo-failover. Option D (Connection pooling) is wrong because it manages client-side database connections to reduce latency and resource usage, but does not provide any regional redundancy or read-scale capability.

Full explanation →

445

MCQmedium

Your company is migrating a legacy on-premises SQL Server database to Azure. The database is used by a mission-critical application that requires high availability with automatic failover to a secondary region. The database size is 2 TB and is expected to grow 20% annually. The application uses stored procedures, functions, and SQL Server Agent jobs. You need to select an Azure relational database service that meets the high availability requirements, supports the existing SQL Server features, and minimizes migration effort. What should you choose?

A.Azure SQL Managed Instance with failover group

B.Azure Database for SQL Server (Hyperscale tier)

C.Azure SQL Database (single database) with active geo-replication

D.Azure SQL Database elastic pool with geo-replication

AnswerA

SQL Managed Instance supports SQL Server Agent, stored procedures, and provides automatic failover to a secondary region.

Why this answer

Option C is correct because Azure SQL Managed Instance provides high availability with automatic failover, supports SQL Server Agent, and offers near 100% compatibility. Option A is wrong because Azure SQL Database does not support SQL Server Agent. Option B is wrong because Azure SQL Database elastic pools are for multiple databases, not high availability.

Option D is wrong because Azure Database for SQL Server is not a recognized service.

Full explanation →

446

MCQeasy

An e-commerce application uses Azure SQL Database and stores user session data in a table called Sessions. The table contains millions of rows and queries often filter by UserID and LastActivityTime. The development team wants to improve query performance for these filters. What should they implement?

A.Create a clustered index on the SessionID column

B.Create a view that filters the data

C.Create a nonclustered index on UserID and LastActivityTime

D.Partition the table by month

AnswerC

Nonclustered index on these columns directly supports the filter queries.

Why this answer

Option C is correct because an index on (UserID, LastActivityTime) speeds up filters on both columns. Option A is wrong because a clustered index organizes data physically but does not specifically optimize these filters. Option B is wrong because partitioning can help manage large tables but not directly optimize filter queries.

Option D is wrong because views are virtual and do not inherently improve performance without indexes.

Full explanation →

447

MCQmedium

A social media company stores user profiles as JSON documents where each profile may have different attributes (e.g., some profiles include 'education' while others include 'work history'). The company also stores user-generated posts in a relational database table with fixed columns (PostID, UserID, Content, Timestamp). Which of the following best describes the data types used for user profiles and user posts?

A.User profiles are structured data; posts are unstructured data.

B.User profiles are semi-structured data; posts are structured data.

C.Both are semi-structured data.

D.User profiles are unstructured data; posts are structured data.

AnswerB

Profiles are semi-structured (JSON with optional fields), posts are structured (fixed relational schema).

Why this answer

User profiles are stored as JSON documents with varying attributes, which is a classic example of semi-structured data because it has some organizational properties (key-value pairs) but does not enforce a fixed schema. User posts are stored in a relational database table with fixed columns (PostID, UserID, Content, Timestamp), which is structured data because it adheres to a rigid schema with defined data types and relationships.

Exam trap

The trap here is that candidates often confuse 'semi-structured' with 'unstructured' because JSON looks like free-form text, but JSON actually has a defined key-value structure, making it semi-structured, not unstructured.

How to eliminate wrong answers

Option A is wrong because user profiles are not structured data; they lack a fixed schema and can have varying attributes, which is the definition of semi-structured data. Option C is wrong because user posts are stored in a relational table with fixed columns, making them structured data, not semi-structured. Option D is wrong because user profiles are not unstructured data like free-form text or images; they are JSON documents with key-value pairs, which have a logical structure even if the schema is flexible.

Full explanation →

448

MCQhard

A company is migrating a 3-TB on-premises SQL Server database to Azure. The database heavily uses cross-database queries with three-part names (e.g., db.schema.table) and relies on SQL Server Agent for scheduled maintenance jobs. They want a fully managed PaaS service with automatic backups and patching, while minimizing application code changes. Which Azure SQL service should they choose?

A.Azure SQL Managed Instance

B.Azure SQL Database (single database)

C.Azure SQL Database (elastic pool)

D.Azure Synapse Analytics dedicated SQL pool

AnswerA

Azure SQL Managed Instance supports three-part cross-database queries and SQL Server Agent, offers high compatibility, and is fully managed.

Why this answer

Azure SQL Managed Instance is the correct choice because it provides near-100% compatibility with on-premises SQL Server, including support for cross-database queries using three-part names (db.schema.table) and SQL Server Agent for scheduled maintenance jobs. As a fully managed PaaS service, it offers automatic backups, patching, and high availability while minimizing application code changes, unlike Azure SQL Database which lacks cross-database query support and SQL Agent.

Exam trap

The trap here is that candidates often choose Azure SQL Database (single or elastic pool) because it is the most well-known PaaS option, overlooking that it lacks critical on-premises features like cross-database three-part name queries and SQL Server Agent, which are essential for minimizing code changes in this migration scenario.

How to eliminate wrong answers

Option B (Azure SQL Database single database) is wrong because it does not support cross-database queries with three-part names (requires elastic query or external tables) and lacks SQL Server Agent for scheduled jobs. Option C (Azure SQL Database elastic pool) is wrong because it inherits the same limitations as single databases—no cross-database three-part name queries and no SQL Server Agent—and is primarily for resource pooling, not compatibility. Option D (Azure Synapse Analytics dedicated SQL pool) is wrong because it is a massively parallel processing (MPP) data warehouse service, not a general-purpose OLTP database; it does not support cross-database queries with three-part names or SQL Server Agent, and would require significant application rewrites.

Full explanation →

449

MCQhard

A company uses Azure Synapse Analytics for its data warehouse. They notice that query performance is degrading over time as data grows. Which action would most likely improve performance without requiring additional compute resources?

A.Partition large tables based on date or other high-cardinality columns

B.Migrate to a star schema on a separate Azure SQL Database

C.Increase the Synapse SQL pool service level

D.Remove columnstore indexes from large tables

AnswerA

Partitioning reduces data scanned per query, improving performance.

Why this answer

Partitioning large tables on a high-cardinality column like date enables partition elimination, where queries only scan relevant partitions instead of the entire table. This reduces I/O and improves performance without requiring additional compute resources, as it optimizes data access patterns within the existing Synapse SQL pool.

Exam trap

The trap here is that candidates may confuse partitioning with indexing or scaling, and incorrectly assume that removing indexes or migrating to a different service is a valid optimization without considering the 'no additional compute resources' constraint.

How to eliminate wrong answers

Option B is wrong because migrating to a star schema on a separate Azure SQL Database would require additional compute resources (a new database) and does not address the performance degradation within the existing Synapse Analytics environment. Option C is wrong because increasing the Synapse SQL pool service level directly adds compute resources (DWUs), which contradicts the requirement of not requiring additional compute resources. Option D is wrong because removing columnstore indexes from large tables would severely degrade query performance, as columnstore indexes are essential for compression and efficient analytical queries in Synapse; this action would worsen, not improve, performance.

Full explanation →

450

MCQmedium

Your company is developing a new analytics solution to track customer sentiment from social media feeds. The data arrives as a continuous stream of JSON messages. The solution must process the data in near real-time, enrich it with customer profile data stored in Azure Cosmos DB, and then store the results in a data lake for historical analysis. The team wants to use a low-code approach for the data processing logic. You are considering the following architectures: A) Use Azure Event Hubs to ingest the stream, Azure Stream Analytics to process and enrich the data using Cosmos DB as a reference data source, and output to Azure Data Lake Storage Gen2. B) Use Azure IoT Hub to ingest the stream, Azure Databricks to process the data, and write to Azure Blob Storage. C) Use Azure Event Hubs to ingest the stream, Azure Functions to process each message, query Cosmos DB for enrichment, and write to Azure Data Lake Storage Gen2. D) Use Azure Event Hubs to ingest the stream, Azure Data Factory to execute a mapping data flow for enrichment, and write to Azure Data Lake Storage Gen2. Which architecture best meets the requirements of near real-time processing, enrichment, and low-code?

A.Option A

B.Option C

C.Option D

D.Option B

AnswerA

Stream Analytics offers low-code, near real-time, and supports reference data enrichment.

Why this answer

Option A is correct because Azure Stream Analytics provides a low-code, SQL-based approach for near real-time processing, and it can natively enrich streaming data by using Azure Cosmos DB as a reference data source via a JOIN operation. The output is directly written to Azure Data Lake Storage Gen2, meeting all requirements without custom code.

Exam trap

The trap here is that candidates often confuse Azure Stream Analytics with Azure Data Factory, assuming both can handle streaming, but Data Factory is batch-only and cannot process a continuous Event Hubs stream in near real-time.

How to eliminate wrong answers

Option B is wrong because Azure IoT Hub is designed for device-to-cloud telemetry, not social media feeds, and Azure Databricks requires coding (Python/Scala) and is not a low-code solution. Option C is wrong because Azure Functions requires writing custom code for each message, which violates the low-code requirement, and it does not natively support reference data enrichment from Cosmos DB in a streaming context. Option D is wrong because Azure Data Factory mapping data flows are designed for batch processing, not near real-time streaming, and they cannot ingest a continuous stream from Event Hubs directly.

Full explanation →

Microsoft Azure Data Fundamentals DP-900 (DP-900) — Questions 376–450