DP-900 Exam Questions and Answers

A company stores customer names, addresses, and order history. They need to perform complex queries that join customer and order data. Which type of data store is most appropriate for this scenario?

Key-value store

Relational database

Relational databases organize data into tables with defined schemas and support SQL queries including joins, making them ideal for this requirement.

Document database

Graph database

Why: Relational databases manage structured data with defined relationships using tables and support complex queries with joins. Key-value stores are optimized for simple lookups, document databases handle semi-structured data, and graph databases excel at traversing relationships but are less efficient for typical tabular joins.

A retail company captures real-time sensor data from IoT devices to detect anomalies and predict equipment failures. The data must be processed immediately as it arrives. Which type of data processing workload best describes this scenario?

Batch processing

Streaming processing

Streaming processing ingests and analyzes data in real time, enabling prompt anomaly detection and failure prediction from IoT sensor feeds.

Online transaction processing (OLTP)

Data warehousing

Why: Streaming data processing handles data in real time as it is generated, making it suitable for low-latency analytics and immediate actions. Batch processing works on accumulated data over time, OLTP focuses on transactional operations, and data warehousing is for aggregating historical data for reporting.

Which classification of data describes information that has a fixed schema and is organized into rows and columns, such as data found in a relational database table?

Unstructured data

Semi-structured data

Structured data

Structured data conforms to a fixed schema, typically in tables with rows and columns. This is the standard format for relational database systems.

Transformed data

Why: Structured data adheres to a predefined schema, often stored in rows and columns. This is typical of relational databases and spreadsheets. Unstructured data has no schema, semi-structured data has flexible schema (e.g., JSON), and 'transformed data' is not a standard data classification.

A logistics company stores shipping waybill data as JSON documents. Each document contains fields like 'shipmentId', 'destination', and 'items', but the number of items and the fields within each item can vary between shipments. Which category best describes this type of data?

Operational data

Semi-structured data

JSON documents with optional fields and variable structures are a classic example of semi-structured data, which has some organizational properties but no rigid schema.

Unstructured data

Structured data

Why: The data is stored as JSON documents with varying fields, which means it does not adhere to a fixed schema. This is characteristic of semi-structured data. Structured data would have a fixed schema, and unstructured data (like images or video) has no structure at all. Operational data is a different classification related to usage, not format.

A consulting firm collects client information in two forms: a spreadsheet with columns for Name, Address, and Phone Number, and audio recordings of client meetings. Which of the following statements correctly categorizes these data types?

Both the spreadsheet data and the audio recordings are examples of structured data.

The spreadsheet data is structured, and the audio recordings are semi-structured.

The spreadsheet data is structured, and the audio recordings are unstructured.

Correct. The spreadsheet has a fixed schema (columns) making it structured; audio recordings have no defined schema, making them unstructured.

The spreadsheet data is semi-structured, and the audio recordings are unstructured.

Why: Data can be classified as structured, semi-structured, or unstructured. Structured data has a fixed schema and is organized into rows and columns, like a spreadsheet. Unstructured data has no predefined schema and includes media files like audio recordings. Semi-structured data has some organizational properties (e.g., tags or markers) but not a rigid schema, such as JSON or XML. In this scenario, the spreadsheet is structured and the audio recordings are unstructured.

A company operates an online store that processes customer orders. When a customer places an order, the system must immediately reduce the inventory count for the purchased items and record the order details. At the end of each month, the company runs reports that aggregate sales data over the past month to analyze trends. Which type of data processing workload best describes the order placement activity?

Transactional processing

Order placement involves immediate, real-time updates to inventory and order records, requiring transactional consistency and ACID properties. This is a classic example of an Online Transaction Processing (OLTP) workload.

Analytical processing

Batch processing

Stream processing

Why: Order placement requires immediate updates to inventory and order records, which is characteristic of a transactional workload. Transactional workloads ensure ACID properties (Atomicity, Consistency, Isolation, Durability) to maintain data integrity during real-time operations. Analytical workloads, such as the monthly sales reports, involve aggregating historical data for analysis and are typically batch-oriented.

Want more Describe core data concepts practice?

All Identify considerations for relational data on Azure questions

Domain 2: Identify considerations for relational data on Azure

A company is migrating an on-premises SQL Server database to Azure. They want to ensure that database administrators (DBAs) can perform administrative tasks but cannot view sensitive customer data in query results. Which Azure SQL feature should they implement?

Dynamic Data Masking

Always Encrypted

Always Encrypted encrypts data on the client side, so the database never sees plaintext. DBAs cannot access the encryption keys and therefore cannot view the sensitive data.

Transparent Data Encryption

Row-Level Security

Why: Always Encrypted enables encryption of sensitive data at the column level and ensures that only authorized applications (with access to the encryption keys) can see plaintext data. DBAs without the keys cannot decrypt the data, even though they can manage the database. Dynamic Data Masking only obscures data from certain users, but DBAs with elevated permissions can still view the unmasked values.

A software-as-a-service (SaaS) provider hosts a multi-tenant application with a separate database for each tenant. They anticipate scaling to thousands of tenants and want to minimize cost while allowing tenants to share resources flexibly. Which Azure SQL offering is most suitable?

Azure SQL Database elastic pool

Elastic pools provide a cost-effective way to manage and scale multiple databases with fluctuating resource needs, ideal for multi-tenant SaaS scenarios.

Azure SQL Database (single database)

Azure SQL Managed Instance

SQL Server on Azure Virtual Machine

Why: Azure SQL Database elastic pools allow multiple single databases to share a fixed set of resources, optimizing cost for workloads with many databases that have varying usage patterns. Individual single databases are more expensive per unit, and Managed Instance or SQL Server on VMs are designed for larger, more resource-intensive workloads rather than thousands of small databases.

A company runs an e-commerce application backed by an on-premises SQL Server database. They plan to migrate to Azure SQL Database and require automatic failover across two Azure regions for disaster recovery. The application must continue to connect using the same connection string after a failover, with no code changes. Which feature should they implement?

Active Geo-Replication

Elastic pools

Failover groups

Failover groups enable automatic asynchronous replication and automatic failover across regions. The application connects to a listener endpoint that remains unchanged after failover, requiring no code changes.

SQL Server on Azure Virtual Machine with Always On Availability Groups

Why: Azure SQL Database failover groups provide automatic, asynchronous replication across regions and support a single listener endpoint. In the event of a regional outage, the group automatically fails over to the secondary region, and the application continues to connect using the same writer endpoint URL. Active Geo-Replication requires manual or custom failover logic. Elastic pools do not provide cross-region replication, and SQL Server on Azure VMs requires manual configuration of Availability Groups and a listener.

A company is migrating a legacy on-premises database to Azure. They require the ability to run cross-database queries within the same logical server, full control over database collation settings, and want to minimize management overhead for infrastructure patching. The database size is under 1 TB and they do not need instance-level features like SQL Agent jobs or linked servers. Which Azure SQL offering should they choose?

Azure SQL Database

Azure SQL Database is a PaaS service that handles patching, supports elastic query for cross-database queries, and allows collation settings on a per-database level. It does not include SQL Agent or linked servers, which are not required here.

Azure SQL Managed Instance

SQL Server on Azure Virtual Machine

Azure Synapse SQL pool

Why: Azure SQL Database is a fully managed PaaS service that supports elastic query for cross-database queries and provides database-level collation control. It eliminates infrastructure patching overhead but does not include instance-scoped features like SQL Agent (unless using elastic jobs) or linked servers. SQL Managed Instance provides those features but with more management overhead. SQL Server on Azure VM gives full control but requires patching. Azure Synapse SQL is for analytical workloads.

A company is migrating an on-premises SQL Server database to Azure. The database uses SQL Server Integration Services (SSIS) packages for daily ETL processes. The company wants to minimize administrative overhead for patching and backup management, but needs to retain full control over instance-level configurations and support for SSIS. Which Azure SQL service should they choose?

Azure SQL Database

Azure SQL Managed Instance

Azure SQL Managed Instance supports SSIS and provides instance-level control with automated patching and backups, minimizing overhead.

Azure Synapse Analytics

Azure SQL Server on Azure Virtual Machines

Why: Azure SQL Managed Instance provides near 100% compatibility with on-premises SQL Server, including support for SSIS, SQL Agent, and cross-database queries, while offloading patching and backups to Azure. Azure SQL Database does not support SSIS. Azure Synapse Analytics is for analytics workloads, not transactional. Azure SQL Server on VMs gives full control but requires manual management of patching and backups, increasing overhead.

A startup is developing a web application that requires a relational database with PostgreSQL compatibility. They want a fully managed service that automatically handles backups, patching, and provides high availability with a 99.99% SLA. Which Azure service should they choose?

Azure Database for PostgreSQL

Azure Database for PostgreSQL (Flexible Server) is a fully managed PostgreSQL service with automatic backups, patching, and zone-redundant high availability offering a 99.99% SLA. It is the ideal choice for a PostgreSQL-compatible relational database.

Azure SQL Database

Azure Database for MySQL

Azure Cosmos DB for PostgreSQL

Why: Azure Database for PostgreSQL is a fully managed relational database service that offers high availability with zone-redundant deployment, automatic backups, and patching. It provides PostgreSQL compatibility and a 99.99% SLA when deployed with zone redundancy. Azure SQL Database is for SQL Server, not PostgreSQL. Azure Database for MySQL is for MySQL, not PostgreSQL. Azure Cosmos DB for PostgreSQL (formerly Hyperscale) is a distributed PostgreSQL-compatible service but is designed for horizontal scaling and may not offer the same simple fully managed experience with 99.99% SLA as the standard Azure Database for PostgreSQL flexible server.

Want more Identify considerations for relational data on Azure practice?

All Describe considerations for working with non-relational data on Azure questions

Domain 3: Describe considerations for working with non-relational data on Azure

A social media application stores user profile data as JSON documents. Each user's document has a different structure, with fields that vary based on user activity. The application needs to query these documents efficiently using SQL-like syntax and support high write throughput. Which Azure data store is most appropriate for this workload?

Azure SQL Database

Azure Blob Storage

Azure Cosmos DB

Azure Cosmos DB is a globally distributed, multi-model NoSQL database that supports JSON documents natively. It allows flexible schemas, SQL-like querying, and high throughput, making it ideal for this scenario.

Azure Table Storage

Why: Azure Cosmos DB is a NoSQL database service that natively supports JSON documents, flexible schemas, and SQL-like queries. It is ideal for applications that require high throughput, low latency, and the ability to handle variable data structures. Azure SQL Database requires a fixed schema, Azure Blob Storage does not support querying JSON content natively, and Azure Table Storage is a key-value store that is less suitable for nested JSON structures.

A ride-sharing application needs to store real-time GPS location updates from drivers and passengers. The data is ingested as key-value pairs where the key is the user ID and the value is a timestamped location. The application requires low-latency reads and writes for millions of concurrent users, and the data model is simple with no need for complex queries or joins. Which Azure NoSQL database API should be used for this workload?

Azure Cosmos DB Table API

The Table API is designed for key-value storage with simple queries by partition key and row key, providing low-latency access at global scale. It is ideal for this type of high-throughput, simple data access pattern.

Azure Cosmos DB SQL (Core) API

Azure Cosmos DB for MongoDB API

Azure Cosmos DB for Apache Gremlin API

Why: Azure Cosmos DB offers multiple APIs. The Table API (or the older Azure Table storage) is optimized for key-value scenarios with simple queries by key, offering low latency at scale. The SQL API supports document queries, which is overkill for simple key-value operations. The MongoDB API is for document data. Gremlin API is for graph data. The key-value pattern with simple lookups makes Table API the most appropriate.

A global social media platform stores user profile images (JPEG) and activity logs in JSON format. The logs have varying structures based on the type of activity. The application requires low-latency reads of images from any region and the ability to query logs using SQL-like syntax. Which Azure data storage solution should they use for each data type?

Azure Table Storage for images and Azure Cosmos DB (Table API) for logs

Azure Blob Storage with a CDN for images and Azure Cosmos DB (SQL API) for logs

Blob Storage efficiently stores unstructured images, and CDN ensures low-latency global access. Cosmos DB SQL API provides SQL-like queries for the varying JSON logs.

Azure Files for images and Azure SQL Database for logs

Azure Disk Storage for images and Azure Cosmos DB (MongoDB API) for logs

Why: Azure Blob Storage is ideal for storing unstructured data like images, and integrating with Azure CDN provides low-latency global access. Azure Cosmos DB with the SQL API offers SQL-like querying of JSON documents with flexible schema, perfect for varying activity logs. Table Storage is not optimized for images or SQL-like queries. Files and Disk Storage are not suitable for global image distribution. Cosmos DB MongoDB API is an alternative but the SQL API is more native for SQL-like queries.

A retail company stores product catalog data as JSON documents. Each product has a different set of attributes depending on its category (e.g., electronics have 'voltage', clothing has 'size'). The application needs to query products by category and price range efficiently. Which Azure data store is most appropriate for this workload?

Azure Cosmos DB

Correct. Cosmos DB is a NoSQL database that supports schema-flexible JSON documents and provides fast queries on any attribute, ideal for product catalogs with varying attributes.

Azure SQL Database

Azure Blob Storage

Azure Table Storage

Why: Azure Cosmos DB is a fully managed NoSQL database service that supports flexible schema and native JSON support. It allows efficient queries on varied attributes and provides low-latency access, making it ideal for product catalogs with heterogeneous item structures.

A media company stores large video files and associated metadata (title, duration, tags) as JSON documents. The application requires low-latency streaming of videos to users worldwide and the ability to quickly query metadata by tag. Which combination of Azure services should the company use?

Azure Blob Storage for videos and Azure Cosmos DB for metadata

Correct. Blob Storage handles large video files efficiently, while Cosmos DB provides fast, indexed querying on flexible JSON metadata.

Azure Blob Storage for both videos and metadata

Azure Cosmos DB for videos and Azure Table Storage for metadata

Azure Files for videos and Azure SQL Database for metadata

Why: Azure Blob Storage is optimized for storing large binary objects like video files and can be integrated with Azure CDN for global streaming. Azure Cosmos DB is a NoSQL database that supports low-latency queries on JSON metadata, including indexing on tags.

A global gaming company develops a multiplayer game. Player profile data (username, email, preferences) is stored as simple key-value pairs and must be accessible with single-digit millisecond latency from any region. Game session logs are stored as JSON documents with varying fields (session ID, player actions, timestamps) and must be queryable by player ID and timestamp range using SQL-like syntax. The company wants to use a single Azure database service for both workloads. Which combination of Azure Cosmos DB APIs should they choose?

Table API for profiles and SQL API for logs

The Table API provides key-value storage with single-digit millisecond latencies, ideal for player profiles. The SQL API supports JSON documents and full SQL query syntax, perfect for querying session logs by player ID and timestamp.

SQL API for both profiles and logs

MongoDB API for profiles and Cassandra API for logs

Table API for both profiles and logs

Why: Azure Cosmos DB supports multiple APIs for different data models. For key-value pairs with low-latency global access, the Table API is ideal. For JSON documents with SQL querying capability, the SQL (Core) API is best. They can use two Azure Cosmos DB accounts, one with Table API for profiles and one with SQL API for logs. The other options either mismatch capabilities or introduce unnecessary complexity.

Want more Describe considerations for working with non-relational data on Azure practice?

All Describe an analytics workload on Azure questions

Domain 4: Describe an analytics workload on Azure

A manufacturer collects sensor data from thousands of IoT devices every second. The data is ingested into Azure Event Hubs and then needs to be stored for historical analysis. The analytics team will run complex aggregations and time-series queries over petabytes of data, expecting fast results even with large scans. Which Azure service should be used as the analytical data store?

Azure Data Lake Storage Gen2

Azure SQL Database

Azure Synapse Analytics dedicated SQL pool

Azure Synapse Analytics dedicated SQL pool uses MPP and columnar storage to execute complex queries over huge datasets efficiently. It is purpose-built for large-scale data warehousing and analytical workloads.

Azure Cosmos DB

Why: Azure Synapse Analytics (formerly SQL Data Warehouse) provides a massively parallel processing (MPP) engine and columnar storage optimized for petabyte-scale analytical queries. It is designed for high-performance aggregations and complex time-series analysis. Azure Data Lake Storage Gen2 is a storage layer but requires a separate compute engine like Synapse or Databricks to run queries. Azure SQL Database lacks the scale and parallelism for petabyte workloads, and Azure Cosmos DB is built for operational, not analytical, workloads.

A manufacturing company has a streaming data pipeline that ingests sensor data from factory equipment into Azure Event Hubs. The data must be prepared for reporting by cleaning invalid records, removing duplicates, and aggregating readings into 5-minute windows. The transformed data needs to be stored in a columnar format in a data lake to support efficient querying by data analysts using SQL. Which Azure service should perform the data transformation and loading?

Azure Data Factory

Azure Databricks

Azure Stream Analytics

Azure Stream Analytics is a serverless real-time analytics service that can ingest data from Event Hubs, perform time-windowed aggregations, clean data, and output to Azure Data Lake Storage in the desired columnar format. It is the most straightforward and cost-effective choice for this streaming ETL scenario.

Azure Synapse Pipelines

Why: Azure Stream Analytics is a real-time event processing engine that can consume from Event Hubs, apply transformations (like filtering, aggregation, and windowing), and output to multiple sinks including Azure Data Lake Storage. Azure Data Factory is for orchestration and batch ETL, not real-time streaming. Azure Databricks could be used, but Stream Analytics is purpose-built for streaming transformations with minimal code. Azure Synapse Pipelines is similar to Data Factory and is not optimized for real-time processing.

A data analytics team stores sales transaction data in Parquet files in Azure Data Lake Storage Gen2. They want to run complex analytical queries that join this data with dimension tables stored in Azure Synapse Analytics dedicated SQL pool. The team prefers not to move or copy the data from the data lake. Which feature should they use to query the data lake data directly?

Azure Data Factory pipelines

PolyBase external tables

PolyBase enables Synapse to create external tables that query data in the data lake without moving it.

Azure Stream Analytics

Azure Databricks notebooks

Why: PolyBase in Azure Synapse Analytics allows creating external tables that reference data stored in Azure Data Lake Storage Gen2 (or Blob Storage) using the T-SQL language. This enables querying the data in place without moving it. Azure Data Factory is for orchestrating data movement, not for direct querying. Azure Stream Analytics is for real-time streaming. Azure Databricks is a separate platform and does not directly integrate with Synapse dedicated SQL pool for in-place querying.

A healthcare analytics company receives continuous streams of patient monitoring data from IoT devices. The data must be processed in near real-time to detect critical events (e.g., abnormal heart rate). Processed data is then stored in a columnar format for historical analysis and reporting by data analysts using SQL. Which combination of Azure services should they use for ingestion, processing, and storage?

Azure Event Hubs, Azure Stream Analytics, Azure Synapse Analytics

Event Hubs ingests data in real-time. Stream Analytics processes the stream to detect events and transform data. Synapse Analytics provides a columnar data warehouse for historical analysis. This combination fits the requirements exactly.

Azure IoT Hub, Azure Data Factory, Azure SQL Data Warehouse

Azure Event Hubs, Azure Stream Analytics, Azure Cosmos DB

Azure Blob Storage, Azure Databricks, Azure Table Storage

Why: Azure Event Hubs is a scalable event ingestion service capable of handling millions of events per second from IoT devices. Azure Stream Analytics provides serverless real-time stream processing with SQL-like language to detect events and transform data. Azure Synapse Analytics (dedicated SQL pool) is a columnar data warehouse optimized for large-scale analytical queries. This combination meets all requirements. IoT Hub is for device management and bi-directional communication, not just ingestion. Data Factory is a batch ETL orchestrator, not real-time. Cosmos DB is not columnar. Blob Storage is not a columnar data warehouse. Databricks can do processing but is more complex and not serverless. Table Storage is not suitable for analytical queries.

A retail chain collects daily sales data from hundreds of stores. The data is stored as CSV files in Azure Data Lake Storage Gen2. The analytics team needs to run complex SQL queries that join sales data with product dimensions and aggregate results across petabytes of data. Queries must return results within seconds. Which Azure service is best suited for this analytical workload?

Azure Synapse Analytics

Correct. Synapse Analytics provides a SQL-based engine optimized for large-scale analytical queries and can directly query data in Data Lake Storage with PolyBase or CETAS.

Azure SQL Database

Azure Analysis Services

Azure HDInsight

Why: Azure Synapse Analytics is a limitless analytics service that combines data warehousing and big data analytics. It can query data directly from Data Lake Storage using Synapse SQL, providing fast performance for complex queries over large datasets.

A financial analytics company has petabytes of transaction data stored as Parquet files in Azure Data Lake Storage Gen2. Data analysts need to run complex SQL queries that join multiple tables and return results within seconds. The company wants to query the data directly without moving it to another store. Which Azure service should they use?

Azure SQL Database

Azure Synapse Serverless SQL pool

Serverless SQL pool can directly query Parquet files in the data lake using standard T-SQL and scales automatically for large datasets.

Azure HDInsight

Azure Databricks

Why: Azure Synapse Serverless SQL pool allows querying data lake files directly using T-SQL, with a distributed query engine optimized for large-scale analytics. Azure SQL Database is not designed for petabyte-scale, Azure HDInsight requires cluster management and is less SQL-centric, and Azure Databricks typically uses Spark SQL or Python, not direct T-SQL.

Want more Describe an analytics workload on Azure practice?