How to use DP-900 flashcards effectively
Flashcards work through active recall — the process of retrieving information from memory rather than passively re-reading it. Research consistently shows that active recall produces stronger, longer-lasting memory than re-reading study guides. For DP-900 preparation, this means flashcards are one of the highest-return study tools available.
Attempt recall first
Read the DP-900 question on each card, pause, and attempt to formulate the answer in your own words before revealing. This retrieval attempt — even if wrong — dramatically strengthens memory compared to immediately reading the answer.
Review wrong cards again
When you get a card wrong, note it and add it back to your review pile. Spaced repetition — seeing difficult cards more frequently — is the mechanism that makes flashcard study far more efficient than linear reading.
Study by domain
Group your DP-900 flashcard sessions by domain for the first 3–4 weeks. Master one domain before moving to the next. In the final week, shuffle all cards together to test cross-domain recall — which is what the real DP-900 exam requires.
Short sessions beat marathon reviews
20–30 flashcard cards per session, done daily, produces better retention than a single 200-card marathon session. Five short daily sessions per week over 4 weeks gives you over 400 total card reviews — enough to reliably pass DP-900.
DP-900 flashcard preview
Sample cards from the DP-900 flashcard bank. Read the question, think of the answer, then read the explanation below.
A company stores customer names, addresses, and order history. They need to perform complex queries that join customer and order data. Which type of data store is most appropriate for this scenario?
Relational database
Relational databases manage structured data with defined relationships using tables and support complex queries with joins. Key-value stores are optimized for simple lookups, document databases handle semi-structured data, and graph databases excel at traversing relationships but are less efficient for typical tabular joins.
A company is migrating an on-premises SQL Server database to Azure. They want to ensure that database administrators (DBAs) can perform administrative tasks but cannot view sensitive customer data in query results. Which Azure SQL feature should they implement?
Always Encrypted
Always Encrypted enables encryption of sensitive data at the column level and ensures that only authorized applications (with access to the encryption keys) can see plaintext data. DBAs without the keys cannot decrypt the data, even though they can manage the database. Dynamic Data Masking only obscures data from certain users, but DBAs with elevated permissions can still view the unmasked values.
A social media application stores user profile data as JSON documents. Each user's document has a different structure, with fields that vary based on user activity. The application needs to query these documents efficiently using SQL-like syntax and support high write throughput. Which Azure data store is most appropriate for this workload?
Azure Cosmos DB
Azure Cosmos DB is a NoSQL database service that natively supports JSON documents, flexible schemas, and SQL-like queries. It is ideal for applications that require high throughput, low latency, and the ability to handle variable data structures. Azure SQL Database requires a fixed schema, Azure Blob Storage does not support querying JSON content natively, and Azure Table Storage is a key-value store that is less suitable for nested JSON structures.
A manufacturer collects sensor data from thousands of IoT devices every second. The data is ingested into Azure Event Hubs and then needs to be stored for historical analysis. The analytics team will run complex aggregations and time-series queries over petabytes of data, expecting fast results even with large scans. Which Azure service should be used as the analytical data store?
Azure Synapse Analytics dedicated SQL pool
Azure Synapse Analytics (formerly SQL Data Warehouse) provides a massively parallel processing (MPP) engine and columnar storage optimized for petabyte-scale analytical queries. It is designed for high-performance aggregations and complex time-series analysis. Azure Data Lake Storage Gen2 is a storage layer but requires a separate compute engine like Synapse or Databricks to run queries. Azure SQL Database lacks the scale and parallelism for petabyte workloads, and Azure Cosmos DB is built for operational, not analytical, workloads.
A data engineer needs to process streaming data from IoT devices and store the results in Azure Data Lake Storage for long-term analytics. The data must be processed in near real-time to detect anomalies and trigger alerts. Which Azure service should the engineer use for stream processing?
Azure Stream Analytics
Azure Stream Analytics is a real-time analytics service designed to process high volumes of streaming data from sources like IoT devices. It allows you to run SQL-like queries on data streams and output results to storage, alerts, or other services. Azure Data Factory is an orchestration service for batch data movement and transformation, not real-time. Azure Analysis Services provides OLAP models for semantic layer analysis, not stream processing. Azure Data Lake Analytics is a deprecated service for batch analytics - its functionality has been superseded by Azure Synapse Analytics and Azure Databricks.
A data engineer needs to query data stored in CSV files in Azure Data Lake Storage Gen2 using T-SQL in Azure Synapse Analytics, without loading the data into the database. Which feature should they use?
External tables
External tables in Azure Synapse Analytics (formerly SQL Data Warehouse) allow querying data stored in Azure Blob Storage or Azure Data Lake Storage without moving the data into the database. They define a schema over the files and use PolyBase as the underlying engine to read and process the data. Materialized views and indexed views store precomputed results locally, requiring data to be loaded first. Stored procedures are for procedural logic, not for querying external files.
A data engineer needs to process raw clickstream data from multiple websites that is stored in Azure Blob Storage as JSON files. The processing must run automatically every hour, transform the data into a structured format for reporting, and handle schema changes in the source data without manual intervention. Which Azure service should be used?
Azure Data Factory with a Mapping Data Flow.
Azure Data Factory with Mapping Data Flow is designed for scheduled data transformation and can handle schema drift (changes in the source schema) automatically. Stream Analytics is for real-time streaming, SQL Database requires manual schema changes, and Logic Apps is for workflow automation, not large-scale data transformation. Therefore, Data Factory is the best choice.
A data engineer is designing a data lake architecture in Azure. They plan to first ingest raw data from various sources into a landing zone in Azure Data Lake Storage Gen2. Then they will clean, validate, and deduplicate that data in a second zone. Finally, they will create aggregated, business-ready datasets in a third zone for analysts. This layered approach is known as which architecture?
Medallion architecture
The Medallion architecture (also called Delta Lake architecture) organizes data into bronze (raw), silver (cleaned and validated), and gold (aggregated and ready-for-analysis) layers. This pattern is widely used in lakehouse implementations to improve data quality and simplify downstream analytics. Star schema and snowflake schema are dimensional modeling techniques for data warehouses, not data lake zone structures. Lambda architecture separates batch and streaming paths, but does not define multiple quality zones.
A data engineer needs to transform large datasets stored in Azure Data Lake Storage Gen2 using Python and Apache Spark. They want a serverless compute option that automatically scales and requires no cluster management. Which Azure service should they use?
Azure Synapse Analytics serverless Spark pool
Azure Synapse Analytics provides serverless Apache Spark pools that automatically scale based on workload. You only pay for the compute used during job execution, and there is no need to manage clusters.
A company collects customer feedback forms. Each form contains always-present fields like CustomerID and SubmissionDate, but also a free-text Comments field and optional fields like Rating or ProductCategory that vary between forms. How should this data be classified?
Semi-structured data
The data has a consistent basic structure but allows variability in optional fields, which is characteristic of semi-structured data. Semi-structured data does not require a rigid schema like structured data, but it has some organizational properties (e.g., tags or markers) that separate it from unstructured data like raw text or images.
A company archives legal documents that must be kept for 10 years. Access to these documents is extremely rare (maybe once a year). They want to minimize storage costs. Which Azure Blob Storage access tier is most cost-effective for this data?
Archive tier
Azure Blob Storage offers access tiers: Hot (high cost, low latency), Cool (lower cost for infrequent access), Cold (even lower cost for less frequent access), and Archive (lowest cost for rarely accessed data with retrieval latency in hours). For data that is accessed only once a year, Archive tier provides the lowest storage cost, though data retrieval requires rehydration which can take up to 15 hours. Cool and Cold tiers are more expensive for long-term archival.
A company collects temperature readings from IoT sensors every second. Each reading includes a timestamp, sensor ID, and temperature value. The data is used for real-time monitoring and historical trend analysis. Which type of data is this most likely classified as?
Structured data
The data has a consistent schema (timestamp, sensor ID, temperature) and is stored in a structured format, such as a table. Structured data fits into rows and columns with a predefined data model. Semi-structured data (like JSON) allows flexibility in fields, while unstructured data (like images or text) has no fixed schema. Streaming data refers to the way data is processed, not its structure.
A bank processes online fund transfers. Each transaction must ensure that either both the debit from the sender's account and the credit to the receiver's account occur, or if any part fails, the entire transaction is rolled back. Which ACID property does this guarantee?
Atomicity
Atomicity ensures that all operations within a transaction are completed successfully; otherwise, the transaction is aborted and no changes are applied. This is critical for financial transactions where partial updates would lead to data inconsistency. Consistency ensures data integrity rules are preserved, Isolation prevents interference between concurrent transactions, and Durability ensures committed changes persist.
A company is migrating an on-premises SQL Server database to Azure SQL Managed Instance. The database has a large fact table that is partitioned by date (monthly partitions) to improve query performance and simplify data archiving. The company wants to maintain the same partitioning strategy in Azure to avoid rewriting queries. Which feature in Azure SQL Managed Instance should they use to achieve this?
Table partitioning with partition functions and schemes
Azure SQL Managed Instance supports table partitioning, which is the same feature available in SQL Server. You can partition tables and indexes using a partition function and scheme. This allows the existing partitioning code to work without modification. Sharding distributes data across multiple databases (scale-out). Index partitioning is a subset of table partitioning for indexes. Federated tables are not a native Azure SQL Managed Instance feature; they are from SQL Server's linked servers.
A company is building a data lake and collects data from three sources: (1) a relational database exporting CSV files with fixed columns for customer records, (2) API responses stored as JSON files with varying fields for product reviews, and (3) scanned handwritten notes stored as TIFF images. Which statement correctly categorizes these data by structure type?
1: structured, 2: semi-structured, 3: unstructured
Structured data has a rigid schema and is typically stored in rows and columns (CSV from a relational DB is structured). Semi-structured data has some organizational properties (like tags or key-value pairs) but does not enforce a strict schema; JSON with varying fields is semi-structured. Unstructured data has no predefined structure; images (TIFF) are unstructured. The correct categorization is: (1) structured, (2) semi-structured, (3) unstructured.
A company has multiple independent databases for different business units, each with low to moderate usage and varying workload patterns. They want to consolidate these databases into a single Azure SQL Database deployment option to share resources and reduce costs, while ensuring that databases do not starve each other of resources. Which Azure SQL Database deployment option should they choose?
Elastic pool
Azure SQL Database elastic pools allow multiple databases to share a fixed pool of resources (DTUs or vCores). This is cost-effective when databases have low average usage but occasional spikes, because they can burst within the pool. Individual databases would pay for their own reserved resources, which is more expensive. Managed Instance is for lift-and-shift with many features. Hyperscale is for very large databases and fast scaling.
A car manufacturing company has two data processing systems: one system processes real-time sensor data from assembly lines to immediately detect equipment failures, and another system processes historical production records to generate monthly efficiency reports. Which two types of data processing workloads best describe these systems?
Stream processing and batch processing
Real-time sensor data processing for immediate detection is an example of stream processing because data is processed continuously as it arrives. Historical production report generation is batch processing because it processes a large volume of data at scheduled intervals. OLTP and OLAP refer to database transaction types, not data processing patterns.
A company has an Azure SQL Database that supports a critical business application in the West US region. They want to ensure that if the primary region becomes unavailable, the database can automatically fail over to a secondary replica in the East US region with minimal data loss. The secondary replica must also be readable to offload some reporting queries when the primary is healthy. Which Azure SQL Database feature should they enable?
Active Geo-Replication
Active Geo-Replication creates a readable secondary replica in a different region. It supports automatic failover (using auto-failover groups) with a configurable grace period to minimize data loss. Long-term backups are for retention, not replication. TDE is for encryption at rest. Point-in-time restore is for recovery within the same region.
A company maintains a database of customer orders that are updated frequently. They also store aggregated monthly sales reports that are generated once and then only read. Which statement correctly distinguishes these two types of data workloads?
Transactional data is optimized for write operations, and analytical data is optimized for read operations.
Transactional workloads (OLTP) are optimized for high-volume write operations and point queries, ensuring data integrity for day-to-day operations. Analytical workloads (OLAP) are optimized for complex read queries and aggregations over large datasets. Option A correctly captures this distinction.
A company needs to store order data for an e-commerce platform. The system requires high concurrency, fast inserts, and the ability to enforce referential integrity between tables (e.g., Customers and Orders). Which Azure service should they use?
Azure SQL Database
Relational databases like Azure SQL Database provide ACID transactions, referential integrity constraints (like foreign keys), and support high concurrency workloads. Azure Cosmos DB is a NoSQL database that does not enforce referential integrity. Blob Storage and Data Lake Storage are object storage solutions for unstructured files.
A company receives real-time clickstream data from its website via Azure Event Hubs. They need to detect fraudulent clicks within seconds and also produce daily aggregate reports of visitor statistics for historical analysis. Which combination of Azure services should they use for the real-time detection and the daily aggregation, respectively?
Azure Stream Analytics for real-time detection; Azure Data Factory for daily aggregation
For real-time processing on streaming data, Azure Stream Analytics is a natural choice as it can query data in motion with low latency. For batch processing such as daily aggregation, Azure Data Factory is a pipeline orchestration service that can schedule and run transformations (e.g., using Databricks or SQL) to produce aggregated results. Azure Databricks can handle both, but for simpler real-time scenarios, Stream Analytics is more straightforward and cost-effective for real-time detection. Azure Synapse Analytics is analytics at scale, not real-time stream processing. Azure Functions can process events but is not optimal for high-throughput streaming or scheduled batch jobs.
A company's application uses Microsoft SQL Server with multiple databases that need to run complex queries joining tables across databases. They are migrating to Azure and need a fully managed relational database service with high availability, automated backups, and minimal management overhead. They do not need a separate SQL Server installation and want to avoid managing VMs. Which Azure deployment option should they choose?
Azure SQL Managed Instance
Azure SQL Managed Instance provides near 100% compatibility with SQL Server, including support for cross-database queries using linked servers. It is a fully managed service with built-in high availability, automated backups, and minimal management overhead. Azure SQL Database single database does not support cross-database joins in T-SQL except through limited elastic query features. SQL Server on Azure VMs requires managing VMs and high availability manually. Azure SQL Database elastic pool inherits the same cross-database limitation as single database. Thus, Managed Instance is the best fit for this scenario.
DP-900 flashcards by domain
The DP-900 flashcard bank covers all 4 official blueprint domains published by Microsoft. Cards are distributed proportionally, so domains with higher exam weight have more cards.
Domain Coverage
Describe core data concepts
Identify considerations for relational data on Azure
Describe considerations for working with non-relational data on Azure
Describe an analytics workload on Azure
Flashcards vs practice tests: which is better for DP-900?
Both flashcards and practice questions are evidence-based study tools. The difference is in what they train:
Flashcards — concept retention
Best for memorising definitions, acronyms, protocol behaviours, command syntax, and conceptual distinctions. Use flashcards to build the foundational vocabulary that DP-900 questions assume you know.
Best in: weeks 1–3
Practice tests — application
Best for applying concepts to realistic scenarios, eliminating distractors, and building exam stamina.DP-900 questions test scenario reasoning — not just recall — so practice tests are essential.
Best in: weeks 3–6
The most effective DP-900 study plan combines both: use flashcards for the first 2–3 weeks to build conceptual foundations, then shift to practice tests and mock exams in the final 2–3 weeks to apply and benchmark that knowledge. Most candidates who pass on their first attempt use both tools.
DP-900 flashcards — frequently asked questions
Are the DP-900 flashcards free?
Yes — all DP-900 flashcards on Courseiva are completely free, no account required. Every card includes the question, correct answer, and a full explanation. Create a free account to track which cards you have studied and get spaced repetition recommendations.
How many DP-900 flashcards are on Courseiva?
Courseiva has 500+ original DP-900 flashcards across all 4 exam blueprint domains. New cards are added regularly as the question bank grows. All cards are written by certified engineers against the official Microsoft exam objectives.
How are Courseiva flashcards different from Anki or Quizlet?
Courseiva flashcards are purpose-built for IT certification exams. Unlike generic flashcard platforms where content quality varies, every Courseiva card is mapped to the official DP-900 exam blueprint, written by engineers who hold the certification, and includes a full explanation of the correct answer and why the distractors are wrong. This explanation quality is what separates genuine learning from rote memorisation.
Can I use DP-900 flashcards offline?
Courseiva is a web platform — an internet connection is required. For offline study, we recommend creating free Courseiva account, using the platform in your browser, and using your device's offline capabilities if your browser supports offline web apps.
Track your DP-900 flashcard progress
Save your results, see which domains need more work, and get spaced repetition recommendations — all free.
Sign Up FreeFree forever · Every certification included