DP-900 is Microsoft's foundational data certification. It builds conceptual understanding of data concepts, Azure data services, and the difference between relational and non-relational data — without requiring hands-on configuration or SQL expertise. This is the entry point for data engineering, analytics, and database administration paths on Azure. It is also a useful credential for anyone working with data in a business context who wants to understand what the data team actually does.
Practice this topic
DP-900 covers fundamental data literacy concepts. Data formats: structured data (rows and columns — relational databases), semi-structured data (flexible schema — JSON, XML, key-value pairs), unstructured data (no schema — images, videos, documents). Data storage: OLTP (Online Transaction Processing — many concurrent small reads/writes, optimised for transactions, low latency — Azure SQL Database, Cosmos DB), OLAP (Online Analytical Processing — large complex queries over historical data, optimised for aggregation — Azure Synapse Analytics, Azure Analysis Services). Data roles: Database Administrator (manage and maintain databases — performance, availability, security), Data Engineer (build data pipelines and storage infrastructure — ETL, data lake design), Data Analyst (query and visualise data to support business decisions — Power BI, SQL queries), Data Scientist (build predictive models — ML, statistics). ETL (Extract, Transform, Load): move data from source to target with transformations — Azure Data Factory orchestrates ETL pipelines.
Relational databases: Azure SQL Database (managed SQL Server, PaaS — elastic scale, built-in HA), Azure SQL Managed Instance (near-100% SQL Server compatibility — for complex migrations), Azure Database for PostgreSQL and MySQL (open-source managed relational databases). Relational concepts: normalisation (eliminate redundancy — 1NF, 2NF, 3NF), primary and foreign keys (enforce referential integrity), ACID transactions (Atomicity, Consistency, Isolation, Durability — guarantee data integrity). Non-relational (NoSQL) databases: Azure Cosmos DB (globally distributed, multiple APIs: NoSQL for documents, MongoDB, Cassandra, Gremlin, Table), Azure Cache for Redis (in-memory key-value), Azure Table Storage (simple NoSQL key-value). NoSQL trade-offs: flexible schema, horizontal scale, high availability — at the cost of reduced consistency guarantees (eventual consistency in distributed scenarios). Azure Blob Storage: for unstructured data (images, videos, documents, backups). Data Lake Storage Gen2: hierarchical namespace over Blob for big data analytics workloads.
Azure analytics services for DP-900: Azure Synapse Analytics (unified analytics platform — SQL, Spark, pipelines, Power BI integration in one workspace), Azure Databricks (Apache Spark-based analytics and ML — collaborative notebooks, MLflow for ML lifecycle management), Power BI (business intelligence and data visualisation — datasets, reports, dashboards, published to Power BI Service). Power BI components: Power BI Desktop (report authoring tool), Power BI Service (cloud publishing and sharing), Power BI Mobile (view on mobile devices). Report vs dashboard: reports are multi-page interactive documents; dashboards are single-page tiles pinned from reports — dashboards give a high-level view. Real-time analytics: Event Hubs ingests streaming data, Stream Analytics processes in-flight data with SQL-like queries, Power BI real-time streaming datasets display live data. Batch analytics: Azure Data Factory orchestrates data movement, Synapse Analytics queries the data, Power BI reports the results.
NoSQL databases are always faster than relational databases
NoSQL databases trade certain consistency guarantees for horizontal scalability. For many transactional workloads, a well-designed relational database with proper indexing outperforms a NoSQL alternative.
Data warehouses and databases serve the same purpose
Databases (OLTP) are optimised for concurrent transactions. Data warehouses (OLAP) are optimised for complex analytical queries across large historical datasets. The storage and indexing strategies differ fundamentally.
Try free DP-900 Data Fundamentals practice questions with explanations, topic links and progress tracking.