DP-900Chapter 59 of 101Objective 3.1

Data Warehouse Concepts: Facts and Dimensions

This chapter covers the core concepts of data warehouses, specifically facts and dimensions, which are fundamental to understanding analytical data stores in Azure. For the DP-900 exam, approximately 15-20% of questions touch on analytics and data warehouse concepts, making this a critical topic. You will learn the definitions, schemas (star and snowflake), and how to identify facts vs. dimensions in exam scenarios. Mastering these concepts will help you answer questions about Azure Synapse Analytics, Azure Analysis Services, and Power BI data modeling.

25 min read
Intermediate
Updated May 31, 2026

Warehouse Storage: Shelves and Items

Imagine a physical warehouse where products are stored. The warehouse has shelves (dimensions) that define the structure: each shelf has a location, a section, and a height. Products (facts) are placed on these shelves. Each product has a SKU, quantity, and price. The shelves provide the context (where, when, what product category), while the products represent the measurable events (sales, inventory levels). The warehouse manager can quickly find total sales by product category by looking at the shelves and summing the product values. Just as a data warehouse uses dimension tables to describe business entities and fact tables to store measurements, the physical warehouse uses shelves to organize and products to quantify. Without shelves, products would be a pile; without products, shelves are empty. The key is that facts are numeric and additive, while dimensions are descriptive and provide the 'who, what, where, when' context.

How It Actually Works

What is a Data Warehouse?

A data warehouse is a centralized repository that stores integrated data from multiple sources for reporting and analysis. Unlike transactional databases (OLTP) optimized for inserts/updates, data warehouses are optimized for read-heavy analytical queries (OLAP). They use a schema-on-write approach where data is pre-processed, cleaned, and structured before loading. In Azure, the primary data warehouse service is Azure Synapse Analytics (formerly SQL Data Warehouse), which uses a massively parallel processing (MPP) architecture.

Facts and Dimensions: The Building Blocks

In a data warehouse, data is organized into fact tables and dimension tables. This is known as a dimensional model, popularized by Ralph Kimball. The fact table contains quantitative data (measures) that can be aggregated, such as sales amount, quantity sold, or profit. Dimension tables contain descriptive attributes (context) that provide meaning to the facts, such as customer name, product category, or date.

Fact Tables: - Store numeric, additive measures (e.g., sales_amount, quantity) - Contain foreign keys to dimension tables - Are often large (millions or billions of rows) - Can have different granularities: transaction-level (each row is a sale), periodic snapshot (daily balance), or accumulating snapshot (order lifecycle) - Example columns: SaleID, DateKey, CustomerKey, ProductKey, SalesAmount, Quantity

Dimension Tables: - Store descriptive attributes (text or discrete values) - Are typically smaller (thousands to millions of rows) - Provide context for facts (e.g., customer name, product name, date) - Often have a surrogate key (integer) as primary key, not the business key - Can be slowly changing (SCD types 1, 2, 3) to track historical changes - Example columns: CustomerKey, CustomerName, CustomerCity, CustomerSegment

Star Schema vs. Snowflake Schema

The exam tests your ability to distinguish between star and snowflake schemas.

Star Schema: - A single fact table surrounded by dimension tables (like a star) - Dimension tables are denormalized (all attributes in one table, e.g., Product table includes category, subcategory, brand) - Advantages: simpler queries, fewer joins, better performance - Disadvantages: data redundancy, larger dimension tables - Example: FactSales joins directly to DimProduct (which contains CategoryName)

Snowflake Schema: - Dimension tables are normalized (split into multiple related tables) - For example, DimProduct references DimCategory, which references DimDepartment - Advantages: less redundancy, easier maintenance - Disadvantages: more joins, slower queries, complex queries - Example: FactSales joins to DimProduct, which joins to DimCategory

Exam tip: Azure Synapse Analytics and Power BI typically use star schemas for performance. Snowflake schemas are less common but may appear in data warehouse design questions.

Grain of a Fact Table

The grain defines what a single row in the fact table represents. It is the most important design decision. Common grains:

Transaction grain: one row per transaction (e.g., each sale)

Periodic snapshot grain: one row per period (e.g., daily inventory level)

Accumulating snapshot grain: one row per entity lifecycle (e.g., order from placement to delivery)

Example: If you have a fact table with DateKey, CustomerKey, and SalesAmount, the grain is one row per customer per day (if there is only one sale per customer per day) or per transaction (if multiple rows per customer per day).

Additive, Semi-Additive, and Non-Additive Measures

Additive: can be summed across all dimensions (e.g., SalesAmount)

Semi-additive: can be summed across some dimensions but not all (e.g., Balance can be summed across customers but not across time)

Non-additive: cannot be summed (e.g., Profit Margin is a ratio; average is appropriate)

Exam questions may ask which measures are additive. Always look for numeric, ratio-based measures as non-additive.

Slowly Changing Dimensions (SCD)

Dimensions can change over time. SCD types describe how to handle changes:

Type 0: Retain original (no change)

Type 1: Overwrite (no history)

Type 2: Add new row with versioning (full history)

Type 3: Add new column (limited history)

Example: Customer moves to a new city. Type 1 updates the city; Type 2 adds a new row with a new surrogate key and effective dates; Type 3 adds a PreviousCity column.

Role-Playing Dimensions

A dimension table can be used multiple times in a fact table with different roles. For example, a Date dimension can be used for OrderDate, ShipDate, and DeliveryDate. In the fact table, each role has a foreign key (e.g., OrderDateKey, ShipDateKey).

Degenerate Dimensions

A degenerate dimension is a dimension attribute stored directly in the fact table because it has no other attributes. Common for transaction numbers (e.g., InvoiceNumber). It is a key but belongs to no dimension table.

Factless Fact Tables

A fact table with no numeric measures, only foreign keys. Used to record events or coverages (e.g., student attendance: DateKey, StudentKey, CourseKey — no measure, just presence).

Conformed Dimensions

Dimensions that are shared across multiple fact tables or data marts. For example, a Date dimension used in Sales and Inventory fact tables ensures consistency. This is a key concept in Kimball's bus architecture.

Azure Synapse Analytics Implementation

In Azure Synapse, tables are created using CREATE TABLE statements. Fact tables are often distributed using hash distribution on a dimension key (e.g., CustomerKey) to collocate joins. Dimension tables can be replicated (small) or hash-distributed. Common distribution methods:

Hash: distribute rows across distributions based on a key

Round-robin: evenly distribute without key affinity (for staging)

Replicate: copy entire table to all distributions (for small dimension tables)

Example of creating a fact table:

CREATE TABLE dbo.FactSales (
    SaleID INT NOT NULL,
    DateKey INT NOT NULL,
    CustomerKey INT NOT NULL,
    ProductKey INT NOT NULL,
    SalesAmount MONEY NOT NULL,
    Quantity INT NOT NULL
)
WITH (
    DISTRIBUTION = HASH(CustomerKey),
    CLUSTERED COLUMNSTORE INDEX
);

Example of creating a dimension table:

CREATE TABLE dbo.DimCustomer (
    CustomerKey INT NOT NULL,
    CustomerID INT NOT NULL,
    CustomerName NVARCHAR(100),
    City NVARCHAR(50),
    Segment NVARCHAR(20)
)
WITH (
    DISTRIBUTION = REPLICATE,
    CLUSTERED INDEX (CustomerKey)
);

Indexing and Performance

Fact tables often use clustered columnstore indexes for compression and query performance. Dimension tables use clustered indexes on the surrogate key. In Synapse, columnstore indexes are the default for analytics workloads.

Relationships in Power BI

In Power BI, data models use star schemas. Relationships are created between fact and dimension tables (one-to-many). The fact table is on the many side, dimension on the one side. Cross-filter direction is usually single (from dimension to fact).

Exam Relevance

DP-900 expects you to:

Define fact and dimension tables

Differentiate star vs. snowflake schemas

Identify additive vs. non-additive measures

Understand SCD basics

Recognize role-playing and degenerate dimensions

Know how these concepts apply to Azure Synapse Analytics and Power BI

Common Trap Questions

Trap: 'A fact table contains descriptive attributes.' Reality: Fact tables contain measures and foreign keys; descriptive attributes are in dimension tables.

Trap: 'Snowflake schema is always better for performance.' Reality: Star schema is typically faster due to fewer joins.

Trap: 'All measures are additive.' Reality: Ratios and percentages are non-additive.

Summary

Facts and dimensions are the heart of data warehousing. Facts store measurements; dimensions store context. Star schemas are preferred for analytics. Azure Synapse uses MPP with hash distribution and columnstore indexes. Understanding these concepts is essential for DP-900 and real-world data engineering.

Walk-Through

1

Identify Business Process

Determine the business process you want to analyze, such as sales, inventory, or orders. This defines the subject of the fact table. For example, a sales process results in a fact table recording each sale transaction. The grain is determined at this step: each row represents one line item on a sales receipt.

2

Define the Grain

Declare exactly what a single fact table row represents. For a sales process, the grain could be 'one row per product per customer per day' or 'one row per transaction line item.' The grain must be atomic (lowest level) to allow maximum flexibility. Choosing the wrong grain leads to data loss or inability to answer certain queries.

3

Identify Dimensions

List the dimensions that provide context for the fact. For sales, typical dimensions are Date, Customer, Product, Store, and Promotion. Each dimension becomes a table with a surrogate key and descriptive attributes. Dimensions should be conformed if shared across multiple fact tables.

4

Identify Facts (Measures)

Determine the numeric measures that will be stored in the fact table. For sales, measures include SalesAmount, Quantity, Discount, and Profit. Ensure measures are additive when possible. If a measure is a ratio (e.g., profit margin), store the numerator and denominator separately to allow aggregation.

5

Design and Build Tables

Create the dimension and fact tables in the data warehouse (e.g., Azure Synapse). Use surrogate keys for dimensions. For the fact table, include foreign keys referencing dimension primary keys. Choose appropriate distribution and indexing. For Synapse, use hash distribution on a dimension key and columnstore index for fact tables.

What This Looks Like on the Job

In a retail company, the data warehouse is used to analyze sales performance across stores, products, and time. The fact table FactSales contains billions of rows, each representing a line item from a point-of-sale transaction. Dimensions include DimStore (store location, region), DimProduct (product name, category, brand), DimDate (date, month, quarter, year), and DimCustomer (customer demographics). The star schema allows the sales team to query total sales by region and product category with simple joins. The fact table is distributed by hash on StoreKey to collocate sales data with the store dimension, reducing data movement during joins. Clustered columnstore indexes compress the data and speed up aggregation queries. Common issues include data skew if a few stores have disproportionately large sales, causing uneven distribution. To mitigate, a different distribution key or round-robin distribution may be used. Another scenario is a financial services company tracking account balances. The fact table FactAccountBalance uses a periodic snapshot grain (daily balance per account). Dimensions include DimAccount, DimDate, and DimBranch. Balance is semi-additive: it can be summed across accounts but not across time (averaging is used for time). Misconfiguration occurs when the grain is not clearly defined, leading to duplicate rows or incorrect aggregations. In production, ETL processes load data daily, and the fact table is partitioned by date to allow partition switching for efficient loading and archiving. Azure Synapse supports partition elimination, improving query performance. When dimensions change (e.g., customer address), Type 2 SCD is used to track history, adding new rows with effective dates. This ensures historical sales data remains accurate. Performance tuning involves monitoring query execution plans, updating statistics, and rebuilding indexes periodically.

How DP-900 Actually Tests This

DP-900 objective 3.1 covers 'Describe data warehouse concepts' including facts and dimensions. The exam tests your ability to: (1) Define fact and dimension tables, (2) Differentiate star vs. snowflake schemas, (3) Identify additive vs. non-additive measures, (4) Recognize slowly changing dimensions (SCD) types, (5) Understand role-playing and degenerate dimensions. Common wrong answers: Candidates often confuse fact tables with dimension tables, thinking fact tables contain descriptive attributes. They may also believe snowflake schemas are always better for performance (star is typically faster). Another trap: assuming all measures are additive (ratios are not). The exam may present a table and ask whether it is a fact or dimension. Key terms: grain, surrogate key, conformed dimension, columnstore index, hash distribution. Numbers: Fact tables can have millions of rows; dimension tables thousands. In Synapse, default distribution is round-robin, but hash is common for fact tables. The exam likes to test edge cases: a factless fact table (e.g., attendance) has no numeric measures. A degenerate dimension (e.g., invoice number stored in fact) has no dimension table. To eliminate wrong answers, apply the definition: if the table contains numeric, additive measures and foreign keys, it's a fact; if it contains descriptive text and a surrogate key, it's a dimension. For schema questions, count the number of joins: star has one join per dimension; snowflake has multiple. For measures, ask: can I sum this across all dimensions? If yes, additive; if only across some, semi-additive; if no, non-additive.

Key Takeaways

Fact tables store numeric, additive measures and foreign keys to dimension tables.

Dimension tables store descriptive attributes and provide context for facts.

Star schema has denormalized dimensions; snowflake schema has normalized dimensions.

Additive measures can be summed across all dimensions; semi-additive cannot be summed across time; non-additive cannot be summed at all.

Slowly changing dimensions (SCD) Type 2 tracks full history by adding new rows.

Role-playing dimensions use the same dimension table for multiple roles (e.g., OrderDate, ShipDate).

Degenerate dimensions are attributes in the fact table with no separate dimension table (e.g., invoice number).

Factless fact tables contain only foreign keys, no measures.

In Azure Synapse, fact tables often use hash distribution on a dimension key and clustered columnstore index.

Conformed dimensions are shared across multiple fact tables for consistency.

Easy to Mix Up

These come up on the exam all the time. Here's how to tell them apart.

Star Schema

Single fact table surrounded by dimension tables.

Dimension tables are denormalized (all attributes in one table).

Fewer joins, faster query performance.

Data redundancy (e.g., category name repeated in product table).

Easier for business users to understand.

Snowflake Schema

Dimension tables are normalized into multiple related tables.

Less data redundancy, easier maintenance.

More joins, slower query performance.

Complex queries require multiple joins.

Saves storage space.

Watch Out for These

Mistake

Fact tables contain only numeric data.

Correct

Fact tables contain numeric measures and foreign keys (which are integers). They can also contain degenerate dimensions (e.g., transaction ID as text or numeric). So they contain both numeric and non-numeric foreign keys.

Mistake

Snowflake schemas are always better for performance than star schemas.

Correct

Star schemas are typically faster because they require fewer joins. Snowflake schemas normalize dimensions, increasing join complexity and potentially slowing queries. Snowflake is used to reduce storage redundancy, not for performance.

Mistake

All measures in a fact table are additive.

Correct

Only additive measures can be summed across all dimensions. Ratios (e.g., profit margin) are non-additive; they must be calculated from additive components. Semi-additive measures (e.g., account balance) can be summed across some dimensions but not time.

Mistake

A fact table always has at least one numeric measure.

Correct

Factless fact tables have no numeric measures; they only contain foreign keys to record events or coverages (e.g., student attendance). They are still fact tables because they capture a business event.

Mistake

Dimension tables are always small.

Correct

While dimension tables are typically smaller than fact tables, they can be large (millions of rows) in cases like customer dimension with full history (SCD Type 2). However, they are still smaller than fact tables, which can have billions of rows.

Do You Actually Know This?

Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.

Frequently Asked Questions

What is the difference between a fact table and a dimension table?

A fact table contains quantitative data (measures) that can be aggregated, such as sales amount or quantity. It also contains foreign keys linking to dimension tables. A dimension table contains descriptive attributes (e.g., product name, customer city) that provide context for the facts. Fact tables are typically large and have many rows; dimension tables are smaller. In a star schema, the fact table is at the center with dimensions surrounding it.

What is the grain of a fact table?

The grain defines what a single row in the fact table represents. For example, in a sales fact table, the grain could be 'one row per transaction line item' or 'one row per product per day.' Choosing the correct grain is critical because it determines the level of detail available for analysis. The grain should be as atomic (lowest level) as possible to support maximum flexibility.

What is a slowly changing dimension (SCD)?

SCD refers to how dimension tables handle changes to attribute values over time. Type 1 overwrites the old value (no history). Type 2 adds a new row with a new surrogate key and effective dates (full history). Type 3 adds a new column to store the previous value (limited history). Type 2 is most common for tracking historical changes.

What is a degenerate dimension?

A degenerate dimension is a dimension attribute that is stored directly in the fact table because it has no other attributes to form a separate dimension table. Common examples are transaction numbers, invoice numbers, or order numbers. They are keys that provide context but do not have additional descriptive columns.

What is a conformed dimension?

A conformed dimension is a dimension that is shared across multiple fact tables or data marts within an enterprise. For example, a Date dimension used in both Sales and Inventory fact tables ensures consistent time attributes. Conformed dimensions are a key concept in Kimball's bus architecture for data warehousing.

How do facts and dimensions relate to Azure Synapse Analytics?

In Azure Synapse, fact tables are typically created with hash distribution on a dimension key to collocate data for joins, and they use clustered columnstore indexes for compression and performance. Dimension tables can be replicated (small) or hash-distributed. The star schema is recommended for optimal query performance in Synapse.

What is a factless fact table?

A factless fact table contains only foreign keys from dimension tables and no numeric measures. It is used to record events or coverages, such as student attendance (DateKey, StudentKey, CourseKey) or product promotion coverage. Despite having no measures, it is still a fact table because it captures a business event.

Terms Worth Knowing

Ready to put this to the test?

You've just covered Data Warehouse Concepts: Facts and Dimensions — now see how well it sticks with free DP-900 practice questions. Full explanations included, no account needed.

Done with this chapter?