DP-900Chapter 58 of 101Objective 1.4

Star Schema vs Snowflake Schema

This chapter covers star schema and snowflake schema, two fundamental dimensional modeling techniques in data warehousing. For the DP-900 exam, understanding the differences and use cases is critical because questions on this topic appear frequently (approximately 5-10% of the exam). You need to know how each schema structures fact and dimension tables, their advantages and trade-offs, and when to use each. We'll dive deep into the mechanics, components, and real-world scenarios to ensure you can confidently answer any exam question.

25 min read
Intermediate
Updated May 31, 2026

Warehouse Inventory vs. Department Store Layout

Imagine a large retail company with a central warehouse and multiple department stores. The warehouse stores every product in a single, normalized location: a product's ID, name, category, supplier, and price are stored only once. When a store needs inventory, it creates a separate 'fact table' of sales transactions, each referencing the product ID, store ID, date, and quantity sold. This is a star schema: the central fact table (sales) is surrounded by dimension tables (product, store, date) that are denormalized—each dimension contains all attributes for that entity in one table. Now consider a snowflake schema: the warehouse further normalizes dimensions. For example, the product dimension is split into product, category, and supplier tables; the store dimension is split into store and region tables. To fulfill a sales report, you must join through multiple levels (snowflake joins). The star schema is simpler and faster for querying because all dimension attributes are in one table per dimension, reducing join complexity. The snowflake schema saves storage by eliminating redundancy but requires more joins, slowing query performance. In data warehousing, star schema is preferred for OLAP cubes and reporting tools because it aligns with how business users think—one fact table with descriptive dimensions. Snowflake schema is used when storage is a premium or when dimensions are highly shared across multiple fact tables.

How It Actually Works

What Are Star and Snowflake Schemas?

Star schema and snowflake schema are two approaches to organizing data in a data warehouse for analytical querying. Both use a central fact table that contains quantitative measures (e.g., sales amount, quantity) and foreign keys referencing dimension tables that provide context (e.g., time, product, customer). The difference lies in how dimension tables are structured.

Star Schema: Denormalized Dimensions

In a star schema, each dimension is denormalized into a single table. For example, a 'Product' dimension table would include product ID, product name, category, subcategory, brand, and supplier all in one table. This eliminates the need for joins within the dimension itself. The fact table sits in the center, and dimension tables radiate out like the points of a star.

Key characteristics: - Simple structure: easy to understand and query. - Fewer joins: queries typically join the fact table with one or more dimension tables directly. - Fast query performance: because dimension tables are flat, queries require fewer joins. - More storage: denormalization leads to redundancy (e.g., category name repeated for every product in the same category). - Ideal for OLAP cubes and reporting tools like Power BI, which prefer star schemas for optimal performance.

Snowflake Schema: Normalized Dimensions

In a snowflake schema, dimension tables are normalized into multiple related tables. Using the same example, the 'Product' dimension might be split into 'Product', 'Category', and 'Supplier' tables. The 'Product' table contains a foreign key to 'Category', which in turn may have a foreign key to 'Department'. This creates a snowflake-like pattern as dimensions branch out.

Key characteristics: - Normalized structure: reduces data redundancy. - More joins: queries require joining multiple tables to retrieve all dimension attributes. - Slower query performance: more joins increase query complexity and execution time. - Less storage: normalization saves space by eliminating duplicate values. - Preferred when storage is expensive or when dimensions are highly shared across multiple fact tables.

How They Work Internally: Query Execution

Consider a query: "Total sales by product category for 2024."

Star schema: - Fact table: Sales (ProductID, DateID, Amount) - Dimension table: Product (ProductID, ProductName, Category) - Query: SELECT Category, SUM(Amount) FROM Sales JOIN Product ON Sales.ProductID = Product.ProductID WHERE Year = 2024 GROUP BY Category. - Joins: one join between Sales and Product. - Execution: database optimizer scans Sales, joins with Product (usually via an index on ProductID), aggregates by Category.

Snowflake schema: - Fact table: Sales (ProductID, DateID, Amount) - Dimension tables: Product (ProductID, ProductName, CategoryID), Category (CategoryID, CategoryName) - Query: SELECT CategoryName, SUM(Amount) FROM Sales JOIN Product ON Sales.ProductID = Product.ProductID JOIN Category ON Product.CategoryID = Category.CategoryID WHERE Year = 2024 GROUP BY CategoryName. - Joins: two joins (Sales->Product and Product->Category). - Execution: more join operations, potentially requiring more memory and CPU. If indexes are missing, performance degrades significantly.

Key Components and Values

Fact table: Contains measures (numeric, additive) and foreign keys to dimensions. Grain (level of detail) is crucial: e.g., one row per sales transaction.

Dimension table: Contains descriptive attributes (text, categorical). In star schema, it is denormalized; in snowflake, normalized.

Surrogate keys: Typically an integer primary key used in the fact table to reference dimensions, independent of source system keys.

Degenerate dimensions: Fact table attributes that are not foreign keys (e.g., order number).

Conformed dimensions: Dimensions shared across multiple fact tables, ensuring consistency.

Configuration and Verification

In Azure Synapse Analytics or SQL Server, you create these schemas using CREATE TABLE statements. For star schema:

CREATE TABLE DimProduct (
    ProductKey INT PRIMARY KEY,
    ProductName NVARCHAR(100),
    Category NVARCHAR(50)
);

CREATE TABLE FactSales (
    OrderID INT,
    ProductKey INT REFERENCES DimProduct(ProductKey),
    SalesAmount DECIMAL(10,2)
);

For snowflake schema:

CREATE TABLE DimCategory (
    CategoryKey INT PRIMARY KEY,
    CategoryName NVARCHAR(50)
);

CREATE TABLE DimProduct (
    ProductKey INT PRIMARY KEY,
    ProductName NVARCHAR(100),
    CategoryKey INT REFERENCES DimCategory(CategoryKey)
);

CREATE TABLE FactSales (
    OrderID INT,
    ProductKey INT REFERENCES DimProduct(ProductKey),
    SalesAmount DECIMAL(10,2)
);

To verify schema, query system tables:

SELECT * FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_TYPE='BASE TABLE';

Interaction with Related Technologies

Power BI: Star schema is the recommended model because Power BI's VertiPaq engine compresses and queries denormalized dimensions efficiently. Snowflake can cause slower performance due to increased cardinality and joins.

Azure Analysis Services (AAS): Tabular models prefer star schema. Snowflake may require creating views that denormalize dimensions.

Azure Synapse Analytics: Dedicated SQL pools benefit from star schema for faster query performance. Distribution and indexing strategies align with star schema design.

Trade-offs Summary

| Aspect | Star | Snowflake | |--------|------|-----------| | Storage | Higher (redundancy) | Lower (normalized) | | Query performance | Faster (fewer joins) | Slower (more joins) | | Maintenance | Simpler (one table per dimension) | Complex (multiple tables) | | Data integrity | Potential anomalies | Better (referential integrity) | | ETL complexity | Easier to load | Harder to load (multiple tables) | | Reporting tools | Preferred | Less preferred |

Exam Relevance

DP-900 tests your ability to identify which schema is used in a given scenario. Common questions: "Which schema is best for fast query performance?" (star). "Which schema reduces data redundancy?" (snowflake). "Which schema has more dimension tables?" (snowflake). Also, you may be asked to identify the fact table vs. dimension table in a diagram.

Walk-Through

1

Identify Business Process

Determine the business process to model, such as sales, inventory, or orders. This defines the fact table's grain (e.g., one row per line item). For DP-900, you need to recognize that the fact table captures measurable events, while dimensions provide context. The grain must be atomic (most detailed level) to support flexible analysis.

2

Define Dimensions

Identify the dimensions that describe the facts: time, product, customer, store, etc. For star schema, each dimension is denormalized into one table. For snowflake, normalize by splitting into sub-dimensions. Example: Time dimension can be split into Date, Month, Quarter, Year tables in snowflake. The exam tests whether you know that snowflake schema has more tables but less redundancy.

3

Design Fact Table

Create the fact table with foreign keys referencing dimension tables and additive measures. The fact table's primary key is typically a composite of all foreign keys (or a surrogate). Measures should be numeric and additive (e.g., sales amount, quantity). In star schema, the fact table is the center; in snowflake, it remains the center but joins to normalized dimensions.

4

Choose Schema Type

Decide between star and snowflake based on requirements. Star is chosen for query performance and simplicity; snowflake for storage efficiency and data integrity. In the exam, if the question emphasizes 'fast query response time' or 'simplicity for business users,' pick star. If it mentions 'reducing storage' or 'avoiding data redundancy,' pick snowflake.

5

Implement and Optimize

Create tables using SQL DDL. For star, create wide dimension tables. For snowflake, create normalized dimension tables with foreign key relationships. Optimize by indexing foreign keys and partitioning fact tables by date. In Azure Synapse, use hash distribution on fact table keys. The exam may ask about performance implications: star schema typically requires fewer joins, so it's faster.

What This Looks Like on the Job

Enterprise Scenario 1: Retail Sales Analytics

A large retail chain uses a star schema to analyze daily sales across 500 stores. The fact table FactSales contains one row per transaction line item, with measures like SalesAmount, Quantity, and Discount. Dimensions include DimDate, DimStore, DimProduct, and DimCustomer, each denormalized. For example, DimProduct includes ProductName, Category, SubCategory, Brand, and Supplier. This design allows analysts to run queries like 'total sales by brand and region for last quarter' with just three joins (FactSales -> DimProduct -> DimStore). The database is hosted on Azure Synapse with clustered columnstore indexes on the fact table, achieving sub-second response times for aggregated queries. Misconfiguration risk: if dimensions are not properly indexed on foreign keys, queries can become slow. Also, if the fact table grain is too coarse (e.g., daily totals instead of line items), drill-down analysis is impossible.

Enterprise Scenario 2: Insurance Claims Processing

An insurance company uses a snowflake schema to manage claims data. The fact table FactClaims includes measures like ClaimAmount, SettlementAmount, and ProcessingTime. Dimensions are normalized: DimPolicy references DimPolicyType and DimCoverageLevel; DimCustomer references DimAddress and DimDemographics. This reduces storage because many policies share the same policy type and coverage level. However, queries that require policy type and customer demographics need four joins: FactClaims -> DimPolicy -> DimPolicyType, and FactClaims -> DimCustomer -> DimDemographics. This can lead to slower performance for ad-hoc queries. To mitigate, the company creates indexed views that denormalize the most common joins. The exam scenario: if the question mentions that storage space is limited and dimensions have many shared attributes, snowflake schema is the correct choice.

Enterprise Scenario 3: E-commerce Order Fulfillment

An e-commerce platform uses a star schema for order analytics. The fact table FactOrders has measures OrderTotal, ShippingCost, and Tax. Dimensions: DimDate, DimCustomer, DimProduct, DimShippingMethod. The DimProduct table includes ProductName, Category, Supplier, WarehouseLocation — all in one table. This enables the business intelligence team to build Power BI reports that load quickly. The star schema aligns with Power BI's recommendation for star schemas to optimize compression and query performance. A common mistake is to use snowflake schema in Power BI, leading to slow report rendering. In production, the ETL process loads data nightly using Azure Data Factory, truncating and reloading dimension tables (type 1 slowly changing dimension) and incrementally loading the fact table.

How DP-900 Actually Tests This

DP-900 Exam Focus on Star vs Snowflake Schema

Objective Code: The topic falls under Core Data Concepts (15-20% of exam) and specifically objective 1.4: 'Describe how to structure data for analytical workloads.' You must be able to differentiate between star and snowflake schemas, understand their advantages and disadvantages, and choose the appropriate schema for a given scenario.

Common Wrong Answers and Why Candidates Choose Them:

1.

'Snowflake schema is faster for queries.' Candidates confuse normalization with performance. Actually, star schema is faster because it requires fewer joins. Snowflake adds joins, slowing down queries.

2.

'Star schema uses less storage.' Candidates think denormalization means less storage, but denormalization actually increases storage due to redundancy. Snowflake uses less storage because it normalizes.

3.

'Snowflake schema is simpler to maintain.' Candidates assume normalized tables are easier to update, but star schema has fewer tables, making ETL simpler. Snowflake requires managing multiple dimension tables and their relationships.

4.

'Both schemas have the same number of tables.' Candidates may not realize that snowflake has more tables due to normalization. Star schema has one table per dimension; snowflake splits dimensions into multiple tables.

Specific Numbers and Terms on the Exam: - The exam uses terms like 'fact table', 'dimension table', 'denormalized', 'normalized', 'grain', 'additive measures'. - Know that a fact table contains foreign keys and measures; dimension tables contain descriptive attributes. - Star schema is also called 'dimensional modeling' or 'Kimball methodology' (Ralph Kimball). Snowflake is a variation. - The exam may show a diagram and ask which schema is represented: star has a central fact table with directly connected dimension tables; snowflake has dimension tables that branch into sub-dimensions.

Edge Cases and Exceptions: - A schema that is partially normalized (some dimensions star, some snowflake) is still considered a snowflake schema if any dimension is normalized. - In some contexts, snowflake schema is used when dimensions have hierarchical relationships (e.g., product -> category -> department) and the hierarchy is deep. - The exam might ask about 'conformed dimensions': dimensions that are shared across multiple fact tables and are consistent. Both schemas can use conformed dimensions.

How to Eliminate Wrong Answers: - If the question mentions 'fast query performance', 'simplicity', or 'business user friendly', eliminate snowflake and choose star. - If the question mentions 'reducing data redundancy', 'storage savings', or 'normalization', eliminate star and choose snowflake. - If the question shows a diagram with a central table and several tables directly connected, it's star. If there are tables connected to other tables before connecting to the central table, it's snowflake.

Key Takeaways

Star schema: one fact table surrounded by denormalized dimension tables; fastest for queries.

Snowflake schema: normalized dimension tables; reduces storage but slows queries.

Fact tables contain measures and foreign keys; dimension tables contain descriptive attributes.

Star schema is the recommended design for OLAP and business intelligence tools.

Snowflake schema is used when storage efficiency or data integrity is prioritized.

The exam tests the ability to choose the right schema based on performance vs. storage trade-offs.

Know that star schema has fewer tables than snowflake schema for the same business process.

Both schemas use surrogate keys to link fact and dimension tables.

Easy to Mix Up

These come up on the exam all the time. Here's how to tell them apart.

Star Schema

Denormalized dimension tables (single table per dimension).

Faster query performance due to fewer joins.

Higher storage usage due to data redundancy.

Simpler ETL processes (load one table per dimension).

Preferred by most reporting tools (Power BI, Tableau).

Snowflake Schema

Normalized dimension tables (multiple tables per dimension).

Slower query performance due to more joins.

Lower storage usage due to reduced redundancy.

More complex ETL processes (load multiple related tables).

Less preferred by reporting tools; may require view denormalization.

Watch Out for These

Mistake

Snowflake schema always provides better performance than star schema.

Correct

Star schema typically provides faster query performance because it requires fewer joins. Snowflake schema increases join complexity, which can degrade performance, especially with large datasets.

Mistake

Star schema is always the best choice for any data warehouse.

Correct

Star schema is preferred for most analytical workloads due to simplicity and performance, but snowflake schema may be better when storage is limited and dimensions have many shared attributes, or when data integrity is critical.

Mistake

Snowflake schema eliminates all data redundancy.

Correct

Snowflake schema reduces redundancy but does not eliminate it entirely. Foreign key columns in fact tables still repeat, and some dimension attributes may still be duplicated across sub-dimensions.

Mistake

A star schema cannot have normalized dimensions.

Correct

By definition, a star schema has denormalized dimensions. If any dimension is normalized, the schema is considered a snowflake schema. However, some designs mix both, but the overall schema is classified by the presence of any normalized dimension.

Mistake

Fact tables can only contain numeric measures.

Correct

Fact tables primarily contain numeric additive measures, but they can also contain degenerate dimensions (e.g., order number) and foreign keys. However, the main purpose is to store measurable events.

Do You Actually Know This?

Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.

Frequently Asked Questions

What is the main difference between star and snowflake schema?

The main difference is how dimension tables are structured. In star schema, dimensions are denormalized into single tables, while in snowflake schema, dimensions are normalized into multiple related tables. This affects query performance (star is faster) and storage (snowflake uses less space).

Which schema is better for Power BI?

Star schema is strongly recommended for Power BI because it optimizes compression and query performance. Power BI's VertiPaq engine works best with denormalized dimension tables. Snowflake schema can cause slower report loading and more complex DAX queries.

Can a data warehouse use both star and snowflake schemas?

Yes, a data warehouse can have a mix of schemas. Some dimensions may be denormalized (star) while others are normalized (snowflake). However, the overall design is often classified by the predominant pattern. In practice, many warehouses use star schema for simplicity.

How do I identify a snowflake schema in a diagram?

Look for dimension tables that are connected to other dimension tables before connecting to the fact table. For example, a Product table connected to a Category table, which then connects to the fact table. In star schema, all dimension tables connect directly to the fact table.

Does snowflake schema always have more tables than star schema?

Yes, because dimensions are split into multiple tables. For the same business process, snowflake schema will have additional tables for sub-dimensions, such as separate tables for category, supplier, etc., whereas star schema combines them into one table per dimension.

What is a fact table's grain?

Grain refers to the level of detail stored in a fact table. For example, 'one row per sales transaction line item' is a grain. Defining the grain is critical because it determines the granularity of analysis. Both star and snowflake schemas use the same fact table grain.

Which schema is easier to maintain?

Star schema is generally easier to maintain because there are fewer tables to manage. ETL processes are simpler as you load one dimension table per dimension. Snowflake schema requires managing multiple tables with foreign key relationships, making maintenance more complex.

Terms Worth Knowing

Ready to put this to the test?

You've just covered Star Schema vs Snowflake Schema — now see how well it sticks with free DP-900 practice questions. Full explanations included, no account needed.

Done with this chapter?