Question 660 of 982
Describe an analytics workload on AzurehardMultiple ChoiceObjective-mapped

Quick Answer

The answer is to hash-distribute the fact table on ProductID and partition it on Date. This strategy minimizes data movement because hash distribution on ProductID ensures that rows for the same product are co-located on the same distribution node, making joins with the product dimension table far more efficient by avoiding data shuffling across nodes. Partitioning on Date then enables partition elimination, where the query engine skips entire partitions that fall outside the date range filter, drastically reducing the amount of data scanned. On the DP-900 exam, this scenario tests your understanding of how distribution and partitioning work together to address specific query patterns—a common trap is choosing round-robin distribution, which scatters data randomly and forces expensive data movement during joins. Remember the memory tip: “Hash for joins, partition for filters” to quickly recall that distribution optimizes joins while partitioning optimizes range scans.

DP-900 Describe an analytics workload on Azure Practice Question

This DP-900 practice question tests your understanding of describe an analytics workload on azure. Match the stated requirement to the specific cloud service, access model, or configuration option — many options are valid in isolation but not for this scenario. A key principle to apply: hash distribution co-locates rows with the same distribution key on the same compute node.. Once you have made your selection, read the full explanation to reinforce the concept and understand why each distractor is designed to mislead on exam day.

A company uses Azure Synapse Analytics dedicated SQL pool for large-scale data warehousing. They have a fact table with billions of rows and frequently run queries that filter by a date range and join with a product dimension table. Which table distribution and partitioning strategy will minimize data movement and improve query performance?

Clue words in this question

Noticing these words before you look at the options changes how you read each choice.

  • Clue: "minimum / minimize"

    Why it matters: Asks for the least resource use — fewest addresses, smallest subnet, lowest overhead. Eliminate over-provisioned options even if they would technically work.

Question 1hardmultiple choice
Full question →

Answer choices

Why each option matters

Answer the question above first, then reveal the full breakdown to understand why each option is right or wrong.

Correct answer & explanation

Hash-distribute on ProductID with partitioning on Date

Hash-distributing the fact table on ProductID ensures that rows for the same product are co-located on the same distribution, minimizing data movement when joining with the product dimension table. Partitioning on Date allows partition elimination for date-range filters, reducing the amount of data scanned. This combination directly addresses the query pattern of date-range filtering and product joins.

Key principle: Hash distribution co-locates rows with the same distribution key on the same compute node.

Answer analysis

Option-by-option breakdown

For each option: why learners choose it and why it is or isn't the right answer here.

  • Round-robin distribution with no partitioning

    Why it's wrong here

    Round-robin distributes data uniformly but does not co-locate related rows, so joins require data movement across distributions. No partitioning means all data must be scanned for date filters.

  • Hash-distribute on ProductID with partitioning on Date

    Why this is correct

    Hash-distribution on ProductID co-locates rows with the same ProductID, enabling efficient joins with the product dimension. Partitioning on the Date column enables partition elimination for date range queries, reducing the amount of data scanned.

    Clue confirmation

    The clue word "minimum / minimize" in the question point toward this answer.

    Related concept

    Hash distribution co-locates rows with the same distribution key on the same compute node.

  • Replicate the fact table on all distributions and partition on ProductID

    Why it's wrong here

    Replicating a large fact table is impractical due to storage overhead and data duplication. Replication is recommended only for small dimension tables. Partitioning on ProductID does not directly help with date range filtering.

  • Hash-distribute on Date with partitioning on ProductID

    Why it's wrong here

    Hash-distributing on Date would spread rows of the same product across distributions, causing data movement during joins. Partitioning on ProductID would not effectively prune data for date range queries, as partitions would be large and contain many dates.

Common exam traps

Common exam trap: answer the scenario, not the keyword

The trap here is that candidates often confuse the roles of distribution and partitioning, thinking that partitioning on the join key (ProductID) will improve join performance, when in fact hash distribution on the join key is what co-locates data for joins, while partitioning on the filter column (Date) enables partition elimination.

Detailed technical explanation

How to think about this question

In Azure Synapse dedicated SQL pool, hash distribution uses a hash function to assign rows to one of 60 distributions; co-locating join keys on the same distribution avoids expensive data movement. Partitioning on Date with a range boundary (e.g., monthly) allows the query optimizer to prune partitions at the storage level, reducing I/O. A real-world scenario: a sales fact table with 5 billion rows filtered on 'OrderDate BETWEEN '2024-01-01' AND '2024-01-31'' can skip scanning 11/12 of the data if partitioned monthly.

KKey Concepts to Remember

  • Hash distribution co-locates rows with the same distribution key on the same compute node.
  • Partitioning divides a table into smaller, manageable segments based on a specified column.
  • Partition elimination reduces data scanned by skipping irrelevant partitions for filtered queries.
  • Efficient joins in Synapse require co-location of join keys to minimize data movement.

TExam Day Tips

  • Watch for words such as best, first, most likely and least administrative effort.
  • Review why wrong options are wrong, not only why the correct option is correct.

Key takeaway

Hash distribution co-locates rows with the same distribution key on the same compute node.

Real-world example

How this comes up in practice

An e-commerce site experiences heavy traffic on Black Friday and near-zero traffic during off-peak weeks. Rather than provisioning permanent large VMs, the team uses auto-scaling groups that add capacity automatically under load and reduce it overnight. Questions like this test whether you understand elasticity, availability zones, and cloud compute scaling patterns.

What to study next

Got this wrong? Here's your next step.

Review hash distribution co-locates rows with the same distribution key on the same compute node., then practise related DP-900 questions on the same topic to reinforce the concept.

Related practice questions

Related DP-900 practice-question pages

Use these pages to review the topic behind this question. This is how one missed question becomes focused revision.

Practice this exam

Start a free DP-900 practice session

Short sessions build daily habit. Longer sessions build exam-day stamina. Try a timed session to simulate real conditions.

FAQ

Questions learners often ask

What does this DP-900 question test?

Describe an analytics workload on Azure — This question tests Describe an analytics workload on Azure — Hash distribution co-locates rows with the same distribution key on the same compute node..

What is the correct answer to this question?

The correct answer is: Hash-distribute on ProductID with partitioning on Date — Hash-distributing the fact table on ProductID ensures that rows for the same product are co-located on the same distribution, minimizing data movement when joining with the product dimension table. Partitioning on Date allows partition elimination for date-range filters, reducing the amount of data scanned. This combination directly addresses the query pattern of date-range filtering and product joins.

What should I do if I get this DP-900 question wrong?

Review hash distribution co-locates rows with the same distribution key on the same compute node., then practise related DP-900 questions on the same topic to reinforce the concept.

Are there clue words in this question I should notice?

Yes — watch for: "minimum / minimize". Asks for the least resource use — fewest addresses, smallest subnet, lowest overhead. Eliminate over-provisioned options even if they would technically work.

What is the key concept behind this question?

Hash distribution co-locates rows with the same distribution key on the same compute node.

About these practice questions

Courseiva creates original exam-style practice questions with explanations and wrong-answer analysis. It does not publish real exam questions, exam dumps, or protected exam content. Learn why practice questions differ from exam dumps →

How Courseiva writes practice questions · Editorial policy

Same concept, more angles

3 more ways this is tested on DP-900

These questions test the same concept from different angles. Work through them to make sure you can recognise it however the exam phrases it.

Variation 1. A company uses Azure Synapse Analytics for their data warehouse. They notice that queries against the fact table are slow. The fact table is hash-distributed on OrderID. Most queries filter by CustomerID. What should they do to improve performance?

medium
  • A.Change to round-robin distribution
  • B.Change the distribution column to CustomerID
  • C.Use rowstore instead of columnstore
  • D.Replicate the fact table to all compute nodes

Why B: The fact table is hash-distributed on OrderID, but queries filter by CustomerID. This causes data movement across nodes for each query, as the filter column doesn't align with the distribution key. Changing the distribution column to CustomerID ensures that rows for the same CustomerID are co-located on the same compute node, eliminating unnecessary data shuffling and improving query performance.

Variation 2. A logistics company uses Azure Synapse Analytics dedicated SQL pool to analyze billions of shipment records. The table 'Shipments' is 10 TB and hash-distributed on 'ShipmentID'. Analysts frequently run queries that filter on 'WarehouseID' and aggregate by 'Region'. These queries are slow because they cause data movement (shuffle) across distributions. Which table design change will most improve query performance for these analytical workloads?

hard
  • A.Change distribution to replicated table
  • B.Change distribution to round-robin
  • C.Create a columnstore index
  • D.Change distribution to hash on 'WarehouseID'

Why D: D is correct because hash-distributing the 'Shipments' table on 'WarehouseID' ensures that all rows for a given warehouse are co-located on the same distribution node. This eliminates the need for data movement (shuffle) when queries filter on 'WarehouseID' and aggregate by 'Region', as the aggregation can be performed locally on each distribution without redistributing data across nodes.

Variation 3. A company uses Azure Synapse Analytics dedicated SQL pool to run large-scale analytics. The data engineering team notices that queries are slow due to excessive data movement between distributions. Which index type should be recommended to minimize data movement for fact tables that are frequently joined on a specific column?

hard
  • A.Ordered clustered columnstore index
  • B.Clustered columnstore index
  • C.Round-robin distributed table
  • D.Hash-distributed table

Why D: Hash-distributed tables distribute rows across distributions based on a hash of a chosen column. When fact tables are frequently joined on that specific column, using the same distribution column ensures that matching rows from both tables reside on the same distribution, eliminating the need to shuffle data between distributions during the join. This minimizes data movement and significantly improves query performance in Azure Synapse dedicated SQL pools.

Last reviewed: Jun 11, 2026

Question Discussion

Share a tip, memory trick, or ask about the reasoning behind this question. Do not post real exam questions, leaked content, braindumps, or copyrighted exam material. Comments are moderated and may be removed without notice.

Loading comments…

Sign in to join the discussion.

This DP-900 practice question is part of Courseiva's free Microsoft certification practice question bank. Courseiva provides original exam-style practice questions with explanations, topic-based practice, mock exams, readiness tracking, and study analytics to help learners prepare for the DP-900 exam.