DP-900Chapter 48 of 101Objective 2.4

Cosmos DB Request Units (RU/s)

This chapter covers Azure Cosmos DB Request Units (RU/s), the fundamental throughput metric for all Cosmos DB operations. Understanding RU/s is critical for the DP-900 exam, as approximately 10-15% of questions involve RU/s provisioning, consumption, or rate-limiting. You will learn what RU/s measures, how to choose between provisioned and serverless modes, how to calculate costs, and how to troubleshoot 429 errors. Master this topic to optimize both performance and cost in Cosmos DB solutions.

25 min read
Intermediate
Updated May 31, 2026

Cosmos DB RU/s as a Toll Road System

Imagine a toll road connecting two cities. Each vehicle (database request) must pay a toll (Request Unit) to pass. The toll road has a fixed capacity: 1,000 toll booths per hour (provisioned RU/s). If fewer than 1,000 vehicles arrive per hour, traffic flows smoothly, and you pay only for the toll booths you staff. If 1,200 vehicles arrive, 200 are forced to wait in a queue—they may time out or get rejected (rate-limited). You can increase capacity by hiring more toll booth operators (scaling RU/s), but you pay for all booths whether used or not. The toll price per vehicle depends on its size and route: a small car (simple point read) costs 1 RU, a heavy truck (complex query) costs 10 RUs. If you have a sudden surge of trucks, you might exhaust your toll booth capacity faster. The system also has a 'consumption' mode (serverless) where you pay per vehicle, but the maximum throughput is capped. This analogy directly mirrors how Cosmos DB meters every operation in Request Units and enforces throughput limits via rate-limiting (HTTP 429).

How It Actually Works

What are Request Units (RUs)?

Request Units (RUs) are a normalized measure of the computational resources required to perform a database operation on Azure Cosmos DB. Resources include CPU, memory, IOPS, and throughput. Every operation—whether a point read, a query, a write, or a stored procedure execution—consumes a certain number of RUs. You provision throughput in terms of RUs per second (RU/s). Cosmos DB guarantees that the provisioned RU/s will be available for your operations, subject to consistency models and partitioning.

Why RU/s Exists

Cosmos DB is a globally distributed, multi-model database. To provide predictable performance and SLAs (99.99% availability for single-region writes, 99.999% for multi-region writes), it needs a unified way to measure and allocate resources across different APIs (SQL, MongoDB, Cassandra, Gremlin, Table) and consistency levels. RU/s abstracts away the underlying hardware and allows you to think purely in terms of throughput. It also enables a consumption-based pricing model where you pay only for what you provision or consume.

How RU Consumption is Calculated

Each operation's RU cost depends on: - Operation type: Writes cost more than reads. A 1 KB point read (single item by ID and partition key) costs 1 RU. The same write costs 5 RUs. - Item size: Larger items consume more RUs. For point reads, RU cost scales linearly with item size (e.g., a 2 KB item costs ~2 RUs). - Query complexity: Indexed queries that scan fewer partitions cost fewer RUs. Aggregations, joins, and ORDER BY increase RU cost. - Consistency level: Stronger consistency (Strong, Bounded staleness) costs more RUs than weaker (Eventual, Consistent prefix, Session). Session consistency is the default and adds a small overhead. - Indexing policy: Queries that can use the index cost fewer RUs than full scans. Customizing indexing (e.g., excluding unused paths) reduces RU consumption.

Provisioned Throughput vs. Serverless

You can configure throughput in two modes: - Provisioned throughput: You set a specific number of RU/s (e.g., 400 RU/s). Cosmos DB reserves that capacity. You pay per hour for the provisioned RU/s, regardless of actual usage. This mode supports autoscale (auto-scaling from 10% to 100% of max RU/s) and manual scaling. Suitable for predictable workloads. - Serverless: You pay only for the RUs consumed by your operations, measured per hour. There is no provisioning. However, serverless has a maximum throughput limit (currently 5,000 RU/s per container) and is not available for all APIs or consistency levels. Suitable for intermittent or low-traffic workloads.

Default Values and Limits

Minimum provisioned RU/s per container or database: 400 RU/s.

Minimum autoscale max RU/s: 4000 RU/s (scales down to 400 RU/s at 10%).

Maximum RU/s per physical partition: 10,000 RU/s.

Maximum storage per physical partition: 50 GB.

For serverless: maximum 5,000 RU/s per container.

Each 1 KB point read at Session consistency: 1 RU.

Each 1 KB point write at Session consistency: 5 RUs.

Stored procedure execution: 5 RUs per invocation + operation costs.

Rate-Limiting (HTTP 429)

If you exceed your provisioned RU/s in a given second, Cosmos DB returns HTTP 429 (Too Many Requests) with a x-ms-retry-after-ms header indicating how long to wait. The SDKs automatically retry up to 9 times by default, with exponential backoff. You can configure RetryOptions on the CosmosClient. Rate-limiting can occur if:

You exceed the RU/s limit on a single partition (especially if partition key causes hot spots).

You have uneven data distribution (skewed partition key).

You exceed the total RU/s for the container or database.

Partitioning and RU/s

RU/s is allocated evenly across physical partitions. Each physical partition gets a share of the total RU/s. For example, with 10,000 RU/s and 4 physical partitions, each partition gets 2,500 RU/s. If one partition receives more requests than its share, requests may be rate-limited even if total RU/s is not exceeded. This is why choosing a good partition key is critical: it distributes both data and throughput evenly.

Consistency Levels and RU Cost

RU costs differ by consistency level. For point reads:

Eventual: ~0.5 RU

Consistent prefix: ~0.5 RU

Session: 1 RU (default)

Bounded staleness: ~2 RUs

Strong: ~2 RUs

Multi-region writes (with Strong consistency) are not supported. For writes, RU cost is the same across all consistency levels because writes are always replicated to a quorum.

Estimating RU Costs

You can estimate RU costs using: - Cosmos DB Capacity Planner: A web-based tool to estimate RU/s based on data size, operations, and consistency. - `x-ms-request-charge` header: Every response includes the actual RU cost for that operation. - Metrics in Azure Portal: Monitor RU consumption per partition, total RU/s, and 429 count.

Configuring RU/s

Using Azure CLI:

az cosmosdb sql container throughput update \
  --resource-group myResourceGroup \
  --account-name myAccount \
  --database-name myDatabase \
  --name myContainer \
  --throughput 400

For autoscale:

az cosmosdb sql container throughput update \
  --resource-group myResourceGroup \
  --account-name myAccount \
  --database-name myDatabase \
  --name myContainer \
  --max-throughput 4000

Using PowerShell:

Update-AzCosmosDBSqlContainerThroughput `
  -ResourceGroupName myResourceGroup `
  -AccountName myAccount `
  -DatabaseName myDatabase `
  -Name myContainer `
  -Throughput 400

Monitoring and Alerts

In Azure Portal, under Metrics, you can create charts for:

Total Request Units (per minute)

Max Consumed RU/s per partition

429 (Throttled Requests)

Normalized RU Consumption (percentage of provisioned RU/s per partition)

Set alerts when normalized RU consumption exceeds 90% or 429 count spikes.

Best Practices

Use autoscale for variable workloads to avoid over-provisioning.

Choose a partition key with high cardinality and even request distribution.

For read-heavy workloads, consider enabling indexing only on necessary paths.

Use the Capacity Planner before deployment.

Monitor x-ms-request-charge in application code to understand RU costs.

When migrating, use Azure Data Factory or the Cosmos DB Spark connector to estimate RU/s.

Interaction with Other Azure Services

Azure Functions: Cosmos DB trigger uses change feed, which consumes RUs. Ensure sufficient RU/s to avoid 429 from trigger processing.

Azure Synapse Link: Analytical store queries do not consume RU/s from transactional store.

Azure Data Factory: Copy activities consume RUs from the source Cosmos DB container.

Azure Stream Analytics: Output to Cosmos DB consumes write RUs.

Exam-Specific Details

The exam expects you to know that 1 RU = 1 point read of 1 KB at Session consistency.

You must know the minimum provisioned throughput: 400 RU/s.

Understand that autoscale scales between 10% and 100% of the max RU/s.

Know that serverless has a max of 5,000 RU/s per container.

Recognize that Strong consistency doubles RU cost for reads compared to Session.

Be able to identify scenarios where 429 errors occur and how to resolve (increase RU/s, optimize queries, fix partition key skew).

Walk-Through

1

Identify Operation Type and Size

Determine what kind of operation you are performing: point read, point write, query, stored procedure, or multi-document transaction. Measure the document size in KB. For example, a point read of a 1 KB document at Session consistency costs 1 RU. A point write of the same document costs 5 RUs. A complex query scanning 1000 documents might cost 100 RUs. This step is crucial because the RU cost is the foundation for provisioning and optimization.

2

Choose Consistency Level

Select the appropriate consistency level for your application. The default is Session, which adds a small overhead (1 RU per point read). Weaker levels like Eventual or Consistent prefix reduce RU cost by about half (0.5 RU per point read). Strong consistency doubles the cost (2 RUs). However, consistency level affects read RU only; write RU is constant. The exam tests that you know the relative RU costs across consistency levels.

3

Provision Throughput in RU/s

Decide between provisioned and serverless modes. For provisioned, set the RU/s value (minimum 400). Use autoscale if workload varies. For serverless, no provisioning is needed but max is 5,000 RU/s per container. Use the Azure Portal, CLI, or SDK to configure. For example, `az cosmosdb sql container throughput update --throughput 1000`. The system allocates RU/s evenly across physical partitions.

4

Monitor RU Consumption and 429 Errors

After deployment, monitor the `Total Request Units` and `Max Consumed RU/s per partition` metrics. Watch for HTTP 429 responses. The response includes `x-ms-retry-after-ms`. SDKs retry automatically. If 429s persist, you need to either increase RU/s, optimize queries (e.g., reduce document size, use index), or repartition data to avoid hot partitions. The exam asks how to diagnose and resolve throttling.

5

Optimize RU Costs

Reduce RU consumption by: 1) Using weaker consistency when acceptable. 2) Excluding unused paths from indexing. 3) Limiting query results with `TOP` or `LIMIT` clauses. 4) Using point reads instead of queries when possible. 5) Enabling TTL to delete old data. 6) Using change feed with batch processing. The exam expects you to know optimization strategies to lower RU costs.

What This Looks Like on the Job

Enterprise Scenario 1: E-Commerce Product Catalog

A large online retailer uses Cosmos DB (SQL API) to store product catalog data with 10 million items, each about 2 KB. They experience 20,000 point reads per second during peak hours, with 100 writes per second. Using Session consistency, each read costs ~2 RUs (since items are 2 KB), so reads consume 40,000 RU/s. Writes cost 5 RUs each, adding 500 RU/s. They provision 45,000 RU/s manually. However, during Black Friday, traffic spikes to 30,000 reads/s. They get 429 errors. Solution: Enable autoscale with max RU/s of 90,000. Autoscale automatically increases throughput to 90,000 RU/s during the spike, then scales down to 4,500 RU/s (10%) afterward, saving costs. They also implement client-side retry with exponential backoff. Misconfiguration: Initially they chose a partition key of category, causing hot partitions for popular categories. They changed to productId (high cardinality) to distribute load evenly.

Enterprise Scenario 2: IoT Telemetry Ingestion

A manufacturing company ingests sensor data from 100,000 devices, each sending a 1 KB document every minute. That's 1,667 writes per second. Each write costs 5 RUs, totaling ~8,333 RU/s. They also run analytical queries on the last hour of data. They use serverless mode because traffic is constant but low (under 5,000 RU/s per container). However, they need to store data for 30 days. With serverless, storage costs are separate. They partition by deviceId and use TTL to auto-delete data older than 30 days. They monitor RU consumption and see it averages 2,000 RU/s, well under the serverless limit. A common mistake: They initially used provisioned throughput at 10,000 RU/s, paying for idle capacity. Switching to serverless reduced costs by 60%.

Enterprise Scenario 3: Multi-Region Social Media App

A social media platform with users worldwide uses Cosmos DB with multi-region writes. They have 100 million users, each reading their timeline (point read of 5 KB) several times a day. They provision 1,000,000 RU/s distributed across three regions. They use Session consistency. They monitor normalized RU consumption per partition and see one partition hitting 100% while others are at 20%. This indicates a hot partition due to a bad partition key (e.g., region). They migrate to a composite partition key (userId + region) to spread load. They also use the Cosmos DB Capacity Planner to estimate RU/s for new features. When misconfigured, 429 errors caused user-facing latency, leading to churn. Proper monitoring and alerting on 429 count and normalized RU consumption prevented issues.

How DP-900 Actually Tests This

What DP-900 Tests on RU/s (Objective 2.4)

The exam focuses on:

Defining Request Units and what they measure (CPU, memory, IOPS).

Understanding that 1 RU = 1 point read of 1 KB at Session consistency.

Knowing the minimum provisioned throughput: 400 RU/s.

Comparing provisioned vs. serverless: provisioned guarantees throughput, serverless has max 5,000 RU/s.

Identifying causes of 429 errors: exceeding RU/s, hot partitions, insufficient provisioned throughput.

Recognizing that autoscale ranges from 10% to 100% of max RU/s.

Understanding that RU cost varies by consistency level (weaker = cheaper).

Common Wrong Answers and Why Candidates Choose Them

1.

'RU/s is the same as IOPS.' Many candidates confuse RU/s with IOPS because both measure throughput. But RU/s is a composite metric including CPU and memory. The exam tests that RU/s is not equivalent to IOPS.

2.

'Serverless has no throughput limit.' Candidates assume serverless is unlimited because it's consumption-based. Actually, serverless caps at 5,000 RU/s per container. The exam expects you to know this limit.

3.

'Provisioned throughput is always cheaper than serverless.' This is false for low-traffic workloads. Serverless can be cheaper if average RU/s is low. The exam tests cost trade-offs.

4.

'All operations cost 1 RU.' Candidates forget that writes, queries, and stored procedures cost more. The exam gives scenarios where a write costs 5 RUs and asks to calculate total RU/s.

Specific Numbers and Terms on the Exam

400 RU/s (minimum provisioned)

5,000 RU/s (serverless max)

1 RU for 1 KB point read (Session)

5 RUs for 1 KB point write

Autoscale range: 10% to 100% of max

HTTP 429 error code

x-ms-request-charge header

x-ms-retry-after-ms header

Edge Cases and Exceptions

Strong consistency: Not supported with multi-region writes. If asked, you cannot have both.

Stored procedures: Cost 5 RUs per invocation plus operation RUs. This is a common trick question.

Change feed: Reading change feed consumes RUs (1 RU per read of 1 KB).

Partition merge: Not supported; you cannot reduce RU/s below what is needed for existing partitions.

How to Eliminate Wrong Answers

If a question asks about cost optimization, look for options that mention reducing document size, using weaker consistency, or excluding indexing paths.

If a question involves 429 errors, eliminate answers that suggest changing consistency to Strong (which increases RU cost) or adding more indexes (which increases write RU).

If comparing provisioned vs. serverless, remember that serverless is best for intermittent or low-throughput workloads; provisioned is for predictable high throughput.

Always calculate RU/s by summing all operations per second, not per minute or per hour.

Key Takeaways

1 RU = 1 point read of 1 KB at Session consistency.

Minimum provisioned throughput: 400 RU/s per container or database.

Serverless max throughput: 5,000 RU/s per container.

Autoscale ranges from 10% to 100% of the max RU/s set.

HTTP 429 errors indicate rate-limiting due to exceeded RU/s or hot partitions.

Write operations cost 5 RUs per 1 KB (Session consistency).

Weaker consistency levels reduce read RU cost.

Use `x-ms-request-charge` header to see actual RU consumption per operation.

Partition key choice directly affects RU distribution and throttling.

The Capacity Planner tool helps estimate required RU/s before deployment.

Easy to Mix Up

These come up on the exam all the time. Here's how to tell them apart.

Provisioned Throughput

You set a specific RU/s value (minimum 400).

You pay per hour for the provisioned RU/s, regardless of usage.

Supports autoscale (scales from 10% to 100% of max).

No upper limit on RU/s per container (scalable).

Best for predictable, high-throughput workloads.

Serverless

No provisioning; pay per consumed RU.

Maximum 5,000 RU/s per container.

No autoscale; throughput is limited by serverless cap.

No minimum RU/s; cost is consumption-based.

Best for intermittent or low-traffic workloads.

Manual Scaling

You manually change RU/s as needed.

Cost is based on the set RU/s, even if unused.

Requires monitoring and manual intervention.

No scaling delays; immediate change.

Suitable for steady, predictable workloads.

Autoscale

Scales automatically between 10% and 100% of max RU/s.

You pay for the max RU/s set, not the actual usage (billing is based on max scaled RU/s).

No manual intervention needed; reacts to traffic.

Scaling takes effect within minutes.

Suitable for variable or unpredictable workloads.

Watch Out for These

Mistake

RU/s is the same as the number of requests per second.

Correct

RU/s measures resource consumption, not request count. A single complex query can consume 100 RUs, while 100 simple point reads might consume 100 RUs. They are different.

Mistake

Serverless has no throughput limits.

Correct

Serverless is capped at 5,000 RU/s per container. If your workload exceeds that, you must use provisioned throughput.

Mistake

Provisioned throughput is always more expensive than serverless.

Correct

Provisioned throughput can be cheaper for steady, high-throughput workloads because serverless has a higher per-RU price. Serverless is cheaper for low or intermittent usage.

Mistake

Autoscale always provisions the max RU/s.

Correct

Autoscale scales between 10% and 100% of the max RU/s based on demand. It does not always run at max.

Mistake

You can set RU/s below 400 for a container.

Correct

The minimum provisioned throughput per container is 400 RU/s. You cannot set it lower, even if you have no traffic.

Do You Actually Know This?

Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.

Frequently Asked Questions

What is the minimum RU/s for a Cosmos DB container?

The minimum provisioned throughput for a container is 400 RU/s. This applies to both manual and autoscale modes. For autoscale, you set a max RU/s (minimum 4,000), and the system scales down to 10% of that (400 RU/s). Serverless has no minimum but caps at 5,000 RU/s.

How do I know how many RUs my query consumed?

Check the `x-ms-request-charge` header in the response. For example, in the .NET SDK, you can access `FeedResponse.RequestCharge`. The Azure Portal also shows RU consumption in the Metrics blade under 'Total Request Units'.

What causes HTTP 429 errors in Cosmos DB?

HTTP 429 (Too Many Requests) occurs when the request rate exceeds the provisioned RU/s for the container or a specific partition. Possible causes: insufficient provisioned throughput, hot partition (one partition receiving more traffic than its share), or exceeding the serverless limit. Solutions: increase RU/s, enable autoscale, optimize queries, or repartition data.

Can I change from provisioned to serverless?

Yes, but you cannot directly switch. You must migrate data to a new container with the desired throughput mode. There is no in-place conversion. Plan accordingly using Azure Data Factory or bulk executor.

Is autoscale cheaper than manual scaling?

Autoscale can be cheaper for variable workloads because you pay for the max RU/s scaled to, which may be lower than a manually provisioned fixed value. However, for steady workloads, manual scaling is often cheaper because autoscale bills at the max RU/s for the hour, even if usage is low.

Does consistency level affect write RU cost?

No. Write RU cost is the same regardless of consistency level. Consistency level only affects read RU costs. Writes always require a quorum commit, consuming the same resources.

What is the RU cost of a stored procedure?

Each stored procedure invocation costs 5 RUs, plus the RU cost of the operations performed inside the procedure. For example, if the procedure reads 10 documents (1 RU each) and writes 1 document (5 RUs), total = 5 + 10 + 5 = 20 RUs.

Terms Worth Knowing

Ready to put this to the test?

You've just covered Cosmos DB Request Units (RU/s) — now see how well it sticks with free DP-900 practice questions. Full explanations included, no account needed.

Done with this chapter?