This chapter covers DynamoDB capacity modes—on-demand and provisioned—a core topic for the SAA-C03 exam. Understanding when to use each mode and how read/write capacity units (RCUs/WCUs) work is critical for designing cost-effective, high-performance applications. Approximately 10-15% of exam questions touch DynamoDB, with capacity mode decisions appearing in at least 2-3 questions. This chapter will give you the precise mechanisms, defaults, and exam traps you need to answer those questions correctly.
Jump to a section
Think of DynamoDB as a toll road with multiple lanes. There are two ways to pay: on-demand (like a toll booth that charges per car) or provisioned (like buying a monthly pass for a specific number of trips). With on-demand, you pay per request—no planning needed, but the cost per request is higher. With provisioned, you commit to a certain number of reads and writes per second (RCUs and WCUs) and pay a flat rate for that capacity, much like a monthly pass that gives you a set number of lane uses. If you exceed your provisioned capacity, you get throttled—your car is forced to wait or take a detour—unless you enable auto-scaling, which automatically adjusts your pass level based on traffic. The toll road analogy extends to burst capacity: if you don't use all your passes in a given second, you can accumulate tokens (like unused passes) to handle short spikes, up to 5 minutes of accumulated capacity. The key difference: on-demand is like a pay-per-car booth with no traffic limits but higher per-car cost; provisioned is like a prepaid pass that gives you predictable cost and performance, but you must plan your capacity or risk throttling.
What is DynamoDB Capacity?
DynamoDB is a fully managed NoSQL database that delivers single-digit millisecond latency at any scale. To achieve this, it allocates resources based on the capacity you configure. Capacity is measured in Read Capacity Units (RCUs) and Write Capacity Units (WCUs). An RCU represents one strongly consistent read per second for an item up to 4 KB in size. A WCU represents one write per second for an item up to 1 KB. For items larger than these thresholds, you consume multiple units proportionally. For example, reading a 6 KB item strongly consumes 2 RCUs (6/4 rounded up). Eventually consistent reads consume half the RCUs (0.5 per 4 KB), so a 6 KB eventually consistent read consumes 1 RCU.
On-Demand Capacity Mode
On-demand mode is a pay-per-request pricing model where DynamoDB automatically scales to handle your workload. You don't specify read/write capacity; you pay only for the actual reads and writes your application performs. This mode is ideal for unpredictable traffic, new applications, or workloads with significant variability. DynamoDB can handle sudden spikes up to the account's table-level throughput limit (default 40,000 RCU and 40,000 WCU per table, but can be increased via a support ticket). The cost per RCU is approximately 2.5x higher than provisioned (e.g., $1.25 per million RCUs vs. $0.50 per million for provisioned). On-demand uses a two-tier pricing model: the first 25 RCU/WCU per table are free per month (in some regions), but this is a minor benefit. The key exam point: on-demand is NOT unlimited; there is a soft limit per account, and sustained usage beyond the default limit requires a service quota increase.
Provisioned Capacity Mode
In provisioned mode, you explicitly set the number of RCUs and WCUs your table should support. DynamoDB reserves the necessary infrastructure to guarantee that capacity. You pay for the provisioned capacity regardless of actual usage. This mode is cost-effective for predictable workloads. You can also enable Auto Scaling, which adjusts capacity based on CloudWatch metrics (ConsumedReadCapacityUnits and ConsumedWriteCapacityUnits) using a target utilization (default 70%). Auto Scaling uses a cooldown period (default 5 minutes) between scaling actions to avoid thrashing. Without Auto Scaling, if you exceed your provisioned capacity, you get ProvisionedThroughputExceededException errors (HTTP 400). The SDK automatically retries with exponential backoff, but sustained throttling degrades performance.
Burst Capacity
DynamoDB provides a burst pool for provisioned tables. For every second you don't use your full provisioned capacity, DynamoDB accumulates unused capacity as tokens. You can use these tokens to absorb spikes up to 5 minutes of accumulated capacity. For example, if you provision 1000 WCU but only use 800 for 300 seconds, you accumulate 200*300 = 60,000 tokens. You can then burst up to 1000 + (60,000 / 300) = 1200 WCU for the next 300 seconds? Actually, the burst pool is capped at 5 minutes of provisioned capacity. So for 1000 WCU, the maximum burst is 1000 * 300 = 300,000 tokens, giving you a burst rate of 1000 + (300,000 / 300) = 2000 WCU for 5 minutes? This is wrong: burst capacity allows you to use up to 5 minutes of unused capacity in a single second, but the actual mechanism is simpler: you can consume up to the provisioned capacity plus the accumulated tokens, but the tokens are consumed at the rate you use them. The exam tests that burst capacity is limited to 5 minutes of provisioned capacity. For example, a table with 1000 WCU can burst to 2000 WCU for 5 minutes if it has been idle for 5 minutes. After that, it returns to 1000 WCU. If you exceed the burst pool, you get throttled.
Adaptive Capacity
DynamoDB also has adaptive capacity, which automatically splits partitions and rebalances throughput when a hot partition is detected. This is enabled by default for all tables. It helps distribute load more evenly, but it does not eliminate the need for proper partition key design. Adaptive capacity can increase the total throughput of a table by spreading load across more partitions, but it cannot exceed the table's provisioned capacity (or on-demand limits). The exam often tests that adaptive capacity helps with uneven access patterns but is not a substitute for good key design.
How to Choose Between On-Demand and Provisioned
The SAA-C03 exam expects you to know the following decision criteria:
Use on-demand for: unpredictable traffic, new applications, light or infrequent workloads, or when you don't want to manage capacity.
Use provisioned for: predictable traffic, cost optimization for steady workloads, or when you need reserved capacity for guaranteed throughput.
Use provisioned with Auto Scaling for: workloads with moderate variability that can tolerate scaling delays (cooldown period).
Also consider DynamoDB Accelerator (DAX) for read-heavy workloads to reduce read capacity consumption (DAX caches reads, reducing RCU usage).
Reserved Capacity
For provisioned mode, you can purchase reserved capacity (1-year or 3-year terms) for additional cost savings (up to 50% discount). This is not a capacity mode but a pricing option. The exam may test that reserved capacity is only available for provisioned mode, not on-demand.
Throttling and Error Handling
When a table is throttled, DynamoDB returns a ProvisionedThroughputExceededException. The AWS SDKs automatically retry with exponential backoff (starting with 50 ms delay, doubling each time up to 20 seconds). However, consistent throttling can cause latency spikes. To avoid throttling, you can:
Increase provisioned capacity
Switch to on-demand
Use Auto Scaling
Optimize partition key design
Use DAX for reads
Use S3 for large objects (items > 400 KB should be stored in S3 with pointer in DynamoDB)
Interacting with Other Services
DynamoDB integrates with AWS Lambda for triggers (streams), with Kinesis for real-time streaming, and with CloudWatch for monitoring. The exam often presents scenarios where you need to choose a capacity mode based on cost and performance trade-offs, often in combination with these services.
Create a DynamoDB Table
When you create a table via AWS Console, CLI, or SDK, you select a capacity mode: On-Demand or Provisioned. For provisioned, you specify initial RCU and WCU values. The table is created with one partition initially. As data grows, DynamoDB automatically splits partitions (each partition can hold up to 10 GB and support up to 3000 RCU or 1000 WCU). The partition key determines how data is distributed. Choosing a good partition key is critical to avoid hot partitions.
Read/Write Operations Consume Capacity
Each GetItem, Query, Scan, PutItem, UpdateItem, or DeleteItem consumes RCU/WCU. For reads, the size of the returned item(s) determines RCU consumption. For strongly consistent reads, 1 RCU per 4 KB; for eventually consistent, 0.5 RCU per 4 KB. For queries and scans, the total size of all returned items is used. Writes consume 1 WCU per 1 KB. Transactions (TransactGetItems, TransactWriteItems) consume double capacity (2 RCU per 4 KB, 2 WCU per 1 KB). The exam often tests these exact numbers.
Monitor CloudWatch Metrics
CloudWatch metrics like ConsumedReadCapacityUnits, ConsumedWriteCapacityUnits, ProvisionedReadCapacityUnits, ProvisionedWriteCapacityUnits, and ThrottledRequests are key. For on-demand, ConsumedReadCapacityUnits reflects actual usage; there is no provisioned metric. Auto Scaling uses ConsumedCapacity/ProvisionedCapacity ratio (target utilization default 70%). ThrottledRequests > 0 indicates you are exceeding capacity. The exam may ask you to interpret a graph showing throttling and choose a solution.
Auto Scaling Adjusts Capacity
If Auto Scaling is enabled, DynamoDB creates an AWS Application Auto Scaling target with a scaling policy. It uses CloudWatch alarms to trigger scaling. The default cooldown period is 5 minutes (scaling in) and 5 minutes (scaling out). Scaling out (increase capacity) is faster than scaling in (decrease). The target utilization is configurable. Auto Scaling cannot scale down below the minimum capacity set (default is 5 for both reads and writes). The exam tests that Auto Scaling is reactive, not proactive, and may not handle sudden spikes.
Handle Throttling with Exponential Backoff
When a request is throttled, the SDK retries automatically. The default retry strategy uses exponential backoff: initial delay 50 ms, doubling each time up to 20 seconds max. The SDK retries up to 10 times by default. If all retries fail, the error is thrown to the application. For mission-critical applications, you should implement your own retry logic with jitter or use on-demand mode. The exam may ask about handling throttling in a serverless application using Lambda.
Scenario 1: E-commerce Product Catalog
An e-commerce company uses DynamoDB to store product inventory. Traffic is predictable: high during business hours, low at night, with occasional spikes during flash sales. Initially, they used provisioned capacity with Auto Scaling set to 70% target utilization. During a flash sale, traffic spiked 10x in seconds. Auto Scaling could not react fast enough (cooldown period of 5 minutes), causing throttling and lost sales. They switched to on-demand mode for the product table. Cost increased by 2.5x for normal traffic, but the ability to handle spikes without throttling was worth it. They also implemented DAX for read-heavy product pages, reducing RCU consumption by 80%. The lesson: for workloads with unpredictable spikes, on-demand is safer, but you can combine with DAX to mitigate cost.
Scenario 2: IoT Sensor Data Ingestion
A smart city project ingests sensor data from thousands of devices at a steady rate of 10,000 writes per second. The data is time-series, so they use a partition key of device_id combined with a sort key of timestamp. They chose provisioned capacity with 10,000 WCU (steady state). They also purchased reserved capacity for 1-year term, saving 30% compared to on-demand. They enabled Auto Scaling to handle occasional device firmware updates that double the write rate. The system runs smoothly. The key: predictable workloads with steady throughput benefit from provisioned + reserved capacity for maximum cost savings.
Scenario 3: Gaming Leaderboard
A mobile game uses DynamoDB for a real-time leaderboard. Traffic is highly variable: low during weekdays, high on weekends, and extreme spikes during tournaments. The game uses a global secondary index (GSI) for leaderboard queries. Initially, they used provisioned mode but faced throttling on the GSI (which inherits capacity from the base table). They switched to on-demand to eliminate throttling. However, on-demand cost was high due to heavy writes during tournaments. They optimized by using DynamoDB Streams to aggregate scores in a separate analytics table with provisioned capacity. This hybrid approach reduced costs while maintaining performance. The exam often tests that GSIs consume separate capacity from the base table, and throttling on a GSI can occur even if the base table has sufficient capacity.
What SAA-C03 Tests on DynamoDB Capacity
The exam objectives (Domain 3: High Performance, Objective 3.6) specifically test your ability to choose between on-demand and provisioned capacity based on workload characteristics. You will see scenario-based questions where you must recommend the most cost-effective or performant solution. Key objective codes: 3.6 (Design high-performing database architectures), 3.7 (Determine cost-effective database solutions).
Common Wrong Answers and Why Candidates Choose Them
Choosing provisioned for unpredictable workloads – Candidates think provisioned is cheaper, but they forget that over-provisioning to handle spikes is wasteful. The correct answer is on-demand for unpredictable workloads.
Choosing on-demand for steady workloads – Candidates think on-demand is simpler, but they ignore the cost premium. The correct answer is provisioned with reserved capacity for cost savings.
Thinking Auto Scaling eliminates throttling instantly – Candidates assume Auto Scaling reacts immediately, but they forget the 5-minute cooldown. The correct answer for sudden spikes is on-demand.
Confusing RCU and WCU sizes – Candidates mix up 4 KB for reads and 1 KB for writes. The exam tests these exact numbers.
Specific Numbers and Terms
RCU: 1 strongly consistent read per second for up to 4 KB; 0.5 RCU for eventually consistent.
WCU: 1 write per second for up to 1 KB.
Transaction reads: 2 RCU per 4 KB; transaction writes: 2 WCU per 1 KB.
Burst capacity: up to 5 minutes of provisioned capacity.
Auto Scaling target utilization: default 70%.
Auto Scaling cooldown: 5 minutes.
On-demand default limit: 40,000 RCU and 40,000 WCU per table (soft limit).
Reserved capacity: 1-year or 3-year term, up to 50% discount.
Edge Cases and Exceptions
GSI throttling: A GSI has its own capacity if specified; otherwise, it shares the table's capacity. Throttling on a GSI can occur even if the base table is not throttled. The exam loves this.
Adaptive capacity: This helps with hot partitions but does not increase total table capacity beyond provisioned or on-demand limits.
Large items: Items > 400 KB should be stored in S3 with pointer in DynamoDB to avoid excessive capacity consumption.
Conditional writes: These still consume WCU even if the condition fails.
Empty responses: A Query or Scan that returns no items still consumes RCU for the read.
How to Eliminate Wrong Answers
Identify workload pattern: predictable vs. unpredictable, steady vs. spiky.
If cost is a concern and workload is predictable, eliminate on-demand.
If performance under spikes is critical and cost is secondary, eliminate provisioned.
If Auto Scaling is mentioned, check for cooldown delays—if the spike is sudden, Auto Scaling won't help.
Always check for reserved capacity options—only available for provisioned.
RCU: 1 strongly consistent read of up to 4 KB per second; eventually consistent reads consume 0.5 RCU.
WCU: 1 write of up to 1 KB per second; larger items consume proportional WCU.
Transaction operations consume double capacity: 2 RCU per 4 KB, 2 WCU per 1 KB.
Burst capacity for provisioned tables is limited to 5 minutes of unused capacity.
Auto Scaling default target utilization is 70% with a 5-minute cooldown.
On-demand has a default throughput limit of 40,000 RCU and 40,000 WCU per table (soft limit).
GSIs can have their own provisioned capacity; if not set, they share the table's capacity and can cause throttling.
Reserved capacity is only for provisioned mode, offering up to 50% discount for 1- or 3-year terms.
These come up on the exam all the time. Here's how to tell them apart.
On-Demand Capacity
Pay per request (RCU/WCU consumed)
No capacity planning needed
Handles unpredictable spikes automatically
Higher cost per unit (approx 2.5x)
Soft limit of 40,000 RCU/WCU per table
Provisioned Capacity
Pay for provisioned capacity regardless of usage
Requires capacity planning or Auto Scaling
Burst capacity limited to 5 minutes of unused capacity
Lower cost per unit, especially with reserved capacity
Can be throttled if exceeded
Mistake
On-demand mode has no throughput limits.
Correct
On-demand has a soft limit of 40,000 RCU and 40,000 WCU per table by default. You can request a limit increase via a support ticket, but it's not unlimited.
Mistake
Provisioned capacity guarantees exactly that throughput at all times.
Correct
Provisioned capacity guarantees throughput under normal conditions, but burst capacity is limited to 5 minutes of unused capacity. If you exceed the burst pool, you get throttled.
Mistake
Auto Scaling handles sudden traffic spikes instantly.
Correct
Auto Scaling has a cooldown period of 5 minutes (default) and reacts to CloudWatch metrics, which have a 1-minute granularity. Sudden spikes can cause throttling before scaling kicks in.
Mistake
GSIs have their own capacity independent of the base table.
Correct
GSIs can have their own provisioned capacity or share the base table's capacity. If you don't specify separate capacity, the GSI uses the table's capacity, and throttling on the GSI can affect the base table.
Mistake
Reserved capacity is available for on-demand mode.
Correct
Reserved capacity is only available for provisioned mode. On-demand is pay-per-request with no upfront commitments.
Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.
On-demand is a pay-per-request model where you don't specify capacity; DynamoDB automatically scales to handle traffic. Provisioned requires you to set read/write capacity units (RCU/WCU) and you pay for that capacity regardless of usage. On-demand is simpler and handles spikes, but is more expensive per request. Provisioned is cheaper for predictable workloads but can throttle if exceeded. The exam tests choosing based on workload predictability and cost sensitivity.
Burst capacity allows you to accumulate unused provisioned capacity for up to 5 minutes. For example, if you provision 1000 WCU but use only 800 for 5 minutes, you accumulate 200*300 = 60,000 tokens. You can then consume up to 1000 + (60,000/300) = 1200 WCU for the next 5 minutes? Actually, the burst pool is capped at 5 minutes of provisioned capacity, so you can use up to 2000 WCU for 5 minutes if you have been idle for 5 minutes. After the burst pool is exhausted, you return to your provisioned capacity.
Yes, you can switch between capacity modes at any time. However, switching from provisioned to on-demand requires that the table has no ongoing Auto Scaling operations. Switching from on-demand to provisioned requires you to specify initial RCU and WCU values. There is no downtime, but you may experience throttling if the new provisioned capacity is lower than current usage. The exam may test that you can change modes dynamically.
Throttling occurs when you exceed provisioned throughput or on-demand limits. DynamoDB returns a ProvisionedThroughputExceededException (HTTP 400). The AWS SDK retries automatically with exponential backoff (starting 50 ms, doubling up to 20 seconds, up to 10 retries). If retries fail, the error is returned to the application. To avoid throttling, increase capacity, switch to on-demand, or use Auto Scaling.
For a new application with unknown traffic patterns, start with on-demand to avoid throttling. Once traffic becomes predictable, consider switching to provisioned with Auto Scaling to reduce costs. The exam often recommends on-demand for new or spiky workloads and provisioned for steady, predictable workloads. Also consider reserved capacity for provisioned mode if you have long-term steady usage.
No, Auto Scaling is reactive and has a cooldown period (default 5 minutes). It cannot prevent throttling from sudden spikes that exceed the current provisioned capacity and burst pool. For workloads with sudden, unpredictable spikes, on-demand is a better choice. Auto Scaling is best for workloads with gradual changes.
GSIs consume their own read/write capacity. If you don't specify separate provisioned capacity for a GSI, it inherits the base table's capacity settings (including Auto Scaling). However, throttling on a GSI can occur independently of the base table. For on-demand tables, GSIs also use on-demand. The exam tests that GSI capacity must be considered separately.
You've just covered DynamoDB Capacity: On-Demand vs Provisioned — now see how well it sticks with free SAA-C03 practice questions. Full explanations included, no account needed.
Done with this chapter?