DVA-C02Chapter 89 of 101Objective 4.2

AWS Service Quotas and Throttling Handling

This chapter covers AWS Service Quotas and throttling handling, a critical topic for the DVA-C02 exam under Domain 4: Troubleshooting (Objective 4.2). Understanding how AWS enforces limits on API requests and how to design applications to handle throttling is essential for building resilient, scalable systems. Approximately 5-8% of exam questions touch on quotas, throttling, and retry strategies. You will learn the mechanisms behind service quotas, the token bucket algorithm used for throttling, how to monitor and request increases, and best practices for handling throttling errors in your code.

25 min read
Intermediate
Updated May 31, 2026

Highway Toll Booths with Limited Lanes

Imagine a busy highway with multiple toll booths at a junction. Each toll booth represents an AWS service quota—a limit on how many requests can be processed simultaneously. The highway itself is the AWS region, and the cars are API requests. AWS has set a maximum number of toll booths (quotas) per lane (API operation) to prevent gridlock. When too many cars arrive at once, they queue up in the on-ramp (throttling queue). If the queue overflows—because the booths are slow or too many cars arrive—the excess cars are turned away and must try again later (throttling error). AWS dynamically adjusts the number of booths based on historical traffic patterns using a token bucket algorithm: each booth has a token bucket that refills at a steady rate. Initially, the bucket is full (burst capacity), allowing a sudden rush of cars. Once tokens are consumed, new cars must wait for tokens to refill. If you keep sending cars faster than the refill rate, you'll exhaust tokens and get throttled. To avoid this, you can request a quota increase—like asking the highway authority to add more booths—which requires justification and is not instantaneous. Some booths (e.g., DynamoDB) have adaptive capacity that automatically adjusts the number of booths based on traffic, but only within preset limits. This analogy mirrors AWS's throttling mechanism: quotas prevent abuse, burst balances allow short spikes, and steady-state rates govern long-term throughput.

How It Actually Works

What Are AWS Service Quotas and Why Do They Exist?

AWS Service Quotas (formerly known as Service Limits) are the maximum number of resources or API requests that an AWS account can use per region for each service. They exist to protect the AWS infrastructure from abuse or unintentional runaway usage, ensuring fair resource distribution among all customers. Quotas are divided into two types: resource-based quotas (e.g., maximum number of EC2 instances per region) and rate-based quotas (e.g., maximum API requests per second for a service like DynamoDB or AWS Lambda). For the DVA-C02 exam, the focus is primarily on rate-based quotas and how to handle throttling when requests exceed these limits.

How Throttling Works Internally: The Token Bucket Algorithm

AWS uses a token bucket algorithm to enforce rate-based quotas. Each API operation has a token bucket with a capacity (burst limit) and a refill rate (steady-state limit). The bucket is initially full. When a request arrives, it consumes one token. If tokens remain, the request proceeds; if the bucket is empty, the request is throttled, and the caller receives a ThrottlingException (HTTP 400) or RequestLimitExceeded (HTTP 503) error. Tokens are added back to the bucket at the refill rate, up to the bucket capacity. This allows for short bursts of traffic up to the burst limit, but sustained traffic must stay within the steady-state rate.

For example, DynamoDB's GetItem operation has a default quota of 40,000 read request units per second (steady-state) and a burst capacity of 80,000 (if unused capacity was accumulated). The token bucket refills at 40,000 tokens per second, but can hold up to 80,000 tokens. So if you have no traffic for a second, the bucket fills to 80,000, allowing a burst of 80,000 requests in the next second. After that burst, the bucket empties, and subsequent requests are throttled until tokens refill.

Key Components, Defaults, and Timers

Service Quotas: Accessible via AWS Service Quotas console, AWS CLI (aws service-quotas), or API. Each service has a set of quotas per region. For example, the default Lambda concurrent executions quota is 1,000 per region.

Throttling Errors: The most common error codes are ThrottlingException (AWS API), ProvisionedThroughputExceededException (DynamoDB), TooManyRequestsException (API Gateway, Lambda), and RequestLimitExceeded (EC2 API). The HTTP status code is typically 400 or 429 (Too Many Requests) for API Gateway, or 503 for some services.

Retry Logic: AWS SDKs implement automatic retries with exponential backoff and jitter. The default retry mode is legacy (up to 3 retries), but you can configure adaptive retry mode (up to 8 retries) using the SDK. The retry logic uses a base delay (e.g., 100 ms) that doubles each attempt, plus random jitter.

Burst vs. Steady-State: Burst capacity is often equal to the steady-state limit for some services (e.g., DynamoDB's burst capacity is 5 minutes of unused capacity, up to 80,000 RCU). For API Gateway, the burst limit is 5,000 requests per second, and the steady-state limit is 10,000 requests per second (default).

Throttling Queue: Some services, like API Gateway, have a throttling queue that holds requests for a short time (e.g., 500 ms) before rejecting them if the queue is full.

Configuration and Verification Commands

To view quotas:

aws service-quotas list-service-quotas --service-code lambda --region us-east-1

To request a quota increase:

aws service-quotas request-service-quota-increase --service-code lambda --quota-code L-B99A9384 --desired-value 1500

To monitor throttling events:

CloudWatch metrics: Throttles (for DynamoDB), ThrottlingException (for Lambda), Count of 429 responses (for API Gateway).

CloudTrail logs: Look for ThrottlingException events.

Interaction with Related Technologies

Lambda: If Lambda functions are throttled, events are queued in the event source mapping (e.g., SQS, DynamoDB Streams) and retried. If the queue is full, events are discarded (e.g., SQS DLQ).

API Gateway: Throttling limits are applied per stage. You can configure usage plans with rate limits and burst limits for API keys.

DynamoDB: Throttling occurs at the table or index level. Use DynamoDB Auto Scaling or on-demand mode to handle traffic spikes.

SQS: No API throttling per se, but there are limits on the number of messages per second. Use batch operations to reduce calls.

EC2 API: Throttling limits apply to Describe* calls (e.g., 100 requests per second per region). Use pagination and caching to reduce calls.

Best Practices for Handling Throttling

1.

Implement Exponential Backoff and Jitter: Use the AWS SDK's built-in retry mechanism. For custom code, implement exponential backoff with a base delay of 100 ms, doubling each time, up to a maximum delay of 20 seconds, and add random jitter.

2.

Use Caching: Cache responses for read-heavy APIs (e.g., using ElastiCache or API Gateway caching).

3.

Monitor and Alert: Set CloudWatch alarms on throttling metrics. Use AWS Trusted Advisor to check quota usage.

4.

Request Quota Increases: Proactively request increases via Service Quotas console or support cases. Note that some quotas (e.g., Lambda concurrency) require a limit increase form.

5.

Design for Degradation: Use circuit breakers, fallbacks, and graceful degradation. For example, if DynamoDB throttles, fall back to a read replica or cache.

6.

Use Adaptive Capacity: For DynamoDB, use on-demand mode or auto scaling to adjust capacity automatically.

Common Exam Scenarios

Scenario: A developer calls an API with 10 requests per second, but the quota is 5 requests per second. The SDK retries automatically with exponential backoff. The developer notices some requests fail after 3 retries. Solution: Increase the retry count or implement a custom retry policy with a longer maximum delay.

Scenario: A Lambda function is invoked 1,000 times per second, but the reserved concurrency is 500. The remaining 500 invocations are throttled. Solution: Increase the reserved concurrency or use SQS to buffer requests.

Scenario: An API Gateway endpoint has a burst limit of 5,000 and a rate limit of 10,000. A client sends 8,000 requests in one second. The first 5,000 are served, the next 3,000 are throttled. Solution: Increase the burst limit or implement client-side throttling.

Walk-Through

1

Identify Service Quotas

First, determine the relevant service quotas for the AWS services you are using. Use the AWS Service Quotas console or AWS CLI command `aws service-quotas list-service-quotas --service-code <service> --region <region>` to view current limits. For example, for Lambda, the default concurrent executions quota is 1,000 per region. Note that quotas are per region and per account. Also check the 'Applied quota value' to see the current limit. If you have previously requested an increase, the applied value may be higher than the default.

2

Monitor Throttling Events

Set up CloudWatch metrics to track throttling events. For example, DynamoDB emits a `ThrottledRequests` metric per table. Lambda emits `Throttles` metric. API Gateway emits `4XXError` for throttled requests. Create CloudWatch alarms to notify when throttling exceeds a threshold (e.g., more than 10 throttled requests per minute). Also enable detailed CloudTrail logging to capture `ThrottlingException` events, which include the caller identity and request parameters.

3

Implement Retry Logic

In your application code, use the AWS SDK's built-in retry mechanism. The SDK automatically retries on throttling errors with exponential backoff and jitter. The default retry mode (`legacy`) retries up to 3 times with a base delay of 100 ms. For better resilience, configure `adaptive` retry mode, which retries up to 8 times and uses a client-side rate limiter. Example SDK configuration: `config = AWS::Config.new(retry_mode: 'adaptive', max_attempts: 8)`. If you are not using the SDK, implement custom retry logic with exponential backoff: delay = min(initial_delay * 2^attempt, max_delay) + random_jitter.

4

Request Quota Increase

If throttling is frequent despite retries, request a quota increase via the Service Quotas console or AWS CLI. Use `aws service-quotas request-service-quota-increase --service-code <service> --quota-code <code> --desired-value <new_limit>`. For example, to increase Lambda concurrency to 1500, use quota code `L-B99A9384`. Note that some quotas require a support case (e.g., EC2 instances). The request may be automatically approved if it is within a certain range; otherwise, it requires AWS review. Plan ahead as increases are not instantaneous.

5

Optimize Application Design

Redesign your application to reduce API calls. Use batch operations (e.g., DynamoDB BatchGetItem, SQS SendMessageBatch) to combine multiple requests into one. Implement caching for read-heavy workloads using ElastiCache or API Gateway caching. Use pagination for list operations to avoid large responses. For asynchronous processing, use SQS or SNS to decouple components and buffer requests. For DynamoDB, consider using on-demand capacity mode to automatically handle traffic spikes without throttling.

What This Looks Like on the Job

Enterprise Scenario 1: E-Commerce Platform with DynamoDB Throttling

A large e-commerce platform uses DynamoDB as its primary database for product catalog and order management. During Black Friday sales, traffic spikes 10x normal levels, causing ProvisionedThroughputExceededException errors. The initial solution was to over-provision read and write capacity, leading to high costs. After analysis, the team implemented DynamoDB Auto Scaling with a minimum of 5,000 RCU, maximum of 50,000 RCU, and target utilization of 70%. They also added an SQS queue to buffer write requests. During the next sale, auto scaling handled the spike, but some requests still throttled due to cold start delays in scaling. They then switched to on-demand mode for the product catalog table, which eliminated throttling entirely but increased cost for sustained low traffic. To optimize, they used a hybrid approach: on-demand for the catalog table (spiky traffic) and provisioned with auto scaling for the orders table (more predictable).

Enterprise Scenario 2: API Gateway Throttling for a Mobile Backend

A mobile app backend uses API Gateway with Lambda integrations. The app has 1 million users, and during peak hours, API calls reach 20,000 requests per second. The default API Gateway quota is 10,000 rps with a burst of 5,000. To handle the load, the team requested a quota increase to 25,000 rps via a support case. They also implemented a usage plan with rate limits per API key to prevent a single client from overwhelming the system. They added client-side throttling using the AWS SDK's retry mechanism with exponential backoff. Additionally, they enabled API Gateway caching for frequently accessed endpoints (e.g., user profiles) to reduce Lambda invocations. When the quota increase was pending, they used a CloudFront distribution in front of API Gateway to cache responses and absorb some traffic.

What Goes Wrong When Misconfigured

Common misconfigurations include: not setting up CloudWatch alarms for throttling, relying solely on default retries without considering maximum retry time (causing timeouts in user-facing applications), requesting quota increases too late (e.g., during a traffic spike), and not using batch operations when possible. For example, a developer might call DynamoDB GetItem in a loop for 100 items instead of using BatchGetItem, leading to 100 API calls instead of 1. This exhausts the token bucket quickly. Another mistake is not implementing exponential backoff with jitter, causing a thundering herd problem where all retries happen simultaneously, worsening throttling. Finally, ignoring service-specific quotas like Lambda concurrency can lead to cold starts or throttling of all invocations if the concurrency limit is reached.

How DVA-C02 Actually Tests This

Exactly What DVA-C02 Tests on This Topic

The DVA-C02 exam tests your ability to identify throttling issues and apply appropriate solutions. Key objectives under Domain 4.2 include: interpreting CloudWatch metrics for throttling, configuring retry logic in SDKs, requesting quota increases, and designing applications to handle throttling gracefully. The exam does not ask you to memorize exact default quota values (e.g., Lambda concurrency = 1000) but expects you to know that quotas exist and how to work around them.

Common Wrong Answers and Why Candidates Choose Them

1.

'Increase the instance size' – Candidates might think that scaling up EC2 instances will solve API throttling. However, throttling is typically at the API level, not compute. The correct answer is to implement retry logic or request a quota increase.

2.

'Use a larger read capacity for DynamoDB' – While increasing capacity helps, the exam often tests the concept of burst capacity and that throttling occurs when the token bucket is empty. A wrong answer might suggest 'disable throttling' which is not possible.

3.

'Use synchronous invocations for Lambda' – When Lambda is throttled, synchronous invocations fail immediately. The correct approach is to use asynchronous invocations with an SQS DLQ to capture failures.

4.

'Switch to on-demand mode for all tables' – On-demand is expensive for steady traffic. The exam expects you to evaluate cost vs. performance.

Specific Numbers and Terms That Appear on the Exam

Default Lambda concurrency: 1,000 per region.

DynamoDB burst capacity: up to 5 minutes of unused capacity, but not exceeding 80,000 RCU/WCU.

API Gateway default burst: 5,000 requests per second; default rate: 10,000 rps.

SDK retry modes: legacy (3 retries), adaptive (8 retries).

Exponential backoff: base delay 100 ms, max delay 20 seconds.

Edge Cases and Exceptions

Lambda reserved concurrency: If set to 0, all invocations are throttled.

API Gateway usage plans: Can override stage-level throttling for specific API keys.

DynamoDB on-demand: No throttling, but there is a per-account limit on max throughput (e.g., 40,000 RCU per table).

S3 PUT/COPY/POST requests: Throttled at 3,500 requests per second per prefix.

How to Eliminate Wrong Answers

Always look for keywords: If the question mentions 'burst', think token bucket. If 'retry', think exponential backoff. If 'throttling errors', think SDK retry configuration or quota increase. Eliminate answers that suggest changing compute resources (EC2, Lambda memory) unless the question explicitly ties to compute limits. Eliminate answers that suggest disabling throttling (not possible). Focus on application-level changes (retry, caching, async processing) and quota management.

Key Takeaways

AWS Service Quotas are per-account, per-region limits on resources and API request rates.

Throttling uses a token bucket algorithm: burst capacity allows short spikes, steady-state rate governs long-term throughput.

Default Lambda concurrency is 1,000 per region; default API Gateway burst is 5,000 rps, rate is 10,000 rps.

AWS SDKs provide automatic retry with exponential backoff; configure adaptive mode for up to 8 retries.

To handle throttling, implement retry logic, use caching, batch operations, and request quota increases proactively.

Monitor throttling via CloudWatch metrics (e.g., ThrottledRequests, 4XXError) and set alarms.

DynamoDB on-demand mode eliminates throttling for most cases but has per-account limits and higher cost.

Reserved concurrency for Lambda can cause throttling if set too low; use it to guarantee capacity for critical functions.

API Gateway usage plans allow per-client rate limiting to prevent abuse.

Always design for degradation: use circuit breakers, fallbacks, and async processing with DLQs.

Easy to Mix Up

These come up on the exam all the time. Here's how to tell them apart.

DynamoDB Provisioned Capacity

You specify read and write capacity units (RCU/WCU).

Throttling occurs if you exceed provisioned capacity.

Burst capacity provides up to 5 minutes of unused capacity.

Lower cost for predictable, steady traffic.

Requires auto scaling or manual adjustments for traffic spikes.

DynamoDB On-Demand Capacity

Capacity scales automatically based on traffic.

No throttling unless you exceed per-account limits (e.g., 40,000 RCU).

No burst capacity concept; pay per request.

Higher cost for steady traffic; ideal for spiky or unpredictable workloads.

No need to manage capacity; AWS handles scaling.

Legacy Retry Mode

Maximum 3 retries.

Exponential backoff with base delay (100 ms).

No client-side rate limiting.

May cause thundering herd on retries.

Default mode in older SDK versions.

Adaptive Retry Mode

Maximum 8 retries.

Exponential backoff with jitter and client-side rate limiting.

Uses a token bucket algorithm to smooth out request rate.

Better for high-throughput applications.

Recommended for new applications.

Watch Out for These

Mistake

Throttling errors are always a result of insufficient compute capacity.

Correct

Throttling is typically caused by exceeding API rate limits, not compute capacity. For example, DynamoDB throttles when you exceed provisioned throughput, even if the underlying hardware has spare capacity.

Mistake

Once you request a quota increase, it takes effect immediately.

Correct

Quota increases are not instantaneous. Some are automatically approved within minutes, but others require a support case and can take days. Always plan ahead.

Mistake

The AWS SDK retries indefinitely on throttling errors.

Correct

The SDK has a maximum number of retries (default 3 in legacy mode, 8 in adaptive mode). After exhausting retries, the error is returned to the application. You must handle the error in your code.

Mistake

Burst capacity allows unlimited spikes as long as you have unused capacity.

Correct

Burst capacity is capped. For DynamoDB, the burst bucket can hold up to 5 minutes of unused capacity, but not exceeding 80,000 RCU/WCU. Once the bucket is empty, throttling begins.

Mistake

Switching to DynamoDB on-demand mode eliminates all throttling.

Correct

On-demand mode has a per-account throughput limit (e.g., 40,000 RCU per table). If you exceed that, throttling still occurs. Also, on-demand is more expensive for sustained traffic.

Do You Actually Know This?

Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.

Frequently Asked Questions

What is the difference between a service quota and a throttle limit?

A service quota is the maximum number of resources or API requests allowed per account per region. A throttle limit is the rate at which requests are allowed per second, enforced by the token bucket algorithm. When you exceed the throttle limit, you get throttled. The quota is the ceiling; the throttle limit is the rate.

How can I increase my AWS service quota?

Use the AWS Service Quotas console, AWS CLI (`aws service-quotas request-service-quota-increase`), or open a support case. Some quotas are automatically adjustable (e.g., Lambda concurrency up to 10,000), while others require manual approval. Plan ahead as increases are not immediate.

What HTTP status code does AWS return when throttling occurs?

It varies by service. Common codes: 400 (ThrottlingException), 429 (TooManyRequestsException for API Gateway and Lambda), 503 (RequestLimitExceeded for EC2 API). Always check the service documentation.

Does the AWS SDK automatically retry throttled requests?

Yes, the AWS SDK automatically retries on throttling errors using exponential backoff and jitter. The default retry mode (legacy) retries up to 3 times. You can configure adaptive mode for up to 8 retries. If all retries fail, the error is returned to your application.

What is the burst capacity in DynamoDB?

DynamoDB burst capacity allows you to consume unused read/write capacity from the past 5 minutes, up to a maximum of 80,000 RCU or WCU. It is implemented via a token bucket that refills at your provisioned rate. Once the bucket is empty, throttling occurs.

How can I monitor throttling in real-time?

Use CloudWatch metrics. For DynamoDB, monitor `ThrottledRequests` per table. For Lambda, monitor `Throttles`. For API Gateway, monitor `4XXError` or `5XXError`. Set CloudWatch alarms to notify you when throttling exceeds a threshold.

What is the best practice for handling throttling in a mobile app?

Implement client-side retry with exponential backoff and jitter. Use the AWS SDK's adaptive retry mode. Cache responses locally to reduce API calls. Consider using a usage plan in API Gateway to enforce per-client rate limits. Also, implement a circuit breaker to stop sending requests when throttling is frequent.

Terms Worth Knowing

Ready to put this to the test?

You've just covered AWS Service Quotas and Throttling Handling — now see how well it sticks with free DVA-C02 practice questions. Full explanations included, no account needed.

Done with this chapter?