This chapter covers Lambda concurrency, throttling, and reserved capacity—critical concepts for the DVA-C02 exam. Understanding how AWS Lambda scales, how concurrency limits work, and how to manage throttling is essential for building resilient serverless applications. Approximately 10-15% of exam questions touch on Lambda configuration and scaling, making this a high-yield topic. You will learn the mechanics of concurrency, burst limits, reserved and provisioned concurrency, and how to avoid common pitfalls.
Jump to a section
Imagine a busy highway leading into a city where each car represents a single invocation of a Lambda function. The highway has multiple toll booths, each representing a unit of concurrency. By default, the highway has 1,000 toll booths (the account-level concurrency limit). Each toll booth can process one car at a time, and when a car passes through, it takes a few seconds. If all 1,000 booths are occupied, new cars must wait in a queue (the throttling queue). Reserved concurrency is like setting aside a specific number of booths exclusively for a certain type of car (e.g., delivery trucks). Even if other lanes are empty, those reserved booths cannot be used by regular cars. Provisioned concurrency is like having pre-staffed booths ready at all times, even when no cars are present, ensuring zero wait time for the first cars. Without provisioned concurrency, the first few cars might experience a cold start as the booth opens. The burst concurrency limit is the maximum number of booths that can open simultaneously in a sudden surge, like a police escort allowing 500 cars to enter at once in a given region. When all booths are occupied and the queue overflows, requests are throttled and return a 429 error, similar to a 'lane closed' sign.
What is Lambda Concurrency?
Lambda concurrency refers to the number of function invocations that are being processed simultaneously at any given time. Each invocation runs in its own execution environment (a container). AWS Lambda scales by creating new execution environments as needed, up to the account-level concurrency limit. The default account-level concurrency limit is 1,000 concurrent executions per AWS region. This is a soft limit that can be increased by requesting a quota increase via the Service Quotas console.
Why Concurrency Matters
Concurrency limits protect your account from runaway code and prevent downstream resources (like databases) from being overwhelmed. If your function attempts to exceed the concurrency limit, new invocations are throttled. Throttled invocations behave differently depending on the invocation type: - Synchronous invocations (e.g., API Gateway, ALB): Return a 429 error (TooManyRequestsException). - Asynchronous invocations (e.g., S3, SNS, EventBridge): Automatically retry for up to 6 hours with exponential backoff, starting after 1 minute. - Event source mappings (e.g., DynamoDB Streams, SQS, Kinesis): Retry based on the event source's retry policy; the Lambda service throttles the poller, not the individual records.
How Lambda Scales: Burst vs. Reserved vs. Provisioned
#### Burst Concurrency
When your function is invoked for the first time, or after a period of inactivity, Lambda needs to create new execution environments. The speed at which new environments can be created is limited by a burst concurrency limit. The burst concurrency limit varies by region: - 3000 per minute for regions with a minimum of 500 concurrent executions baseline (e.g., US East, US West, EU West). - 1000 per minute for other regions. - 500 per minute for smaller regions (e.g., ap-northeast-3, me-south-1).
This means that even if your account limit is 10,000, you cannot instantly scale to 10,000 concurrent executions. In the first minute, you can only create up to the burst limit (e.g., 3000) new environments. After that, you can add an additional 500 per minute (or 1000 per minute in some regions) until you reach the account limit.
#### Reserved Concurrency
Reserved concurrency guarantees a set number of concurrent executions for a specific function. It also caps the function's maximum concurrency at that value. For example, if you set reserved concurrency to 100 for function A, function A can use up to 100 concurrent executions, and no other function can use that capacity. This prevents other functions from consuming all available concurrency and starving function A. Reserved concurrency is subtracted from the account-level limit. If you set total reserved concurrency across all functions to 1000, the remaining account-level concurrency for unreserved functions is 0 (unless you have a higher account limit).
#### Provisioned Concurrency
Provisioned concurrency pre-initializes a specified number of execution environments, so they are ready to handle requests immediately with no cold start. This is useful for latency-sensitive applications. Provisioned concurrency counts against your account's concurrency limit and is billed even when not in use (since the environments are kept warm). You can configure provisioned concurrency to scale with Application Auto Scaling based on a schedule or utilization metric.
How Invocations Are Routed
When an invocation request arrives, Lambda checks if there is an available execution environment that is already warm (i.e., has handled a previous invocation with the same function version and is not busy). If a warm environment is available, the request is routed to it immediately. If no warm environment is available, Lambda creates a new one (cold start). If the function has provisioned concurrency, the pre-warmed environments are used first. If the function has reserved concurrency, the invocation is only allowed if the reserved concurrency limit has not been reached. If the account-level limit would be exceeded, the invocation is throttled.
Interaction with Other AWS Services
API Gateway: When used as a trigger, API Gateway invokes Lambda synchronously. Throttled invocations return a 429 error to the client. You can configure API Gateway throttling separately (e.g., 10,000 requests per second) to protect Lambda.
SQS: Lambda polls the SQS queue using long polling. If Lambda is throttled, the poller backs off and retries. Messages remain in the queue. The visibility timeout should be set appropriately to avoid duplicate processing.
DynamoDB Streams: Lambda reads from stream shards. Throttling causes the shard iterator to age; the function can fall behind. Reserved concurrency helps ensure processing capacity.
Application Auto Scaling: You can associate a target tracking scaling policy with a provisioned concurrency configuration. For example, set the target to 70% utilization, and Auto Scaling will adjust provisioned concurrency to maintain that average.
Configuration and Commands
You can configure reserved and provisioned concurrency via the AWS CLI, SDK, or CloudFormation.
Set reserved concurrency using AWS CLI:
aws lambda put-function-concurrency --function-name my-function --reserved-concurrent-executions 100Remove reserved concurrency:
aws lambda delete-function-concurrency --function-name my-functionConfigure provisioned concurrency:
aws lambda put-provisioned-concurrency-config --function-name my-function --qualifier prod --provisioned-concurrent-executions 50View provisioned concurrency status:
aws lambda get-provisioned-concurrency-config --function-name my-function --qualifier prodList all functions with their reserved concurrency settings:
aws lambda list-functions --query "Functions[?ReservedConcurrentExecutions!=null].{FunctionName:FunctionName, Reserved:ReservedConcurrentExecutions}"Monitoring and Alarms
Use CloudWatch metrics to monitor concurrency: - ConcurrentExecutions: Number of function instances processing invocations. - Throttles: Number of invocations that were throttled. - ProvisionedConcurrencyUtilization: Percentage of provisioned concurrency used.
Set CloudWatch alarms on Throttles > 0 to detect throttling events. Also monitor ConcurrentExecutions to ensure you are not approaching your account limit.
Best Practices
Set reserved concurrency for critical functions to guarantee capacity.
Use provisioned concurrency for latency-sensitive functions, especially those with high initialization latency (e.g., large libraries, loading models).
Monitor Throttles and increase account limits or optimize function code to reduce execution duration.
For asynchronous invocations, configure a dead letter queue (DLQ) or on-failure destination to capture throttled events.
Use burst concurrency understanding to design for gradual scaling; sudden spikes may be throttled initially.
Invocation Arrives at Lambda Service
A request triggers a Lambda function invocation (synchronous, asynchronous, or via event source mapping). The Lambda service receives the request and checks the function's configuration, including reserved concurrency and provisioned concurrency settings. It also verifies the account-level concurrency limit.
Check for Available Warm Environment
Lambda looks for an existing execution environment that is idle and matches the function version and alias. If found, the request is routed to that environment. If not, Lambda proceeds to create a new environment (cold start). If provisioned concurrency is configured, pre-warmed environments are used, avoiding cold start.
Concurrency Limit Evaluation
Before creating a new environment, Lambda checks whether the function's reserved concurrency (if set) has been reached. If the function has no reserved concurrency, it checks the account-level concurrency limit. If either limit would be exceeded, the invocation is throttled. For synchronous invocations, a 429 error is returned. For asynchronous, the event is retried.
Environment Creation (Cold Start)
If a new environment is needed, Lambda allocates resources: downloads the function code, starts a new container, configures the runtime, and runs initialization code outside the handler. This adds latency (cold start). The duration depends on function package size, runtime, and initialization code. Provisioned concurrency eliminates this step.
Execution and Completion
The function handler executes. During execution, the environment is considered busy. After completion, the environment remains warm for a period (typically 5-15 minutes) to handle subsequent invocations. If idle for longer, the environment is reclaimed. The concurrency count decreases when execution completes.
Scenario 1: E-commerce Order Processing
A large e-commerce platform uses Lambda to process orders from an SQS queue. During Black Friday, traffic spikes to 2000 orders per second. The account concurrency limit is 1000. Without reserved concurrency, other functions (e.g., image resizing, analytics) can consume available concurrency, starving the order processor. The team sets reserved concurrency of 800 for the order processing function, ensuring it gets capacity. They also set provisioned concurrency of 200 to handle the initial burst and avoid cold starts. They configure Application Auto Scaling to increase provisioned concurrency based on queue depth. They monitor Throttles and ConcurrentExecutions. Misconfiguration: if reserved concurrency is set too low (e.g., 300), orders are throttled and remain in the queue, causing delays. If set too high (e.g., 1000), other functions may be starved.
Scenario 2: Real-Time Data Streaming
A financial services company processes real-time stock trade data from Kinesis Data Streams using Lambda. Each shard is processed by one Lambda invocation at a time. The account has 10 shards, but the Lambda function has high initialization time (5 seconds) due to loading ML models. To avoid latency, they configure provisioned concurrency of 10 (one per shard). This ensures each shard has a warm environment ready. Without provisioned concurrency, cold starts cause delays, and processing falls behind. They also set reserved concurrency of 10 to prevent other functions from using that capacity. Misconfiguration: setting provisioned concurrency lower than the number of shards (e.g., 5) means some shards experience cold starts, leading to uneven processing.
Scenario 3: Microservices with Shared Pool
A startup runs multiple microservices as separate Lambda functions behind API Gateway. They have a single AWS account and region. One function (user authentication) is critical and must always respond quickly. They set reserved concurrency of 50 for that function. Other functions share the remaining 950 concurrent executions. During a marketing campaign, a reporting function experiences a spike, consuming 800 concurrent executions. The authentication function is unaffected because its reserved capacity is guaranteed. However, other functions may be throttled. They monitor Throttles and increase the account limit to 2000 to provide headroom. Misconfiguration: not setting reserved concurrency on critical functions can cause them to be throttled when other functions spike.
Exactly What DVA-C02 Tests
The DVA-C02 exam tests your understanding of Lambda concurrency, throttling, and reserved/provisioned concurrency under Objective 1.1 (Develop and maintain applications on AWS). Expect 2-3 questions on these topics. Key areas:
Difference between reserved and provisioned concurrency.
Behavior of throttled invocations based on invocation type (sync vs async vs event source mapping).
Default account concurrency limit (1000) and burst limits (3000/1000/500 per minute depending on region).
How reserved concurrency guarantees capacity but also caps the function.
Provisioned concurrency eliminates cold starts and can be auto-scaled.
Common Wrong Answers
"Reserved concurrency prevents other functions from using that capacity but does not limit the function's own concurrency." Reality: Reserved concurrency both guarantees and caps the function's concurrency. If you set reserved concurrency to 100, the function cannot exceed 100 concurrent executions.
"Provisioned concurrency is free when not in use." Reality: You are billed for provisioned concurrency even when no invocations occur, because environments are kept warm.
"Throttled asynchronous invocations are lost." Reality: They are automatically retried for up to 6 hours with exponential backoff. They are not lost unless the retry policy expires.
"Increasing the account concurrency limit also increases the burst limit." Reality: The burst limit is fixed per region and cannot be increased. Only the steady-state scaling rate (500 per minute) can be improved by increasing the account limit.
Specific Numbers and Terms
Account concurrency limit: 1000 (default).
Burst concurrency: 3000 per minute (major regions), 1000 per minute (other), 500 per minute (smaller regions).
Steady-state scaling: 500 concurrent executions per minute (or 1000 in some regions) after burst.
Provisioned concurrency: billed per second when configured, even if idle.
Reserved concurrency: subtracted from account limit.
Throttle error code: 429 TooManyRequestsException (for sync).
Edge Cases
If you set reserved concurrency on all functions to sum to the account limit, unreserved functions get zero concurrency and will be throttled immediately.
Provisioned concurrency counts against the account limit and reserved concurrency. If you set provisioned concurrency to 200 on a function with reserved concurrency 100, the provisioned concurrency will never be fully utilized because the reserved cap is 100.
For event source mappings (e.g., DynamoDB Streams), throttling causes the poller to back off; the function may fall behind. Reserved concurrency helps ensure processing capacity.
How to Eliminate Wrong Answers
If a question mentions "guaranteed capacity" but also says "unlimited concurrency for that function", it's wrong because reserved concurrency caps.
If a question says "no additional cost" for provisioned concurrency when idle, it's wrong.
For throttling behavior, identify the invocation type: sync returns error immediately; async retries; event source mappings retry based on source policy.
Remember that burst limits are regional and cannot be changed.
Default account concurrency limit is 1000 per region (soft limit).
Burst concurrency: 3000/min (major regions), 1000/min (others), 500/min (smaller).
Steady-state scaling: 500 concurrent executions per minute after burst.
Reserved concurrency guarantees and caps a function's concurrency.
Provisioned concurrency eliminates cold starts but incurs cost when idle.
Synchronous throttling returns 429 error; asynchronous retries for up to 6 hours.
Event source mappings (SQS, DynamoDB Streams) handle throttling via poller backoff.
Monitor Throttles and ConcurrentExecutions CloudWatch metrics.
These come up on the exam all the time. Here's how to tell them apart.
Reserved Concurrency
Guarantees and caps a function's concurrent executions.
Does NOT eliminate cold starts.
Subtracted from account-level concurrency limit.
No additional cost beyond normal execution.
Set per function version or alias.
Provisioned Concurrency
Pre-warms a specified number of environments to eliminate cold starts.
Billed even when idle.
Counts against account limit and reserved concurrency.
Can be auto-scaled using Application Auto Scaling.
Useful for latency-sensitive applications.
Mistake
Reserved concurrency only guarantees capacity but does not limit the function.
Correct
Reserved concurrency both guarantees and caps the function's maximum concurrency. If you set 100, the function cannot use more than 100 concurrent executions.
Mistake
Provisioned concurrency is free when not in use.
Correct
You are billed for provisioned concurrency even when no invocations occur, because the environments are kept warm and ready.
Mistake
Throttled asynchronous invocations are lost forever.
Correct
They are automatically retried for up to 6 hours with exponential backoff. They are only lost if all retries fail and no DLQ or on-failure destination is configured.
Mistake
The burst concurrency limit can be increased by requesting a quota increase.
Correct
The burst limit is a fixed regional value that cannot be increased. Only the steady-state scaling rate and account limit can be increased.
Mistake
Setting reserved concurrency on all functions to the account limit ensures each function gets at least that amount.
Correct
If you set reserved concurrency on all functions to sum to the account limit, unreserved functions get zero concurrency and will be throttled immediately. Also, each function is capped at its reserved value.
Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.
When a Lambda function is throttled, the behavior depends on the invocation type. For synchronous invocations (e.g., API Gateway, ALB), Lambda returns a 429 TooManyRequestsException error. For asynchronous invocations (e.g., S3, SNS), Lambda automatically retries the event for up to 6 hours with exponential backoff, starting after 1 minute. For event source mappings (e.g., DynamoDB Streams, SQS), the poller backs off and retries; the event remains in the source until successfully processed or the retention period expires. You can configure a dead letter queue (DLQ) or on-failure destination to capture failed events after retries.
Reserved concurrency guarantees a set number of concurrent executions for a function and also caps the function's maximum concurrency. It does not eliminate cold starts. Provisioned concurrency pre-initializes a specified number of execution environments, so they are ready to handle requests with no cold start. Provisioned concurrency is billed even when idle, while reserved concurrency has no additional cost beyond normal execution. Both count against the account-level concurrency limit. You can use both together: reserved concurrency to guarantee capacity and provisioned concurrency to eliminate cold starts.
No, the burst concurrency limit is a fixed regional value that cannot be increased. It is 3000 per minute for major regions (e.g., us-east-1, eu-west-1), 1000 per minute for other regions, and 500 per minute for smaller regions. However, you can increase the account-level concurrency limit, which also increases the steady-state scaling rate (the number of additional concurrent executions allowed per minute after the burst). For example, if your account limit is 10,000, after the initial burst of 3000, you can add 500 per minute until you reach 10,000.
Concurrency refers to the number of function invocations being processed simultaneously at any given time. Throttling occurs when a new invocation request is rejected because the concurrency limit has been reached. The limit can be the account-level limit (default 1000) or a function-level reserved concurrency limit. Throttling is a mechanism to protect your account and downstream resources from being overwhelmed. You can monitor throttling using the Throttles CloudWatch metric and set alarms to detect when it happens.
You can use Application Auto Scaling to automatically adjust provisioned concurrency based on a target utilization metric. First, create a provisioned concurrency configuration for your function. Then, register a scalable target using the AWS CLI or SDK. For example: `aws application-autoscaling register-scalable-target --service-namespace lambda --resource-id function:my-function:prod --scalable-dimension lambda:function:ProvisionedConcurrency --min-capacity 10 --max-capacity 100`. Then create a scaling policy with a target tracking configuration, e.g., target 70% utilization of provisioned concurrency. Auto Scaling will add or remove provisioned concurrency to maintain that average.
The default account-level concurrency limit is 1,000 concurrent executions per region. This is a soft limit that can be increased by submitting a quota increase request through the Service Quotas console. The limit applies to all functions in that region, except for functions with reserved concurrency which are capped individually. If you need more than 1,000 concurrent executions, you can request an increase to up to tens of thousands, depending on your use case and AWS approval.
Yes, provisioned concurrency counts against the account-level concurrency limit. For example, if your account limit is 1,000 and you configure 200 provisioned concurrency for a function, that leaves 800 for other concurrent executions (including other provisioned concurrency). Additionally, if you set reserved concurrency on the same function, the provisioned concurrency cannot exceed the reserved concurrency value. So if you set reserved concurrency to 100, you cannot set provisioned concurrency to more than 100.
You've just covered Lambda Concurrency, Throttling, and Reserved Capacity — now see how well it sticks with free DVA-C02 practice questions. Full explanations included, no account needed.
Done with this chapter?