This chapter dives deep into Lambda concurrency controls—Reserved and Provisioned Concurrency—which are critical for building resilient, high-performance serverless applications on AWS. As a core topic under the Resilient Architectures domain (Objective 2.1) of the SAA-C03 exam, understanding these concepts is essential for designing systems that can handle traffic spikes without degradation. Expect 3-5% of exam questions to directly test your knowledge of concurrency limits, bursting behavior, and the trade-offs between Reserved and Provisioned Concurrency.
Jump to a section
Imagine a concert venue with a capacity of 1,000 people. Normally, anyone can buy a ticket and enter, but if too many show up at once, the venue can get overcrowded and slow down or even shut the doors (throttling). The concert organizer can set aside a block of 200 seats that are reserved exclusively for VIP ticket holders. These 200 seats are always available for VIPs, even if the rest of the venue is packed. This is like Reserved Concurrency: you guarantee a certain number of concurrent executions for your critical function, isolated from the general pool. But what if your VIPs need more than 200 seats? They can't use the regular seats; they'd be turned away. To solve this, the organizer can also issue backstage passes that allow a specific number of people to enter even when the venue is full, bypassing the regular queue. This is Provisioned Concurrency: you pre-warm a number of execution environments so that requests are handled instantly without any cold start delay. The backstage passes are always ready—no waiting in line. In AWS Lambda, Provisioned Concurrency keeps a set number of environments initialized and ready to serve, eliminating cold starts. Reserved Concurrency ensures that your function always has a guaranteed slice of the concurrency pie, preventing other functions from starving it. Both mechanisms are essential for controlling performance and cost in serverless architectures.
What Is Lambda Concurrency and Why Does It Matter?
AWS Lambda is a serverless compute service that runs your code in response to events and automatically manages the underlying compute resources. Each invocation of your function runs in an isolated execution environment. Concurrency refers to the number of in-flight invocations at any given time—i.e., the number of executions that are currently processing requests. By default, AWS Lambda can scale to handle thousands of concurrent executions per region, but there is a regional safety limit (default 1,000 concurrent executions per region) to protect against runaway code or accidental infinite loops.
For a production workload, you need fine-grained control over concurrency to:
Ensure critical functions always have capacity to handle traffic spikes.
Prevent a single misbehaving function from consuming all available concurrency and starving other functions.
Eliminate cold starts for latency-sensitive functions.
AWS Lambda provides two concurrency controls: Reserved Concurrency and Provisioned Concurrency. They serve different purposes but can be used together.
Reserved Concurrency: Guaranteed Capacity
Reserved Concurrency guarantees that your function always has a specific amount of concurrency available to it. This is a hard cap and a reservation. When you set Reserved Concurrency on a function, you are both reserving that capacity for the function and limiting it to that maximum. No other function can use the reserved concurrency, and your function cannot exceed the reserved amount.
How it works internally: - When you set Reserved Concurrency to, say, 100, Lambda deducts 100 from the regional concurrency pool. The function is guaranteed to have 100 concurrent executions available at all times. - If the function attempts to invoke more than 100 concurrent executions, it will be throttled (HTTP 429 TooManyRequestsException). - Other functions in the same account and region cannot use the reserved 100 concurrency; they compete for the remaining pool. - Reserved Concurrency does not pre-warm environments. Cold starts still occur when new execution environments are created.
Key values and defaults: - Regional concurrency limit: 1,000 by default (can be increased via Service Quotas request). - Reserved Concurrency can be any integer from 0 to the regional limit (subject to remaining pool). - Setting Reserved Concurrency to 0 effectively disables the function (all invocations are throttled). - You cannot set Reserved Concurrency higher than the regional limit minus the sum of all other Reserved Concurrency settings.
Configuration: You can set Reserved Concurrency via:
AWS Management Console: Function > Configuration > Concurrency > Edit > Reserve concurrency.
AWS CLI: aws lambda put-function-concurrency --function-name my-function --reserved-concurrent-executions 100
AWS CloudFormation / CDK: ReservedConcurrentExecutions property.
Verification:
- AWS CLI: aws lambda get-function-concurrency --function-name my-function returns ReservedConcurrentExecutions.
- CloudWatch metrics: ConcurrentExecutions metric shows actual concurrency. You can set alarms when it approaches the reserved limit.
Provisioned Concurrency: Pre-Warmed Capacity
Provisioned Concurrency is designed to eliminate cold starts. It initializes a specified number of execution environments ahead of time, so they are ready to process requests immediately. When a request arrives, it is served by one of the pre-warmed environments, resulting in consistent low latency.
How it works internally: - When you set Provisioned Concurrency to, say, 50, Lambda creates and maintains 50 execution environments that are fully initialized (including any code outside the handler, such as database connections or SDK clients). - These environments are kept warm and ready. Each environment can handle one request at a time (concurrency). - If the actual concurrency exceeds the provisioned level, additional environments are created on-demand (with cold starts) up to the function's reserved or unreserved concurrency limit. - Provisioned Concurrency is billed per second for the provisioned environments, even if they are idle. - You can schedule Provisioned Concurrency using Application Auto Scaling with a target tracking policy or scheduled scaling.
Key values and defaults: - Provisioned Concurrency can be set independently of Reserved Concurrency. - You can set Provisioned Concurrency to any value up to the function's reserved concurrency (if set) or the regional limit. - Provisioned Concurrency is applied per function version or alias. You cannot set it on $LATEST. - You can use Application Auto Scaling to automatically adjust Provisioned Concurrency based on utilization (e.g., keep 70% of reserved concurrency provisioned).
Configuration:
- AWS Console: Function > Versions/Aliases > Select version/alias > Actions > Configure Provisioned Concurrency.
- AWS CLI: aws lambda put-provisioned-concurrency-config --function-name my-function --qualifier PROD --provisioned-concurrent-executions 50
- CloudFormation: ProvisionedConcurrencyConfig property on AWS::Lambda::Version or AWS::Lambda::Alias.
Verification:
- AWS CLI: aws lambda get-provisioned-concurrency-config --function-name my-function --qualifier PROD returns Allocated and Available provisioned concurrency.
- CloudWatch metrics: ProvisionedConcurrencyExecutions and ProvisionedConcurrencyUtilization.
Interaction Between Reserved and Provisioned Concurrency
You can use both controls together. For example:
Set Reserved Concurrency to 200 to guarantee capacity.
Set Provisioned Concurrency to 150 on the PROD alias to keep 150 environments warm.
The function can still burst up to 200 (the reserved limit) but the first 150 requests are served without cold starts.
If you set Provisioned Concurrency higher than Reserved Concurrency, the function's effective limit is the Reserved Concurrency. Provisioned Concurrency will be capped at that limit.
Burst Concurrency
When a function has no Reserved or Provisioned Concurrency, it can still scale rapidly. AWS Lambda has a burst concurrency limit per region that varies by account and region. The default burst is between 500 and 3000 concurrent executions per minute, depending on the region. This allows functions to handle sudden spikes but is not guaranteed. For predictable scaling, use Reserved and Provisioned Concurrency.
Throttling Behavior
When a function is throttled:
Synchronous invocations: Return HTTP 429 TooManyRequestsException.
Asynchronous invocations: Automatically retried for up to 6 hours, with exponential backoff (starting at 1 second, doubling up to 5 minutes). Events are queued in the dead-letter queue (DLQ) if configured.
Event source mappings (e.g., DynamoDB Streams, SQS): Throttled invocations are retried with backoff; the event source can fall behind.
Cost Implications
Reserved Concurrency itself has no additional cost; you pay only for the invocations and duration.
Provisioned Concurrency incurs charges for the time the environments are provisioned (per second) even if idle, plus the execution duration when they handle requests.
Over-provisioning Provisioned Concurrency can lead to wasted cost. Under-provisioning leads to cold starts. Use auto-scaling to optimize.
Exam-Relevant Details
The default regional concurrency limit is 1,000 (soft limit).
Reserved Concurrency acts as both a reservation and a hard limit.
Provisioned Concurrency eliminates cold starts for the provisioned amount.
Provisioned Concurrency must be set on a version or alias, not $LATEST.
You can scale Provisioned Concurrency with Application Auto Scaling using target tracking (utilization) or scheduled scaling.
If a function has both Reserved and Provisioned Concurrency, the provisioned amount cannot exceed the reserved amount.
Setting Reserved Concurrency to 0 disables the function.
Burst concurrency is a separate concept; it is not guaranteed and varies by region.
Set Reserved Concurrency on a Function
Identify the mission-critical Lambda function that must always have capacity. Navigate to the function in the AWS Console, go to Configuration > Concurrency, and click 'Reserve concurrency'. Enter the desired number, e.g., 100. This immediately deducts 100 from the regional concurrency pool. The function now has a guaranteed 100 concurrent execution slots. Any attempt to invoke beyond 100 will be throttled. Other functions in the account can only use the remaining 900 of the regional limit. Use the CLI command `aws lambda put-function-concurrency` to set this programmatically. Verify with `aws lambda get-function-concurrency`.
Configure Provisioned Concurrency on an Alias
Create a function alias (e.g., 'PROD') pointing to a specific version. In the Console, select the alias under Versions/Aliases, then choose 'Add Provisioned Concurrency'. Enter the number of environments to pre-warm, e.g., 50. Lambda immediately begins initializing 50 execution environments. This process can take a few minutes. Once ready, the ProvisionedConcurrencyConfig shows 'Allocated' and 'Available' counts as 50. Requests to the alias will be served by these warm environments, eliminating cold starts for the first 50 concurrent requests. Use CLI: `aws lambda put-provisioned-concurrency-config` with `--qualifier PROD`.
Set Up Auto Scaling for Provisioned Concurrency
To automatically adjust Provisioned Concurrency based on traffic, use Application Auto Scaling. First, register the alias as a scalable target: `aws application-autoscaling register-scalable-target --service-namespace lambda --resource-id function:my-function:PROD --scalable-dimension lambda:function:ProvisionedConcurrency --min-capacity 10 --max-capacity 100`. Then create a scaling policy with target tracking: `aws application-autoscaling put-scaling-policy --policy-name my-scaling-policy --service-namespace lambda --resource-id function:my-function:PROD --scalable-dimension lambda:function:ProvisionedConcurrency --policy-type TargetTrackingScaling --target-tracking-scaling-policy-configuration TargetValue=70.0,PredefinedMetricSpecification={PredefinedMetricType=LambdaProvisionedConcurrencyUtilization}`. This maintains 70% utilization of provisioned concurrency.
Monitor Concurrency Metrics in CloudWatch
Key CloudWatch metrics for concurrency: `ConcurrentExecutions` (per function or aggregate), `ProvisionedConcurrencyExecutions` (number of requests served by provisioned environments), `ProvisionedConcurrencyUtilization` (ratio of provisioned concurrency used). Set CloudWatch alarms: e.g., if `ConcurrentExecutions` exceeds 80% of Reserved Concurrency for 1 minute, trigger an SNS notification. For Provisioned Concurrency, if `ProvisionedConcurrencyUtilization` exceeds 90%, consider increasing the provisioned amount or auto-scaling target. Also monitor `Throttles` metric to catch when requests are being rejected.
Test Throttling and Scaling Behavior
Simulate high load using a load testing tool (e.g., Artillery, Serverless Artillery). Invoke the function with increasing concurrency. Observe that when concurrency reaches the Reserved Concurrency limit, subsequent synchronous invocations receive HTTP 429. Asynchronous invocations are queued and retried. For Provisioned Concurrency, note that the first N requests (where N is the provisioned amount) have low latency; beyond that, latency increases due to cold starts. Verify auto-scaling adjusts provisioned concurrency up as utilization increases. Check CloudWatch logs and metrics to validate behavior. This step is crucial for exam scenarios where you must predict how a function behaves under load.
Scenario 1: E-commerce Checkout Service
An online retailer uses Lambda to process checkout requests. The checkout function must handle flash sales with sudden traffic spikes. Without concurrency controls, the function might be throttled if other functions (e.g., image processing) consume all regional concurrency. The team sets Reserved Concurrency to 500 to guarantee capacity for checkout. They also set Provisioned Concurrency to 300 on the PROD alias to eliminate cold starts for the first 300 concurrent users. During a flash sale, traffic peaks at 450 concurrent requests. The first 300 are served without cold starts; the remaining 150 experience a slight delay as new environments are initialized. The function stays within its 500 limit, and no requests are throttled. Auto-scaling adjusts Provisioned Concurrency based on utilization, reducing cost during off-peak hours. Misconfiguration: if Reserved Concurrency is set too low (e.g., 200), the function would throttle at 200, causing failed checkouts and lost revenue.
Scenario 2: Real-Time Data Processing Pipeline
A financial services company ingests stock market data via Kinesis Streams and processes it with a Lambda function. The function must maintain low latency to avoid data backlogs. The team sets Provisioned Concurrency to 100 on the function alias to keep environments warm. They also set Reserved Concurrency to 100 to prevent other functions from starving this pipeline. Because the data volume is predictable, they use scheduled scaling to increase Provisioned Concurrency to 200 during market hours and reduce to 20 overnight. This optimizes cost while ensuring performance. A common mistake: forgetting that Provisioned Concurrency must be set on a version or alias, not $LATEST. If set on $LATEST, the configuration is silently ignored, leading to unexpected cold starts.
Scenario 3: Multi-Function Microservices
A SaaS platform runs dozens of microservices as Lambda functions. One function, responsible for user authentication, is critical and must never be throttled. The team sets Reserved Concurrency to 50 for auth, leaving 950 for other functions. They also set Provisioned Concurrency to 30 on the auth function's PROD alias to ensure fast logins. However, a misbehaving image-resizing function starts consuming 800 concurrent executions due to a bug, leaving only 150 for the remaining functions. The auth function is unaffected because its 50 are reserved. Without Reserved Concurrency, the auth function would have been starved. This illustrates how Reserved Concurrency provides isolation. The team also monitors ConcurrentExecutions per function and sets alarms to detect anomalous consumption.
What Goes Wrong When Misconfigured
Over-provisioning Provisioned Concurrency: Wastes money on idle environments. Example: setting Provisioned Concurrency to 1000 for a function that rarely exceeds 10 concurrent executions results in high costs.
Under-provisioning Reserved Concurrency: Critical functions get throttled during spikes. Example: setting Reserved Concurrency to 10 for a checkout function that sees 100 concurrent users will cause 90% of requests to fail.
Setting Reserved Concurrency too high without accounting for other functions: Can leave insufficient concurrency for other functions, causing them to throttle.
Forgetting to set Provisioned Concurrency on an alias: If you set it on $LATEST, it has no effect. Always use a version or alias.
Not using auto-scaling for variable traffic: Fixed Provisioned Concurrency leads to either over-provisioning or under-provisioning. Use Application Auto Scaling with target tracking based on utilization (e.g., 70%).
What SAA-C03 Tests on This Topic
The exam objectives under Domain 2: Resilient Architectures, Objective 2.1: Choose when to use Lambda concurrency controls. Specifically, you need to know:
How to guarantee capacity for critical functions (Reserved Concurrency).
How to eliminate cold starts (Provisioned Concurrency).
The difference between the two and when to use each.
The default regional concurrency limit (1,000).
How throttling works for synchronous vs. asynchronous invocations.
How to auto-scale Provisioned Concurrency.
Common Wrong Answers and Why Candidates Choose Them
"Provisioned Concurrency guarantees capacity." This is false. Provisioned Concurrency pre-warms environments but does not reserve capacity. If the function has no Reserved Concurrency, other functions can consume the regional pool and cause throttling even if Provisioned Concurrency is set. The correct answer: Reserved Concurrency guarantees capacity; Provisioned Concurrency reduces cold starts.
"Reserved Concurrency eliminates cold starts." False. Reserved Concurrency only reserves capacity; it does not pre-warm environments. Cold starts still occur when new execution environments are created. Provisioned Concurrency eliminates cold starts.
"You can set Provisioned Concurrency on $LATEST." False. Provisioned Concurrency must be set on a specific version or alias. $LATEST is mutable and cannot have provisioned concurrency.
"Setting Reserved Concurrency to 0 removes the limit." False. Setting Reserved Concurrency to 0 disables the function; all invocations are throttled. To remove a limit, set Reserved Concurrency to None (unreserved).
Specific Numbers, Values, and Terms That Appear Verbatim
Default regional concurrency limit: 1,000.
Burst concurrency: 500-3000 per minute (varies by region).
Provisioned Concurrency is billed per second, even if idle.
Reserved Concurrency is not billed separately.
The CLI command for Reserved Concurrency: put-function-concurrency.
The CLI command for Provisioned Concurrency: put-provisioned-concurrency-config.
Application Auto Scaling scalable dimension: lambda:function:ProvisionedConcurrency.
Predefined metric: LambdaProvisionedConcurrencyUtilization.
Edge Cases and Exceptions
If a function has both Reserved and Provisioned Concurrency, the provisioned amount cannot exceed the reserved amount. If you try, the API will reject it or cap it.
If you set Reserved Concurrency on a function that also has event source mappings (e.g., DynamoDB Streams), throttling can cause the event source to fall behind. The exam may ask how to handle this: increase reserved concurrency or use Provisioned Concurrency to reduce processing time.
For asynchronous invocations, throttled events are automatically retried for up to 6 hours. This is important for design scenarios where you need to avoid data loss.
Provisioned Concurrency can be configured per alias, allowing different environments (dev, prod) to have different settings.
How to Eliminate Wrong Answers Using the Underlying Mechanism
When you see an answer choice that says "eliminates cold starts," check if it refers to Provisioned Concurrency. If it says "guarantees capacity," check if it refers to Reserved Concurrency. If it mentions setting on $LATEST, it's wrong. If it says Reserved Concurrency is billed, it's wrong. Use the mechanism: Reserved Concurrency is a reservation in the regional pool; Provisioned Concurrency is a pool of warm environments. They are orthogonal but complementary.
Reserved Concurrency guarantees capacity and acts as a hard limit; Provisioned Concurrency eliminates cold starts but does not guarantee capacity.
Default regional concurrency limit is 1,000 concurrent executions (soft limit).
Provisioned Concurrency must be configured on a function version or alias, not $LATEST.
Setting Reserved Concurrency to 0 disables the function; all invocations are throttled.
Use Application Auto Scaling with target tracking on LambdaProvisionedConcurrencyUtilization to automatically adjust Provisioned Concurrency.
Throttled synchronous invocations return HTTP 429; asynchronous invocations are retried for up to 6 hours.
Reserved and Provisioned Concurrency can be used together; provisioned amount cannot exceed reserved amount.
Burst concurrency (500-3000 per minute) is separate and not guaranteed; use Reserved Concurrency for predictable capacity.
Provisioned Concurrency is billed per second for provisioned environments, plus standard invocation costs.
Monitor CloudWatch metrics: ConcurrentExecutions, ProvisionedConcurrencyExecutions, ProvisionedConcurrencyUtilization, and Throttles.
These come up on the exam all the time. Here's how to tell them apart.
Reserved Concurrency
Guarantees a specific number of concurrent executions for the function.
Acts as a hard cap on concurrency; function cannot exceed this limit.
Does not pre-warm environments; cold starts still occur.
No additional cost beyond standard invocation and duration charges.
Set per function, not per version/alias.
Provisioned Concurrency
Pre-warms a specified number of execution environments to eliminate cold starts.
Does not guarantee capacity; function can still be throttled if regional limit is reached.
Reduces latency for the provisioned amount; beyond that, cold starts may occur.
Incurs charges for provisioned environments per second, even when idle.
Set per version or alias; cannot be set on $LATEST.
Mistake
Provisioned Concurrency guarantees a function will never be throttled.
Correct
Provisioned Concurrency only pre-warms environments; it does not reserve capacity in the regional concurrency pool. If the regional limit is reached, the function can still be throttled. To guarantee no throttling, you must also set Reserved Concurrency.
Mistake
Reserved Concurrency eliminates cold starts.
Correct
Reserved Concurrency only guarantees that a certain number of concurrent executions are available for the function. It does not pre-initialize environments. Cold starts still occur when new environments are created. Provisioned Concurrency is needed to eliminate cold starts.
Mistake
You can set Provisioned Concurrency on the $LATEST version of a function.
Correct
Provisioned Concurrency can only be configured on a specific version or an alias. $LATEST is mutable and cannot have provisioned concurrency. Attempting to set it on $LATEST will either fail or be ignored.
Mistake
Setting Reserved Concurrency to 0 removes the concurrency limit.
Correct
Setting Reserved Concurrency to 0 disables the function entirely. All invocations are immediately throttled. To remove a reserved concurrency limit, you must set it to 'None' (unreserved) via the console or CLI.
Mistake
Provisioned Concurrency is free; you only pay for invocations.
Correct
Provisioned Concurrency incurs charges for the time environments are provisioned (per second), even if they are idle. Additionally, you pay for the duration when requests are processed. It is not free.
Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.
Reserved Concurrency guarantees that a function always has a specific number of concurrent execution slots available, protecting it from being throttled by other functions. It acts as a hard cap. Provisioned Concurrency pre-initializes a set number of execution environments so they are ready to serve requests instantly, eliminating cold starts. Provisioned Concurrency does not reserve capacity; the function can still be throttled if the regional limit is reached. You can use both together: Reserved Concurrency for capacity guarantee and Provisioned Concurrency for low latency.
No. Provisioned Concurrency can only be configured on a specific function version (e.g., version 1, 2) or an alias (e.g., PROD). The $LATEST version is mutable and cannot have provisioned concurrency. If you need provisioned concurrency for a function that is frequently updated, use an alias that points to the desired version and update the alias when you publish a new version.
The function will be throttled. For synchronous invocations, the caller receives an HTTP 429 TooManyRequestsException error. For asynchronous invocations, the event is automatically retried for up to 6 hours with exponential backoff. If a dead-letter queue (DLQ) is configured, failed events after retries are sent to the DLQ. For event source mappings (e.g., DynamoDB Streams), throttling causes the event source to retry with backoff, potentially falling behind.
Use Application Auto Scaling. First, register the function alias as a scalable target with a min and max capacity. Then create a scaling policy with target tracking using the predefined metric LambdaProvisionedConcurrencyUtilization. Set a target value (e.g., 70%). AWS will automatically adjust the provisioned concurrency to maintain that utilization. You can also use scheduled scaling for predictable traffic patterns.
No. Reserved Concurrency itself does not incur any additional charges. You only pay for the actual invocations and duration of your Lambda function. However, Provisioned Concurrency does incur charges for the time environments are provisioned, even if idle.
The default regional concurrency limit is 1,000 concurrent executions per region. This is a soft limit that can be increased by opening a support ticket with AWS. The burst concurrency limit (how quickly Lambda can scale) is separate and varies by region, typically between 500 and 3,000 per minute.
No. If you have set Reserved Concurrency on a function, the Provisioned Concurrency cannot exceed the reserved amount. If you attempt to set it higher, the API will either reject the request or cap the provisioned concurrency at the reserved concurrency value. This ensures that the function's total concurrency (provisioned + on-demand) does not exceed the reserved limit.
You've just covered Lambda Concurrency: Reserved and Provisioned — now see how well it sticks with free SAA-C03 practice questions. Full explanations included, no account needed.
Done with this chapter?