SAA-C03Chapter 125 of 189Objective 3.3

Lambda Performance: Memory, Cold Starts, and Init Duration

Under the High Performance exam objective for SAA-C03, Lambda performance—specifically memory allocation, cold starts, and init duration—is a critical topic. Understanding these concepts is essential for designing cost-efficient and responsive serverless applications. Approximately 10-15% of exam questions touch Lambda performance, often in scenarios involving latency-sensitive workloads or cost optimization. You will learn the precise mechanisms, default values, and configuration options that the exam tests.

25 min read

Intermediate

Updated Jul 20, 2026

Reviewed by Johnson Ajibi· Senior Network & Security Engineer · MSc IT Security

Jump to a section

Explain it to me simply Where people get tripped up Test what I know Look up key terms

Restaurant Kitchen with Prep Chefs

How does a restaurant kitchen prepare dishes on demand? The kitchen has a limited number of burners (CPU) and counter space (memory). A chef (Lambda function) can use more burners to cook faster, but only up to the maximum available. When a new order arrives, if the kitchen is idle (no previous orders), the chef must first fetch ingredients from the pantry (download code from S3) and preheat the oven (initialize runtime) before cooking — this is the cold start. If orders keep coming, the chef stays at the station, ready to cook immediately (warm start). The restaurant manager can allocate more burners (increase memory) to speed up cooking, but that also costs more. The chef's prep time (init duration) is the time to get ready, which happens only once per idle period. If the kitchen is left idle for too long (e.g., 5 minutes), the chef goes on a break and the next order suffers a cold start. The restaurant can also reserve a chef (provisioned concurrency) to keep the kitchen always warm, paying for idle time.

How It Actually Works

What is Lambda Memory and How Does It Affect Performance?

AWS Lambda allocates CPU proportionally to the memory you configure. The memory setting ranges from 128 MB to 10,240 MB (10 GB) in 1 MB increments. CPU power scales linearly with memory: doubling memory roughly doubles the available vCPU. This is because Lambda uses a proportional share of underlying physical CPUs. For example, a function with 1,769 MB gets one full vCPU; above that, it gets more than one. The exact CPU allocation is not documented but follows a linear model. Therefore, increasing memory not only gives more RAM but also more compute power, often reducing execution time. This is a key exam point: memory is the primary performance lever.

Cold Starts: The Init Phase

A cold start occurs when a Lambda function is invoked after being idle for a period, or when a new execution environment is created. The cold start consists of two phases: Init and Invoke. The Init phase includes: - Extension init: Starts external extensions (e.g., monitoring agents). - Runtime init: Loads the language runtime (e.g., Python 3.9, Node.js 18). - Function init: Runs the static code outside the handler (e.g., imports, global variables).

The Init phase duration is called Init Duration, which appears in CloudWatch Logs as INIT_START. The maximum Init Duration is 10 seconds for all runtimes except custom runtimes (which have a limit of 15 seconds). If Init exceeds this, the invocation fails.

After Init, the function code is executed. Cold starts add the Init Duration to the overall latency. Warm starts have no Init Duration because the execution environment is reused.

How Lambda Reuses Execution Environments

Lambda keeps an execution environment alive for a period of time after a function finishes, expecting another invocation. The idle timeout is not publicly documented but is typically around 5-15 minutes. During this time, subsequent invocations reuse the same environment, avoiding cold starts. If no new invocation arrives within the idle timeout, Lambda terminates the environment. The next invocation triggers a cold start.

Provisioned Concurrency: Eliminating Cold Starts

Provisioned Concurrency (PC) pre-initializes a number of execution environments, keeping them warm and ready to serve invocations instantly. This eliminates cold starts for those environments. You set a number of provisioned environments, and Lambda initializes them ahead of time. PC incurs costs even when not in use.

Reserved Concurrency vs Provisioned Concurrency

Reserved Concurrency (RC) guarantees that a function can scale to a certain level, preventing other functions from using that capacity. However, RC does not prevent cold starts; it only reserves capacity. PC is used specifically to avoid cold starts. The exam often tests the difference: Reserved Concurrency for capacity guarantees, Provisioned Concurrency for latency.

Impact of Memory on Cost

Lambda pricing is based on memory and execution time (in GB-seconds). Increasing memory reduces execution time but increases the per-second cost. The total cost is calculated as: memory (in GB) * duration (in seconds) * price per GB-second. The price per GB-second is $0.0000166667 for x86 and $0.0000133334 for Arm (Graviton). Because CPU scales with memory, there is often an optimal memory setting that minimizes cost for a given workload. The exam may ask you to choose between increasing memory to reduce time versus keeping memory low to save cost.

Lambda SnapStart: Reducing Cold Starts for Java

SnapStart is a feature for Java functions that reduces cold start latency by taking a snapshot of the execution environment after the Init phase, then restoring it on subsequent cold starts. This can cut Init Duration from seconds to milliseconds. SnapStart is enabled at the function version level and requires the function code to be deterministic (no unique identifiers generated at init).

Configuring Memory and Concurrency

Memory can be set in the Lambda console, CLI, or Infrastructure as Code (e.g., CloudFormation). The command to update memory:

aws lambda update-function-configuration --function-name my-function --memory-size 1024

Reserved Concurrency is set similarly:

aws lambda put-function-concurrency --function-name my-function --reserved-concurrent-executions 100

Provisioned Concurrency is set per alias or version:

aws lambda put-provisioned-concurrency-config --function-name my-function --qualifier prod --provisioned-concurrent-executions 50

Monitoring Lambda Performance

CloudWatch Logs record Init Duration for each cold start. You can see it in the log stream as a line like:

INIT_START Runtime Version: python:3.9.v8	Runtime Version ARN: arn:aws:lambda:us-east-1::runtime:...

CloudWatch Metrics include: - Duration: Execution time in milliseconds. - Init Duration: Time spent in the Init phase (only for cold starts). - ConcurrentExecutions: Number of function instances running concurrently. - ProvisionedConcurrencySpilloverInvocations: Invocations that used standard concurrency because PC was exhausted.

The exam expects you to interpret these metrics to identify cold start issues.

Interacting with Other Services

Lambda is often fronted by API Gateway, which has its own integration timeout (29 seconds for REST APIs, 30 seconds for HTTP APIs). If cold start plus execution time exceeds the timeout, the client gets a 504 error. Similarly, Application Load Balancer has a 60-second idle timeout. For event sources like S3 or SNS, cold starts add latency but typically do not cause failures because those services have retry mechanisms.

Performance Tuning Strategies

Increase memory: Reduces execution time, may reduce cost if time reduction outweighs memory increase.

Use Provisioned Concurrency: For latency-sensitive applications, especially those with predictable traffic.

Optimize code: Minimize imports, use connection reuse (e.g., database connections outside handler).

Choose appropriate runtime: Python and Node.js have faster cold starts than Java or .NET.

Enable SnapStart for Java: Reduces cold start to single-digit milliseconds.

Use Lambda@Edge or CloudFront Functions: For edge computing with even lower latency.

The exam often presents scenarios where you must decide between increasing memory or using Provisioned Concurrency to meet latency requirements.

Walk-Through

Function Invocation Request Arrives

An event source (e.g., API Gateway, S3, SNS) triggers a Lambda invocation. The request is sent to the Lambda service, which checks if an idle execution environment is available for the function. If yes, the environment is reused (warm start). If not, a new environment must be created (cold start). The service also checks if Provisioned Concurrency environments are available; if so, those are used first.

Cold Start: Init Phase Begins

If no warm environment exists, Lambda allocates a new execution environment (a microVM). The Init phase starts: first, any Lambda extensions are initialized (e.g., Datadog, New Relic). Then the runtime is loaded (e.g., Python interpreter). Finally, the function's static code outside the handler is executed (e.g., importing libraries, establishing database connections). The total Init Duration is measured and logged. This phase has a maximum duration of 10 seconds for standard runtimes.

Cold Start: Invoke Phase Executes

After Init completes, the handler function is invoked with the event payload. The execution time is measured as Duration. Any logs generated are sent to CloudWatch. The function returns a response. If the function throws an error, Lambda may retry depending on the event source. After the invocation completes, the environment is kept idle for a period (typically 5-15 minutes) before being terminated.

Warm Start: Environment Reuse

If a subsequent invocation arrives before the idle timeout, Lambda reuses the same execution environment. The Init phase is skipped entirely. The handler is invoked directly. This results in much lower latency (often single-digit milliseconds). The environment may process multiple invocations sequentially, but never concurrently — each environment handles one invocation at a time.

Idle Timeout and Environment Termination

After the function completes, Lambda starts an idle timer. The exact timeout is not documented but is approximately 5-15 minutes. If no new invocation arrives within that period, Lambda terminates the environment (reclaims resources). The next invocation will then experience a cold start. For functions with Provisioned Concurrency, the environments are kept indefinitely (as long as PC is configured).

What This Looks Like on the Job

Scenario 1: Real-Time API Backend

A financial services company builds a real-time stock price API using API Gateway and Lambda. The function queries a database and returns the latest price. They set memory to 512 MB, but cold starts cause latency spikes up to 3 seconds, exceeding the 1-second SLA. They enable Provisioned Concurrency with 20 environments, reducing p99 latency to under 200 ms. They also increase memory to 1 GB to shorten execution time. The cost increases but meets SLA.

Scenario 2: Image Processing Pipeline

A media company uses Lambda to resize images uploaded to S3. The function uses a large library (Pillow) for image processing, causing cold starts of 5-6 seconds. They optimize by using Lambda Layers to pre-package the library, reducing Init Duration to 2 seconds. They also increase memory from 128 MB to 1,024 MB, which reduces execution time from 10 seconds to 2 seconds, lowering overall cost despite higher memory.

Scenario 3: IoT Data Ingestion

An IoT company ingests sensor data via Kinesis Firehose to Lambda for transformation. The function runs every few minutes, so cold starts are frequent. They try using Provisioned Concurrency but find it too expensive because the function is idle most of the time. Instead, they schedule a CloudWatch Events rule to invoke the function every 5 minutes (a 'warm-up' ping) to keep the environment alive. This reduces cold start occurrences but adds a small cost for the warm-up invocations.

Common Misconfigurations

Setting memory too low: causes long execution times and higher cost due to time-based pricing.

Not using Provisioned Concurrency for latency-sensitive apps: leads to unpredictable latency.

Over-provisioning Provisioned Concurrency: wastes money on idle capacity.

Not optimizing code: large dependencies increase Init Duration.

Using Java without SnapStart: cold starts can exceed 10 seconds.

How SAA-C03 Actually Tests This

What SAA-C03 Tests

Objective 3.3: Design for high performance. Questions focus on choosing between memory increase and Provisioned Concurrency to meet performance requirements.

Objective 2.3: Design cost-optimized architectures. Questions ask about the cost implications of memory and Provisioned Concurrency.

Objective 1.1: Design for high availability. Cold start impact on availability in event-driven architectures.

Common Wrong Answers

"Reserved Concurrency eliminates cold starts" – Wrong. Reserved Concurrency only guarantees capacity; cold starts still occur. Provisioned Concurrency eliminates cold starts.

"Increase memory to always reduce cost" – Wrong. Increasing memory increases per-second cost; if execution time doesn't decrease proportionally, cost may increase.

"Cold starts only happen on first invocation" – Wrong. Cold starts happen after idle periods (5-15 minutes) and when scaling up.

"SnapStart works with any runtime" – Wrong. SnapStart is only for Java.

Numbers and Terms to Memorize

Memory range: 128 MB to 10,240 MB (10 GB).

Init Duration limit: 10 seconds (15 for custom runtimes).

Provisioned Concurrency: per-alias or version.

Idle timeout: ~5-15 minutes (not documented exactly).

Price per GB-second: $0.0000166667 (x86), $0.0000133334 (Graviton).

SnapStart: reduces Init Duration from seconds to milliseconds for Java.

Edge Cases

Throttling: If concurrent executions exceed the account limit (1,000 by default) or reserved concurrency, invocations are throttled (return 429).

Provisioned Concurrency Spillover: If PC environments are all busy, additional invocations use standard concurrency (cold starts possible).

VPC Lambda: Functions in a VPC have additional cold start latency because an ENI must be created (can be mitigated with VPC-aware configuration).

Eliminating Wrong Answers

If a question mentions "cold start latency" and offers Reserved Concurrency as a solution, eliminate it immediately.

If a question asks for "lowest cost" and the function runs infrequently, choose lower memory (e.g., 128 MB) even if execution time is longer.

If a question involves a latency-sensitive API with unpredictable traffic, Provisioned Concurrency is the answer.

Key Takeaways

Memory range: 128 MB to 10,240 MB; CPU scales linearly with memory.

Cold starts include Init Duration (max 10 seconds for standard runtimes).

Provisioned Concurrency eliminates cold starts; Reserved Concurrency does not.

SnapStart reduces cold starts for Java functions only.

Idle timeout is approximately 5-15 minutes (not documented).

Cost = memory (GB) * duration (seconds) * price per GB-second.

Increasing memory can reduce cost if execution time decreases proportionally.

VPC Lambda adds ENI creation time to cold starts.

CloudWatch metrics: Duration, Init Duration, ConcurrentExecutions, ProvisionedConcurrencySpilloverInvocations.

API Gateway integration timeout: 29 seconds (REST), 30 seconds (HTTP).

Easy to Mix Up

These come up on the exam all the time. Here's how to tell them apart.

Increase Memory

Increases CPU proportionally, reducing execution time.

Affects cost: may increase or decrease total cost.

Does not eliminate cold starts.

Simple to configure (single memory parameter).

Suitable for CPU-bound tasks.

Use Provisioned Concurrency

Eliminates cold starts by pre-initializing environments.

Incurs cost even when environments are idle.

Does not affect execution time (only init time).

Requires configuring number of environments per alias/version.

Suitable for latency-sensitive applications with predictable traffic.

Watch Out for These

Mistake

Cold starts only happen when a function is first created.

Correct

Cold starts happen whenever a new execution environment is created, which occurs after idle periods (5-15 minutes) and during scale-ups (when concurrent invocations exceed the number of warm environments).

Mistake

Increasing memory always increases cost.

Correct

Increasing memory may reduce execution time enough that total GB-seconds decrease, lowering cost. The relationship is workload-dependent.

Mistake

Reserved Concurrency eliminates cold starts.

Correct

Reserved Concurrency only reserves capacity; it does not pre-initialize environments. Cold starts still occur. Provisioned Concurrency eliminates cold starts.

Mistake

SnapStart works with all Lambda runtimes.

Correct

SnapStart is only available for Java 11 and later runtimes. It does not apply to Python, Node.js, or other runtimes.

Mistake

Provisioned Concurrency guarantees no cold starts for all invocations.

Correct

Provisioned Concurrency eliminates cold starts only for the number of environments configured. If all provisioned environments are busy, additional invocations use standard concurrency and may experience cold starts.

Do You Actually Know This?

Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.

Frequently Asked Questions

What is the maximum memory for a Lambda function?

The maximum memory is 10,240 MB (10 GB). You can configure from 128 MB to 10,240 MB in 1 MB increments. CPU scales proportionally with memory, so higher memory gives more compute power.

What is the difference between Reserved Concurrency and Provisioned Concurrency?

Reserved Concurrency guarantees that a function can scale to a certain number of concurrent executions, preventing other functions from using that capacity. It does not prevent cold starts. Provisioned Concurrency pre-initializes a set number of execution environments, keeping them warm to eliminate cold starts. Use Reserved Concurrency for capacity guarantees, Provisioned Concurrency for latency.

How long does Lambda keep an execution environment idle?

The idle timeout is not publicly documented, but it is typically between 5 and 15 minutes. If no new invocation arrives within that period, the environment is terminated, and the next invocation experiences a cold start.

Does increasing memory always reduce execution time?

Generally yes, because CPU scales with memory. However, if the function is not CPU-bound (e.g., waiting on network I/O), the benefit may be minimal. The exam expects you to know that increasing memory can reduce execution time but may increase cost.

What is Lambda SnapStart and when should I use it?

SnapStart is a feature for Java functions that reduces cold start latency by taking a snapshot of the execution environment after init and restoring it on subsequent cold starts. It can reduce Init Duration from seconds to milliseconds. Use it for Java functions where cold starts are a problem.

How can I monitor Lambda cold starts?

Cold starts are logged in CloudWatch Logs as `INIT_START` lines, and the Init Duration is recorded in CloudWatch Metrics under the `Init Duration` metric. You can also use CloudWatch Logs Insights to search for `INIT_START` events.

What happens if Lambda Init Duration exceeds 10 seconds?

If the Init phase takes longer than 10 seconds (15 for custom runtimes), the invocation fails. This is a common issue with large Java functions. Using SnapStart or optimizing code can help stay within the limit.

Terms Worth Knowing

Lambda Region Serverless

Ready to put this to the test?

You've just covered Lambda Performance: Memory, Cold Starts, and Init Duration — now see how well it sticks with free SAA-C03 practice questions. Full explanations included, no account needed.

Try SAA-C03 practice questions Back to all chapters

Done with this chapter?

EKS Networking: AWS VPC CNI Plugin

Lambda Layers and Container Images

See the full SAA-C03 study guide