DVA-C02Chapter 91 of 101Objective 4.2

Lambda Cold Start vs Warm Start Optimisation

This chapter covers Lambda cold start vs warm start optimization, a critical topic for the DVA-C02 exam under Domain 4: Troubleshooting, Objective 4.2 (Optimize performance). Cold starts directly impact latency, cost, and user experience. Expect 3–5 exam questions on this topic, often testing your ability to identify cold start causes, mitigation strategies, and trade-offs. Mastering this material will help you design serverless applications that meet performance requirements while controlling costs.

25 min read
Intermediate
Updated May 31, 2026

Lambda Cold Start as Idle Restaurant Kitchen

Imagine a restaurant kitchen that only fires up when an order comes in. Normally, the kitchen is fully staffed with chefs at their stations, ingredients prepped, and ovens preheated — this is a warm start. An order can be cooked immediately. However, if the kitchen has been idle for more than, say, 15 minutes, the chefs leave, ingredients are put back in storage, and the ovens cool down. When a new order arrives (a cold start), the manager must first call in chefs, wait for them to arrive, retrieve ingredients from the walk-in cooler, and preheat the ovens. This setup phase takes extra time before any cooking can begin. The duration of this idle timeout is configurable by the restaurant owner. In AWS Lambda, the analogy maps as follows: the chefs are the runtime environment (e.g., Node.js process), the ingredients are the function code and dependencies loaded into memory, and the preheated ovens are the Execution Context (including any TCP connections or SDK clients initialized outside the handler). AWS Lambda keeps the Execution Context warm for a period (typically 5–15 minutes) after the last invocation. If no new request arrives within that window, the context is destroyed, and the next invocation must go through the full cold start cycle: downloading the code, initializing the runtime, running initialization code, and then executing the handler. The restaurant manager's callback to the chef is analogous to the Lambda service assigning a new Execution Context to a specific function version. Just as a restaurant can optimize by keeping a skeleton crew during slow hours, developers can use Provisioned Concurrency to keep a specified number of contexts always warm, eliminating cold starts for predictable traffic.

How It Actually Works

What is a Cold Start and Why Does It Exist?

A cold start occurs when a Lambda function is invoked after a period of inactivity, requiring AWS Lambda to create a new Execution Context from scratch. This involves:

Downloading the function code (from S3 or ECR if using container images).

Starting a new runtime environment (e.g., a Node.js process, Python interpreter, or JVM).

Running the function's initialization code (code outside the handler).

Finally, executing the handler with the event.

The cold start latency typically ranges from 200 ms to 2 seconds, but can exceed 10 seconds for large Java functions or container images. In contrast, a warm start reuses an existing Execution Context, bypassing the initialization steps, and completes in tens of milliseconds.

Cold starts exist because AWS Lambda allocates resources on demand to maximize utilization and minimize cost. Idle Execution Contexts are recycled after a period of inactivity (usually 5–15 minutes). The exact timeout is not documented but is influenced by function memory, runtime, and VPC configuration.

How Cold Starts Work Internally

When you invoke a Lambda function, the service performs these steps:

1.

Request Routing: The Invoke API receives the request and routes it to the appropriate Lambda worker (a sandboxed environment on a fleet of EC2 instances).

2.

Context Lookup: The worker checks if a warm Execution Context exists for the specific function version and alias. If yes, the context is reused (warm start). If not, a cold start begins.

3.

Code Download: The Lambda service fetches the deployment package from S3 (or pulls the container image from ECR). For large packages, this adds latency.

4.

Runtime Initialization: A new runtime process is started. For Java, this includes JVM startup and class loading. For Python, importing modules. For Node.js, loading modules and initializing the event loop.

5.

Init Code Execution: Any code outside the handler function runs. This includes creating database connections, initializing SDK clients, loading configuration files, or populating caches.

6.

Handler Invocation: The handler function is called with the event and context objects. The response is returned.

During a warm start, steps 3–5 are skipped. The existing Execution Context already has the code loaded and init code already run. The handler is invoked directly.

Key Components, Values, Defaults, and Timers

Execution Context: A sandboxed environment that includes the runtime, function code, and any initialized resources. It is isolated per function version and alias.

Idle Timeout: The duration an Execution Context remains idle before being destroyed. Not configurable or documented, but observed to be around 5–15 minutes. Activity (invocations) resets the timer.

Provisioned Concurrency: A feature that keeps a specified number of Execution Contexts initialized and ready to serve requests. It eliminates cold starts for those contexts. You pay for the provisioned concurrency even when not in use.

Reserved Concurrency: Limits the maximum number of concurrent executions for a function. It does not prevent cold starts but can help control scaling.

Function Memory: More memory (up to 10,240 MB) proportionally allocates more CPU, reducing initialization time. For CPU-bound init code, higher memory speeds up cold starts.

Runtime: Interpreted runtimes (Python, Node.js) have faster cold starts than compiled ones (Java, .NET). Custom runtimes via provided.al2 or container images add overhead.

VPC Configuration: Functions attached to a VPC get an Elastic Network Interface (ENI) attached during initialization, adding 5–10 seconds to cold starts. ENI attachment is a one-time cost per Execution Context.

Package Size: Deployment packages larger than 50 MB take longer to download. AWS recommends keeping packages under 3 MB for optimal cold start performance.

Configuration and Verification Commands

You can monitor cold starts using CloudWatch Logs and Metrics:

CloudWatch Logs: Each invocation logs a REPORT line containing Init Duration (time spent in initialization) and Duration (handler execution time). If Init Duration is present, it was a cold start.

CloudWatch Metric: ColdStart is a custom metric you can emit from your code. AWS does not provide a built-in metric.

X-Ray: Enable active tracing to see the initialization phase in the trace timeline.

Example CloudWatch Log line:

REPORT RequestId: 1234...	Duration: 150.12 ms	Billed Duration: 300 ms	Memory Size: 128 MB	Max Memory Used: 64 MB	Init Duration: 400.50 ms

To view cold start frequency, you can parse logs with CloudWatch Logs Insights:

filter @type = "REPORT"
| stats count() as invocations, count(@initDuration > 0) as cold_starts
| filter @initDuration > 0

How Cold Starts Interact with Related Technologies

API Gateway: REST and HTTP APIs add their own latency (typically 10–50 ms). Cold start latency adds on top. For real-time applications, use WebSocket API or Provisioned Concurrency.

Application Load Balancer (ALB): ALB target groups can invoke Lambda functions. ALB has a keepalive timeout (default 60 seconds) that may cause cold starts if traffic is sporadic.

Step Functions: Express Workflows can invoke Lambda directly. Standard Workflows have a 1-second minimum billing, but cold starts still affect latency.

DynamoDB Streams / SQS / S3 Events: These event sources batch records and invoke Lambda. The batch window can aggregate invocations, reducing cold start frequency.

Lambda@Edge: Runs at CloudFront edge locations. Cold starts are more frequent due to lower traffic per location. Provisioned Concurrency is not supported for Lambda@Edge.

Container Images: Functions packaged as container images (up to 10 GB) have longer cold starts because the image must be pulled from ECR. Use ECR lifecycle policies to optimize.

Mitigation Strategies

1.

Provisioned Concurrency: For latency-sensitive functions, allocate a minimum number of warm contexts. Combine with Application Auto Scaling to adjust based on demand.

2.

Keep Functions Warm with Scheduled Events: Use CloudWatch Events to invoke the function every 5 minutes. However, this is unreliable and wastes resources.

3.

Optimize Init Code: Move heavy initialization outside the handler but only if it's reusable across invocations. For Java, use SnapStart (see below).

4.

Use SnapStart (Java only): AWS Lambda SnapStart takes a snapshot of the initialized Execution Context and restores it on cold start, reducing init time from seconds to milliseconds. Available for Java 11 and later.

5.

Reduce Package Size: Use layers for shared dependencies. Exclude unnecessary files. Use the smallest possible runtime.

6.

Increase Memory: More memory reduces both init duration and handler duration (for CPU-bound tasks).

7.

Avoid VPC if Possible: If your function doesn't need VPC resources, don't attach a VPC. If needed, use VPC endpoints for AWS services to reduce ENI attachment time.

8.

Use Lambda Response Streaming: For functions that return large payloads, streaming can reduce time-to-first-byte but doesn't affect cold start.

Trade-offs

Provisioned Concurrency costs money even when idle. Estimate cost: $0.0000041667 per GB-second for provisioned concurrency vs $0.0000133334 per GB-second for on-demand.

Keeping functions warm with scheduled events still incurs invocation costs and may not prevent all cold starts due to context recycling.

SnapStart has limitations: not supported for all runtimes, ephemeral storage is not persisted, and some frameworks may not be compatible.

Increasing memory increases cost per invocation but may reduce duration, potentially lowering overall cost.

Walk-Through

1

Lambda receives invocation request

The Invoke API call arrives at the Lambda service endpoint. The service authenticates the request, checks permissions (IAM and resource-based policies), and validates the event payload. It then routes the request to a specific Lambda worker based on function version and alias. The worker maintains a cache of Execution Contexts indexed by function ARN. If the function has Provisioned Concurrency, the worker preferentially uses those contexts. This step typically takes less than 10 ms.

2

Worker checks for warm context

The worker examines its local cache for an Execution Context that matches the function version, alias, and qualifier. A warm context exists if the same function was invoked recently (within the idle timeout window). The worker also checks if the context is currently in use (busy). If a warm context is available and not busy, the request is assigned to it, skipping initialization. If no warm context exists, the worker initiates a cold start by requesting a new sandbox environment from the Lambda control plane.

3

Code download and sandbox creation

For a cold start, the Lambda control plane allocates a new sandbox (a lightweight VM or container) on a worker. It downloads the function code from S3 (or pulls the container image from ECR). The download time depends on package size and network bandwidth. For a 10 MB package, this takes about 100–200 ms. For large container images (1 GB+), it can take several seconds. The sandbox is isolated using cgroups and namespaces. After download, the code is placed in /tmp (512 MB to 10,240 MB depending on memory).

4

Runtime and init code execution

The runtime process starts (e.g., node, python3, java). For Node.js, this involves loading the event loop and module system. For Java, the JVM starts, loads classes, and runs static initializers. The Lambda runtime then executes the code outside the handler (global scope). This is where you typically create SDK clients, database connections, or load configuration. AWS recommends initializing these once and reusing them across invocations. The init duration is logged as `Init Duration` in CloudWatch. This step is the most variable, ranging from 50 ms (Python) to several seconds (Java).

5

Handler invocation and response

After initialization, the runtime calls the handler function with the event and context objects. The handler executes synchronously (or asynchronously for async invocations). For synchronous invocations (e.g., API Gateway), the response is returned directly. For async invocations, the function is queued and the service returns 202 immediately. The handler's execution time is logged as `Duration`. After the handler returns, the Execution Context remains warm for future invocations, subject to the idle timeout. If the function throws an unhandled error, the context may be destroyed.

What This Looks Like on the Job

Scenario 1: Real-time Chat Application

A company builds a real-time chat service using API Gateway WebSocket API and Lambda. Users expect messages to be delivered in under 200 ms. Without optimization, cold starts cause delays of 1–2 seconds, leading to poor user experience. The team configures Provisioned Concurrency to maintain 50 warm contexts, matching the expected concurrent users. They also enable SnapStart for the Java runtime, reducing cold start init time from 2 seconds to 200 ms. The function uses an RDS Proxy connection to avoid VPC cold start penalties. During traffic spikes, Application Auto Scaling increases provisioned concurrency based on the ProvisionedConcurrencyUtilization metric. The solution keeps p95 latency under 150 ms. Misconfiguration example: Setting provisioned concurrency too low causes cold starts during peak hours; setting it too high increases costs unnecessarily.

Scenario 2: Data Processing Pipeline

A financial services firm processes stock trade records using Lambda triggered by SQS. The function transforms data and writes to DynamoDB. Cold starts are acceptable because the pipeline is batch-oriented, but they increase overall processing time. The team uses a Lambda function with 3,008 MB memory to speed up initialization. They package dependencies in a Lambda layer (common library) to reduce deployment package size to 1 MB. They avoid VPC by using DynamoDB Table Class (Standard) and SQS via VPC Endpoints (no ENI attachment). The function is invoked in batches of 10 messages (batch window of 60 seconds). With these optimizations, cold starts occur only once per scaling event, and the average processing time per batch is under 5 seconds. Misconfiguration: Using a VPC without endpoints forces an ENI attachment, adding 10 seconds to cold starts and increasing cost.

Scenario 3: Serverless Web API

A startup deploys a REST API using API Gateway and Lambda. Traffic is unpredictable — sometimes 100 requests per second, sometimes idle for hours. Cold starts are frequent during idle periods. The team implements a two-pronged approach: (1) They use a scheduled CloudWatch Event to invoke the function every 5 minutes with a dummy event to keep it warm. However, this only works if the function has consistent traffic patterns. (2) They set Reserved Concurrency to 5 to limit scaling, but this causes throttling during spikes. They eventually switch to Provisioned Concurrency with 10 warm contexts and Application Auto Scaling based on request count. They also use a language with fast cold starts (Node.js, 128 MB). The API now responds in under 100 ms for warm requests and under 300 ms for cold requests. Misconfiguration: Relying solely on scheduled events leads to cold starts when the function is invoked during the gap between events.

How DVA-C02 Actually Tests This

DVA-C02 Exam Focus: Lambda Cold Start vs Warm Start Optimization (Objective 4.2)

The exam tests your understanding of cold start causes, mitigation strategies, and trade-offs. You will be asked to choose the best optimization for a given scenario. Key areas:

Provisioned Concurrency: Know that it eliminates cold starts but incurs cost even when idle. Exam questions often ask when to use it vs. reserved concurrency.

SnapStart: Available only for Java 11 and newer. It reduces cold start init time by restoring from a snapshot. Not compatible with all libraries (e.g., those that generate unique state at startup).

VPC Impact: Functions in a VPC have an additional 5–10 second cold start due to ENI attachment. The exam will test whether to use VPC endpoints or RDS Proxy to mitigate.

Init Duration vs Duration: Understand that Init Duration appears only in cold starts. The exam may ask you to identify a cold start from logs.

Memory and CPU: More memory reduces init duration because it allocates more CPU proportionally. The exam might ask about the relationship.

Common Wrong Answers

1. "Use Reserved Concurrency to prevent cold starts." - Wrong because reserved concurrency only limits concurrency, it does not keep contexts warm. Candidates confuse it with provisioned concurrency.

2. "Cold starts only happen when the function is idle for more than 15 minutes." - Wrong because the idle timeout is not fixed; it can be shorter depending on load. Also, cold starts happen on first invocation after deployment or when scaling up under load, even if the function has been active.

3. "Increase timeout to reduce cold starts." - Wrong because timeout is the maximum execution time, not idle time. Increasing timeout does not keep contexts warm.

4. "Use a custom runtime to speed up cold starts." - Wrong because custom runtimes (provided.al2) add overhead. The fastest cold starts are with interpreted runtimes like Python or Node.js.

Specific Numbers and Terms

Init Duration: Measured in milliseconds, logged only for cold starts.

Provisioned Concurrency: You pay $0.0000041667 per GB-second.

SnapStart: Reduces init time from >1 second to <200 ms.

VPC Cold Start: Adds 5–10 seconds.

Package Size: Keep under 3 MB for optimal cold start.

Memory Range: 128 MB to 10,240 MB.

Edge Cases

Function with multiple versions/aliases: Each version/alias has its own Execution Context pool. Cold starts occur per version.

Concurrent invocations: If all warm contexts are busy, new invocations trigger cold starts even if the function has been recently used.

Container images: Cold starts can take 10+ seconds due to image download. Use ECR lifecycle hooks to cache images.

Lambda@Edge: No provisioned concurrency. Cold starts are frequent at edge locations.

How to Eliminate Wrong Answers

If the question mentions "eliminate cold starts" and uses the word "reserved," it's likely wrong. Look for "provisioned."

If the question involves Java and latency, consider SnapStart.

If the function is VPC-attached and latency-sensitive, the answer likely involves VPC endpoints or RDS Proxy.

If the question asks about cost savings, remember that provisioned concurrency costs more than on-demand when idle.

Key Takeaways

Cold starts occur when a new Execution Context is created, adding 200 ms to 10+ seconds of latency.

Provisioned Concurrency is the primary method to eliminate cold starts; it keeps contexts warm but costs money.

SnapStart (Java 11+) reduces cold start init time by restoring from a snapshot.

VPC-attached functions suffer additional 5–10 seconds cold start due to ENI creation.

Init Duration is logged in CloudWatch only for cold starts; it measures initialization time.

More memory reduces both init and execution duration due to proportional CPU allocation.

Keep deployment packages under 3 MB to minimize code download time.

Use Lambda layers for shared dependencies to reduce package size.

Easy to Mix Up

These come up on the exam all the time. Here's how to tell them apart.

Provisioned Concurrency

Eliminates cold starts for provisioned contexts.

Costs money even when contexts are idle.

Can be auto-scaled with Application Auto Scaling.

Supports all runtimes.

Requires careful capacity planning.

Scheduled Warm-Up Events

Does not guarantee warm contexts due to idle timeout variability.

Incurs invocation costs and may waste resources.

Adds complexity to manage scheduled events.

May cause cold starts if the schedule is not frequent enough.

Simple to implement with CloudWatch Events.

Watch Out for These

Mistake

Cold starts only happen when the function hasn't been invoked for 15 minutes.

Correct

Cold starts also happen on the first invocation after a deployment, when scaling up due to increased concurrency (even if the function was active), and when the Execution Context is recycled for reasons other than idle timeout (e.g., worker failures, updates). The 15-minute figure is an observed average, not a guarantee.

Mistake

Reserved Concurrency prevents cold starts.

Correct

Reserved Concurrency only sets a limit on the maximum number of concurrent executions. It does not keep Execution Contexts warm. To prevent cold starts, you need Provisioned Concurrency.

Mistake

Increasing the function timeout reduces cold starts.

Correct

Timeout is the maximum execution duration for the handler. It has no effect on how long the Execution Context is kept idle. The idle timeout is managed by the Lambda service and is not configurable.

Mistake

Lambda always reuses Execution Contexts for the same function version.

Correct

Lambda reuses contexts only if they are still idle and not busy. If all contexts are busy, new invocations will trigger cold starts. Also, contexts can be recycled at any time due to underlying infrastructure changes.

Mistake

SnapStart eliminates cold starts entirely.

Correct

SnapStart reduces the initialization time from seconds to milliseconds by restoring from a snapshot, but it does not eliminate the cold start altogether. There is still a small overhead for restoring the snapshot and executing the handler. Also, SnapStart is only available for Java 11+ and has compatibility limitations.

Do You Actually Know This?

Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.

Frequently Asked Questions

What is the difference between cold start and warm start in Lambda?

A cold start occurs when Lambda creates a new Execution Context, requiring code download, runtime initialization, and init code execution. A warm start reuses an existing context, skipping those steps. Cold starts add latency (200 ms to several seconds), while warm starts are typically under 10 ms. You can identify a cold start in CloudWatch logs by the presence of 'Init Duration' in the REPORT line.

How long does Lambda keep an Execution Context warm?

AWS does not document the exact idle timeout, but it is generally observed to be between 5 and 15 minutes. The timeout may vary based on memory, runtime, and overall service load. Activity (invocations) resets the timer. There is no way to configure this timeout.

Does Provisioned Concurrency guarantee no cold starts?

Yes, for the number of contexts you provision. If all provisioned contexts are busy, new invocations will trigger cold starts. To handle bursts, you can use Application Auto Scaling to adjust the provisioned concurrency count based on utilization. Provisioned Concurrency does not apply to Lambda@Edge.

What is Lambda SnapStart and how does it reduce cold starts?

SnapStart is a feature for Java 11 and later that takes a snapshot of the initialized Execution Context (after init code runs) and restores it on cold start. This reduces initialization time from seconds to under 200 ms. However, it is not compatible with all libraries (e.g., those that generate unique state at startup). You enable it per function version.

How can I monitor cold starts in Lambda?

You can monitor cold starts by parsing CloudWatch Logs for 'Init Duration' > 0. There is no built-in CloudWatch metric for cold starts, but you can emit a custom metric from your code. AWS X-Ray also shows the initialization phase in the trace timeline. Use CloudWatch Logs Insights to query cold start frequency.

Does increasing Lambda memory reduce cold start time?

Yes, because memory is proportional to CPU allocation. More CPU speeds up code download, runtime initialization, and init code execution. However, the relationship is not linear. For example, doubling memory from 128 MB to 256 MB may reduce init duration by about 30–50%. The cost also increases proportionally.

What is the impact of VPC on Lambda cold starts?

When a Lambda function is attached to a VPC, the service must create an Elastic Network Interface (ENI) in the VPC subnet during initialization. This adds 5–10 seconds to cold starts. To mitigate, use VPC endpoints for AWS services (e.g., DynamoDB, S3) to avoid ENI attachment, or use RDS Proxy for database connections.

Terms Worth Knowing

Ready to put this to the test?

You've just covered Lambda Cold Start vs Warm Start Optimisation — now see how well it sticks with free DVA-C02 practice questions. Full explanations included, no account needed.

Done with this chapter?