SAA-C03Chapter 166 of 189Objective 4.1

Spot Instance Interruption Notices and Handling

This chapter covers the critical concept of Spot Instance Interruption Notices and how to handle them effectively in AWS. Understanding this topic is essential for the SAA-C03 exam because it directly tests your ability to design cost-optimized, resilient architectures using Spot Instances. Approximately 5-10% of exam questions involve Spot Instances or related cost optimization strategies, and interruption handling is a key sub-topic. You will learn the mechanics of the 2-minute notice, how to detect it, and best practices for building fault-tolerant workloads.

25 min read
Intermediate
Updated May 31, 2026

Eviction Notice for a Leased Apartment

Imagine you rent an apartment in a building where the landlord can give you a 2-minute warning before you must vacate. You receive a notice saying, "You have 2 minutes to leave." During those 2 minutes, you can grab your essentials and move to another apartment in the same building. But if you have a lot of furniture, you might not finish in time. The landlord uses this system because they occasionally need your apartment for a higher-paying tenant or building maintenance. As a tenant, you knew this risk when you signed a low-rent, flexible lease. To handle this, you might keep a packed bag ready and have a backup apartment prearranged. In AWS, a Spot Instance is like that apartment: you get a 2-minute warning (the interruption notice) before the instance is terminated or stopped. You should design your application to checkpoint progress and gracefully handle the interruption, perhaps by using a Spot Fleet with multiple instance pools so that if one instance is reclaimed, another can take over quickly.

How It Actually Works

What is a Spot Instance Interruption Notice?

A Spot Instance is an unused EC2 instance that AWS offers at a discounted price (up to 90% off On-Demand) but with the caveat that AWS can reclaim it with a 2-minute warning when it needs the capacity back. The interruption notice is a signal that the instance will be terminated, stopped, or hibernated within 2 minutes. This notice is delivered via two mechanisms: a termination notice on the instance metadata and an Amazon EventBridge event. The exam expects you to know the notice duration (2 minutes), the metadata endpoint, and the EventBridge event structure.

Why Does AWS Interrupt Spot Instances?

AWS interrupts Spot Instances primarily for capacity reasons. When demand for On-Demand instances or Reserved Instances increases, AWS may need to reclaim Spot capacity. The interruption can also occur if the Spot price exceeds your maximum bid (for instances with a bid price). However, for most modern Spot instances using the default capacity-optimized allocation strategy, the primary cause is capacity reclamation. The exam may ask about scenarios where interruptions are more likely, such as during AWS re:Invent or other high-demand periods.

The 2-Minute Warning Mechanism

When AWS decides to reclaim a Spot Instance, it sends a 2-minute warning. This warning is accessible via the instance's metadata at:

http://169.254.169.254/latest/meta-data/spot/termination-time

This endpoint returns a JSON document with the termination time in ISO 8601 format (e.g., "time": "2025-05-15T12:00:00Z"). The instance can poll this endpoint, and the value appears approximately 2 minutes before the actual termination. Additionally, an EventBridge event with the source aws.ec2 and detail type EC2 Spot Instance Interruption Warning is emitted. The event contains the instance ID and the action (terminate, stop, or hibernate).

Key Components and Defaults

Metadata endpoint: http://169.254.169.254/latest/meta-data/spot/termination-time

Notice duration: 2 minutes (120 seconds)

EventBridge event: EC2 Spot Instance Interruption Warning

Instance actions: Terminate (default), Stop, or Hibernate (if enabled)

Spot price: Not directly tied to interruption for capacity-optimized pools; the exam still tests the old model where price matters.

How to Detect the Interruption Notice

There are three primary methods: 1. Poll the metadata endpoint: Your application can poll the termination-time endpoint every few seconds. If it returns a non-empty value, the instance is about to be interrupted. 2. Listen for EventBridge events: Use Amazon CloudWatch Events or EventBridge to capture the interruption warning and trigger a Lambda function to perform cleanup or checkpointing. 3. Use the AWS Health Dashboard: Personal Health Events can also indicate impending interruptions, but this is less granular.

Handling the Interruption

Your application must be designed to handle the 2-minute notice gracefully. Best practices include: - Checkpointing: Save application state to durable storage (e.g., Amazon S3, EBS snapshots) upon receiving the notice. - Graceful shutdown: Catch the signal and begin a controlled shutdown process. - Use Spot Fleet or EC2 Fleet: Distribute instances across multiple instance types and Availability Zones to reduce the impact of a single interruption. - Use a capacity-optimized allocation strategy: This minimizes the chance of interruption by selecting pools with the most available capacity. - Enable hibernation: If the workload supports it, enable hibernation for Spot Instances so that the instance can be resumed later with the same state. Note that hibernation is only supported for certain instance types and Amazon Linux 2/Ubuntu.

Interaction with Related Technologies

Auto Scaling groups: When a Spot Instance is interrupted, the Auto Scaling group can automatically launch a replacement instance (On-Demand or Spot) if configured with a mixed instances policy.

AWS Batch: Batch jobs running on Spot Instances can be retried on interruption.

Amazon EMR: EMR clusters use Spot Instances for task nodes and can handle interruptions gracefully by marking the node as lost and redistributing work.

Amazon ECS/EKS: Containerized workloads can use Spot Instances as capacity providers; when an instance is interrupted, ECS/EKS reschedules containers to other instances.

Configuring Spot Instances for Interruption Handling

When launching a Spot Instance, you can specify the interruption behavior: - Terminate: The default. The instance is terminated after 2 minutes. - Stop: The instance is stopped (EBS volumes remain attached). You can restart it later. - Hibernate: The instance is hibernated, preserving in-memory state to the EBS root volume. Supported only for certain instance types.

You can set this via the AWS CLI:

aws ec2 request-spot-instances --instance-interruption-behavior stop

Or via the console when launching an instance.

Exam-Relevant Details

The exam will test your knowledge of the 2-minute notice, the metadata endpoint, and the EventBridge event. You should know that you cannot extend the 2-minute window. Also, note that the interruption notice is a best-effort signal; in rare cases, it may not be delivered. Therefore, your architecture should also be resilient to abrupt termination. The exam may present a scenario where a workload is running on Spot Instances and you need to minimize disruption. The correct answer often involves using a Spot Fleet with multiple instance types and Availability Zones, or using a mixed instances policy in an Auto Scaling group.

Common Pitfalls

Assuming Spot Instances are always cheaper: They are cheaper but can be interrupted. The exam expects you to balance cost and resilience.

Ignoring the 2-minute window: Some candidates think the notice is 5 minutes or 1 minute. The correct value is 2 minutes.

Using only one instance type: This increases the risk of simultaneous interruptions. Always diversify.

Not using the metadata endpoint: Relying solely on EventBridge may introduce latency; polling metadata is more immediate.

Conclusion

Spot Instance Interruption Notices are a fundamental part of using Spot Instances cost-effectively. The 2-minute warning gives you just enough time to save state and shut down gracefully. For the SAA-C03 exam, remember the exact duration, the metadata endpoint, and the EventBridge event. Understand how to design fault-tolerant systems using Spot Fleet, Auto Scaling groups, and checkpointing. Mastering these concepts will help you answer cost optimization questions correctly.

Walk-Through

1

AWS decides to reclaim capacity

AWS monitors capacity utilization across its data centers. When demand for On-Demand or Reserved Instances increases in a particular Availability Zone, AWS may decide to reclaim Spot Instances. The decision is based on internal capacity management algorithms. The exact trigger is not documented, but it typically occurs during peak usage periods. At this point, AWS selects one or more Spot Instances to interrupt.

2

AWS sends the 2-minute interruption notice

Approximately 2 minutes before the actual interruption, AWS updates the instance metadata endpoint at `http://169.254.169.254/latest/meta-data/spot/termination-time` with the termination timestamp. Simultaneously, an EventBridge event with the source `aws.ec2` and detail type `EC2 Spot Instance Interruption Warning` is published. The event includes fields like `instance-id`, `action` (terminate/stop/hibernate), and `time`. The notice is delivered to the instance's metadata service, which is a local link-local address accessible only from within the instance.

3

Application detects the notice

A well-designed application polls the metadata endpoint every few seconds (e.g., every 5 seconds). When the endpoint returns a non-empty JSON object, the application knows it has 2 minutes until interruption. Alternatively, a Lambda function can be triggered by the EventBridge event to send a signal to the instance, but this adds latency. The polling method is more direct and recommended for time-sensitive operations.

4

Application performs graceful shutdown

Upon receiving the notice, the application should initiate a checkpointing process. This typically involves saving intermediate state to a durable store like Amazon S3 or an EBS snapshot. For example, a video transcoding job might save the last processed frame. The application should then flush buffers, close connections, and begin a controlled shutdown. If the instance is set to stop or hibernate, the operating system's shutdown sequence is initiated.

5

Instance is interrupted

After exactly 2 minutes (or slightly less if the notice is delivered late), AWS performs the configured interruption action: terminate, stop, or hibernate. If terminate, the instance is immediately destroyed and any attached EBS volumes are deleted unless the `DeleteOnTermination` flag is set to false. If stop, the instance enters the stopped state and EBS volumes remain attached. If hibernate, the instance is hibernated and can be restarted later. The instance is no longer billed after termination or stop, but stopped instances incur charges for EBS storage.

6

Replacement instance is launched

If the Spot Instance was part of an Auto Scaling group or Spot Fleet, the group automatically launches a replacement instance based on its launch configuration. For Auto Scaling groups with a mixed instances policy, it may launch an On-Demand instance if Spot capacity is unavailable. Spot Fleet can also be configured to maintain target capacity by launching instances from other pools. The new instance will have a different ID and may be in a different Availability Zone.

What This Looks Like on the Job

Enterprise Scenario 1: Big Data Processing with Amazon EMR

A large e-commerce company uses Amazon EMR to run nightly data processing jobs that analyze customer behavior and generate recommendations. To reduce costs, they run their EMR clusters primarily on Spot Instances for task nodes, with On-Demand instances for the master node. They have experienced interruptions during high-traffic periods like Black Friday, causing job failures and delays. To handle this, they implemented a strategy where each task in the job is designed to be idempotent and checkpoint intermediate results to Amazon S3. They also use the EMR feature that automatically marks a node as lost when it receives a Spot interruption notice, and redistributes its tasks to other nodes. The cluster is configured with a mix of instance types (e.g., m5.xlarge, r5.xlarge) across multiple Availability Zones. During a recent interruption event, only 10% of task nodes were reclaimed, and the job completed only 15 minutes later than usual. The cost savings compared to using all On-Demand instances is approximately 70%.

Enterprise Scenario 2: Containerized Microservices on Amazon EKS

A financial services startup runs its microservices on Amazon EKS using Fargate and Spot Instances as capacity providers. They have a critical service that processes real-time stock trades. To ensure high availability, they deploy the service across multiple Availability Zones and use a Spot Fleet with a capacity-optimized allocation strategy. They also configured a Kubernetes PriorityClass so that the trade processing pods have higher priority than batch analytics pods. When a Spot Instance receives an interruption notice, the kubelet on that node drains the pods gracefully, and the Kubernetes scheduler reschedules them to other nodes. The startup uses a custom controller that watches the EC2 metadata endpoint and updates a Kubernetes annotation to trigger pre-emption of lower-priority pods. This ensures that the trade processing pods are never disrupted. The architecture handles up to 5% of nodes being interrupted at any given time without impact on the critical service.

Common Misconfigurations

Not using multiple instance types: A common mistake is to use only one instance type (e.g., c5.large) in a Spot Fleet. If that particular pool experiences high demand, all instances may be interrupted simultaneously. The correct approach is to diversify across at least 3-4 instance types.

Ignoring the interruption notice: Some developers do not implement any interrupt handling, assuming the instance will run indefinitely. This leads to data loss and job failures when interruptions occur.

Setting too low a bid price: With the old pricing model, setting a low bid price increases the likelihood of interruption. Although the new model uses capacity-optimized allocation, the exam still tests the concept of bid price. In production, use the default capacity-optimized strategy.

Using only one Availability Zone: If all Spot Instances are in a single AZ, an AZ-level capacity shortage can take down the entire fleet. Always spread across multiple AZs.

How SAA-C03 Actually Tests This

What SAA-C03 Tests on Spot Instance Interruption Notices

The SAA-C03 exam covers this topic under Objective 4.1 (Cost Optimized). Specifically, you need to know:

The duration of the interruption notice: 2 minutes.

The metadata endpoint: http://169.254.169.254/latest/meta-data/spot/termination-time.

The EventBridge event: EC2 Spot Instance Interruption Warning.

The three possible interruption behaviors: terminate, stop, hibernate.

How to design fault-tolerant architectures using Spot Fleet or Auto Scaling groups with mixed instances.

Common Wrong Answers and Why Candidates Choose Them

1.

Wrong: "The interruption notice is 5 minutes." Candidates confuse this with some other AWS timeout (e.g., ELB idle timeout). The correct value is 2 minutes. The exam often tests this exact number.

2.

Wrong: "Use a single instance type in a Spot Fleet to simplify management." This is tempting because it seems easier, but it increases risk. The correct answer is to use multiple instance types and Availability Zones.

3.

Wrong: "Set a higher bid price to prevent interruptions." Under the modern pricing model, bid price is irrelevant for capacity-optimized allocation. The exam may include older questions where bid price matters, but the current best practice is to use the default capacity-optimized strategy.

4.

Wrong: "The interruption notice is sent via SNS." AWS does not send a direct SNS notification for Spot interruptions. The correct mechanisms are metadata and EventBridge.

Specific Numbers and Terms to Memorize

2 minutes: The notice duration.

169.254.169.254: The link-local address for instance metadata.

spot/termination-time: The metadata path.

EC2 Spot Instance Interruption Warning: The EventBridge detail type.

terminate, stop, hibernate: The three interruption behaviors.

Edge Cases and Exceptions

Hibernation limitations: Hibernation is only supported for instances with a root volume that is an EBS volume, and for specific instance families (e.g., C5, M5, R5). Not all instance types support it.

Interruption without notice: In rare cases, the 2-minute notice may not be delivered due to network issues or AWS internal failures. Your architecture should be resilient to abrupt terminations.

Spot Fleet with a target capacity: If a Spot Fleet is set to maintain target capacity, it will launch replacement instances automatically. However, if no Spot capacity is available, it will not launch On-Demand instances unless you configure a mixed instances policy.

How to Eliminate Wrong Answers

If the question mentions "minimize disruption" and the answer suggests using a single instance type, eliminate it.

If the question asks about the notice duration and an option says "5 minutes", eliminate it.

If the answer suggests using SNS to receive the notice, eliminate it.

If the answer suggests that setting a higher bid price guarantees no interruption, eliminate it (unless the question explicitly mentions the old pricing model).

Key Takeaways

Spot Instance interruption notice duration is exactly 2 minutes.

The notice is available at the metadata endpoint: http://169.254.169.254/latest/meta-data/spot/termination-time.

EventBridge event detail type is 'EC2 Spot Instance Interruption Warning'.

Three interruption behaviors: terminate (default), stop, hibernate.

Use multiple instance types and Availability Zones to reduce interruption impact.

Design applications to checkpoint state upon receiving the notice.

Auto Scaling groups and Spot Fleet can automatically replace interrupted instances.

Hibernation is only supported for certain instance types and OSes.

The 2-minute notice is a best-effort signal; architectures must handle abrupt terminations.

Capacity-optimized allocation strategy minimizes interruption frequency.

Easy to Mix Up

These come up on the exam all the time. Here's how to tell them apart.

Spot Instances

Up to 90% cheaper than On-Demand.

Can be interrupted with a 2-minute notice.

Best for fault-tolerant, flexible workloads.

No upfront commitment required.

Ideal for batch processing, big data, CI/CD.

On-Demand Instances

Fixed pricing, no interruption risk.

Pay per second with no discount.

Suitable for critical, stateful applications.

No risk of termination.

Used for production databases, web servers.

Terminate (default)

Instance is deleted after 2 minutes.

EBS volumes are deleted by default.

No recovery of instance state.

Billing stops immediately.

Common for stateless workloads.

Stop

Instance is stopped, not deleted.

EBS volumes remain attached.

Can be restarted later.

Billing stops for instance, but EBS charges continue.

Useful for stateful workloads with persistent storage.

Polling metadata endpoint

Direct and immediate.

Requires application-level polling.

Works even if EventBridge is down.

Available from within the instance.

No additional AWS service costs.

EventBridge event

Event-driven, no polling needed.

Can trigger Lambda or other actions.

Slight latency (seconds).

Requires EventBridge configuration.

May incur costs for events.

Watch Out for These

Mistake

The interruption notice gives you 5 minutes to react.

Correct

The notice is exactly 2 minutes. The 5-minute figure is a common confusion with other AWS timers like the ELB idle timeout (60 seconds) or Lambda timeout (15 minutes).

Mistake

You can avoid interruptions by setting a very high bid price.

Correct

Under the current capacity-optimized allocation strategy, bid price is not used. Interruptions are based on capacity needs, not price. Even with a high bid, your instances can still be interrupted.

Mistake

The interruption notice is sent via email or SNS by default.

Correct

No, the notice is available via the instance metadata endpoint and EventBridge. You must set up a monitoring system to capture it. AWS does not send proactive notifications.

Mistake

Spot Instances are only suitable for stateless workloads.

Correct

While stateless workloads are easier to handle, stateful workloads can also use Spot Instances by checkpointing state to durable storage upon receiving the notice. Hibernation also helps preserve state.

Mistake

You can extend the 2-minute window by modifying the metadata.

Correct

The 2-minute window is fixed and cannot be changed. Any attempt to modify the metadata endpoint will fail.

Do You Actually Know This?

Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.

Frequently Asked Questions

How long is the Spot Instance interruption notice?

The notice is 2 minutes. You must design your application to complete any checkpointing or graceful shutdown within that window. Poll the metadata endpoint or listen for the EventBridge event to detect the notice.

Where can I find the Spot Instance interruption notice?

You can find it at the instance metadata endpoint: http://169.254.169.254/latest/meta-data/spot/termination-time. It returns a JSON object with the termination time. Additionally, an EventBridge event with detail type 'EC2 Spot Instance Interruption Warning' is emitted.

Can I prevent a Spot Instance from being interrupted?

No, you cannot prevent interruptions. AWS reclaims capacity when needed. However, you can reduce the likelihood by using a capacity-optimized allocation strategy and diversifying instance types and Availability Zones.

What are the different interruption behaviors?

You can configure the instance to be terminated (default), stopped, or hibernated. Terminate deletes the instance. Stop preserves EBS volumes. Hibernate saves in-memory state to the root volume. Hibernation is only supported for certain instance types.

How should I design my application to handle Spot interruptions?

Your application should poll the metadata endpoint for the termination notice. Upon receiving it, checkpoint state to S3 or an EBS snapshot, then gracefully shut down. Use Auto Scaling groups or Spot Fleet to launch replacement instances.

Is the interruption notice guaranteed?

No, the 2-minute notice is delivered on a best-effort basis. In rare cases, your instance may be terminated without notice. Therefore, your architecture should also be resilient to abrupt failures.

Can I use Spot Instances for stateful workloads?

Yes, but you need to handle state carefully. Use checkpointing to durable storage, or enable hibernation to preserve in-memory state. Also, consider using EBS volumes with the 'stop' behavior to persist data.

Terms Worth Knowing

Ready to put this to the test?

You've just covered Spot Instance Interruption Notices and Handling — now see how well it sticks with free SAA-C03 practice questions. Full explanations included, no account needed.

Done with this chapter?