This chapter covers EC2 Spot Instances and interruption management, a critical topic for the Cost domain of the SOA-C02 exam. You will learn how Spot Instances work, how to request them, how to handle interruptions, and how to design fault-tolerant workloads that leverage Spot capacity. Approximately 10-15% of exam questions touch on Spot Instances, bidding, interruption behaviors, and best practices. Mastering this topic is essential for reducing AWS costs and passing the exam.
Jump to a section
Imagine an airline that sells the same seat at different prices. Full-price tickets (On-Demand Instances) can be bought anytime, are non-cancellable by the airline, and cost the most. Discounted standby tickets (Spot Instances) are sold at a deep discount but come with a catch: if the airline needs the seat for a full-fare passenger, you can be bumped off with just a 2-minute warning. You are not allowed to check luggage (data persistence is limited). To make the most of standby, you split your trip into many short legs (fault-tolerant workloads). You also set a maximum price you are willing to pay — if the current standby price goes above that, you voluntarily give up your seat. The airline announces a bidding system: the current spot price fluctuates based on supply and demand. When you are bumped, you get two minutes to gather your carry-on (save state) before being escorted off. This mirrors how EC2 Spot Instances work: you bid for spare EC2 capacity at a discount, but AWS can reclaim the instance with a 2-minute interruption notice when capacity is needed for On-Demand or Reserved Instances. You can configure a maximum price (or use the default of the On-Demand price) and your instance runs as long as the Spot price stays below your bid. When interrupted, you receive a notification via instance metadata and can take actions like checkpointing or migrating the workload.
What Are Spot Instances?
EC2 Spot Instances are spare compute capacity in the AWS cloud offered at a significant discount — typically 60-90% off On-Demand prices. AWS sells this unused capacity through a bidding mechanism. You specify the maximum price you are willing to pay per instance-hour (your bid), and your instance runs as long as the current Spot price is below your bid and capacity is available. When the Spot price exceeds your bid or capacity is reclaimed, AWS terminates or stops your instance with a 2-minute warning.
Spot Instances are ideal for fault-tolerant, flexible, and stateless workloads: big data analytics, containerized applications, CI/CD pipelines, web servers (with auto-scaling), image and media rendering, high-performance computing (HPC), and test/development environments. They are NOT suitable for stateful, time-sensitive, or single-instance workloads like databases, long-running batch jobs that cannot checkpoint, or applications that require constant availability.
How Spot Instances Work Internally
When you launch a Spot Instance, you do not get dedicated hardware. Instead, AWS manages a pool of spare EC2 capacity across Availability Zones and instance types. The Spot price is determined by supply and demand for that capacity pool. AWS updates the Spot price every hour (or more frequently) based on current utilization. The price can fluctuate, but in practice, it is relatively stable except during spikes.
Your request can be in one of three forms: a one-time request (for a specific number of instances), a persistent request (maintains the desired count, re-launching instances if they are terminated), or a Spot Fleet request (a collection of instance types and pools). For the SOA-C02 exam, focus on persistent requests and Spot Fleets.
When your instance is marked for interruption, AWS sends a notification via two mechanisms:
- Instance Metadata: The events/maintenance/scheduled endpoint in the instance metadata (http://169.254.169.254/latest/meta-data/events/maintenance/scheduled) shows an event with code instance-rebalance or instance-stop or instance-terminate. The action field includes stop, terminate, or hibernate.
- CloudWatch Events: A EC2 Spot Instance Interruption Warning event is emitted approximately 2 minutes before the action.
You can also enable Spot Instance rebalance recommendation (newer feature) which gives a 2-minute notice before interruption, allowing you to proactively rebalance your workload.
Bidding and Pricing
Spot price: The current price per instance-hour for a specific instance type in an Availability Zone. It changes gradually.
Maximum price (bid): The maximum you are willing to pay. If the Spot price exceeds your bid, your instance is reclaimed. You can set a bid up to the On-Demand price (the default). Setting a higher bid does not give you priority; AWS uses a market-clearing price, so you usually pay the current Spot price, not your bid.
Persistent request: If your instance is terminated due to price or capacity, the request automatically relaunches a new instance as long as the request is still active and the maximum price is above the current Spot price.
Spot Fleet: A collection of Spot Instances (and optionally On-Demand) that optimizes for cost or capacity. You define allocation strategies: lowestPrice, diversified, capacityOptimized, or capacityOptimizedPrioritized.
Interruption Behaviors
When AWS needs to reclaim capacity, the interruption behavior can be:
- Terminate: The instance is stopped (if configured) or terminated. By default, the instance is terminated. Root EBS volumes are deleted (unless DeleteOnTermination is set to false).
- Stop: The instance is stopped (not terminated) — only available for instances backed by Amazon EBS. When stopped, you can start it later if capacity and price are favorable.
- Hibernate: The instance is hibernated — RAM contents are saved to the EBS root volume, and on start, the instance resumes with the same state. This requires an EBS-backed instance with hibernation enabled at launch.
You can set the interruption behavior when launching a Spot Instance. For persistent requests, you typically use stop or hibernate to preserve state and reduce relaunch times.
Spot Instance Request States
A Spot Instance request goes through these states:
- open: The request is active and waiting for fulfillment.
- active: The request is fulfilled and the instance is running.
- closed: The request is closed (you canceled it, or the instance was terminated and the request is not persistent).
- cancelled: You canceled the request.
- disabled: The request is disabled (e.g., due to an invalid parameter).
For persistent requests, if the instance is interrupted, the request goes back to open and AWS tries to relaunch.
Spot Fleet Allocation Strategies
lowestPrice: Launches instances from the pool with the lowest Spot price. This can lead to all instances in one pool, increasing interruption risk if that pool is reclaimed.
diversified: Distributes instances across multiple pools (different instance types and AZs). Reduces the impact of a single pool being reclaimed.
capacityOptimized: Launches instances from the pool with the optimal capacity for the number of instances requested. Useful for large workloads that need to start quickly.
capacityOptimizedPrioritized: Similar to capacityOptimized but you can set priorities for instance types.
Interaction with Auto Scaling Groups
You can use Spot Instances with Auto Scaling Groups (ASGs). The ASG can launch Spot Instances by specifying a mixed instances policy or by using a launch template with Spot options. When Spot Instances are interrupted, the ASG can automatically replace them if the desired capacity is not met. This is a common pattern for cost-optimized, fault-tolerant architectures.
Best Practices for Interruption Management
Use Spot Instance rebalance recommendation to get early warning (2 minutes) and proactively move workloads.
Design for fault tolerance: Use multiple instance types and AZs, implement checkpointing, and use distributed systems (e.g., Spark, Hadoop).
Set interruption behavior to stop or hibernate for workloads that can pause and resume.
Use CloudWatch Events to automate responses: For example, when an interruption warning is received, save state to S3, drain connections, and deregister from a load balancer.
Combine with On-Demand or Reserved Instances for baseline capacity, and use Spot for burstable or flexible workloads.
Monitoring and Notifications
CloudWatch Metrics: SpotInstanceCount, SpotRequestCount, etc.
CloudWatch Events: EC2 Spot Instance Interruption Warning — includes instance ID, action (stop/terminate/hibernate), and time.
AWS Health Dashboard: Personal Health Events for Spot capacity reclamation.
Instance Metadata: As mentioned, check the events/maintenance/scheduled endpoint.
Commands and Configuration
To launch a Spot Instance via AWS CLI:
aws ec2 request-spot-instances \
--spot-price "0.05" \
--instance-count 1 \
--type "persistent" \
--launch-specification file://spec.jsonExample spec.json:
{
"ImageId": "ami-0abcdef1234567890",
"InstanceType": "t3.medium",
"KeyName": "my-key",
"SecurityGroups": ["sg-12345678"],
"BlockDeviceMappings": [
{
"DeviceName": "/dev/xvda",
"Ebs": {
"VolumeSize": 20,
"DeleteOnTermination": true
}
}
]
}To check Spot price history:
aws ec2 describe-spot-price-history \
--instance-types t3.medium \
--product-description "Linux/UNIX" \
--start-time 2025-01-01T00:00:00ZKey Values and Defaults
Default maximum price: The On-Demand price for the instance type.
Interruption notice time: 2 minutes before the action (stop/terminate/hibernate).
Spot price update frequency: At least once per hour (more frequent during spikes).
Persistent request: By default, if you specify type=persistent, the request remains open until you cancel it or it expires (if you set a valid-until).
Spot Fleet: Can include up to 50 instance types per fleet.
Edge Cases and Exam Traps
Spot Instances are not guaranteed: Even if you set a high bid, AWS can still reclaim capacity if needed. The bid only protects you from price increases, not from capacity reclamation.
Persistent requests do not automatically restart stopped instances: If you stop a Spot Instance (not due to interruption), the request does not relaunch it. Only interruption or termination due to price/capacity triggers relaunch.
Spot price is per instance-hour: Partial hours are billed per second (as of 2023) but the price is per hour. The first hour of a Spot Instance is billed at the Spot price at launch, and each subsequent hour is billed at the current Spot price at the start of that hour.
Spot Instances can be terminated by you: If you terminate a Spot Instance, the persistent request will launch a new one if the request is still active.
You cannot convert an existing On-Demand instance to Spot: You must launch a new Spot Instance.
Summary of Exam-Relevant Points
Spot Instances are for fault-tolerant workloads only.
Interruption notice is 2 minutes via metadata and CloudWatch Events.
Bidding: set max price; you pay the current Spot price (not your bid).
Persistent requests automatically relaunch interrupted instances.
Spot Fleet allocation strategies: lowestPrice, diversified, capacityOptimized, capacityOptimizedPrioritized.
Interruption behavior: terminate (default), stop, hibernate.
Use rebalance recommendation for early warning.
Combine with ASGs for automatic replacement.
Do NOT use for stateful or critical single-instance workloads.
Create a Spot Instance Request
You start by creating a Spot Instance request via the AWS Management Console, CLI, or SDK. You specify the AMI, instance type, network settings, key pair, and optionally a maximum price (bid). If you omit the bid, AWS uses the On-Demand price as the max. You also choose the request type: one-time or persistent. For persistent, AWS will automatically re-launch instances if they are interrupted due to price or capacity. You also set the interruption behavior (terminate, stop, hibernate). The request enters the 'open' state.
AWS Evaluates Capacity and Price
AWS checks the Spot capacity pool for the specified instance type and Availability Zone. It compares your maximum price to the current Spot price. If your max price is >= Spot price and capacity is available, the request is fulfilled. The Spot price is determined by a market-clearing mechanism: it is the price at which supply meets demand. You pay the current Spot price per instance-hour, not your bid. The instance launches and the request moves to 'active' state.
Instance Runs and Monitors Interruption
The instance runs normally. You can use it for any workload, but you should design for interruptions. The instance can be monitored via CloudWatch. The Spot price fluctuates. AWS may decide to reclaim capacity if On-Demand demand increases or if the Spot price rises above your bid. When a reclamation is imminent, AWS sends a 2-minute interruption notice. The notice can be received via instance metadata (http://169.254.169.254/latest/meta-data/events/maintenance/scheduled) or CloudWatch Events.
Receive Interruption Warning and Act
Approximately 2 minutes before the interruption, a 'EC2 Spot Instance Interruption Warning' event is emitted. Your application should have logic to handle this: save state to S3, gracefully shut down, deregister from ELB, etc. You can also use the rebalance recommendation feature to get an even earlier signal. The instance metadata shows the action (stop, terminate, hibernate) and a time. You can automate responses using Lambda functions triggered by CloudWatch Events.
Interruption Action Executes
After the 2-minute warning, AWS performs the interruption action: terminate, stop, or hibernate. If terminate, the instance is terminated and root EBS volume is deleted (unless configured otherwise). If stop, the instance is stopped and EBS volumes persist. If hibernate, RAM is saved to root EBS volume and instance is stopped. For persistent requests, the request goes back to 'open' state and AWS attempts to re-launch if the Spot price is still below your max price and capacity is available.
Enterprise Scenario 1: Big Data Analytics Pipeline
A large e-commerce company runs daily Spark jobs to process terabytes of clickstream data. The workload is fault-tolerant: if a Spark executor node fails, the driver redistributes tasks. They use a Spot Fleet with a diversified allocation strategy across 5 instance types (r5.large, r5.xlarge, m5.large, m5.xlarge, c5.2xlarge) in three AZs. They set a max price at 80% of On-Demand. The Spot Fleet launches 20 Spot Instances for the compute layer. They also have 2 On-Demand instances as driver nodes. They use CloudWatch Events to trigger a Lambda that logs interruption events and updates a dashboard. During a capacity reclamation event, only 2-3 instances are interrupted. The Spark job automatically retries failed tasks. The company saves 65% on compute costs compared to using all On-Demand. Misconfiguration: initially they used a single instance type and AZ, causing 50% of instances to be interrupted simultaneously, leading to job failures. They switched to diversified strategy.
Enterprise Scenario 2: CI/CD Build Fleet
A SaaS company uses Jenkins for continuous integration. Build agents are stateless — they pull code, run tests, and publish artifacts. They use an Auto Scaling Group with a mixed instances policy: 30% On-Demand for baseline, 70% Spot for burst. The Spot Instances use interruption behavior 'stop' to preserve workspace state. When a Spot Instance receives an interruption warning, a lifecycle hook triggers a Lambda that runs 'docker save' on running containers and uploads to S3, then the instance is stopped. After the interruption, the ASG sees the instance as stopped and does not replace it; but if the instance is terminated, the ASG launches a new one. This setup reduces build costs by 70%. Common mistake: setting interruption behavior to 'terminate' and losing workspace state, causing rebuilds.
Enterprise Scenario 3: Media Rendering Farm
A visual effects studio uses Deadline render farm software. They submit thousands of render tasks to a queue. Workers are Spot Instances with a 'hibernate' interruption behavior. When a worker gets an interruption warning, the renderer checkpoints the frame, and the instance hibernates. Later, when capacity returns and price is low, the persistent request launches a new instance (or the same instance is started if stopped). However, hibernate requires an EBS root volume large enough to hold RAM. They use 'capacityOptimized' Spot Fleet to launch instances quickly. They found that using 'lowestPrice' caused frequent interruptions because the cheapest pool was often reclaimed. They switched to 'capacityOptimized' and reduced interruptions by 40%.
The SOA-C02 exam tests Spot Instances in Domain 6 (Cost), Objective 6.2: 'Implement cost optimization strategies for compute resources.' Expect 3-5 questions on Spot Instances, bidding, interruption handling, and Spot Fleet. Key points:
Interruption notice time: The exam explicitly tests that the warning is 2 minutes. Wrong answer: 5 minutes or 1 minute. Remember: 2 minutes.
Bidding and pricing: You pay the current Spot price, not your bid. Common wrong answer: 'You pay your bid price.' The exam will show a scenario where a user sets a bid of $0.10, the Spot price is $0.05, and asks what they pay. Correct: $0.05.
Persistent requests: If a Spot Instance is interrupted, a persistent request automatically re-launches. Wrong answer: 'The request must be manually re-submitted.' Also, if you terminate the instance yourself, the persistent request does NOT re-launch — this is a trap.
Spot Fleet allocation strategies: The exam asks which strategy to use for minimizing interruptions. Correct: 'diversified' (spreads across pools). Wrong: 'lowestPrice' (concentrates risk). Also, 'capacityOptimized' is for large, quick launches, not for interruption mitigation.
Interruption behavior: You can set to 'stop' or 'hibernate' to preserve data. The exam may ask which behavior allows the instance to resume with the same state. Answer: 'hibernate' (RAM saved). 'stop' preserves EBS but not RAM.
Rebalance recommendation: This is a newer feature that provides an earlier notification (2 minutes) but is not the same as the interruption warning. The exam may test that both exist.
Spot Instance request states: Know that 'open' means waiting, 'active' means running, 'closed' means done. The exam might ask what state a persistent request goes to after interruption: 'open'.
Use cases: The exam will list workloads and ask which is suitable for Spot. Correct: batch processing, data analytics, CI/CD. Wrong: databases (RDS), stateful web servers, critical production services.
Edge case: If you set a bid lower than the current Spot price, the request remains 'open' until the price drops. The exam may ask why an instance is not launching.
Terminology: 'Spot price' vs 'bid price'. The exam uses 'maximum price' or 'bid price' interchangeably. Know that the default max price is the On-Demand price.
To eliminate wrong answers: focus on the mechanism. If a scenario involves a stateful application, eliminate Spot. If it mentions constant availability, eliminate Spot. If it asks about cost savings, Spot is likely the answer. Always check if the workload is fault-tolerant.
Spot Instances offer 60-90% discount but can be interrupted with a 2-minute warning.
You pay the current Spot price, not your bid price.
Persistent requests automatically re-launch instances interrupted due to price/capacity.
Spot Fleet allocation strategies: lowestPrice (cost), diversified (resilience), capacityOptimized (fast launch).
Interruption behavior can be terminate, stop, or hibernate.
Use rebalance recommendation for early warning.
Combine Spot with Auto Scaling Groups for automatic replacement.
Not suitable for stateful or critical single-instance workloads.
These come up on the exam all the time. Here's how to tell them apart.
Spot Instances
Up to 90% discount
Can be interrupted with 2-minute notice
Best for fault-tolerant workloads
Bidding and market pricing
No guarantee of capacity availability
On-Demand Instances
Full price
No interruption (unless you terminate)
Suitable for any workload
Fixed pricing
Capacity guaranteed within limits
Mistake
You pay your bid price for Spot Instances.
Correct
You pay the current Spot price, which is usually lower than your bid. The bid is only a maximum; you are charged the market price.
Mistake
Spot Instances are guaranteed to run until you terminate them.
Correct
AWS can reclaim Spot Instances with a 2-minute notice if capacity is needed. They are not guaranteed; you must design for interruptions.
Mistake
Persistent requests automatically relaunch instances after any termination.
Correct
Persistent requests only relaunch after interruptions due to price/capacity changes. If you manually terminate an instance, the request does not relaunch it.
Mistake
Setting a higher bid gives you priority over other users.
Correct
AWS uses a market-clearing price; all users pay the same Spot price regardless of bid. A higher bid only protects you from being interrupted if the price rises, but does not give priority.
Mistake
Spot Instances can be used for any workload.
Correct
Spot Instances are only suitable for fault-tolerant, stateless, or flexible workloads. Stateful, time-sensitive, or critical workloads should use On-Demand or Reserved Instances.
Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.
A Spot Instance request is a single request for a specific number of instances of a single instance type. A Spot Fleet is a collection of Spot Instances (and optionally On-Demand) across multiple instance types and Availability Zones, managed as a group. Spot Fleet offers allocation strategies and can optimize for cost or capacity. For the exam, Spot Fleet is more flexible and recommended for large-scale, fault-tolerant workloads.
Design your application to be fault-tolerant. Use the 2-minute interruption warning to save state to S3, drain connections, and deregister from load balancers. Automate responses using CloudWatch Events and Lambda. Set interruption behavior to 'stop' or 'hibernate' to preserve data. Use multiple instance types and AZs to reduce the impact of a single pool being reclaimed.
No. You cannot convert an On-Demand instance to a Spot Instance. You must launch a new Spot Instance. However, you can create an AMI from your On-Demand instance and use that AMI to launch a Spot Instance.
By default, the root EBS volume is deleted on termination. If you set DeleteOnTermination to false, the volume persists but is detached. For stop or hibernate, the EBS volume persists. To preserve data, use stop or hibernate, or store data on separate EBS volumes with DeleteOnTermination set to false.
It is a feature that provides an early signal (2 minutes) that a Spot Instance is at risk of interruption due to an impending rebalance of capacity. It is separate from the interruption warning and allows you to proactively move workloads before the interruption occurs.
Yes. Spot Instances support VPC, subnets, security groups, and all networking features. You launch them just like On-Demand instances, specifying the VPC and subnet.
If you do not specify a maximum price, AWS uses the On-Demand price as the maximum. This means you will never pay more than the On-Demand price, but you may still be interrupted if the Spot price exceeds On-Demand (unlikely but possible).
You've just covered Spot Instances and Interruption Management — now see how well it sticks with free SOA-C02 practice questions. Full explanations included, no account needed.
Done with this chapter?