The answer is to configure target tracking scaling on ALB RequestCountPerTarget for the Auto Scaling group. This is correct because the sharp increase in request latency despite low CPU utilization indicates the application is I/O-bound or network-bound, not compute-bound, so CPU-based metrics are irrelevant. By scaling on the average number of requests each target receives, you directly address the root cause—overloaded instances—and ensure new EC2 instances launch precisely when per-instance demand spikes. On the SAA-C03 exam, this scenario tests your ability to choose the right scaling metric for the workload type; a common trap is defaulting to CPU utilization when the real bottleneck is request volume. Remember the key insight: if latency climbs but CPU stays low, think network or I/O, not compute. A useful memory tip is "Requests per Target, not CPU per server"—when the load is on the network layer, scale on the network metric.
SAA-C03 Design High-Performing Architectures Practice Question
This SAA-C03 practice question tests your understanding of design high-performing architectures. The scenario asks you to isolate a root cause — eliminate options that address a different problem before choosing. After answering, compare your reasoning against the explanation and wrong-answer breakdown below. Once you have made your selection, read the full explanation to reinforce the concept and understand why each distractor is designed to mislead on exam day.
Exhibit
CloudWatch metrics for the Auto Scaling group (5-minute period):
- CPUUtilization: 28% average
- NetworkIn: 190 MB/min average, no saturation
- GroupDesiredCapacity: 4
- ALBRequestCountPerTarget: 4,800 during peaks
- TargetResponseTime p95: 2.7 seconds during peaks
ALB access log sample:
2026-04-28T09:02:11Z app/prod-alb 203.0.113.10:443 10.0.1.21:8080 0.000 2.698 0.000 200 200 1843 1920 "GET https://app.example.com/search?q=aws HTTP/1.1"
Based on the exhibit, a web application runs on an Amazon EC2 Auto Scaling group behind an Application Load Balancer. During traffic surges, the average CPU utilization stays below 35%, but request latency increases sharply and the ALB access logs show far more requests per target than expected. Which change is the best way to improve scaling behavior?
Clue words in this question
Noticing these words before you look at the options changes how you read each choice.
Clue: "best"
Why it matters: Signals that multiple options may be partially correct. Choose the option that most directly solves the exact problem described, not the one that sounds most complete.
CloudWatch metrics for the Auto Scaling group (5-minute period):
- CPUUtilization: 28% average
- NetworkIn: 190 MB/min average, no saturation
- GroupDesiredCapacity: 4
- ALBRequestCountPerTarget: 4,800 during peaks
- TargetResponseTime p95: 2.7 seconds during peaks
ALB access log sample:
2026-04-28T09:02:11Z app/prod-alb 203.0.113.10:443 10.0.1.21:8080 0.000 2.698 0.000 200 200 1843 1920 "GET https://app.example.com/search?q=aws HTTP/1.1"
A
Lower the CPU target tracking threshold so the Auto Scaling group launches more instances sooner.
Why wrong: CPU is already low, so using CPU as the scaling signal will not match the bottleneck. The application is saturating on request handling before CPU becomes a useful indicator.
B
Replace the Application Load Balancer with a Network Load Balancer to reduce request latency.
Why wrong: A Network Load Balancer does not solve application-layer capacity pressure on the targets. It also does not provide a better scaling signal for HTTP request volume.
C
Configure target tracking scaling on ALB RequestCountPerTarget for the Auto Scaling group.
RequestCountPerTarget directly reflects how many requests each instance is serving, which matches the symptom in the exhibit. It scales the fleet based on actual per-target demand instead of CPU, so the group can add capacity before queueing and latency grow.
D
Increase the ALB idle timeout so requests can wait longer before timing out.
Why wrong: A longer idle timeout only masks slow responses and can prolong connection occupancy. It does not add capacity or correct the scaling signal that is driving the latency spike.
Answer the question above first, then reveal the full breakdown to understand why each option is right or wrong.
Correct answer & explanation
✓
Configure target tracking scaling on ALB RequestCountPerTarget for the Auto Scaling group.
Option C is correct because the issue is that request latency increases sharply and the ALB logs show far more requests per target than expected, indicating that the Auto Scaling group is not scaling based on the actual load per instance. By configuring target tracking scaling on ALB RequestCountPerTarget, the Auto Scaling group will launch new instances when the average number of requests per target exceeds a defined threshold, directly addressing the root cause of high request volume per instance. This approach ensures scaling is driven by the actual workload distribution rather than CPU utilization, which remains low due to the application being I/O-bound or network-bound.
Key principle: Answer the scenario, not the keyword: identify the specific constraint before choosing the most familiar-sounding option.
Answer analysis
Option-by-option breakdown
For each option: why learners choose it and why it is or isn't the right answer here.
✗
Lower the CPU target tracking threshold so the Auto Scaling group launches more instances sooner.
Why it's wrong here
CPU is already low, so using CPU as the scaling signal will not match the bottleneck. The application is saturating on request handling before CPU becomes a useful indicator.
✗
Replace the Application Load Balancer with a Network Load Balancer to reduce request latency.
Why it's wrong here
A Network Load Balancer does not solve application-layer capacity pressure on the targets. It also does not provide a better scaling signal for HTTP request volume.
✓
Configure target tracking scaling on ALB RequestCountPerTarget for the Auto Scaling group.
Why this is correct
RequestCountPerTarget directly reflects how many requests each instance is serving, which matches the symptom in the exhibit. It scales the fleet based on actual per-target demand instead of CPU, so the group can add capacity before queueing and latency grow.
Clue confirmation
The clue word "best" in the question point toward this answer.
Related concept
Read the scenario before looking for a memorised answer.
✗
Increase the ALB idle timeout so requests can wait longer before timing out.
Why it's wrong here
A longer idle timeout only masks slow responses and can prolong connection occupancy. It does not add capacity or correct the scaling signal that is driving the latency spike.
Common exam traps
Common exam trap: answer the scenario, not the keyword
The trap here is that candidates often assume CPU utilization is the universal scaling metric, but the question explicitly states CPU stays low while latency spikes, indicating the bottleneck is request throughput, not compute, making RequestCountPerTarget the correct metric to scale on.
Detailed technical explanation
How to think about this question
Under the hood, target tracking scaling with RequestCountPerTarget uses the ALB's CloudWatch metric `RequestCountPerTarget`, which is calculated as the sum of requests divided by the number of healthy targets over a period. This metric is ideal for applications where CPU utilization is not the primary bottleneck, such as I/O-intensive or memory-bound workloads. In a real-world scenario, an e-commerce site during a flash sale might see a surge in requests that overwhelm the web server's connection pool, causing latency spikes even though CPU remains low; scaling on request count per target directly mitigates this by distributing the load across more instances.
KKey Concepts to Remember
Read the scenario before looking for a memorised answer.
Find the constraint that changes the correct option.
Eliminate answers that are true in general but not in this case.
TExam Day Tips
→Watch for words such as best, first, most likely and least administrative effort.
→Review why wrong options are wrong, not only why the correct option is correct.
Key takeaway
Answer the scenario, not the keyword: identify the specific constraint before choosing the most familiar-sounding option.
Real-world example
How this comes up in practice
A company's IT admin needs to give a contractor read-only access to production logs without sharing account credentials. Using role-based access control (RBAC) and temporary scoped permissions — not a permanent shared password — is the correct pattern. Questions like this test whether you can apply least-privilege access across cloud identity services.
What to study next
Got this wrong? Here's your next step.
Identify which exam domain this question belongs to, review the core concept, then practise similar questions from the same domain.
Design High-Performing Architectures — This question tests Design High-Performing Architectures — Read the scenario before looking for a memorised answer..
What is the correct answer to this question?
The correct answer is: Configure target tracking scaling on ALB RequestCountPerTarget for the Auto Scaling group. — Option C is correct because the issue is that request latency increases sharply and the ALB logs show far more requests per target than expected, indicating that the Auto Scaling group is not scaling based on the actual load per instance. By configuring target tracking scaling on ALB RequestCountPerTarget, the Auto Scaling group will launch new instances when the average number of requests per target exceeds a defined threshold, directly addressing the root cause of high request volume per instance. This approach ensures scaling is driven by the actual workload distribution rather than CPU utilization, which remains low due to the application being I/O-bound or network-bound.
What should I do if I get this SAA-C03 question wrong?
Identify which exam domain this question belongs to, review the core concept, then practise similar questions from the same domain.
Are there clue words in this question I should notice?
Yes — watch for: "best". Signals that multiple options may be partially correct. Choose the option that most directly solves the exact problem described, not the one that sounds most complete.
What is the key concept behind this question?
Read the scenario before looking for a memorised answer.
About these practice questions
Courseiva creates original exam-style practice questions with explanations and wrong-answer analysis. It does not publish real exam questions, exam dumps, or protected exam content. Learn why practice questions differ from exam dumps →
These questions test the same concept from different angles. Work through them to make sure you can recognise it however the exam phrases it.
Variation 1. A company runs a stateless web API on Amazon EC2 behind an Application Load Balancer. The team notices that during business hours, the ALB starts queueing requests and the average request latency rises. They want to scale out quickly and reliably based on demand, not CPU alone. Which Auto Scaling approach best matches this requirement?
easy
A.Use a fixed-size Auto Scaling group and increase capacity manually once per hour.
✓ B.Use target tracking scaling based on ALB request count per target.
C.Scale based only on EC2 instance memory utilization, regardless of load.
D.Use step scaling with a single threshold on average network-in bytes.
Why B: Target tracking scaling based on ALB request count per target directly aligns with the requirement to scale out based on demand (request queuing and latency) rather than CPU alone. This policy automatically adjusts the Auto Scaling group size to maintain a target value for the average number of requests per instance, which is a more reliable indicator of load for a stateless web API than CPU utilization.
Variation 2. A company runs a stateless web API on Amazon EC2 behind an Application Load Balancer. The team notices that during business hours, the ALB starts queueing requests and the average request latency rises. They want to scale out quickly and reliably based on demand, not CPU alone. Which Auto Scaling approach best matches this requirement?
easy
A.Use a fixed-size Auto Scaling group and increase capacity manually once per hour.
✓ B.Use target tracking scaling based on ALB request count per target.
C.Scale based only on EC2 instance memory utilization, regardless of load.
D.Use step scaling with a single threshold on average network-in bytes.
Why B: Option B is correct because target tracking scaling based on ALB request count per target directly measures the load on each instance, allowing the Auto Scaling group to add or remove instances to maintain a target value. This approach scales out quickly and reliably based on actual demand (request queuing and latency), not just CPU, which aligns with the requirement to respond to rising latency and queueing during business hours.
Share a tip, memory trick, or ask about the reasoning behind this question. Do not post real exam questions, leaked content, braindumps, or copyrighted exam material. Comments are moderated and may be removed without notice.
This SAA-C03 practice question is part of Courseiva's free Amazon Web Services certification practice question bank. Courseiva provides original exam-style practice questions with explanations, topic-based practice, mock exams, readiness tracking, and study analytics to help learners prepare for the SAA-C03 exam.
Question Discussion
Share a tip, memory trick, or ask about the reasoning behind this question. Do not post real exam questions, leaked content, braindumps, or copyrighted exam material. Comments are moderated and may be removed without notice.
Sign in to join the discussion.