- A
Keep scaling on CPU% to avoid over-scaling
Why wrong: CPU-based scaling can lag or fail when the bottleneck is not CPU saturation (for example, thread/connection limits, queueing, downstream dependency slowness, or ALB target response time). Your symptom already shows CPU is not the limiting factor.
- B
Scale on a request-driven metric such as ALB RequestCount per target (or target-group request rate)
A request-driven metric correlates directly with incoming workload pressure. Scaling on request rate helps ensure enough capacity is added before request queues build up, which can reduce p95 response time even when CPU remains low.
- C
Disable scaling and manually increase capacity during business hours
Why wrong: Manual capacity changes eliminate elasticity and can still miss sudden spikes outside business hours. This increases the risk of prolonged high p95 latency during unpredictable traffic surges.
- D
Scale only when network packet drops fall below a threshold
Why wrong: Packet-drop metrics are not a reliable proxy for application-level queuing/backlog that drives p95 latency. They are also often noisy and can be unrelated to CPU or request handling at the application tier.
Quick Answer
The correct choice is to scale on a request-driven metric such as ALB RequestCount per target. This directly addresses the latency spike because the p95 response time is rising while CPU utilization stays below 40%, which signals that the bottleneck is request queueing or connection overhead rather than compute power. By scaling on RequestCountPerTarget, you launch new instances precisely when individual targets are overwhelmed by incoming requests, reducing queueing delays and improving response times. On the SAA-C03 exam, this scenario tests your understanding that CPU is not always the right scaling metric—look for clues like “p95 latency increases but CPU is low” to spot the trap of relying on average CPU. A common memory tip is “Latency up, CPU low? Scale on request count, not CPU.”
SAA-C03 Design High-Performing Architectures Practice Question
This SAA-C03 practice question tests your understanding of design high-performing architectures. Match the stated requirement to the specific cloud service, access model, or configuration option — many options are valid in isolation but not for this scenario. After answering, compare your reasoning against the explanation and wrong-answer breakdown below. Once you have made your selection, read the full explanation to reinforce the concept and understand why each distractor is designed to mislead on exam day.
Your web application runs on EC2 instances behind an Application Load Balancer (ALB). During traffic spikes, p95 response time increases, but average CPU utilization remains below 40%. The current Auto Scaling policy scales based on average CPU%. What should you change to improve performance during spikes?
Answer choices
Why each option matters
Answer the question above first, then reveal the full breakdown to understand why each option is right or wrong.
Correct answer & explanation
Scale on a request-driven metric such as ALB RequestCount per target (or target-group request rate)
The p95 response time is increasing during traffic spikes while CPU utilization remains low, indicating that the bottleneck is not compute capacity but rather request handling or connection overhead. By scaling on ALB RequestCountPerTarget, you directly target the metric causing latency—each target's request load—rather than an indirect metric like CPU. This ensures that new instances are launched precisely when individual targets are overwhelmed by requests, reducing queueing delays and improving response times.
Key principle: Answer the scenario, not the keyword: identify the specific constraint before choosing the most familiar-sounding option.
Answer analysis
Option-by-option breakdown
For each option: why learners choose it and why it is or isn't the right answer here.
- ✗
Keep scaling on CPU% to avoid over-scaling
Why it's wrong here
CPU-based scaling can lag or fail when the bottleneck is not CPU saturation (for example, thread/connection limits, queueing, downstream dependency slowness, or ALB target response time). Your symptom already shows CPU is not the limiting factor.
- ✓
Scale on a request-driven metric such as ALB RequestCount per target (or target-group request rate)
Why this is correct
A request-driven metric correlates directly with incoming workload pressure. Scaling on request rate helps ensure enough capacity is added before request queues build up, which can reduce p95 response time even when CPU remains low.
Related concept
Read the scenario before looking for a memorised answer.
- ✗
Disable scaling and manually increase capacity during business hours
Why it's wrong here
Manual capacity changes eliminate elasticity and can still miss sudden spikes outside business hours. This increases the risk of prolonged high p95 latency during unpredictable traffic surges.
- ✗
Scale only when network packet drops fall below a threshold
Why it's wrong here
Packet-drop metrics are not a reliable proxy for application-level queuing/backlog that drives p95 latency. They are also often noisy and can be unrelated to CPU or request handling at the application tier.
Common exam traps
Common exam trap: answer the scenario, not the keyword
The trap here is that candidates assume high latency always means high CPU, but AWS tests the understanding that p95 latency can spike due to request queueing even when CPU is idle, making request-based scaling the correct choice over CPU-based scaling.
Trap categories for this question
Command / output trap
CPU-based scaling can lag or fail when the bottleneck is not CPU saturation (for example, thread/connection limits, queueing, downstream dependency slowness, or ALB target response time). Your symptom already shows CPU is not the limiting factor.
Detailed technical explanation
How to think about this question
Under the hood, ALB RequestCountPerTarget measures the number of requests routed to each EC2 instance per minute. When this metric exceeds a threshold, it indicates that the target is spending time on connection handling, TLS handshakes, or request parsing rather than CPU-bound computation. Auto scaling based on this metric allows the ALB to distribute load across more targets, reducing per-instance queue depth and lowering tail latency. In real-world scenarios, this is critical for applications with long-polling, WebSocket upgrades, or heavy I/O where CPU stays low but request throughput saturates.
KKey Concepts to Remember
- Read the scenario before looking for a memorised answer.
- Find the constraint that changes the correct option.
- Eliminate answers that are true in general but not in this case.
TExam Day Tips
- Watch for words such as best, first, most likely and least administrative effort.
- Review why wrong options are wrong, not only why the correct option is correct.
Key takeaway
Answer the scenario, not the keyword: identify the specific constraint before choosing the most familiar-sounding option.
Real-world example
How this comes up in practice
An e-commerce site experiences heavy traffic on Black Friday and near-zero traffic during off-peak weeks. Rather than provisioning permanent large VMs, the team uses auto-scaling groups that add capacity automatically under load and reduce it overnight. Questions like this test whether you understand elasticity, availability zones, and cloud compute scaling patterns.
What to study next
Got this wrong? Here's your next step.
Identify which exam domain this question belongs to, review the core concept, then practise similar questions from the same domain.
- →
Design High-Performing Architectures — study guide chapter
Learn the concepts, then practise the questions
- →
Design High-Performing Architectures practice questions
Targeted practice on this topic area only
- →
All SAA-C03 questions
1,040 questions across all exam domains
- →
SAA-C03 study guide
Full concept coverage aligned to exam objectives
- →
SAA-C03 practice test guide
How to use practice tests most effectively before exam day
Related practice questions
Related SAA-C03 practice-question pages
Use these pages to review the topic behind this question. This is how one missed question becomes focused revision.
Design Secure Architectures practice questions
Practise SAA-C03 questions linked to Design Secure Architectures.
Design Resilient Architectures practice questions
Practise SAA-C03 questions linked to Design Resilient Architectures.
Design High-Performing Architectures practice questions
Practise SAA-C03 questions linked to Design High-Performing Architectures.
Design Cost-Optimized Architectures practice questions
Practise SAA-C03 questions linked to Design Cost-Optimized Architectures.
SAA-C03 VPC practice questions
Practise SAA-C03 questions linked to SAA-C03 VPC.
SAA-C03 S3 lifecycle policy questions
Practise SAA-C03 questions linked to SAA-C03 S3 lifecycle policy questions.
SAA-C03 RDS Multi-AZ questions
Practise SAA-C03 questions linked to SAA-C03 RDS Multi-AZ questions.
SAA-C03 IAM policy practice questions
Practise SAA-C03 questions linked to SAA-C03 IAM policy.
SAA-C03 Route 53 failover questions
Practise SAA-C03 questions linked to SAA-C03 Route 53 failover questions.
SAA-C03 CloudFront practice questions
Practise SAA-C03 questions linked to SAA-C03 CloudFront.
SAA-C03 NAT gateway questions
Practise SAA-C03 questions linked to SAA-C03 NAT gateway questions.
SAA-C03 VPC endpoint questions
Practise SAA-C03 questions linked to SAA-C03 VPC endpoint questions.
Practice this exam
Start a free SAA-C03 practice session
Short sessions build daily habit. Longer sessions build exam-day stamina. Try a timed session to simulate real conditions.
FAQ
Questions learners often ask
What does this SAA-C03 question test?
Design High-Performing Architectures — This question tests Design High-Performing Architectures — Read the scenario before looking for a memorised answer..
What is the correct answer to this question?
The correct answer is: Scale on a request-driven metric such as ALB RequestCount per target (or target-group request rate) — The p95 response time is increasing during traffic spikes while CPU utilization remains low, indicating that the bottleneck is not compute capacity but rather request handling or connection overhead. By scaling on ALB RequestCountPerTarget, you directly target the metric causing latency—each target's request load—rather than an indirect metric like CPU. This ensures that new instances are launched precisely when individual targets are overwhelmed by requests, reducing queueing delays and improving response times.
What should I do if I get this SAA-C03 question wrong?
Identify which exam domain this question belongs to, review the core concept, then practise similar questions from the same domain.
What is the key concept behind this question?
Read the scenario before looking for a memorised answer.
About these practice questions
Courseiva creates original exam-style practice questions with explanations and wrong-answer analysis. It does not publish real exam questions, exam dumps, or protected exam content. Learn why practice questions differ from exam dumps →
Keep practising
More SAA-C03 practice questions
- A content publishing system uses Lambda functions that call an unreliable third-party API. Failed events must be retaine…
- A startup runs two EC2-based workloads in the same AWS Region. Its customer-facing API is always on, and its nightly vid…
- A warehouse integration service must use shared file storage across Linux EC2 instances in multiple Availability Zones.…
- A team runs a stateless web app on Amazon EC2 behind an Application Load Balancer. During traffic spikes, new EC2 instan…
- A service in private subnets downloads product images from Amazon S3 and stores job state in DynamoDB. A NAT Gateway is…
- A static site is hosted in Amazon S3 and delivered by CloudFront. After a frontend release, the same JavaScript bundles…
Last reviewed: Jun 11, 2026
This SAA-C03 practice question is part of Courseiva's free Amazon Web Services certification practice question bank. Courseiva provides original exam-style practice questions with explanations, topic-based practice, mock exams, readiness tracking, and study analytics to help learners prepare for the SAA-C03 exam.
Question Discussion
Share a tip, memory trick, or ask about the reasoning behind this question. Do not post real exam questions, leaked content, braindumps, or copyrighted exam material. Comments are moderated and may be removed without notice.
Sign in to join the discussion.