← SAA-C03·Amazon Web Services

Question 261 of 1,040

Design High-Performing Architectures →easyMultiple ChoiceObjective-mapped

Quick Answer

The correct choice is to scale on a request-driven metric such as ALB RequestCount per target. This directly addresses the latency spike because the p95 response time is rising while CPU utilization stays below 40%, which signals that the bottleneck is request queueing or connection overhead rather than compute power. By scaling on RequestCountPerTarget, you launch new instances precisely when individual targets are overwhelmed by incoming requests, reducing queueing delays and improving response times. On the SAA-C03 exam, this scenario tests your understanding that CPU is not always the right scaling metric—look for clues like “p95 latency increases but CPU is low” to spot the trap of relying on average CPU. A common memory tip is “Latency up, CPU low? Scale on request count, not CPU.”

SAA-C03 Design High-Performing Architectures Practice Question

This SAA-C03 practice question tests your understanding of design high-performing architectures. Match the stated requirement to the specific cloud service, access model, or configuration option — many options are valid in isolation but not for this scenario. After answering, compare your reasoning against the explanation and wrong-answer breakdown below. Once you have made your selection, read the full explanation to reinforce the concept and understand why each distractor is designed to mislead on exam day.

Your web application runs on EC2 instances behind an Application Load Balancer (ALB). During traffic spikes, p95 response time increases, but average CPU utilization remains below 40%. The current Auto Scaling policy scales based on average CPU%. What should you change to improve performance during spikes?

Question 1easymultiple choice

Full question →

A
Keep scaling on CPU% to avoid over-scaling
Why wrong: CPU-based scaling can lag or fail when the bottleneck is not CPU saturation (for example, thread/connection limits, queueing, downstream dependency slowness, or ALB target response time). Your symptom already shows CPU is not the limiting factor.
B
Scale on a request-driven metric such as ALB RequestCount per target (or target-group request rate)
A request-driven metric correlates directly with incoming workload pressure. Scaling on request rate helps ensure enough capacity is added before request queues build up, which can reduce p95 response time even when CPU remains low.
C
Disable scaling and manually increase capacity during business hours
Why wrong: Manual capacity changes eliminate elasticity and can still miss sudden spikes outside business hours. This increases the risk of prolonged high p95 latency during unpredictable traffic surges.
D
Scale only when network packet drops fall below a threshold
Why wrong: Packet-drop metrics are not a reliable proxy for application-level queuing/backlog that drives p95 latency. They are also often noisy and can be unrelated to CPU or request handling at the application tier.

Full breakdown with real-world context →

Answer choices

Why each option matters

Answer the question above first, then reveal the full breakdown to understand why each option is right or wrong.

Correct answer & explanation

✓

Scale on a request-driven metric such as ALB RequestCount per target (or target-group request rate)

The p95 response time is increasing during traffic spikes while CPU utilization remains low, indicating that the bottleneck is not compute capacity but rather request handling or connection overhead. By scaling on ALB RequestCountPerTarget, you directly target the metric causing latency—each target's request load—rather than an indirect metric like CPU. This ensures that new instances are launched precisely when individual targets are overwhelmed by requests, reducing queueing delays and improving response times.

Key principle: Answer the scenario, not the keyword: identify the specific constraint before choosing the most familiar-sounding option.

Answer analysis

Option-by-option breakdown

For each option: why learners choose it and why it is or isn't the right answer here.

✗
Keep scaling on CPU% to avoid over-scaling
Why it's wrong here
CPU-based scaling can lag or fail when the bottleneck is not CPU saturation (for example, thread/connection limits, queueing, downstream dependency slowness, or ALB target response time). Your symptom already shows CPU is not the limiting factor.
✓
Scale on a request-driven metric such as ALB RequestCount per target (or target-group request rate)
Why this is correct
A request-driven metric correlates directly with incoming workload pressure. Scaling on request rate helps ensure enough capacity is added before request queues build up, which can reduce p95 response time even when CPU remains low.
Related concept
Read the scenario before looking for a memorised answer.
✗
Disable scaling and manually increase capacity during business hours
Why it's wrong here
Manual capacity changes eliminate elasticity and can still miss sudden spikes outside business hours. This increases the risk of prolonged high p95 latency during unpredictable traffic surges.
✗
Scale only when network packet drops fall below a threshold
Why it's wrong here
Packet-drop metrics are not a reliable proxy for application-level queuing/backlog that drives p95 latency. They are also often noisy and can be unrelated to CPU or request handling at the application tier.

Common exam traps

Common exam trap: answer the scenario, not the keyword

The trap here is that candidates assume high latency always means high CPU, but AWS tests the understanding that p95 latency can spike due to request queueing even when CPU is idle, making request-based scaling the correct choice over CPU-based scaling.

Trap categories for this question

Command / output trap
CPU-based scaling can lag or fail when the bottleneck is not CPU saturation (for example, thread/connection limits, queueing, downstream dependency slowness, or ALB target response time). Your symptom already shows CPU is not the limiting factor.

Detailed technical explanation

How to think about this question

Under the hood, ALB RequestCountPerTarget measures the number of requests routed to each EC2 instance per minute. When this metric exceeds a threshold, it indicates that the target is spending time on connection handling, TLS handshakes, or request parsing rather than CPU-bound computation. Auto scaling based on this metric allows the ALB to distribute load across more targets, reducing per-instance queue depth and lowering tail latency. In real-world scenarios, this is critical for applications with long-polling, WebSocket upgrades, or heavy I/O where CPU stays low but request throughput saturates.

KKey Concepts to Remember

Read the scenario before looking for a memorised answer.
Find the constraint that changes the correct option.
Eliminate answers that are true in general but not in this case.

TExam Day Tips

Watch for words such as best, first, most likely and least administrative effort.
Review why wrong options are wrong, not only why the correct option is correct.

Key takeaway

Answer the scenario, not the keyword: identify the specific constraint before choosing the most familiar-sounding option.

Real-world example

How this comes up in practice

An e-commerce site experiences heavy traffic on Black Friday and near-zero traffic during off-peak weeks. Rather than provisioning permanent large VMs, the team uses auto-scaling groups that add capacity automatically under load and reduce it overnight. Questions like this test whether you understand elasticity, availability zones, and cloud compute scaling patterns.

What to study next

Got this wrong? Here's your next step.

Identify which exam domain this question belongs to, review the core concept, then practise similar questions from the same domain.

Related SAA-C03 practice-question pages

Use these pages to review the topic behind this question. This is how one missed question becomes focused revision.

Design Secure Architectures practice questions

Practise SAA-C03 questions linked to Design Secure Architectures.

Design Resilient Architectures practice questions

Practise SAA-C03 questions linked to Design Resilient Architectures.

Design High-Performing Architectures practice questions

Practise SAA-C03 questions linked to Design High-Performing Architectures.

Design Cost-Optimized Architectures practice questions

Practise SAA-C03 questions linked to Design Cost-Optimized Architectures.

SAA-C03 VPC practice questions

Practise SAA-C03 questions linked to SAA-C03 VPC.

SAA-C03 S3 lifecycle policy questions

Practise SAA-C03 questions linked to SAA-C03 S3 lifecycle policy questions.

SAA-C03 RDS Multi-AZ questions

Practise SAA-C03 questions linked to SAA-C03 RDS Multi-AZ questions.

SAA-C03 IAM policy practice questions

Practise SAA-C03 questions linked to SAA-C03 IAM policy.

SAA-C03 Route 53 failover questions

Practise SAA-C03 questions linked to SAA-C03 Route 53 failover questions.

SAA-C03 CloudFront practice questions

Practise SAA-C03 questions linked to SAA-C03 CloudFront.

SAA-C03 NAT gateway questions

Practise SAA-C03 questions linked to SAA-C03 NAT gateway questions.

SAA-C03 VPC endpoint questions

Practise SAA-C03 questions linked to SAA-C03 VPC endpoint questions.

Practice this exam

Start a free SAA-C03 practice session

Short sessions build daily habit. Longer sessions build exam-day stamina. Try a timed session to simulate real conditions.

10 questions 20 questions 30 questions 50 questions Timed 30

SAA-C03 practice-test guide →Study guide →Browse all practice tests

FAQ

Questions learners often ask

What does this SAA-C03 question test?

Design High-Performing Architectures — This question tests Design High-Performing Architectures — Read the scenario before looking for a memorised answer..

What is the correct answer to this question?

The correct answer is: Scale on a request-driven metric such as ALB RequestCount per target (or target-group request rate) — The p95 response time is increasing during traffic spikes while CPU utilization remains low, indicating that the bottleneck is not compute capacity but rather request handling or connection overhead. By scaling on ALB RequestCountPerTarget, you directly target the metric causing latency—each target's request load—rather than an indirect metric like CPU. This ensures that new instances are launched precisely when individual targets are overwhelmed by requests, reducing queueing delays and improving response times.

What should I do if I get this SAA-C03 question wrong?

Identify which exam domain this question belongs to, review the core concept, then practise similar questions from the same domain.

What is the key concept behind this question?

Read the scenario before looking for a memorised answer.

About these practice questions

Courseiva creates original exam-style practice questions with explanations and wrong-answer analysis. It does not publish real exam questions, exam dumps, or protected exam content. Learn why practice questions differ from exam dumps →

How Courseiva writes practice questions · Editorial policy

Last reviewed: Jun 11, 2026

Question Discussion

Share a tip, memory trick, or ask about the reasoning behind this question. Do not post real exam questions, leaked content, braindumps, or copyrighted exam material. Comments are moderated and may be removed without notice.

Loading comments…

This SAA-C03 practice question is part of Courseiva's free Amazon Web Services certification practice question bank. Courseiva provides original exam-style practice questions with explanations, topic-based practice, mock exams, readiness tracking, and study analytics to help learners prepare for the SAA-C03 exam.