Courseiva
Knowledge + Practice
CertificationsVendorsCareer RoadmapsLabs & ToolsStudy GuidesGlossaryPractice Questions
C
Courseiva

Free IT certification practice questions with explained answers for CCNA, CompTIA, AWS, Azure, Google Cloud, and more.

Certification Practice Questions

CCNA practice questionsSecurity+ SY0-701 practice questionsAWS SAA-C03 practice questionsAZ-104 practice questionsAZ-900 practice questionsCLF-C02 practice questionsA+ Core 1 practice questionsGoogle Cloud ACE practice questionsCySA+ CS0-003 practice questionsNetwork+ N10-009 practice questions
View all certifications →

Product

CertificationsCertification PathsExam TopicsPractice TestsExam Dumps vs Practice TestsStudy HubComparisons

Company

AboutContactEditorial PolicyQuestion Writing PolicyTrust Center

Legal

Privacy PolicyTerms of Service

Courseiva is a free IT certification practice platform offering original exam-style practice questions, detailed explanations, topic-based practice, mock exams, readiness tracking, and study analytics for Cisco, CompTIA, Microsoft, AWS, and other technology certifications.

© 2026 Courseiva. Courseiva is operated by JTNetSolutions Ltd. All rights reserved.

Courseiva is an independent certification practice platform and is not affiliated with, endorsed by, or sponsored by Cisco, Microsoft, AWS, CompTIA, Google, ISC2, ISACA, or any other certification vendor. Vendor names and certification marks are used only to identify the exams learners are preparing for.

Certifications›SAA-C03›Objectives›Design Resilient Architectures
Objective 2.026% of exam

Design Resilient Architectures

SAA-C03 Practice Questions

Use this page to practise high availability and resilience questions. The SAA-C03 exam tests your ability to match an architecture pattern to an RTO/RPO requirement — know the cost and recovery time of each pattern.

Full Practice Test →All Objectives

What this objective tests

SAA-C03 Design Resilient Architectures — Key Topics

High availability and resilience questions test multi-AZ vs multi-Region patterns, Auto Scaling, load balancing and the right service for a given recovery time objective.

  • Multi-AZ vs multi-Region deployment trade-offs.
  • Auto Scaling policies and when to scale horizontally vs vertically.
  • Elastic Load Balancing: ALB, NLB, CLB and their use cases.
  • RTO and RPO targets matched to the correct AWS architecture.

Common exam traps

Where candidates lose marks on Design Resilient Architectures

  • ⚠Multi-AZ protects against AZ failure; multi-Region protects against Region failure.
  • ⚠Auto Scaling does not guarantee zero downtime without a load balancer.
  • ⚠ALB operates at Layer 7; NLB operates at Layer 4.
  • ⚠Pilot light is cheaper than warm standby but has longer recovery time.

SAA-C03 Design Resilient Architectures — Practice Questions

30 questions from this objective · 26% of your SAA-C03 exam

Question 2mediummultiple choice
Full question →

An order-processing service consumes messages from an Amazon SQS Standard queue using a custom worker. During traffic spikes, the worker occasionally times out after performing some work but before acknowledging the message, so SQS redelivers it and it may be processed again.

You also observe that a small set of “poison” messages always fail validation.

What change most directly improves resilience by (1) preventing poison messages from retrying indefinitely and (2) avoiding duplicate side effects caused by legitimate retries?

Question 3mediummultiple choice
Full question →

Based on the exhibit, the application sees several minutes of connection errors during an Aurora failover. What is the best change to reduce failover impact?

Exhibit

Application configuration
  JDBC URL: jdbc:postgresql://mydb-instance-1.abcdefghijkl.us-east-1.rds.amazonaws.com:5432/app
Aurora event log
  11:15:02 Failover initiated
  11:15:04 Writer moved to a different instance
  11:18:20 Application still reporting connection refused errors
Notes from the team
  The application uses a connection pool and does not re-resolve the endpoint quickly.
Question 4mediummultiple choice
Full question →

A payments service receives payment orders by consuming messages from an Amazon SQS Standard queue. The downstream processor occasionally exceeds its processing timeout. As a result, some messages reappear in the queue and may be processed more than once.

The team wants to prevent duplicate side effects (for example, double-charging) and also ensure poison messages do not repeatedly consume processing capacity.

What approach best satisfies both goals?

Question 5mediummultiple choice
Review the full subnetting walkthrough →

A company runs an application behind an Application Load Balancer (ALB). An Auto Scaling group (ASG) is configured with desired capacity 2, but it is attached only to subnets in a single Availability Zone. The ALB is healthy because it is configured across multiple Availability Zones.

When the Availability Zone that contains the ASG subnets experiences an outage, what change most directly improves resilience and allows capacity to be restored automatically?

Question 6hardmultiple choice
Read the full DNS explanation →

Based on the exhibit, DNS still sends traffic to the primary Region even though Route 53 health checks show the primary endpoint is unhealthy. What is the best change to make failover work as intended?

Exhibit

Route 53 record sets for app.example.com:
- Record 1: Type A, RoutingPolicy=Simple, AliasTarget=alb-use1.amazonaws.com
- Record 2: Type A, RoutingPolicy=Simple, AliasTarget=alb-usw2.amazonaws.com

Health check status:
hc-primary: FAILED
hc-secondary: HEALTHY

Resolver test:
$ dig +short app.example.com
alb-use1.amazonaws.com

Ops note:
The intent is to send all traffic to us-east-1 normally and fail over to us-west-2 only when the primary is unhealthy.
Question 7mediummultiple choice
Full question →

Based on the exhibit, the web application must remain available even if one Availability Zone fails. What is the best change to improve resilience with the least redesign?

Exhibit

Application Load Balancer
  Subnets: subnet-a1 (us-east-1a), subnet-b1 (us-east-1b)
Auto Scaling group
  VPCZoneIdentifier: subnet-a1 (us-east-1a)
  DesiredCapacity: 2
  MinSize: 2
  MaxSize: 4
CloudWatch
  HealthyHostCount: 2
  HTTPCode_Target_5XX_Count: 0
Troubleshooting note
  A planned test that disabled us-east-1a caused the application to become unreachable.
Question 8mediummultiple choice
Read the full NAT/PAT explanation →

An Auto Scaling group behind an Application Load Balancer frequently replaces new EC2 instances. The application needs ~6 minutes to warm up after instance launch. However, the ALB target group health checks start immediately and mark the targets unhealthy until the application is ready. Because the targets become unhealthy early, the Auto Scaling group then terminates the instances and launches replacements, creating a repeated unhealthy/termination loop.

What configuration change will most directly improve recovery by preventing premature ASG termination while the application is warming up?

Question 9mediummultiple choice
Read the full DNS explanation →

A company runs an internet-facing API in two AWS Regions. Route 53 currently uses simple routing to a primary Application Load Balancer (ALB) DNS name. When the primary Region experiences an outage, customers wait a long time because the DNS entry is not changed automatically.

The team wants automatic failover: if the primary Region ALB health check fails for a sustained period, Route 53 should route users to the secondary Region ALB.

Which Route 53 approach best meets this requirement?

Question 10mediummultiple choice
Full question →

A team accidentally updates critical rows in an Amazon RDS for PostgreSQL database. Automated backups are enabled. They need to recover the data to the exact state as of 90 minutes ago.

They also cannot risk interrupting the current production database instance while investigators validate the restored data.

Which recovery strategy best meets these constraints?

Question 11easymultiple choice
Full question →

Based on the exhibit, the database must continue serving if the current Availability Zone fails. What should you change?

Exhibit

Amazon RDS for PostgreSQL
DB instance identifier: orders-db
Multi-AZ: false
Automated backups: enabled
Availability Zone: us-east-1b
Publicly accessible: no
Question 12hardmultiple choice
Full question →

Based on the exhibit, the application tier is not replacing unhealthy instances even though the Auto Scaling group spans two Availability Zones. What change most directly improves automatic recovery when the application process fails?

Network Topology
$ aws autoscaling describe-auto-scaling-groupsauto-scaling-group-names orders-asg$ aws elbv2 describe-target-healthtarget-group-arn arn:aws:elasticloadbalancing:us-east-1:111122223333:targetgroup/orders-tg/abcd1234"AutoScalingGroups": ["AutoScalingGroupName": "orders-asg","DesiredCapacity": 4,"MinSize": 4,"MaxSize": 8,"AvailabilityZones": ["us-east-1a", "us-east-1b"],"HealthCheckType": "EC2","HealthCheckGracePeriod": 300,"TargetGroupARNs": ["arn:aws:elasticloadbalancing:us-east-1:111122223333:targetgroup/orders-tg/abcd1234"]TARGETSi-01e2a3b4: healthyi-02e3b4c5: healthyi-03f4c5d6: unhealthyi-04a5d6e7: unhealthyApplication health endpoint:2026-04-27T13:05:22Z GET /health -> 500EC2 status checks: passing
Question 13hardmultiple choice
Full question →

Based on the exhibit, the team must restore an Amazon RDS for PostgreSQL database to the exact state just before a bad delete happened. What is the best recovery approach?

Exhibit

RDS backup status:
- Automated backups: Enabled
- Backup retention period: 14 days
- Latest automated snapshot: 2026-04-27 09:00 UTC
- Latest restorable time: 2026-04-27 15:14 UTC

Incident timeline:
- 2026-04-27 15:11 UTC: deployment script accidentally deleted critical rows
- 2026-04-27 15:12 UTC: application detected missing data
- Required restore point: 2026-04-27 15:10 UTC

Operations note:
The business wants to recover to a new database first, verify data, and then cut over the application.
Question 14mediummultiple choice
Read the full DNS explanation →

Based on the exhibit, the company wants DNS traffic to fail over automatically from the primary Region to a secondary Region when the primary endpoint is unhealthy. Which Route 53 change is best?

Exhibit

Route 53 record set
  Name: app.example.com
  Type: A (Alias)
  Routing policy: Simple
  Alias target: alb-primary-123.us-east-1.elb.amazonaws.com
  TTL: 60 seconds
Health check
  ID: hc-44
  Status: Inactive
Secondary environment
  ALB target exists in us-west-2: alb-secondary-456.us-west-2.elb.amazonaws.com
Operational note
  A Region outage should shift users to the secondary ALB without manual DNS changes.
Question 15hardmultiple choice
Full question →

Based on the exhibit, downstream payment timeouts cause EventBridge deliveries to back up and some events are retried until they age out. What change best improves resilience and preserves events during downstream outages?

Exhibit

Amazon EventBridge rule:
- source: orders.checkout
- target: Lambda function process-orders
- retry policy: default

CloudWatch metrics:
- Invocations: 120/min
- Throttles: 87/min
- ApproximateAgeOfOldestEvent: 900 seconds

Lambda log excerpt:
2026-04-27T18:22:41Z payment API timeout
2026-04-27T18:22:44Z retry attempt 3 failed
2026-04-27T18:22:48Z processing orderId=90118 paused

Business requirement:
No events should be lost during a temporary payment API outage, and the system must absorb bursts instead of failing immediately.
Question 16mediummultiple choice
Full question →

A SaaS platform plans to run in two AWS Regions for lower latency. The team wants to enable active-active writes (both regions accept updates) to avoid failover downtime. However, the business requires strong consistency for order status transitions (for example, only one transition from “Paid” to “Shipped” must be allowed).

Which statement is the best architectural choice to meet the consistency requirement?

Question 17easymultiple choice
Full question →

Based on the exhibit, the web tier becomes unavailable if us-west-2a has an outage. What is the best change to improve resilience with the least redesign?

Exhibit

Auto Scaling group: web-asg
Attached subnets: subnet-1111 (us-west-2a)
Load balancer subnets: subnet-1111 (us-west-2a)
Desired capacity: 2
Health check type: ELB
Question 18hardmultiple choice
Full question →

Based on the exhibit, the database is manually promoted during an Availability Zone failure and the application outage lasts longer than the target. What change best improves resilience with the least operational intervention?

Exhibit

Current topology:
app -> Amazon RDS for PostgreSQL primary db-a in us-east-1a
app -> Amazon RDS read replica db-b in us-east-1b

Incident report:
10:14 UTC - Primary AZ impaired
10:15 UTC - Application returns database connection errors
10:18 UTC - DBA manually promotes db-b
10:22 UTC - Application reconnects
Observed replication lag before failure: 40 seconds
Target:
- Automatic failover within 2 minutes
- No manual promotion during an AZ outage
Question 19mediummultiple choice
Full question →

An application writes to an Amazon Aurora DB cluster. After a planned Aurora failover, the application experiences several minutes of connection errors.

The logs show the application continues connecting to the specific DB instance endpoint that was the primary before the failover.

What change most directly improves resilience during Aurora failovers?

Question 20mediummultiple choice
Full question →

A service processes customer payments from a message queue. Because the queue provides at-least-once delivery, the same payment message can be delivered more than once if the consumer times out before committing its state. Currently, the service sometimes charges the customer twice.

Which design change most directly prevents duplicate charges while still allowing safe retries?

Question 21mediummultiple choice
Read the full DNS explanation →

Your web application is deployed in two AWS Regions (Region A and Region B). You want Route 53 to automatically fail over DNS traffic from Region A to Region B when Region A is unhealthy.

The failover decision must be based on health checks that verify whether the application in Region A is reachable.

Which Route 53 routing configuration best meets these requirements?

Question 22mediummultiple choice
Full question →

Based on the exhibit, the payment worker sometimes processes the same SQS Standard message more than once after a timeout. What change best prevents duplicate charges while keeping the queue architecture?

Exhibit

Amazon SQS queue
  QueueName: payments-standard
  QueueType: Standard
  VisibilityTimeout: 120 seconds
Worker logs
  14:01:10 Received messageId=msg-4412 orderId=4412
  14:03:09 Charged customer card successfully
  14:03:10 Lambda timed out before DeleteMessage completed
  14:03:35 Same message received again and charged a second time
Business requirement
  A payment must never be charged twice even if a message is delivered again.
Question 23hardmultiple choice
Full question →

Based on the exhibit, duplicate payment charges occasionally occur when the worker times out after the charge is submitted but before the message is deleted. What change best prevents duplicate charges while keeping retry behavior?

Exhibit

Amazon SQS worker log:
2026-04-27T14:02:11Z Received messageId=82f3a9 paymentId=78341 receiveCount=1
2026-04-27T14:02:57Z Charged card successfully for paymentId=78341
2026-04-27T14:03:05Z Timeout occurred before DeleteMessage
2026-04-27T14:03:12Z Received messageId=82f3a9 paymentId=78341 receiveCount=2
2026-04-27T14:03:43Z Duplicate charge blocked manually

Queue configuration:
VisibilityTimeout=30 seconds
RedrivePolicy=not configured
Question 24mediummultiple choice
Full question →

A production team accidentally deletes critical rows in an Amazon RDS for PostgreSQL database. The deletion occurred about 6 hours ago. The team wants to recover to a specific point in time with minimal disruption.

Assuming automated backups are enabled, which approach provides the best resilience outcome?

Question 25mediummultiple choice
Full question →

A web application uses pooled JDBC connections to an Amazon Aurora cluster using the writer endpoint. During an Aurora planned failover, monitoring shows a short spike in failed requests. The Aurora cluster writer endpoint remains the same, but many existing pooled connections briefly fail. The application retries aggressively and overloads the new writer during the transition.

Which design change will most improve application resilience during Aurora failovers without requiring application redeployment?

Question 26mediummultiple choice
Full question →

Based on the exhibit, an administrator accidentally deleted data from Amazon RDS for PostgreSQL about 90 minutes ago. Which recovery approach best restores the database to the exact required point in time?

Exhibit

Amazon RDS settings
  DBInstanceIdentifier: prod-orders
  Engine: postgres
  MultiAZ: false
  BackupRetentionPeriod: 7 days
  Latest automated backup: 2026-04-28T08:00:00Z
Incident log
  2026-04-28T10:42:17Z: DELETE FROM orders WHERE order_date < '2026-01-01';
  2026-04-28T10:43:05Z: Mistake discovered
Recovery objective
  Restore to 2026-04-28T10:35:00Z, the last safe point before the deletion.
Question 27hardmultiple choice
Full question →

Based on the exhibit, the current disaster recovery design misses the RTO target even though the database replica is current. Which deployment model best meets the requirements with the least always-on cost?

Exhibit

Disaster recovery test results:
- Requirement: RTO <= 15 minutes, RPO <= 5 minutes
- Primary Region: full application stack running 24/7
- Secondary Region:
  - RDS cross-Region replica current within 2 minutes
  - AMIs copied to secondary Region
  - Auto Scaling group desired=0, min=0, max=6
  - No load balancer or application instances running until failover

Measured failover drill:
- Start application stack in secondary Region: 12 minutes
- Promote database replica: 4 minutes
- Update DNS and propagate: 2 minutes
- Total recovery time: 18 minutes
Question 28mediummultiple choice
Full question →

A payments platform requires disaster recovery across Regions. Requirements: RPO of 15 minutes and RTO of about 1 hour. The business cannot afford full duplicate capacity in both Regions all the time, but the team wants automated readiness so failover is mostly operationally guided rather than a slow rebuild. Which DR strategy is the best fit?

Question 29easymultiple choice
Full question →

Based on the exhibit, a web application must stay available if one Availability Zone fails. What is the best change to improve resilience?

Exhibit

Auto Scaling group configuration:
- Desired capacity: 4
- VPC subnets: subnet-0a11 (us-east-1a) only
- Health check type: ELB

Application Load Balancer configuration:
- Enabled subnets: subnet-0a11 (us-east-1a), subnet-0b22 (us-east-1b)

Incident note:
- A planned test stopped all instances in us-east-1a and the application became unavailable.
Question 30mediummatching
Full question →

Match the disaster recovery strategy to the recovery posture it best fits for a Regional outage.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts
Matches

Lowest cost option where the environment is rebuilt from backups and hours of downtime are acceptable.

Keep only the critical core running in the secondary Region, then scale out after failover.

Run a scaled-down but functional environment in another Region for faster cutover.

Serve production traffic from more than one Region at the same time for the fastest recovery.

Question 31mediummultiple choice
Full question →

A global application experiences frequent writes and must survive a full Regional outage with near-zero data loss. The product team also requires that users can continue to write during the incident using the closest Region. Which approach is most aligned with these requirements?

More Design Resilient Architectures questions available in the full practice test.

Continue Practising →
←

Previous objective

Design Secure Architectures

Next objective

Design High-Performing Architectures

→

All SAA-C03 Objectives

  • 1.Design Secure Architectures30%
  • 2.Design Resilient Architectures26%
  • 3.Design High-Performing Architectures24%
  • 4.Design Cost-Optimized Architectures20%