Courseiva
Knowledge + Practice
CertificationsVendorsCareer RoadmapsLabs & ToolsStudy GuidesGlossaryPractice Questions
C
Courseiva

Free IT certification practice questions with explained answers for CCNA, CompTIA, AWS, Azure, Google Cloud, and more.

Certification Practice Questions

CCNA practice questionsSecurity+ SY0-701 practice questionsAWS SAA-C03 practice questionsAZ-104 practice questionsAZ-900 practice questionsCLF-C02 practice questionsA+ Core 1 practice questionsGoogle Cloud ACE practice questionsCySA+ CS0-003 practice questionsNetwork+ N10-009 practice questions
View all certifications →

Product

CertificationsCertification PathsExam TopicsPractice TestsExam Dumps vs Practice TestsStudy HubComparisons

Company

AboutContactEditorial PolicyQuestion Writing PolicyTrust Center

Legal

Privacy PolicyTerms of Service

Courseiva is a free IT certification practice platform offering original exam-style practice questions, detailed explanations, topic-based practice, mock exams, readiness tracking, and study analytics for Cisco, CompTIA, Microsoft, AWS, and other technology certifications.

© 2026 Courseiva. Courseiva is operated by JTNetSolutions Ltd. All rights reserved.

Courseiva is an independent certification practice platform and is not affiliated with, endorsed by, or sponsored by Cisco, Microsoft, AWS, CompTIA, Google, ISC2, ISACA, or any other certification vendor. Vendor names and certification marks are used only to identify the exams learners are preparing for.

← Design Resilient Architectures practice sets

SAA-C03 Design Resilient Architectures • Complete Question Bank

SAA-C03 Design Resilient Architectures — All Questions With Answers

Complete SAA-C03 Design Resilient Architectures question bank — all 0 questions with answers and detailed explanations.

264
Questions
Free
No signup
Certifications/SAA-C03/Practice Test/Design Resilient Architectures/All Questions
Question 1mediummultiple choice
Read the full Design Resilient Architectures explanation →

An order-processing service consumes messages from an Amazon SQS Standard queue using a custom worker. During traffic spikes, the worker occasionally times out after performing some work but before acknowledging the message, so SQS redelivers it and it may be processed again.

You also observe that a small set of “poison” messages always fail validation.

What change most directly improves resilience by (1) preventing poison messages from retrying indefinitely and (2) avoiding duplicate side effects caused by legitimate retries?

Question 2mediummultiple choice
Read the full Design Resilient Architectures explanation →

Based on the exhibit, the application sees several minutes of connection errors during an Aurora failover. What is the best change to reduce failover impact?

Exhibit

Application configuration
  JDBC URL: jdbc:postgresql://mydb-instance-1.abcdefghijkl.us-east-1.rds.amazonaws.com:5432/app
Aurora event log
  11:15:02 Failover initiated
  11:15:04 Writer moved to a different instance
  11:18:20 Application still reporting connection refused errors
Notes from the team
  The application uses a connection pool and does not re-resolve the endpoint quickly.
Question 3mediummultiple choice
Read the full Design Resilient Architectures explanation →

A payments service receives payment orders by consuming messages from an Amazon SQS Standard queue. The downstream processor occasionally exceeds its processing timeout. As a result, some messages reappear in the queue and may be processed more than once.

The team wants to prevent duplicate side effects (for example, double-charging) and also ensure poison messages do not repeatedly consume processing capacity.

What approach best satisfies both goals?

Question 4mediummultiple choice
Review the full subnetting walkthrough →

A company runs an application behind an Application Load Balancer (ALB). An Auto Scaling group (ASG) is configured with desired capacity 2, but it is attached only to subnets in a single Availability Zone. The ALB is healthy because it is configured across multiple Availability Zones.

When the Availability Zone that contains the ASG subnets experiences an outage, what change most directly improves resilience and allows capacity to be restored automatically?

Question 5hardmultiple choice
Read the full DNS explanation →

Based on the exhibit, DNS still sends traffic to the primary Region even though Route 53 health checks show the primary endpoint is unhealthy. What is the best change to make failover work as intended?

Exhibit

Route 53 record sets for app.example.com:
- Record 1: Type A, RoutingPolicy=Simple, AliasTarget=alb-use1.amazonaws.com
- Record 2: Type A, RoutingPolicy=Simple, AliasTarget=alb-usw2.amazonaws.com

Health check status:
hc-primary: FAILED
hc-secondary: HEALTHY

Resolver test:
$ dig +short app.example.com
alb-use1.amazonaws.com

Ops note:
The intent is to send all traffic to us-east-1 normally and fail over to us-west-2 only when the primary is unhealthy.
Question 6mediummultiple choice
Read the full Design Resilient Architectures explanation →

Based on the exhibit, the web application must remain available even if one Availability Zone fails. What is the best change to improve resilience with the least redesign?

Exhibit

Application Load Balancer
  Subnets: subnet-a1 (us-east-1a), subnet-b1 (us-east-1b)
Auto Scaling group
  VPCZoneIdentifier: subnet-a1 (us-east-1a)
  DesiredCapacity: 2
  MinSize: 2
  MaxSize: 4
CloudWatch
  HealthyHostCount: 2
  HTTPCode_Target_5XX_Count: 0
Troubleshooting note
  A planned test that disabled us-east-1a caused the application to become unreachable.
Question 7mediummultiple choice
Read the full NAT/PAT explanation →

An Auto Scaling group behind an Application Load Balancer frequently replaces new EC2 instances. The application needs ~6 minutes to warm up after instance launch. However, the ALB target group health checks start immediately and mark the targets unhealthy until the application is ready. Because the targets become unhealthy early, the Auto Scaling group then terminates the instances and launches replacements, creating a repeated unhealthy/termination loop.

What configuration change will most directly improve recovery by preventing premature ASG termination while the application is warming up?

Question 8mediummultiple choice
Read the full DNS explanation →

A company runs an internet-facing API in two AWS Regions. Route 53 currently uses simple routing to a primary Application Load Balancer (ALB) DNS name. When the primary Region experiences an outage, customers wait a long time because the DNS entry is not changed automatically.

The team wants automatic failover: if the primary Region ALB health check fails for a sustained period, Route 53 should route users to the secondary Region ALB.

Which Route 53 approach best meets this requirement?

Question 9mediummultiple choice
Read the full Design Resilient Architectures explanation →

A team accidentally updates critical rows in an Amazon RDS for PostgreSQL database. Automated backups are enabled. They need to recover the data to the exact state as of 90 minutes ago.

They also cannot risk interrupting the current production database instance while investigators validate the restored data.

Which recovery strategy best meets these constraints?

Question 10easymultiple choice
Read the full Design Resilient Architectures explanation →

Based on the exhibit, the database must continue serving if the current Availability Zone fails. What should you change?

Exhibit

Amazon RDS for PostgreSQL
DB instance identifier: orders-db
Multi-AZ: false
Automated backups: enabled
Availability Zone: us-east-1b
Publicly accessible: no
Question 11hardmultiple choice
Read the full Design Resilient Architectures explanation →

Based on the exhibit, the application tier is not replacing unhealthy instances even though the Auto Scaling group spans two Availability Zones. What change most directly improves automatic recovery when the application process fails?

Network Topology
$ aws autoscaling describe-auto-scaling-groupsauto-scaling-group-names orders-asg$ aws elbv2 describe-target-healthtarget-group-arn arn:aws:elasticloadbalancing:us-east-1:111122223333:targetgroup/orders-tg/abcd1234"AutoScalingGroups": ["AutoScalingGroupName": "orders-asg","DesiredCapacity": 4,"MinSize": 4,"MaxSize": 8,"AvailabilityZones": ["us-east-1a", "us-east-1b"],"HealthCheckType": "EC2","HealthCheckGracePeriod": 300,"TargetGroupARNs": ["arn:aws:elasticloadbalancing:us-east-1:111122223333:targetgroup/orders-tg/abcd1234"]TARGETSi-01e2a3b4: healthyi-02e3b4c5: healthyi-03f4c5d6: unhealthyi-04a5d6e7: unhealthyApplication health endpoint:2026-04-27T13:05:22Z GET /health -> 500EC2 status checks: passing
Question 12hardmultiple choice
Read the full Design Resilient Architectures explanation →

Based on the exhibit, the team must restore an Amazon RDS for PostgreSQL database to the exact state just before a bad delete happened. What is the best recovery approach?

Exhibit

RDS backup status:
- Automated backups: Enabled
- Backup retention period: 14 days
- Latest automated snapshot: 2026-04-27 09:00 UTC
- Latest restorable time: 2026-04-27 15:14 UTC

Incident timeline:
- 2026-04-27 15:11 UTC: deployment script accidentally deleted critical rows
- 2026-04-27 15:12 UTC: application detected missing data
- Required restore point: 2026-04-27 15:10 UTC

Operations note:
The business wants to recover to a new database first, verify data, and then cut over the application.
Question 13mediummultiple choice
Read the full DNS explanation →

Based on the exhibit, the company wants DNS traffic to fail over automatically from the primary Region to a secondary Region when the primary endpoint is unhealthy. Which Route 53 change is best?

Exhibit

Route 53 record set
  Name: app.example.com
  Type: A (Alias)
  Routing policy: Simple
  Alias target: alb-primary-123.us-east-1.elb.amazonaws.com
  TTL: 60 seconds
Health check
  ID: hc-44
  Status: Inactive
Secondary environment
  ALB target exists in us-west-2: alb-secondary-456.us-west-2.elb.amazonaws.com
Operational note
  A Region outage should shift users to the secondary ALB without manual DNS changes.
Question 14hardmultiple choice
Read the full Design Resilient Architectures explanation →

Based on the exhibit, downstream payment timeouts cause EventBridge deliveries to back up and some events are retried until they age out. What change best improves resilience and preserves events during downstream outages?

Exhibit

Amazon EventBridge rule:
- source: orders.checkout
- target: Lambda function process-orders
- retry policy: default

CloudWatch metrics:
- Invocations: 120/min
- Throttles: 87/min
- ApproximateAgeOfOldestEvent: 900 seconds

Lambda log excerpt:
2026-04-27T18:22:41Z payment API timeout
2026-04-27T18:22:44Z retry attempt 3 failed
2026-04-27T18:22:48Z processing orderId=90118 paused

Business requirement:
No events should be lost during a temporary payment API outage, and the system must absorb bursts instead of failing immediately.
Question 15mediummultiple choice
Read the full Design Resilient Architectures explanation →

A SaaS platform plans to run in two AWS Regions for lower latency. The team wants to enable active-active writes (both regions accept updates) to avoid failover downtime. However, the business requires strong consistency for order status transitions (for example, only one transition from “Paid” to “Shipped” must be allowed).

Which statement is the best architectural choice to meet the consistency requirement?

Question 16easymultiple choice
Read the full Design Resilient Architectures explanation →

Based on the exhibit, the web tier becomes unavailable if us-west-2a has an outage. What is the best change to improve resilience with the least redesign?

Exhibit

Auto Scaling group: web-asg
Attached subnets: subnet-1111 (us-west-2a)
Load balancer subnets: subnet-1111 (us-west-2a)
Desired capacity: 2
Health check type: ELB
Question 17hardmultiple choice
Read the full Design Resilient Architectures explanation →

Based on the exhibit, the database is manually promoted during an Availability Zone failure and the application outage lasts longer than the target. What change best improves resilience with the least operational intervention?

Exhibit

Current topology:
app -> Amazon RDS for PostgreSQL primary db-a in us-east-1a
app -> Amazon RDS read replica db-b in us-east-1b

Incident report:
10:14 UTC - Primary AZ impaired
10:15 UTC - Application returns database connection errors
10:18 UTC - DBA manually promotes db-b
10:22 UTC - Application reconnects
Observed replication lag before failure: 40 seconds
Target:
- Automatic failover within 2 minutes
- No manual promotion during an AZ outage
Question 18mediummultiple choice
Read the full Design Resilient Architectures explanation →

An application writes to an Amazon Aurora DB cluster. After a planned Aurora failover, the application experiences several minutes of connection errors.

The logs show the application continues connecting to the specific DB instance endpoint that was the primary before the failover.

What change most directly improves resilience during Aurora failovers?

Question 19mediummultiple choice
Read the full Design Resilient Architectures explanation →

A service processes customer payments from a message queue. Because the queue provides at-least-once delivery, the same payment message can be delivered more than once if the consumer times out before committing its state. Currently, the service sometimes charges the customer twice.

Which design change most directly prevents duplicate charges while still allowing safe retries?

Question 20mediummultiple choice
Read the full DNS explanation →

Your web application is deployed in two AWS Regions (Region A and Region B). You want Route 53 to automatically fail over DNS traffic from Region A to Region B when Region A is unhealthy.

The failover decision must be based on health checks that verify whether the application in Region A is reachable.

Which Route 53 routing configuration best meets these requirements?

Question 21mediummultiple choice
Read the full Design Resilient Architectures explanation →

Based on the exhibit, the payment worker sometimes processes the same SQS Standard message more than once after a timeout. What change best prevents duplicate charges while keeping the queue architecture?

Exhibit

Amazon SQS queue
  QueueName: payments-standard
  QueueType: Standard
  VisibilityTimeout: 120 seconds
Worker logs
  14:01:10 Received messageId=msg-4412 orderId=4412
  14:03:09 Charged customer card successfully
  14:03:10 Lambda timed out before DeleteMessage completed
  14:03:35 Same message received again and charged a second time
Business requirement
  A payment must never be charged twice even if a message is delivered again.
Question 22hardmultiple choice
Read the full Design Resilient Architectures explanation →

Based on the exhibit, duplicate payment charges occasionally occur when the worker times out after the charge is submitted but before the message is deleted. What change best prevents duplicate charges while keeping retry behavior?

Exhibit

Amazon SQS worker log:
2026-04-27T14:02:11Z Received messageId=82f3a9 paymentId=78341 receiveCount=1
2026-04-27T14:02:57Z Charged card successfully for paymentId=78341
2026-04-27T14:03:05Z Timeout occurred before DeleteMessage
2026-04-27T14:03:12Z Received messageId=82f3a9 paymentId=78341 receiveCount=2
2026-04-27T14:03:43Z Duplicate charge blocked manually

Queue configuration:
VisibilityTimeout=30 seconds
RedrivePolicy=not configured
Question 23mediummultiple choice
Read the full Design Resilient Architectures explanation →

A production team accidentally deletes critical rows in an Amazon RDS for PostgreSQL database. The deletion occurred about 6 hours ago. The team wants to recover to a specific point in time with minimal disruption.

Assuming automated backups are enabled, which approach provides the best resilience outcome?

Question 24mediummultiple choice
Read the full Design Resilient Architectures explanation →

A web application uses pooled JDBC connections to an Amazon Aurora cluster using the writer endpoint. During an Aurora planned failover, monitoring shows a short spike in failed requests. The Aurora cluster writer endpoint remains the same, but many existing pooled connections briefly fail. The application retries aggressively and overloads the new writer during the transition.

Which design change will most improve application resilience during Aurora failovers without requiring application redeployment?

Question 25mediummultiple choice
Read the full Design Resilient Architectures explanation →

Based on the exhibit, an administrator accidentally deleted data from Amazon RDS for PostgreSQL about 90 minutes ago. Which recovery approach best restores the database to the exact required point in time?

Exhibit

Amazon RDS settings
  DBInstanceIdentifier: prod-orders
  Engine: postgres
  MultiAZ: false
  BackupRetentionPeriod: 7 days
  Latest automated backup: 2026-04-28T08:00:00Z
Incident log
  2026-04-28T10:42:17Z: DELETE FROM orders WHERE order_date < '2026-01-01';
  2026-04-28T10:43:05Z: Mistake discovered
Recovery objective
  Restore to 2026-04-28T10:35:00Z, the last safe point before the deletion.
Question 26hardmultiple choice
Read the full Design Resilient Architectures explanation →

Based on the exhibit, the current disaster recovery design misses the RTO target even though the database replica is current. Which deployment model best meets the requirements with the least always-on cost?

Exhibit

Disaster recovery test results:
- Requirement: RTO <= 15 minutes, RPO <= 5 minutes
- Primary Region: full application stack running 24/7
- Secondary Region:
  - RDS cross-Region replica current within 2 minutes
  - AMIs copied to secondary Region
  - Auto Scaling group desired=0, min=0, max=6
  - No load balancer or application instances running until failover

Measured failover drill:
- Start application stack in secondary Region: 12 minutes
- Promote database replica: 4 minutes
- Update DNS and propagate: 2 minutes
- Total recovery time: 18 minutes
Question 27mediummultiple choice
Read the full Design Resilient Architectures explanation →

A payments platform requires disaster recovery across Regions. Requirements: RPO of 15 minutes and RTO of about 1 hour. The business cannot afford full duplicate capacity in both Regions all the time, but the team wants automated readiness so failover is mostly operationally guided rather than a slow rebuild. Which DR strategy is the best fit?

Question 28easymultiple choice
Read the full Design Resilient Architectures explanation →

Based on the exhibit, a web application must stay available if one Availability Zone fails. What is the best change to improve resilience?

Exhibit

Auto Scaling group configuration:
- Desired capacity: 4
- VPC subnets: subnet-0a11 (us-east-1a) only
- Health check type: ELB

Application Load Balancer configuration:
- Enabled subnets: subnet-0a11 (us-east-1a), subnet-0b22 (us-east-1b)

Incident note:
- A planned test stopped all instances in us-east-1a and the application became unavailable.
Question 29mediummatching
Read the full Design Resilient Architectures explanation →

Match the disaster recovery strategy to the recovery posture it best fits for a Regional outage.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts
Matches

Lowest cost option where the environment is rebuilt from backups and hours of downtime are acceptable.

Keep only the critical core running in the secondary Region, then scale out after failover.

Run a scaled-down but functional environment in another Region for faster cutover.

Serve production traffic from more than one Region at the same time for the fastest recovery.

Question 30mediummultiple choice
Read the full Design Resilient Architectures explanation →

A global application experiences frequent writes and must survive a full Regional outage with near-zero data loss. The product team also requires that users can continue to write during the incident using the closest Region. Which approach is most aligned with these requirements?

Question 31mediummatching
Read the full Design Resilient Architectures explanation →

A team wants a web application to keep serving traffic if one Availability Zone fails. Match each architecture element to the resilience behavior it provides.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts
Matches

Stop sending requests to unhealthy targets and keep only healthy instances in rotation.

Launch replacement instances in healthy AZs when capacity is lost.

Maintain a synchronous standby in another AZ and fail over automatically.

Allow instances to be replaced without losing user sessions that are stored elsewhere.

Question 32easymultiple choice
Read the full Design Resilient Architectures explanation →

Based on the exhibit, the database must fail over automatically if the primary Availability Zone goes down. Which solution should the architect choose?

Exhibit

Amazon RDS configuration:
- Engine: MySQL
- Deployment: Single-AZ
- Backup retention: 7 days
- Application connection string: db-prod.cluster-abcdefghijkl.us-east-1.rds.amazonaws.com

Operations note:
- During maintenance, the database endpoint stayed reachable only after a manual restore from snapshot.
Question 33mediummultiple choice
Review the full subnetting walkthrough →

A company uses Amazon RDS for a PostgreSQL database powering a customer-facing application. The application’s availability depends on fast database failover with minimal manual intervention. The RDS instance currently runs as a single-AZ deployment in one DB subnet group. Which change most directly meets the goal?

Question 34mediummultiple choice
Review the full subnetting walkthrough →

A stateless web API runs on EC2 instances behind an Application Load Balancer (ALB). The Auto Scaling group (ASG) currently uses subnets from only one Availability Zone, even though the ALB spans two Availability Zones. During maintenance of that single AZ, the ALB remains up but clients see timeouts because there are no healthy targets. Which change most directly improves resilience against an AZ failure?

Question 35mediummultiple choice
Read the full Design Resilient Architectures explanation →

A caching layer uses Amazon ElastiCache for Redis in front of a stateless web service. The service must continue to read cached responses during maintenance events and should automatically fail over to another node if one AZ becomes impaired. Which design change best satisfies this requirement?

Question 36mediummultiple choice
Read the full Design Resilient Architectures explanation →

A company runs a stateful analytics workload on EC2 instances that use EBS volumes. The data must be restorable in another Region after a major outage, with frequent point-in-time recovery. Which approach provides the most suitable replication mechanism for the EBS-backed data?

Question 37mediummultiple choice
Read the full Design Resilient Architectures explanation →

An order processing workflow uses Amazon SQS as the decoupling layer between a producer and a consumer Lambda function. The consumer intermittently fails due to a downstream dependency. The team has observed that certain “poison” messages keep being retried repeatedly and prevent other messages from being processed efficiently. Which SQS configuration most directly addresses this issue?

Question 38mediummultiple choice
Read the full Design Resilient Architectures explanation →

A media company stores original uploads in an S3 bucket. They must recover from accidental overwrites/deletes and also recover quickly from a full Region outage. The required RPO is about 1 hour. Which configuration best meets these requirements?

Question 39mediummultiple choice
Review the full subnetting walkthrough →

An ECS service runs on EC2 instances and is fronted by an ALB. The ALB spans two Availability Zones, and the ECS service desired count is 2 tasks. The underlying EC2 capacity uses an Auto Scaling group (ASG) with min size set to 1, and the ASG also spans only one subnet in practice. What is the most effective change to meet the requirement that the service continues during a single-AZ instance loss?

Question 40mediummultiple choice
Read the full Design Resilient Architectures explanation →

Your order-processing system uses EventBridge rules to send events to a Lambda function that updates order status. Over the last week, some events fail with a transient database timeout, and the Lambda retries intermittently but then the events are lost (no alerts after failures). You want at-least-once processing, bounded retries, and a way to inspect unprocessable events for later reprocessing.

Which architecture change best meets these requirements?

Question 41mediummulti select
Read the full Design Resilient Architectures explanation →

A retail API runs on Amazon EC2 instances behind an Application Load Balancer and stores orders in an Amazon RDS for PostgreSQL database. A test that stopped one Availability Zone caused the API to return errors because all application servers were in the same AZ and the database was single-AZ. Which two changes should the architect make to continue serving traffic during a single-AZ failure? Select two.

Question 42easymultiple choice
Read the full Design Resilient Architectures explanation →

An engineering team deploys a stateless web API on EC2 using an Auto Scaling group and an Application Load Balancer (ALB). During a recent test, they noticed that when one Availability Zone was unavailable, traffic failed until new instances were manually launched. Which change most directly improves automatic failover for the compute layer within a single Region?

Question 43mediummulti select
Read the full NAT/PAT explanation →

A customer portal must recover from a regional outage within a few hours. The business wants lower ongoing cost than a fully active second Region and does not want to rebuild everything from scratch during the outage. Which two DR patterns best fit that goal? Select two.

Question 44mediummultiple choice
Read the full Design Resilient Architectures explanation →

Your media processing pipeline writes original uploads to an S3 bucket and later generates derivative files. An operator accidentally deletes a subset of original uploads in production. You need to (1) restore the deleted objects with minimal data loss and (2) protect against both regional disasters and future operator mistakes. The company requires recovery even if objects are deleted and later overwritten.

What is the most effective change to meet these requirements?

Question 45easymultiple choice
Read the full Design Resilient Architectures explanation →

A company runs its customer-facing web app on EC2 behind an Application Load Balancer. The database is Amazon RDS for PostgreSQL. The requirement is that if a single Availability Zone fails, the database must automatically fail over within the same AWS Region with minimal application changes. Which database setup best meets this requirement?

Question 46mediummulti select
Read the full Design Resilient Architectures explanation →

A media company stores daily financial exports in Amazon S3. The files must be protected against accidental overwrite or deletion, and the business also wants a second copy in another Region for recovery after a regional outage. Which two actions should the architect take? Select two.

Question 47mediummulti select
Read the full Design Resilient Architectures explanation →

A serverless order-ingestion API writes directly to a database. During traffic spikes, the database occasionally throttles, Lambda retries create duplicate order records, and some requests time out. Which two changes best improve buffering and safe retry behavior? Select two.

Question 48easymultiple choice
Read the full Design Resilient Architectures explanation →

An order system receives events and uses a Lambda function to write each order into a database. During traffic spikes, the database sometimes throttles, and Lambda retries lead to occasional message loss in the event flow. The team wants buffering, automatic retries, and a way to isolate messages that repeatedly fail so they can be inspected later. What design change best meets this need?

Question 49mediummultiple choice
Read the full DNS explanation →

A SaaS platform serves an API using two regional deployments: us-east-1 (primary) and us-west-2 (secondary). Each region has its own ALB. The business requires automated DNS-based failover when the primary region becomes unhealthy, and they do not want manual DNS changes during incidents.

Which Route 53 configuration is the best match?

Question 50mediummultiple choice
Read the full Design Resilient Architectures explanation →

A fintech startup uses AWS to run a web API and a PostgreSQL database. They must meet an RPO of 15 minutes and an RTO of 2 hours for a Region-wide disaster. Budget allows running a small, always-on set of infrastructure in a secondary Region, but not full production capacity. The team wants a DR approach that is regularly testable without large manual effort.

Which disaster recovery strategy is the best fit?

Question 51mediummulti select
Read the full DNS explanation →

A SaaS application is deployed in us-east-1 and us-west-2 behind separate ALBs. The business wants DNS to send new clients to the primary Region when it is healthy and automatically fail over to the secondary Region when the primary endpoint is unhealthy. Which two Route 53 settings are required? Select two.

Question 52mediummultiple choice
Review the full subnetting walkthrough →

Your ecommerce app runs behind an Application Load Balancer (ALB) and uses an RDS database for orders. During an AZ impairment in us-east-1, customers report that checkout takes several minutes to recover. The current design places EC2 instances only in private subnets of AZ-a, while the ALB spans multiple subnets. The RDS DB instance is Multi-AZ. Management wants automatic recovery within the same Region.

Which change best addresses the issue with minimal operational overhead?

Question 53easymultiple choice
Read the full Design Resilient Architectures explanation →

A team uses an S3 bucket to store important customer-generated exports. They need protection against accidental overwrites and also want copies of the data in another AWS Region for disaster recovery. Which S3 configuration best satisfies both requirements?

Question 54mediummultiple choice
Read the full Design Resilient Architectures explanation →

A company runs a customer portal on an Amazon Aurora PostgreSQL cluster. The application currently connects directly to the writer instance endpoint and keeps long-lived connections open. During a maintenance failover, writes fail until clients are restarted. The team wants the application to reconnect to the correct Aurora endpoint automatically and reduce user-visible write interruptions.

Which change is most likely to achieve this?

Question 55easymultiple choice
Review the full routing breakdown →

A company runs the same public API in two regions (Region A and Region B), each fronted by an ALB. They want Route 53 to automatically route clients to the Region B API when Region A becomes unhealthy, with minimal configuration effort. Which Route 53 approach should they use?

Question 56easymultiple choice
Read the full Design Resilient Architectures explanation →

A retail platform needs disaster recovery across AWS Regions. The business requirement is: RTO up to 6 hours, RPO up to 1 hour, and they want the ability to start serving quickly during a Region outage but do not want to run full production capacity continuously. Which DR strategy best fits these requirements?

Question 57easymultiple choice
Read the full Design Resilient Architectures explanation →

A team uses an S3 bucket to store important customer-generated exports. They need protection against accidental overwrites and also want copies of the data in another AWS Region for disaster recovery. Which S3 configuration best satisfies both requirements?

Question 58mediummultiple choice
Read the full Design Resilient Architectures explanation →

A fintech startup uses AWS to run a web API and a PostgreSQL database. They must meet an RPO of 15 minutes and an RTO of 2 hours for a Region-wide disaster. Budget allows running a small, always-on set of infrastructure in a secondary Region, but not full production capacity. The team wants a DR approach that is regularly testable without large manual effort.

Which disaster recovery strategy is the best fit?

Question 59mediummulti select
Read the full Design Resilient Architectures explanation →

A media company stores daily financial exports in Amazon S3. The files must be protected against accidental overwrite or deletion, and the business also wants a second copy in another Region for recovery after a regional outage. Which two actions should the architect take? Select two.

Question 60mediummultiple choice
Read the full DNS explanation →

A SaaS platform serves an API using two regional deployments: us-east-1 (primary) and us-west-2 (secondary). Each region has its own ALB. The business requires automated DNS-based failover when the primary region becomes unhealthy, and they do not want manual DNS changes during incidents.

Which Route 53 configuration is the best match?

Question 61mediummultiple choice
Read the full Design Resilient Architectures explanation →

A company runs a customer portal on an Amazon Aurora PostgreSQL cluster. The application currently connects directly to the writer instance endpoint and keeps long-lived connections open. During a maintenance failover, writes fail until clients are restarted. The team wants the application to reconnect to the correct Aurora endpoint automatically and reduce user-visible write interruptions.

Which change is most likely to achieve this?

Question 62mediummulti select
Read the full NAT/PAT explanation →

A customer portal must recover from a regional outage within a few hours. The business wants lower ongoing cost than a fully active second Region and does not want to rebuild everything from scratch during the outage. Which two DR patterns best fit that goal? Select two.

Question 63easymultiple choice
Read the full Design Resilient Architectures explanation →

An engineering team deploys a stateless web API on EC2 using an Auto Scaling group and an Application Load Balancer (ALB). During a recent test, they noticed that when one Availability Zone was unavailable, traffic failed until new instances were manually launched. Which change most directly improves automatic failover for the compute layer within a single Region?

Question 64mediummulti select
Read the full DNS explanation →

A SaaS application is deployed in us-east-1 and us-west-2 behind separate ALBs. The business wants DNS to send new clients to the primary Region when it is healthy and automatically fail over to the secondary Region when the primary endpoint is unhealthy. Which two Route 53 settings are required? Select two.

Question 65easymultiple choice
Read the full Design Resilient Architectures explanation →

A retail platform needs disaster recovery across AWS Regions. The business requirement is: RTO up to 6 hours, RPO up to 1 hour, and they want the ability to start serving quickly during a Region outage but do not want to run full production capacity continuously. Which DR strategy best fits these requirements?

Question 66mediummultiple choice
Read the full Design Resilient Architectures explanation →

Your order-processing system uses EventBridge rules to send events to a Lambda function that updates order status. Over the last week, some events fail with a transient database timeout, and the Lambda retries intermittently but then the events are lost (no alerts after failures). You want at-least-once processing, bounded retries, and a way to inspect unprocessable events for later reprocessing.

Which architecture change best meets these requirements?

Question 67easymultiple choice
Review the full routing breakdown →

A company runs the same public API in two regions (Region A and Region B), each fronted by an ALB. They want Route 53 to automatically route clients to the Region B API when Region A becomes unhealthy, with minimal configuration effort. Which Route 53 approach should they use?

Question 68easymultiple choice
Read the full Design Resilient Architectures explanation →

An order system receives events and uses a Lambda function to write each order into a database. During traffic spikes, the database sometimes throttles, and Lambda retries lead to occasional message loss in the event flow. The team wants buffering, automatic retries, and a way to isolate messages that repeatedly fail so they can be inspected later. What design change best meets this need?

Question 69easymultiple choice
Read the full Design Resilient Architectures explanation →

A company runs its customer-facing web app on EC2 behind an Application Load Balancer. The database is Amazon RDS for PostgreSQL. The requirement is that if a single Availability Zone fails, the database must automatically fail over within the same AWS Region with minimal application changes. Which database setup best meets this requirement?

Question 70mediummulti select
Read the full Design Resilient Architectures explanation →

A retail API runs on Amazon EC2 instances behind an Application Load Balancer and stores orders in an Amazon RDS for PostgreSQL database. A test that stopped one Availability Zone caused the API to return errors because all application servers were in the same AZ and the database was single-AZ. Which two changes should the architect make to continue serving traffic during a single-AZ failure? Select two.

Question 71mediummulti select
Read the full Design Resilient Architectures explanation →

A serverless order-ingestion API writes directly to a database. During traffic spikes, the database occasionally throttles, Lambda retries create duplicate order records, and some requests time out. Which two changes best improve buffering and safe retry behavior? Select two.

Question 72mediummultiple choice
Review the full subnetting walkthrough →

Your ecommerce app runs behind an Application Load Balancer (ALB) and uses an RDS database for orders. During an AZ impairment in us-east-1, customers report that checkout takes several minutes to recover. The current design places EC2 instances only in private subnets of AZ-a, while the ALB spans multiple subnets. The RDS DB instance is Multi-AZ. Management wants automatic recovery within the same Region.

Which change best addresses the issue with minimal operational overhead?

Question 73mediummultiple choice
Read the full Design Resilient Architectures explanation →

Your media processing pipeline writes original uploads to an S3 bucket and later generates derivative files. An operator accidentally deletes a subset of original uploads in production. You need to (1) restore the deleted objects with minimal data loss and (2) protect against both regional disasters and future operator mistakes. The company requires recovery even if objects are deleted and later overwritten.

What is the most effective change to meet these requirements?

Question 74easymultiple choice
Read the full Design Resilient Architectures explanation →

Based on the exhibit, a web application must stay available if one Availability Zone fails. What is the best change to improve resilience?

Exhibit

Auto Scaling group configuration:
- Desired capacity: 4
- VPC subnets: subnet-0a11 (us-east-1a) only
- Health check type: ELB

Application Load Balancer configuration:
- Enabled subnets: subnet-0a11 (us-east-1a), subnet-0b22 (us-east-1b)

Incident note:
- A planned test stopped all instances in us-east-1a and the application became unavailable.
Question 75mediummatching
Read the full Design Resilient Architectures explanation →

Match the disaster recovery strategy to the recovery posture it best fits for a Regional outage.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts
Matches

Lowest cost option where the environment is rebuilt from backups and hours of downtime are acceptable.

Keep only the critical core running in the secondary Region, then scale out after failover.

Run a scaled-down but functional environment in another Region for faster cutover.

Serve production traffic from more than one Region at the same time for the fastest recovery.

Question 76mediummultiple choice
Read the full Design Resilient Architectures explanation →

A global application experiences frequent writes and must survive a full Regional outage with near-zero data loss. The product team also requires that users can continue to write during the incident using the closest Region. Which approach is most aligned with these requirements?

Question 77easymultiple choice
Read the full Design Resilient Architectures explanation →

Based on the exhibit, the database must fail over automatically if the primary Availability Zone goes down. Which solution should the architect choose?

Exhibit

Amazon RDS configuration:
- Engine: MySQL
- Deployment: Single-AZ
- Backup retention: 7 days
- Application connection string: db-prod.cluster-abcdefghijkl.us-east-1.rds.amazonaws.com

Operations note:
- During maintenance, the database endpoint stayed reachable only after a manual restore from snapshot.
Question 78mediummultiple choice
Read the full Design Resilient Architectures explanation →

A payments platform requires disaster recovery across Regions. Requirements: RPO of 15 minutes and RTO of about 1 hour. The business cannot afford full duplicate capacity in both Regions all the time, but the team wants automated readiness so failover is mostly operationally guided rather than a slow rebuild. Which DR strategy is the best fit?

Question 79mediummultiple choice
Review the full subnetting walkthrough →

A company uses Amazon RDS for a PostgreSQL database powering a customer-facing application. The application’s availability depends on fast database failover with minimal manual intervention. The RDS instance currently runs as a single-AZ deployment in one DB subnet group. Which change most directly meets the goal?

Question 80mediummultiple choice
Review the full subnetting walkthrough →

An ECS service runs on EC2 instances and is fronted by an ALB. The ALB spans two Availability Zones, and the ECS service desired count is 2 tasks. The underlying EC2 capacity uses an Auto Scaling group (ASG) with min size set to 1, and the ASG also spans only one subnet in practice. What is the most effective change to meet the requirement that the service continues during a single-AZ instance loss?

Question 81mediummultiple choice
Read the full Design Resilient Architectures explanation →

An order processing workflow uses Amazon SQS as the decoupling layer between a producer and a consumer Lambda function. The consumer intermittently fails due to a downstream dependency. The team has observed that certain “poison” messages keep being retried repeatedly and prevent other messages from being processed efficiently. Which SQS configuration most directly addresses this issue?

Question 82mediummultiple choice
Read the full Design Resilient Architectures explanation →

A company runs a stateful analytics workload on EC2 instances that use EBS volumes. The data must be restorable in another Region after a major outage, with frequent point-in-time recovery. Which approach provides the most suitable replication mechanism for the EBS-backed data?

Question 83mediummultiple choice
Read the full Design Resilient Architectures explanation →

A caching layer uses Amazon ElastiCache for Redis in front of a stateless web service. The service must continue to read cached responses during maintenance events and should automatically fail over to another node if one AZ becomes impaired. Which design change best satisfies this requirement?

Question 84mediummultiple choice
Read the full Design Resilient Architectures explanation →

A media company stores original uploads in an S3 bucket. They must recover from accidental overwrites/deletes and also recover quickly from a full Region outage. The required RPO is about 1 hour. Which configuration best meets these requirements?

Question 85mediummatching
Read the full Design Resilient Architectures explanation →

A team wants a web application to keep serving traffic if one Availability Zone fails. Match each architecture element to the resilience behavior it provides.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts
Matches

Stop sending requests to unhealthy targets and keep only healthy instances in rotation.

Launch replacement instances in healthy AZs when capacity is lost.

Maintain a synchronous standby in another AZ and fail over automatically.

Allow instances to be replaced without losing user sessions that are stored elsewhere.

Question 86mediummultiple choice
Review the full subnetting walkthrough →

A stateless web API runs on EC2 instances behind an Application Load Balancer (ALB). The Auto Scaling group (ASG) currently uses subnets from only one Availability Zone, even though the ALB spans two Availability Zones. During maintenance of that single AZ, the ALB remains up but clients see timeouts because there are no healthy targets. Which change most directly improves resilience against an AZ failure?

Question 87mediummultiple choice
Review the full subnetting walkthrough →

A web app runs on an EC2 Auto Scaling group behind an Application Load Balancer (ALB). The ALB is configured with health checks and the ASG spans three subnets in three Availability Zones. During an AZ outage, monitoring shows the number of healthy instances drops sharply and never returns to the original capacity until the ASG is manually adjusted. What change most directly improves resilience so capacity returns automatically during an AZ failure?

Question 88mediummultiple choice
Review the full routing breakdown →

Your public API is hosted in two regions. You want Route 53 to automatically send traffic to the secondary region when the primary region’s endpoint fails. The primary API health check is returning failure codes, but clients still reach the primary region for several minutes. Which Route 53 configuration most directly addresses this behavior?

Question 89mediummultiple choice
Read the full Design Resilient Architectures explanation →

An orders service publishes payment instructions to an Amazon SQS queue. After occasional processing timeouts, the downstream consumer sometimes processes the same instruction twice, resulting in duplicate payment attempts. The team currently uses an SQS Standard queue with a visibility timeout of 2 minutes and relies on the consumer to finish before the timeout expires. What approach best improves resilience against duplicate processing?

Question 90mediummultiple choice
Read the full Design Resilient Architectures explanation →

A developer accidentally deletes important rows in an RDS database. The mistake is discovered 45 minutes later. The database has automated backups enabled with a retention period of 7 days. What is the best way to restore the database to a point just before the deletion?

Question 91mediummultiple choice
Review the full subnetting walkthrough →

A web application runs on an Auto Scaling group (ASG) behind an Application Load Balancer (ALB). The ASG is currently attached to subnets in only two Availability Zones (AZs). During a planned maintenance window, one AZ becomes unavailable for about 25 minutes. Monitoring shows that targets in the remaining AZ go healthy, and the ALB/target group health checks report normal. However, users still experience intermittent connection failures and slower responses during the AZ outage. What change will most directly improve resilience against an AZ loss while keeping the same ALB-based design?

Question 92mediummultiple choice
Read the full Design Resilient Architectures explanation →

An application uses an Amazon Aurora DB cluster. The cluster performs an automatic failover from the writer instance to a standby instance. After failover completes, reads succeed, but all new writes fail with errors indicating the application is connecting to the old writer endpoint. Which change best fixes the resiliency issue after failover?

Question 93mediummultiple choice
Read the full DNS explanation →

A company hosts a public API using two AWS regions behind a single custom domain. Route 53 is configured with latency-based routing and health checks. During a regional outage, application metrics confirm the primary API is unhealthy, but clients still resolve to the primary region for most requests. Which DNS configuration change will most directly ensure automatic failover to the secondary region when the primary fails?

Question 94mediummultiple choice
Read the full Design Resilient Architectures explanation →

An orders service publishes payment instructions to an Amazon SQS queue. The downstream consumer sometimes times out while processing a message. After the message becomes visible again, the consumer may process the same instruction more than once and occasionally creates duplicate orders. The team needs a resiliency-focused design that prevents duplicates from creating double-charges, even if the same message is processed multiple times. What is the best architectural change?

Question 95mediummultiple choice
Read the full Design Resilient Architectures explanation →

A web application runs on an Auto Scaling group (ASG) behind an Application Load Balancer (ALB). After a new release, instances begin failing ALB health checks with errors like 502 while the application is still starting up. CloudWatch shows that the ASG replaces the instances before they finish initializing, so traffic never reaches healthy targets. Which change most directly prevents premature replacement during startup so traffic can resume as soon as the instances are actually healthy?

Question 96mediummultiple choice
Read the full Design Resilient Architectures explanation →

A company uses an Amazon Aurora DB cluster in a Multi-AZ configuration. During a planned failover of the writer instance, the database endpoints in the application are updated incorrectly. After failover, reads work but writes fail with connection errors and timeouts for several minutes. The team currently uses the instance endpoint for the writer. What should they change to improve write resilience during failovers?

Question 97mediummultiple choice
Review the full routing breakdown →

A public API is deployed in two AWS Regions: us-east-1 (primary) and us-west-2 (secondary). The team wants Route 53 to automatically route users to the secondary region if the primary API becomes unhealthy. They will use Route 53 health checks that monitor the API’s /status endpoint over HTTPS. Which Route 53 configuration most directly implements this failover behavior?

Question 98mediummultiple choice
Read the full Design Resilient Architectures explanation →

An orders service publishes payment instructions to an Amazon SQS Standard queue. A downstream consumer sometimes times out and retries the work, causing the consumer to process the same instruction more than once. Operationally, the team must ensure that duplicate processing does not create duplicate charges. The queue type cannot be changed. What is the most resilient application-side approach?

Question 99mediummultiple choice
Read the full Design Resilient Architectures explanation →

A service consumes messages from an SQS queue. Recently, a new message format started failing validation in the consumer. The consumer catches the exception but cannot successfully process those messages without code changes. The team wants failed messages to be isolated for later investigation instead of being retried indefinitely. What should they configure?

Question 100mediummultiple choice
Read the full Design Resilient Architectures explanation →

A web application runs on an Auto Scaling group (ASG) behind an Application Load Balancer (ALB). The ASG uses the ALB target group health checks to decide when instances are healthy (for example, by using the ELB/target-group health check integration). During a deployment, the ASG performs instance replacement. Shortly after the deployment starts and while new instances are still bootstrapping, CloudWatch shows the ALB target group briefly has zero healthy targets, and users intermittently receive 502 responses. Which ASG deployment configuration best reduces the chance that there will be a period with zero healthy ALB targets, while still keeping failover behavior resilient?

Question 101mediummultiple choice
Review the full routing breakdown →

You host a public API using Amazon API Gateway in two AWS Regions: us-east-1 (primary) and us-west-2 (secondary). You want Route 53 to send client traffic to the secondary region only when the primary API is unhealthy. Which Route 53 setup best meets this requirement?

Question 102mediummultiple choice
Read the full Design Resilient Architectures explanation →

An orders service publishes payment instructions to an Amazon SQS Standard queue. A downstream consumer sometimes times out or crashes after it has partially completed processing, causing the same instruction to be processed more than once. You must keep the design resilient without attempting to guarantee exactly-once processing. Which approach best handles duplicates safely?

Question 103mediummultiple choice
Read the full Design Resilient Architectures explanation →

A Multi-AZ Amazon RDS database experiences incorrect writes at 10:15 UTC due to a buggy release. The team detects the problem at 10:25 UTC. They want to restore the data to a known-good point around 10:15 UTC, and validate the recovered data, without taking the current production instance offline during the recovery process. What is the most appropriate AWS action?

Question 104mediummultiple choice
Read the full Design Resilient Architectures explanation →

An events service publishes critical notifications using Amazon SNS. Three independent downstream systems (A, B, and C) subscribe to the topic. Downstream system B sometimes fails to process certain messages (for example, it times out or returns an error while handling the message), and you want: 1) failures in B to be isolated so A and C keep processing unaffected, and 2) messages that B cannot successfully process after retries to be sent to a DLQ for B. Which design best meets these requirements?

Question 105mediummultiple choice
Read the full Design Resilient Architectures explanation →

A web application runs on an Amazon EC2 Auto Scaling group behind an Application Load Balancer (ALB). After each deployment, new instances take about 2 minutes to download artifacts and become ready to accept requests on the target port. In the last deployment, the ALB started marking targets unhealthy before the app was ready, and the Auto Scaling group then replaced those instances repeatedly, causing a prolonged outage. Which change best improves resilience during instance start-up without reducing actual availability once the application is healthy?

Question 106mediummultiple choice
Read the full Design Resilient Architectures explanation →

A company uses Amazon RDS with automated backups enabled (retention period: 7 days). At 10:30 UTC, a bad release corrupts specific rows in a production table. The team detects the issue at 11:10 UTC. They need to revert the database state to what it was from 10:00–10:30 UTC, recover quickly, and minimize risk to the currently running workload. What is the best option?

Question 107mediummultiple choice
Review the full routing breakdown →

An internal-facing application is available in two AWS regions (Region 1 and Region 2). Each region has its own Application Load Balancer (ALB) and target group. The company uses an AWS Route 53 private hosted zone to route clients to Region 1 by default, but it must automatically fail over to Region 2 when Region 1’s ALB is unhealthy. Which Route 53 design best meets this requirement?

Question 108mediummultiple choice
Read the full Design Resilient Architectures explanation →

An internal worker consumes messages from an Amazon SQS Standard queue. Recently, some messages fail validation in the worker (for example, missing required fields), causing the worker to crash before it can successfully process those messages. Those messages keep getting retried repeatedly, slowing down processing of valid messages. The team wants a resilient mechanism to quarantine bad messages after a limited number of receive attempts. What should they implement?

Question 109mediummultiple choice
Read the full Design Resilient Architectures explanation →

An orders service publishes payment instructions to an Amazon SQS Standard queue. The downstream processor sometimes times out after it has already applied the payment, but before it can delete the message from the queue. As a result, the same payment instruction can be processed more than once. The team wants the strongest way to prevent duplicate side effects while keeping the system decoupled. What should they implement?

Question 110easymultiple choice
Review the full subnetting walkthrough →

A web application runs on an Amazon EC2 Auto Scaling group (ASG) behind an Application Load Balancer (ALB). The ALB is configured to use at least two Availability Zones (AZs), but the ASG currently uses subnets in only one AZ. If that AZ becomes unavailable, the application stops serving requests. Which change most directly improves resilience to an AZ outage?

Question 111easymultiple choice
Review the full routing breakdown →

Your company hosts an internal API in two AWS Regions. You want Amazon Route 53 to automatically send traffic to the secondary Region if the primary Region’s endpoint becomes unhealthy. Which Route 53 configuration best meets this requirement?

Question 112easymultiple choice
Read the full Design Resilient Architectures explanation →

An internal worker consumes messages from an Amazon SQS queue. Occasionally, a message fails validation in the worker (for example, missing required fields). Reprocessing the same bad message repeatedly wastes processing time and delays healthy messages. What is the best AWS approach to handle these poison messages without blocking the rest of the queue?

Question 113easymultiple choice
Read the full Design Resilient Architectures explanation →

A team needs a relational database solution that can automatically fail over to a standby instance if the primary database becomes unavailable. They want the standby to be located in a different Availability Zone. Which RDS/Aurora configuration best satisfies this requirement?

Question 114easymultiple choice
Read the full Design Resilient Architectures explanation →

A production Amazon RDS database has automated backups enabled with sufficient retention. At 10:30 UTC, a release corrupts specific rows. The issue is detected at 10:45 UTC. The team wants to restore the database state to before the corruption with minimal complexity. What should they do?

Question 115easymultiple choice
Read the full Design Resilient Architectures explanation →

An orders service consumes payment instructions from an Amazon SQS queue. Sometimes the consumer times out after applying the payment but before deleting the SQS message. As a result, the same payment instruction is processed again. Which design change most directly prevents duplicate side effects caused by message retries?

Question 116mediummultiple choice
Read the full Design Resilient Architectures explanation →

A production Amazon RDS database has automated backups enabled. At 10:00 UTC, an application deploy accidentally overwrote a subset of rows due to a faulty migration. The issue is detected at 10:45 UTC. The team confirms that the required retention window is still available. Which approach offers the most resilient and least disruptive way to recover the affected data close to the time of the event?

Question 117mediummultiple choice
Read the full Design Resilient Architectures explanation →

An orders system sends payment instructions to an Amazon SQS queue. The consumer sometimes times out after it has already created the payment record but before it deletes the SQS message. As a result, the same instruction can be processed more than once. Which design best ensures the consumer remains resilient and does not create duplicate payments when the same instruction is delivered multiple times?

Question 118mediummultiple choice
Read the full Design Resilient Architectures explanation →

A company runs an Amazon Aurora DB cluster with a Multi-AZ deployment. The application is configured with a hard-coded endpoint that points to the current writer *DB instance* (an instance-specific endpoint), rather than the Aurora cluster writer endpoint. During an unexpected AZ failure, Aurora promotes the standby to become the new writer. However, the application continues to fail to connect until an operator updates the hard-coded endpoint. What change most directly improves resiliency so the application automatically reconnects after failover?

Question 119mediummultiple choice
Read the full Design Resilient Architectures explanation →

An event-driven order processing service consumes messages from an Amazon SQS Standard queue. After a deployment, about 1% of messages start failing validation because a required field is missing. The consumer catches the exception and returns control, so the messages are retried. However, those poison messages keep reappearing and repeatedly consuming processing time for hours, delaying handling of valid messages. What is the most resilient way to handle the poison messages while keeping the system available?

Question 120mediummultiple choice
Read the full DNS explanation →

A company hosts an internal API behind an Application Load Balancer (ALB) in two AWS Regions. They want Amazon Route 53 to automatically fail over to the secondary Region when the primary Region’s ALB is unhealthy. Health checks for the primary ALB are already configured, but the DNS record currently uses a latency-based routing policy. Which Route 53 configuration most directly provides automatic failover based on health status?

Question 121mediummultiple choice
Read the full NAT/PAT explanation →

A web application runs on an EC2 Auto Scaling group (ASG) behind an Application Load Balancer (ALB). The ASG spans three Availability Zones. After a deployment, new instances frequently fail the ALB target group health checks with HTTP 5xx responses and are quickly terminated by the ASG. What change most improves resiliency during deployments with minimal downtime by preventing premature removal of instances that are still starting?

Question 122mediummultiple choice
Read the full Design Resilient Architectures explanation →

A fintech company has a two-Region DR requirement: RPO must be within 15 minutes and RTO must be under 2 hours. To control cost, they do not want to run full production infrastructure in the secondary Region continuously. They plan to continuously replicate the database and keep the application infrastructure in the secondary Region prepared, but at reduced capacity. Which DR strategy best matches this requirement and accurately describes their plan?

Question 123easymulti select
Read the full Design Resilient Architectures explanation →

A web application runs on an Auto Scaling group behind an Application Load Balancer. The business wants the service to keep running if one Availability Zone goes down. Which two changes should you make? Select two.

Question 124easymulti select
Read the full Design Resilient Architectures explanation →

A production Amazon RDS database must continue serving the application if the primary DB instance fails. The application should reconnect automatically without hard-coding a new IP address. Which two actions should you take? Select two.

Question 125easymulti select
Review the full routing breakdown →

A company hosts an internal API in two AWS Regions. Traffic must automatically switch to the secondary Region when the primary Region's endpoint is unhealthy. Which two Route 53 settings are required? Select two.

Question 126easymulti select
Read the full Design Resilient Architectures explanation →

A service processes messages from an Amazon SQS queue. Sometimes the worker finishes the business logic but does not delete the message before the visibility timeout expires, so the message is delivered again. Which two changes improve resilience and reduce the impact of duplicate processing? Select two.

Question 127easymulti select
Read the full Design Resilient Architectures explanation →

A developer accidentally corrupts part of a production Amazon RDS database, and the issue is discovered 45 minutes later. The team needs to restore the database to the state immediately before the change. Which two actions should be part of the recovery plan? Select two.

Question 128easymulti select
Read the full Design Resilient Architectures explanation →

A batch processing job can be interrupted and restarted from checkpoints. The business wants to lower compute cost while still keeping the workload resilient to interruptions. Which two choices are best? Select two.

Question 129easymultiple choice
Read the full Design Resilient Architectures explanation →

A production application uses an Amazon RDS Multi-AZ DB instance. During an unplanned failover, the database endpoint remains the same. What change should the application team make to handle the failover reliably?

Question 130easymultiple choice
Read the full Design Resilient Architectures explanation →

Your web tier runs on an EC2 Auto Scaling group behind an Application Load Balancer (ALB). You currently deploy both the ALB and the Auto Scaling group in only two Availability Zones (AZs). One AZ fails. What is the best configuration change to improve resilience?

Question 131easymultiple choice
Review the full routing breakdown →

An internal API is hosted in two AWS Regions behind Route 53. Under normal conditions, clients should use the primary region. If the primary endpoint becomes unhealthy, traffic must automatically switch to the secondary region. Which Route 53 setup best meets this requirement?

Question 132easymultiple choice
Read the full Design Resilient Architectures explanation →

An order-processing system publishes an event whenever a payment succeeds. Three downstream services (inventory, shipping, and analytics) must react independently. Analytics sometimes has high latency, but order processing must not be blocked. What is the best AWS approach to decouple these consumers?

Question 133easymultiple choice
Read the full Design Resilient Architectures explanation →

A consumer application reads from an Amazon SQS queue. Some messages have an invalid format and always fail processing. They are retried repeatedly and consume consumer capacity. What is the best way to prevent these "poison pill" messages from blocking normal processing?

Question 134easymultiple choice
Read the full Design Resilient Architectures explanation →

An event consumer sometimes processes the same SQS message more than once due to timeouts and retries. The consumer must ensure the payment is not charged twice. What design choice best addresses this requirement?

Question 135easymultiple choice
Read the full Design Resilient Architectures explanation →

A company needs an Amazon RDS database that automatically fails over to a standby when the primary DB instance becomes unavailable. Which approach best meets the requirement with minimal operational effort?

Question 136easymultiple choice
Read the full Design Resilient Architectures explanation →

An internal service is hosted behind an Application Load Balancer (ALB) with targets spread across two Availability Zones. If the targets in one Availability Zone become unhealthy, the service must continue serving traffic from the healthy AZ. What change most directly improves resilience at the load-balancing layer?

Question 137easymultiple choice
Read the full Design Resilient Architectures explanation →

A worker consumes messages from an Amazon SQS queue. Some messages consistently fail validation and are retried until the worker can no longer process them. What is the most appropriate AWS mechanism to handle these poison messages while keeping the queue usable?

Question 138easymultiple choice
Read the full Design Resilient Architectures explanation →

A production Amazon RDS database has automated backups enabled. At 10:45 UTC, an issue is discovered. The team needs to restore the database to its state as of 10:30 UTC. Which capability should they use?

Question 139easymultiple choice
Read the full Design Resilient Architectures explanation →

A system processes events from Amazon SQS and sometimes sees duplicate messages due to retries. The business requirement is that each payment must be charged at most once. What design choice best addresses this resiliency requirement?

Question 140easymultiple choice
Read the full Design Resilient Architectures explanation →

A company wants a disaster recovery setup for a web application. They need relatively quick recovery, but they can't afford running full production in the secondary location at all times. Which option best matches this requirement?

Question 141mediummulti select
Read the full Design Resilient Architectures explanation →

A fintech company needs a disaster recovery design for a web application in two Regions. The business requires an RPO of 15 minutes and an RTO under 2 hours, but it cannot afford to keep a full production stack running in both Regions all the time. Which two DR strategies best fit the requirement? Select two.

Question 142mediummulti select
Read the full Design Resilient Architectures explanation →

A transactional application uses Amazon RDS for MySQL in a single Availability Zone. The team wants the database to fail over automatically if the primary DB instance becomes unavailable, and they want the application to recover with minimal code changes. Which two actions should they take? Select two.

Question 143mediummulti select
Read the full Design Resilient Architectures explanation →

An order-processing worker consumes messages from Amazon SQS. Occasionally, the worker times out after successfully creating a payment record but before deleting the message, which causes duplicate charges during retries. Some messages also fail validation repeatedly because required fields are missing. Which two changes should the team make? Select two.

Question 144mediummulti select
Read the full Design Resilient Architectures explanation →

A production Amazon RDS database already has automated backups enabled. At 10:45 UTC, the team discovers that a faulty migration corrupted rows in a table at 10:30 UTC. The business wants the database restored to exactly the state it had at 10:30 UTC with minimal risk. Which two actions should the team take? Select two.

Question 145easymultiple choice
Read the full Design Resilient Architectures explanation →

A worker service consumes messages from an Amazon SQS queue. Some messages are malformed and always fail validation. The worker retries, but it keeps reprocessing the same bad messages and consumes processing capacity that should be used for valid work. What is the best solution to prevent “poison messages” from blocking progress?

Question 146easymultiple choice
Read the full Design Resilient Architectures explanation →

A production Amazon RDS database has automated backups enabled. An application mistakenly updates a table and the issue is discovered one hour later. The team needs to restore the database to the exact state it had 45 minutes ago. Which approach best meets the requirement?

Question 147easymultiple choice
Read the full Design Resilient Architectures explanation →

A company wants a disaster recovery setup for a web application. They want to keep costs low but still recover within a couple of hours after a regional disruption. They are willing to run only minimal infrastructure in the secondary location and scale it up during the outage. Which DR approach best matches this requirement?

Question 148easymultiple choice
Read the full Design Resilient Architectures explanation →

A company hosts a web application on Amazon EC2 instances in an Auto Scaling group behind an Application Load Balancer (ALB). The ALB and the Auto Scaling group are currently deployed in only one Availability Zone (AZ). The business wants the application to keep running if that AZ has an outage. What is the best change?

Question 149easymultiple choice
Read the full Design Resilient Architectures explanation →

A team runs an Amazon RDS for MySQL database in a single Availability Zone. They want automatic failover with minimal downtime if the primary database instance becomes unavailable. Automated backups are already enabled. Which configuration change best meets the requirement?

Question 150easymultiple choice
Review the full routing breakdown →

An organization hosts the same public API in two AWS Regions. Normal traffic should go to the primary Region. If the primary endpoint becomes unhealthy, Route 53 should automatically route users to the secondary Region. What is the best Route 53 configuration approach?

Question 151easymultiple choice
Read the full Design Resilient Architectures explanation →

An orders service currently sends HTTP requests directly to two downstream services (inventory and shipping). During peak load, inventory slows down, causing the orders service to slow as well. The team wants the orders service to remain responsive even when a downstream service is temporarily slow or restarted. Which design change best achieves this resiliency goal?

Question 152mediummulti select
Read the full Design Resilient Architectures explanation →

A payment worker consumes messages from an Amazon SQS queue. Sometimes the worker finishes the payment creation, but a timeout prevents message deletion and the same payment request is delivered again. Which two design changes best reduce the risk of duplicate charges and keep bad messages from looping forever? Select two.

Question 153mediummulti select
Read the full Design Resilient Architectures explanation →

An application uses an Amazon RDS Multi-AZ DB instance. During a failover test, connections fail until the application is restarted, even though the database comes back online. Which two changes should the team make to improve resilience during failover? Select two.

Question 154mediummulti select
Review the full routing breakdown →

An internal API is deployed in two AWS Regions behind separate Application Load Balancers. The company wants clients to use the primary Region when it is healthy and automatically switch to the secondary Region if the primary health check fails. Which two Route 53 record configurations are required? Select two.

Question 155mediummulti select
Read the full Design Resilient Architectures explanation →

A production Amazon Aurora MySQL database is corrupted by a bad migration at 10:30 UTC, and the problem is discovered at 10:45 UTC. The team wants to recover to the state just before the migration with minimal manual effort. Which two actions should they take? Select two.

Question 156mediummulti select
Read the full Design Resilient Architectures explanation →

An order service must notify inventory, shipping, and analytics independently when payment succeeds. The shipping service may be slow, but the order service should keep accepting new orders even if one consumer is unavailable. Which two changes best improve resilience? Select two.

Question 157mediummultiple choice
Read the full Design Resilient Architectures explanation →

Based on the exhibit, the team wants to stop poison messages from consuming worker capacity and also prevent duplicate side effects if the same message is delivered more than once. Which design change best meets the requirement?

Exhibit

SQS queue attributes:
  VisibilityTimeout = 30 seconds
  RedrivePolicy = not configured

CloudWatch Logs:
  14:02:11 worker-a received messageId=7b2c8f4a
  14:02:43 worker-a started payment write for order 9912
  14:03:04 worker-a message visible again before delete
  14:03:11 worker-b received messageId=7b2c8f4a
  14:03:18 worker-b repeated payment write for order 9912

Application note:
  Average handler duration is 42-55 seconds during peak load
Question 158mediummultiple choice
Read the full Design Resilient Architectures explanation →

Based on the exhibit, a faulty deployment corrupted production data at 10:30 UTC and the issue was discovered at 10:55 UTC. The team needs to recover the database to the last good state before the corruption. Which action should they take?

Exhibit

RDS backup configuration:
  Automated backups: enabled
  Backup retention: 14 days
  Latest manual snapshot: 2026-04-18 02:00 UTC

Operations log:
  10:30 UTC - schema migration started
  10:36 UTC - application errors began
  10:55 UTC - corrupted rows discovered

Requirement:
  Restore to a point before the bad migration without losing the entire day of changes
Question 159mediummultiple choice
Read the full Design Resilient Architectures explanation →

Based on the exhibit, the application team wants the database to keep the same connection endpoint during failover and to reconnect automatically after the primary instance becomes unavailable. Which change best meets the requirement?

Exhibit

application.properties:
  spring.datasource.url=jdbc:mysql://10.0.12.55:3306/orders
  spring.datasource.username=appuser
  spring.datasource.password=****

RDS event log:
  2026-04-12T03:14:22Z db-1 - Failover started
  2026-04-12T03:15:01Z db-1 - Primary unavailable
  2026-04-12T03:16:10Z app-server-2 - SQLRecoverableException: Communications link failure

Current deployment:
  Amazon RDS for MySQL, Multi-AZ enabled
  Application instances in two AZs
  Connection string uses an IP address that was entered manually
Question 160mediummultiple choice
Review the full routing breakdown →

Based on the exhibit, which Route 53 configuration should be used so traffic automatically returns to the secondary Region only when the primary Region becomes unhealthy?

Exhibit

DNS design notes:
  Primary Region: us-east-1
  Primary ALB: alb-prod-east-1.example.internal
  Secondary Region: us-west-2
  Secondary ALB: alb-prod-west-2.example.internal

Health check results:
  /health on us-east-1 returns HTTP 503
  /health on us-west-2 returns HTTP 200

Requirement:
  Clients should use the primary endpoint during normal operations and switch automatically only on primary failure
Question 161mediummultiple choice
Read the full Design Resilient Architectures explanation →

Based on the exhibit, the business needs Regional disaster recovery with an RTO of 45 minutes and an RPO of 15 minutes. The solution should keep cost lower than running two fully active production environments. Which DR strategy is the best fit?

Exhibit

Business requirements:
  RTO: 45 minutes
  RPO: 15 minutes
  Budget: lower than a fully duplicated production stack

Current state:
  One production Region hosts the live application
  Daily backups are stored in a separate Region
  The application tier can be recreated from automation scripts
Question 162mediummultiple choice
Read the full Design Resilient Architectures explanation →

Based on the exhibit, the application should continue serving requests if one Availability Zone fails. Which change best improves resilience with the least operational complexity?

Exhibit

Current deployment:
  Application Load Balancer subnets: subnet-a1 (AZ-a), subnet-a2 (AZ-a)
  Auto Scaling group subnets: subnet-a1 (AZ-a) only
  Desired capacity: 4 instances
  Minimum capacity: 4 instances

Incident report:
  2026-04-18T09:21Z AZ-a experienced a power issue
  2026-04-18T09:22Z all targets became unhealthy
  2026-04-18T09:25Z service returned HTTP 503 to users
Question 163easymultiple choice
Read the full Design Resilient Architectures explanation →

Based on the exhibit, some SQS messages fail validation repeatedly and continue consuming worker time. What change best prevents the bad messages from being retried forever?

Exhibit

Worker log excerpt:
2026-04-28T09:02:11Z messageId=7f31 receiveCount=1 status=ValidationError
2026-04-28T09:03:14Z messageId=7f31 receiveCount=2 status=ValidationError
2026-04-28T09:04:17Z messageId=7f31 receiveCount=3 status=ValidationError
Queue metric: ApproximateNumberOfMessagesNotVisible keeps increasing
Question 164easymultiple choice
Read the full Design Resilient Architectures explanation →

Based on the exhibit, the web team wants the application to continue serving traffic if one Availability Zone fails. Which change best meets the requirement with the least operational overhead?

Exhibit

ALB target group: 2 healthy targets in us-east-1a only
Auto Scaling group subnets: subnet-0a1b2c3d (us-east-1a)
Desired capacity: 2
Unused subnet available: subnet-9f8e7d6c (us-east-1b)
Health checks: passing
Recent incident note: "If us-east-1a is unavailable, both app instances are lost."
Question 165mediummultiple choice
Read the full Design Resilient Architectures explanation →

A payments API uses an RDS MySQL database and must remain available during an Availability Zone failure with minimal application changes. What should the architect enable?

Question 166mediummultiple choice
Read the full Design Resilient Architectures explanation →

A ticket booking system runs on EC2 instances behind an Application Load Balancer. The design must tolerate the failure of one Availability Zone. What should the Auto Scaling group configuration include?

Question 167hardmulti select
Read the full Design Resilient Architectures explanation →

A regional web application for a inventory service must fail over automatically to a secondary Region if the primary endpoint becomes unhealthy. Which two services or features are required?

Question 168mediummultiple choice
Read the full NAT/PAT explanation →

A patient portal receives bursts of orders that sometimes overwhelm a downstream fulfilment service. The architecture must absorb spikes and retry processing without losing requests. Which service should be placed between the web tier and fulfilment workers?

Question 169hardmultiple choice
Read the full Design Resilient Architectures explanation →

A claims workflow uses Amazon SQS. Poison messages are repeatedly failing and blocking useful retries. What should the architect configure?

Question 170mediummultiple choice
Read the full Design Resilient Architectures explanation →

A trading dashboard stores uploaded documents in S3. The business requires a copy in another AWS Region for disaster recovery. What should be configured?

Question 171mediummultiple choice
Read the full Design Resilient Architectures explanation →

A content publishing system uses Lambda functions that call an unreliable third-party API. Failed events must be retained for later investigation after retries are exhausted. What should be configured?

Question 172hardmultiple choice
Read the full Design Resilient Architectures explanation →

A warehouse integration service must use shared file storage across Linux EC2 instances in multiple Availability Zones. The storage must remain available during an AZ failure. Which service should be used?

Question 173hardmulti select
Read the full Design Resilient Architectures explanation →

A payments API requires point-in-time recovery and accidental-delete protection for a DynamoDB table. Which two settings should the architect enable?

Question 174mediummultiple choice
Read the full Design Resilient Architectures explanation →

A ticket booking system uses Aurora MySQL. The company wants fast cross-Region disaster recovery with low RPO. Which architecture should be considered?

Question 175easymultiple choice
Read the full Design Resilient Architectures explanation →

A inventory service exposes a static website from S3 and CloudFront. Users should still receive cached pages if the S3 origin has a short outage. Which feature helps most?

Question 176hardmultiple choice
Read the full NAT/PAT explanation →

A patient portal must process every event at least once, but duplicate processing is acceptable if the consumer handles idempotency. Which eventing approach is most suitable?

Question 177mediummultiple choice
Read the full Design Resilient Architectures explanation →

A claims workflow uses an RDS MySQL database and must remain available during an Availability Zone failure with minimal application changes. What should the architect enable?

Question 178mediummultiple choice
Read the full Design Resilient Architectures explanation →

A trading dashboard runs on EC2 instances behind an Application Load Balancer. The design must tolerate the failure of one Availability Zone. What should the Auto Scaling group configuration include?

Question 179hardmulti select
Read the full Design Resilient Architectures explanation →

A regional web application for a content publishing system must fail over automatically to a secondary Region if the primary endpoint becomes unhealthy. Which two services or features are required?

Question 180mediummultiple choice
Read the full Design Resilient Architectures explanation →

A warehouse integration service receives bursts of orders that sometimes overwhelm a downstream fulfilment service. The architecture must absorb spikes and retry processing without losing requests. Which service should be placed between the web tier and fulfilment workers?

Question 181hardmultiple choice
Read the full Design Resilient Architectures explanation →

A payments API uses Amazon SQS. Poison messages are repeatedly failing and blocking useful retries. What should the architect configure?

Question 182mediummultiple choice
Read the full Design Resilient Architectures explanation →

A ticket booking system stores uploaded documents in S3. The business requires a copy in another AWS Region for disaster recovery. What should be configured?

Question 183mediummultiple choice
Read the full Design Resilient Architectures explanation →

A inventory service uses Lambda functions that call an unreliable third-party API. Failed events must be retained for later investigation after retries are exhausted. What should be configured?

Question 184hardmultiple choice
Read the full NAT/PAT explanation →

A patient portal must use shared file storage across Linux EC2 instances in multiple Availability Zones. The storage must remain available during an AZ failure. Which service should be used?

Question 185hardmulti select
Read the full Design Resilient Architectures explanation →

A claims workflow requires point-in-time recovery and accidental-delete protection for a DynamoDB table. Which two settings should the architect enable?

Question 186mediummultiple choice
Read the full Design Resilient Architectures explanation →

A trading dashboard uses Aurora MySQL. The company wants fast cross-Region disaster recovery with low RPO. Which architecture should be considered?

Question 187easymultiple choice
Read the full Design Resilient Architectures explanation →

A content publishing system exposes a static website from S3 and CloudFront. Users should still receive cached pages if the S3 origin has a short outage. Which feature helps most?

Question 188hardmultiple choice
Read the full Design Resilient Architectures explanation →

A warehouse integration service must process every event at least once, but duplicate processing is acceptable if the consumer handles idempotency. Which eventing approach is most suitable?

Question 189mediummultiple choice
Read the full Design Resilient Architectures explanation →

A payments API uses an RDS MySQL database and must remain available during an Availability Zone failure with minimal application changes. What should the architect enable? The design must avoid adding custom operational scripts.

Question 190mediummultiple choice
Read the full Design Resilient Architectures explanation →

A ticket booking system runs on EC2 instances behind an Application Load Balancer. The design must tolerate the failure of one Availability Zone. What should the Auto Scaling group configuration include? The design must avoid adding custom operational scripts.

Question 191hardmulti select
Read the full Design Resilient Architectures explanation →

A regional web application for a inventory service must fail over automatically to a secondary Region if the primary endpoint becomes unhealthy. Which two services or features are required? The design must avoid adding custom operational scripts.

Question 192mediummultiple choice
Read the full NAT/PAT explanation →

A patient portal receives bursts of orders that sometimes overwhelm a downstream fulfilment service. The architecture must absorb spikes and retry processing without losing requests. Which service should be placed between the web tier and fulfilment workers? The design must avoid adding custom operational scripts.

Question 193hardmultiple choice
Read the full Design Resilient Architectures explanation →

A claims workflow uses Amazon SQS. Poison messages are repeatedly failing and blocking useful retries. What should the architect configure? The design must avoid adding custom operational scripts.

Question 194mediummultiple choice
Read the full Design Resilient Architectures explanation →

A trading dashboard stores uploaded documents in S3. The business requires a copy in another AWS Region for disaster recovery. What should be configured? The design must avoid adding custom operational scripts.

Question 195mediummultiple choice
Read the full Design Resilient Architectures explanation →

A content publishing system uses Lambda functions that call an unreliable third-party API. Failed events must be retained for later investigation after retries are exhausted. What should be configured? The design must avoid adding custom operational scripts.

Question 196hardmultiple choice
Read the full Design Resilient Architectures explanation →

A warehouse integration service must use shared file storage across Linux EC2 instances in multiple Availability Zones. The storage must remain available during an AZ failure. Which service should be used? The design must avoid adding custom operational scripts.

Question 197hardmulti select
Read the full Design Resilient Architectures explanation →

A payments API requires point-in-time recovery and accidental-delete protection for a DynamoDB table. Which two settings should the architect enable? The design must avoid adding custom operational scripts.

Question 198mediummultiple choice
Read the full Design Resilient Architectures explanation →

A ticket booking system uses Aurora MySQL. The company wants fast cross-Region disaster recovery with low RPO. Which architecture should be considered? The design must avoid adding custom operational scripts.

Question 199easymultiple choice
Read the full Design Resilient Architectures explanation →

A inventory service exposes a static website from S3 and CloudFront. Users should still receive cached pages if the S3 origin has a short outage. Which feature helps most? The design must avoid adding custom operational scripts.

Question 200hardmultiple choice
Read the full NAT/PAT explanation →

A patient portal must process every event at least once, but duplicate processing is acceptable if the consumer handles idempotency. Which eventing approach is most suitable? The design must avoid adding custom operational scripts.

Question 201mediummultiple choice
Read the full Design Resilient Architectures explanation →

A claims workflow uses an RDS MySQL database and must remain available during an Availability Zone failure with minimal application changes. What should the architect enable? The design must avoid adding custom operational scripts.

Question 202mediummultiple choice
Read the full Design Resilient Architectures explanation →

A trading dashboard runs on EC2 instances behind an Application Load Balancer. The design must tolerate the failure of one Availability Zone. What should the Auto Scaling group configuration include? The design must avoid adding custom operational scripts.

Question 203hardmulti select
Read the full Design Resilient Architectures explanation →

A regional web application for a content publishing system must fail over automatically to a secondary Region if the primary endpoint becomes unhealthy. Which two services or features are required? The design must avoid adding custom operational scripts.

Question 204mediummultiple choice
Read the full Design Resilient Architectures explanation →

A warehouse integration service receives bursts of orders that sometimes overwhelm a downstream fulfilment service. The architecture must absorb spikes and retry processing without losing requests. Which service should be placed between the web tier and fulfilment workers? The design must avoid adding custom operational scripts.

Question 205hardmultiple choice
Read the full Design Resilient Architectures explanation →

A payments API uses Amazon SQS. Poison messages are repeatedly failing and blocking useful retries. What should the architect configure? The design must avoid adding custom operational scripts.

Question 206mediummultiple choice
Read the full Design Resilient Architectures explanation →

A ticket booking system stores uploaded documents in S3. The business requires a copy in another AWS Region for disaster recovery. What should be configured? The design must avoid adding custom operational scripts.

Question 207mediummultiple choice
Read the full Design Resilient Architectures explanation →

A inventory service uses Lambda functions that call an unreliable third-party API. Failed events must be retained for later investigation after retries are exhausted. What should be configured? The design must avoid adding custom operational scripts.

Question 208hardmultiple choice
Read the full NAT/PAT explanation →

A patient portal must use shared file storage across Linux EC2 instances in multiple Availability Zones. The storage must remain available during an AZ failure. Which service should be used? The design must avoid adding custom operational scripts.

Question 209hardmulti select
Read the full Design Resilient Architectures explanation →

A claims workflow requires point-in-time recovery and accidental-delete protection for a DynamoDB table. Which two settings should the architect enable? The design must avoid adding custom operational scripts.

Question 210mediummultiple choice
Read the full Design Resilient Architectures explanation →

A trading dashboard uses Aurora MySQL. The company wants fast cross-Region disaster recovery with low RPO. Which architecture should be considered? The design must avoid adding custom operational scripts.

Question 211easymultiple choice
Read the full Design Resilient Architectures explanation →

A content publishing system exposes a static website from S3 and CloudFront. Users should still receive cached pages if the S3 origin has a short outage. Which feature helps most? The design must avoid adding custom operational scripts.

Question 212hardmultiple choice
Read the full Design Resilient Architectures explanation →

A warehouse integration service must process every event at least once, but duplicate processing is acceptable if the consumer handles idempotency. Which eventing approach is most suitable? The design must avoid adding custom operational scripts.

Question 213mediummultiple choice
Read the full NAT/PAT explanation →

A payments API uses an RDS MySQL database and must remain available during an Availability Zone failure with minimal application changes. What should the architect enable? The architecture review board prefers a managed AWS-native control.

Question 214mediummultiple choice
Read the full NAT/PAT explanation →

A ticket booking system runs on EC2 instances behind an Application Load Balancer. The design must tolerate the failure of one Availability Zone. What should the Auto Scaling group configuration include? The architecture review board prefers a managed AWS-native control.

Question 215hardmulti select
Read the full NAT/PAT explanation →

A regional web application for a inventory service must fail over automatically to a secondary Region if the primary endpoint becomes unhealthy. Which two services or features are required? The architecture review board prefers a managed AWS-native control.

Question 216mediummultiple choice
Read the full NAT/PAT explanation →

A patient portal receives bursts of orders that sometimes overwhelm a downstream fulfilment service. The architecture must absorb spikes and retry processing without losing requests. Which service should be placed between the web tier and fulfilment workers? The architecture review board prefers a managed AWS-native control.

Question 217hardmultiple choice
Read the full NAT/PAT explanation →

A claims workflow uses Amazon SQS. Poison messages are repeatedly failing and blocking useful retries. What should the architect configure? The architecture review board prefers a managed AWS-native control.

Question 218mediummultiple choice
Read the full NAT/PAT explanation →

A trading dashboard stores uploaded documents in S3. The business requires a copy in another AWS Region for disaster recovery. What should be configured? The architecture review board prefers a managed AWS-native control.

Question 219mediummultiple choice
Read the full NAT/PAT explanation →

A content publishing system uses Lambda functions that call an unreliable third-party API. Failed events must be retained for later investigation after retries are exhausted. What should be configured? The architecture review board prefers a managed AWS-native control.

Question 220hardmultiple choice
Read the full NAT/PAT explanation →

A warehouse integration service must use shared file storage across Linux EC2 instances in multiple Availability Zones. The storage must remain available during an AZ failure. Which service should be used? The architecture review board prefers a managed AWS-native control.

Question 221hardmulti select
Read the full NAT/PAT explanation →

A payments API requires point-in-time recovery and accidental-delete protection for a DynamoDB table. Which two settings should the architect enable? The architecture review board prefers a managed AWS-native control.

Question 222mediummultiple choice
Read the full NAT/PAT explanation →

A ticket booking system uses Aurora MySQL. The company wants fast cross-Region disaster recovery with low RPO. Which architecture should be considered? The architecture review board prefers a managed AWS-native control.

Question 223easymultiple choice
Read the full NAT/PAT explanation →

A inventory service exposes a static website from S3 and CloudFront. Users should still receive cached pages if the S3 origin has a short outage. Which feature helps most? The architecture review board prefers a managed AWS-native control.

Question 224hardmultiple choice
Read the full NAT/PAT explanation →

A patient portal must process every event at least once, but duplicate processing is acceptable if the consumer handles idempotency. Which eventing approach is most suitable? The architecture review board prefers a managed AWS-native control.

Question 225mediummultiple choice
Read the full NAT/PAT explanation →

A claims workflow uses an RDS MySQL database and must remain available during an Availability Zone failure with minimal application changes. What should the architect enable? The architecture review board prefers a managed AWS-native control.

Question 226mediummultiple choice
Read the full NAT/PAT explanation →

A trading dashboard runs on EC2 instances behind an Application Load Balancer. The design must tolerate the failure of one Availability Zone. What should the Auto Scaling group configuration include? The architecture review board prefers a managed AWS-native control.

Question 227hardmulti select
Read the full NAT/PAT explanation →

A regional web application for a content publishing system must fail over automatically to a secondary Region if the primary endpoint becomes unhealthy. Which two services or features are required? The architecture review board prefers a managed AWS-native control.

Question 228mediummultiple choice
Read the full NAT/PAT explanation →

A warehouse integration service receives bursts of orders that sometimes overwhelm a downstream fulfilment service. The architecture must absorb spikes and retry processing without losing requests. Which service should be placed between the web tier and fulfilment workers? The architecture review board prefers a managed AWS-native control.

Question 229hardmultiple choice
Read the full NAT/PAT explanation →

A payments API uses Amazon SQS. Poison messages are repeatedly failing and blocking useful retries. What should the architect configure? The architecture review board prefers a managed AWS-native control.

Question 230mediummultiple choice
Read the full NAT/PAT explanation →

A ticket booking system stores uploaded documents in S3. The business requires a copy in another AWS Region for disaster recovery. What should be configured? The architecture review board prefers a managed AWS-native control.

Question 231mediummultiple choice
Read the full NAT/PAT explanation →

A inventory service uses Lambda functions that call an unreliable third-party API. Failed events must be retained for later investigation after retries are exhausted. What should be configured? The architecture review board prefers a managed AWS-native control.

Question 232hardmultiple choice
Read the full NAT/PAT explanation →

A patient portal must use shared file storage across Linux EC2 instances in multiple Availability Zones. The storage must remain available during an AZ failure. Which service should be used? The architecture review board prefers a managed AWS-native control.

Question 233hardmulti select
Read the full NAT/PAT explanation →

A claims workflow requires point-in-time recovery and accidental-delete protection for a DynamoDB table. Which two settings should the architect enable? The architecture review board prefers a managed AWS-native control.

Question 234mediummultiple choice
Read the full NAT/PAT explanation →

A trading dashboard uses Aurora MySQL. The company wants fast cross-Region disaster recovery with low RPO. Which architecture should be considered? The architecture review board prefers a managed AWS-native control.

Question 235easymultiple choice
Read the full NAT/PAT explanation →

A content publishing system exposes a static website from S3 and CloudFront. Users should still receive cached pages if the S3 origin has a short outage. Which feature helps most? The architecture review board prefers a managed AWS-native control.

Question 236hardmultiple choice
Read the full NAT/PAT explanation →

A warehouse integration service must process every event at least once, but duplicate processing is acceptable if the consumer handles idempotency. Which eventing approach is most suitable? The architecture review board prefers a managed AWS-native control.

Question 237mediummultiple choice
Read the full Design Resilient Architectures explanation →

A payments API uses an RDS MySQL database and must remain available during an Availability Zone failure with minimal application changes. What should the architect enable? The team wants the control to be enforceable during normal operations.

Question 238mediummultiple choice
Read the full Design Resilient Architectures explanation →

A ticket booking system runs on EC2 instances behind an Application Load Balancer. The design must tolerate the failure of one Availability Zone. What should the Auto Scaling group configuration include? The team wants the control to be enforceable during normal operations.

Question 239hardmulti select
Read the full Design Resilient Architectures explanation →

A regional web application for a inventory service must fail over automatically to a secondary Region if the primary endpoint becomes unhealthy. Which two services or features are required? The team wants the control to be enforceable during normal operations.

Question 240mediummultiple choice
Read the full NAT/PAT explanation →

A patient portal receives bursts of orders that sometimes overwhelm a downstream fulfilment service. The architecture must absorb spikes and retry processing without losing requests. Which service should be placed between the web tier and fulfilment workers? The team wants the control to be enforceable during normal operations.

Question 241hardmultiple choice
Read the full Design Resilient Architectures explanation →

A claims workflow uses Amazon SQS. Poison messages are repeatedly failing and blocking useful retries. What should the architect configure? The team wants the control to be enforceable during normal operations.

Question 242mediummultiple choice
Read the full Design Resilient Architectures explanation →

A trading dashboard stores uploaded documents in S3. The business requires a copy in another AWS Region for disaster recovery. What should be configured? The team wants the control to be enforceable during normal operations.

Question 243mediummultiple choice
Read the full Design Resilient Architectures explanation →

A content publishing system uses Lambda functions that call an unreliable third-party API. Failed events must be retained for later investigation after retries are exhausted. What should be configured? The team wants the control to be enforceable during normal operations.

Question 244hardmultiple choice
Read the full Design Resilient Architectures explanation →

A warehouse integration service must use shared file storage across Linux EC2 instances in multiple Availability Zones. The storage must remain available during an AZ failure. Which service should be used? The team wants the control to be enforceable during normal operations.

Question 245hardmulti select
Read the full Design Resilient Architectures explanation →

A payments API requires point-in-time recovery and accidental-delete protection for a DynamoDB table. Which two settings should the architect enable? The team wants the control to be enforceable during normal operations.

Question 246mediummultiple choice
Read the full Design Resilient Architectures explanation →

A ticket booking system uses Aurora MySQL. The company wants fast cross-Region disaster recovery with low RPO. Which architecture should be considered? The team wants the control to be enforceable during normal operations.

Question 247easymultiple choice
Read the full Design Resilient Architectures explanation →

A inventory service exposes a static website from S3 and CloudFront. Users should still receive cached pages if the S3 origin has a short outage. Which feature helps most? The team wants the control to be enforceable during normal operations.

Question 248hardmultiple choice
Read the full NAT/PAT explanation →

A patient portal must process every event at least once, but duplicate processing is acceptable if the consumer handles idempotency. Which eventing approach is most suitable? The team wants the control to be enforceable during normal operations.

Question 249mediummultiple choice
Read the full Design Resilient Architectures explanation →

A claims workflow uses an RDS MySQL database and must remain available during an Availability Zone failure with minimal application changes. What should the architect enable? The team wants the control to be enforceable during normal operations.

Question 250mediummultiple choice
Read the full Design Resilient Architectures explanation →

A trading dashboard runs on EC2 instances behind an Application Load Balancer. The design must tolerate the failure of one Availability Zone. What should the Auto Scaling group configuration include? The team wants the control to be enforceable during normal operations.

Question 251hardmultiple choice
Read the full Design Resilient Architectures explanation →

A company runs a production MySQL database on Amazon RDS in us-east-1. A read replica exists in us-west-2 for disaster recovery. The primary region experiences a complete outage. Which of the following describes the correct procedure to restore database service using the cross-region read replica?

Question 252mediummultiple choice
Read the full Design Resilient Architectures explanation →

A company uses Amazon SQS and AWS Lambda to process orders. Lambda typically completes in 4 minutes, but complex orders can take up to 12 minutes. The team reports that some orders are being processed more than once. Which is the MOST likely cause and the recommended fix?

Question 253mediummultiple choice
Read the full DNS explanation →

A company hosts a web application on EC2 instances behind an Application Load Balancer (ALB) in us-east-1. A static failover site is hosted in an S3 bucket with static website hosting enabled. The company needs automatic DNS failover to the S3 bucket if the primary ALB becomes unhealthy. Which Route 53 configuration achieves this?

Question 254mediummulti select
Read the full Design Resilient Architectures explanation →

A company is designing a highly available web application on AWS. The application runs on Amazon EC2 instances behind an Application Load Balancer (ALB) and uses an Amazon RDS Multi-AZ DB instance. Which three design choices would improve the application's resilience against an AWS Availability Zone failure? (Choose three.)

Question 255mediummulti select
Read the full Design Resilient Architectures explanation →

A company is migrating a legacy monolithic application to AWS and wants to improve its resilience by decoupling components. The application currently writes directly to a shared file system and uses synchronous HTTP calls between modules. Which three AWS services should the company use to achieve a more resilient, decoupled architecture? (Choose three.)

Question 256mediummulti select
Read the full Design Resilient Architectures explanation →

A company runs a production database on Amazon RDS for MySQL with Multi-AZ enabled. The database experiences a sudden increase in read replicas due to a marketing campaign. Which three strategies would help ensure the database remains resilient under heavy read traffic? (Choose three.)

Question 257mediummulti select
Review the full routing breakdown →

A company is designing a multi-tier web application on AWS that must be resilient to the failure of an entire AWS Region. The application uses Amazon Route 53, an Application Load Balancer, EC2 instances, and Amazon RDS. Which three design choices support a multi-Region resilient architecture? (Choose three.)

Question 258mediummulti select
Read the full Design Resilient Architectures explanation →

A company is deploying a stateless web application on Amazon ECS with Fargate. The application must be resilient to individual task failures and Availability Zone failures. Which three steps should the company take to achieve this resilience? (Choose three.)

Question 259mediummulti select
Read the full Design Resilient Architectures explanation →

A company is designing a disaster recovery plan for a critical application hosted on AWS. The application runs on EC2 instances with data stored in Amazon EBS volumes and Amazon S3. The recovery time objective (RTO) is 15 minutes, and the recovery point objective (RPO) is 1 hour. Which three strategies would help meet these objectives? (Choose three.)

Question 260mediummulti select
Read the full Design Resilient Architectures explanation →

A company is designing a multi-Region disaster recovery (DR) strategy for a stateless web application running on Amazon EC2 instances behind an Application Load Balancer (ALB). The application uses an Amazon RDS for MySQL database as its data store. The architecture must provide rapid failover with the lowest possible Recovery Point Objective (RPO) and Recovery Time Objective (RTO). Which of the following design choices will help achieve these objectives? (Choose four.)

Question 261mediummulti select
Read the full NAT/PAT explanation →

A solutions architect is designing a highly available and resilient architecture for a critical internal application that processes financial transactions. The application runs on Amazon EC2 instances inside an Auto Scaling group. The database layer uses an Amazon Aurora MySQL cluster. The company requires that if an entire AWS Availability Zone (AZ) fails, the application must remain operational with minimal impact and automatically recover without manual intervention. Which combination of architectural decisions will meet these requirements? (Choose four.)

Question 262mediumdrag order
Read the full NAT/PAT explanation →

Order the steps for setting up a VPC with public and private subnets using a NAT gateway.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps
Order
1Step 1
2Step 2
3Step 3
4Step 4
5Step 5
Question 263mediumdrag order
Read the full Design Resilient Architectures explanation →

Order the steps to restore an Amazon RDS DB instance from a snapshot.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps
Order
1Step 1
2Step 2
3Step 3
4Step 4
5Step 5
Question 264mediumdrag order
Read the full Design Resilient Architectures explanation →

Order the steps to create a static website using Amazon S3 and CloudFront.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps
Order
1Step 1
2Step 2
3Step 3
4Step 4
5Step 5

Practice tests

Scored 10-question sessions with instant feedback and explanations.

SAA-C03 Practice Test 1 — 10 Questions→SAA-C03 Practice Test 2 — 10 Questions→SAA-C03 Practice Test 3 — 10 Questions→SAA-C03 Practice Test 4 — 10 Questions→SAA-C03 Practice Test 5 — 10 Questions→SAA-C03 Practice Exam 1 — 20 Questions→SAA-C03 Practice Exam 2 — 20 Questions→SAA-C03 Practice Exam 3 — 20 Questions→SAA-C03 Practice Exam 4 — 20 Questions→Free SAA-C03 Practice Test 1 — 30 Questions→Free SAA-C03 Practice Test 2 — 30 Questions→Free SAA-C03 Practice Test 3 — 30 Questions→SAA-C03 Practice Questions 1 — 50 Questions→SAA-C03 Practice Questions 2 — 50 Questions→SAA-C03 Exam Simulation 1 — 100 Questions→

Practice by domain

Each domain maps to a weighted exam section. Focus on the domain where you are weakest.

Design Secure ArchitecturesDesign Resilient ArchitecturesDesign High-Performing ArchitecturesDesign Cost-Optimized Architectures

Practice by scenario

Filter questions by type — troubleshooting, exhibit, drag-and-drop, PBQ, ACLs, OSPF, and more.

Browse scenarios→

Continue studying

All Design Resilient Architectures setsAll Design Resilient Architectures questionsSAA-C03 Practice Hub