Knowledge + Practice

CCNA Reliability and Business Continuity Questions

75 of 240 questions · Page 1/4 · Reliability and Business Continuity · Answers revealed

Practice these questions Domain overview All questions

1

MCQmedium

A company runs an Amazon RDS for MySQL DB instance in us-east-1. The SysOps administrator needs to implement a disaster recovery solution that can recover from a regional outage with a Recovery Point Objective (RPO) of less than 1 second and a Recovery Time Objective (RTO) of less than 1 minute. Which solution should the administrator use?

A.Multi-AZ deployment

B.Cross-region read replica

C.Aurora Global Database

D.Automated snapshot copy to another region

AnswerC

Provides sub-second RPO and minute-level RTO across regions.

Why this answer

Aurora Global Database is the correct choice because it provides a fully managed cross-region replication solution with a typical RPO of less than 1 second and an RTO of less than 1 minute during a regional failover. It uses a primary cluster in one region and up to five secondary clusters in other regions, with asynchronous replication that is optimized for low latency, meeting the stringent RPO/RTO requirements.

Exam trap

The trap here is that candidates often confuse Multi-AZ deployments (which are for high availability within a region) with cross-region disaster recovery, and they underestimate the replication lag and failover time of standard cross-region read replicas versus the optimized architecture of Aurora Global Database.

How to eliminate wrong answers

Option A is wrong because Multi-AZ deployment provides high availability within a single region, not cross-region disaster recovery, and its failover RTO is typically 1-2 minutes, exceeding the required 1 minute. Option B is wrong because a cross-region read replica for RDS MySQL uses asynchronous replication with a typical RPO of seconds to minutes, not less than 1 second, and promoting a read replica to a primary instance can take several minutes, failing the RTO requirement. Option D is wrong because automated snapshot copy to another region has an RPO of at least 5 minutes (the minimum snapshot interval) and restoring from a snapshot can take tens of minutes, both far exceeding the required RPO and RTO.

Practice this question →

2

MCQhard

A company has a production DynamoDB table with on-demand capacity. They need to ensure business continuity with a Recovery Point Objective (RPO) of 5 minutes and a Recovery Time Objective (RTO) of 1 hour in case of a regional outage. What is the MOST cost-effective solution?

A.Use AWS Backup to schedule daily backups and restore in another region.

B.Enable DynamoDB global tables for the table.

C.Enable point-in-time recovery (PITR) on the table.

D.Configure cross-region read replicas for the table.

AnswerB

Global tables replicate data across multiple AWS Regions asynchronously, typically within seconds, achieving low RPO. Failover can be automated via Route 53 health checks to meet RTO.

Why this answer

Option A is correct because DynamoDB global tables provide cross-region replication with sub-second latency, meeting the 5-minute RPO, and failover can be automated within 1 hour. Option B is wrong because point-in-time recovery only protects against accidental writes/deletes, not regional outages. Option C is wrong because restoring a snapshot takes longer than 1 hour and data loss may exceed 5 minutes.

Option D is wrong because cross-region read replicas are not available for DynamoDB.

Practice this question →

3

MCQmedium

A company runs a critical web application on EC2 instances behind an Application Load Balancer across three Availability Zones. The application stores session data in an RDS MySQL database. To improve reliability, the company wants to ensure that a single Availability Zone failure does not impact the application's availability. Which combination of actions should the SysOps administrator take?

A.Configure the ALB to use only healthy instances and enable detailed CloudWatch metrics.

B.Increase the EC2 instance size to handle more traffic in a single AZ.

C.Increase the Auto Scaling group's desired capacity to a larger number.

D.Deploy RDS in Multi-AZ configuration with automatic failover, and enable cross-zone load balancing on the ALB.

AnswerD

Multi-AZ RDS provides database failover, cross-zone ALB distributes traffic across AZs.

Why this answer

Option C is correct because deploying RDS Multi-AZ with automatic failover ensures that database operations continue during an AZ outage. Enabling cross-zone load balancing ensures the ALB can route traffic to healthy instances in other AZs. Option A is wrong because ELB health checks alone do not handle AZ failures.

Option B is wrong because increasing instance size does not provide AZ redundancy. Option D is wrong because increasing the Auto Scaling group's desired capacity does not distribute across AZs if not configured.

Practice this question →

4

MCQmedium

A company runs a stateful web application on a single Amazon EC2 instance. The application stores session state in memory and writes critical data to an Amazon EBS volume. The SysOps administrator needs to implement a highly available architecture that can tolerate an Availability Zone (AZ) failure. The administrator plans to use an Auto Scaling group and an Application Load Balancer (ALB). Which combination of steps is required to make the application highly available while preserving session and data durability across AZ failures?

A.Create an AMI of the current instance, configure an Auto Scaling group with a launch template that uses the AMI, and attach the existing EBS volume to new instances.

B.Create a multi-AZ Auto Scaling group and use sticky sessions (session affinity) on the ALB to tie users to specific instances.

C.Use an Auto Scaling group across multiple AZs, migrate session storage to Amazon ElastiCache (multi-AZ), and migrate application data from EBS to Amazon EFS (file system mounted across AZs).

D.Use an Auto Scaling group in a single AZ and use a Multi-AZ RDS instance for data storage.

AnswerC

ElastiCache provides a shared, cross-AZ in-memory session store. EFS provides a shared, cross-AZ file system. The Auto Scaling group launches instances in multiple AZs, and the ALB distributes traffic. This architecture survives an AZ failure.

Why this answer

Option C is correct because it addresses both session state and data durability across AZ failures. Migrating session storage to ElastiCache (multi-AZ) ensures session data survives instance failure, and migrating application data from EBS to EFS provides a shared, multi-AZ file system that persists independently of any single EC2 instance. This combination allows the Auto Scaling group to launch new instances in any AZ and immediately access both session and application data.

Exam trap

The trap here is that candidates often assume sticky sessions (session affinity) alone are sufficient for high availability, but they fail to realize that sticky sessions do not replicate session state across instances, so an instance failure still loses the session data.

How to eliminate wrong answers

Option A is wrong because attaching the existing EBS volume to new instances is not possible across AZs (EBS volumes are AZ-scoped) and does not provide a shared, durable data layer; it also fails to address session state persistence. Option B is wrong because sticky sessions alone do not preserve session data if the instance fails; they only route traffic to the same instance, and if that instance goes down, the session is lost. Option D is wrong because using a single AZ for the Auto Scaling group cannot tolerate an AZ failure, and while Multi-AZ RDS handles database durability, it does not address the application's in-memory session state or EBS-stored data.

Practice this question →

5

MCQeasy

A company runs a web application on EC2 instances in an Auto Scaling group. The application is behind an Application Load Balancer. The company wants to ensure that the application can handle a sudden spike in traffic without downtime. What should the SysOps administrator do?

A.Use a scheduled scaling policy to add instances during business hours.

B.Configure a target tracking scaling policy based on average CPU utilization.

C.Reduce the number of Availability Zones to lower latency.

D.Manually increase the desired capacity of the Auto Scaling group when traffic increases.

AnswerB

Target tracking automatically adjusts capacity to maintain a target metric.

Why this answer

Dynamic scaling policies adjust capacity based on demand. Option A is wrong because manual scaling is not automated. Option C is wrong because scheduled scaling is for predictable patterns.

Option D is wrong because it reduces availability.

Practice this question →

6

Multi-Selecteasy

A company wants to ensure its SysOps Administrator can recover an EBS volume from a snapshot. Which TWO steps are required? (Choose TWO.)

Select 2 answers

A.Configure a KMS key for the snapshot.

B.Attach the volume to an EC2 instance.

C.Specify the Availability Zone when creating the volume.

D.Create a snapshot of the volume.

E.Enable encryption on the snapshot.

AnswersC, D

Required to create the volume in the correct AZ.

Why this answer

Option A is correct because you must have a snapshot to create a volume. Option D is correct because when creating a volume from a snapshot, you must specify the Availability Zone. Option B is wrong because snapshots are already encrypted at rest.

Option C is wrong because instance store volumes cannot be created from snapshots. Option E is wrong because KMS keys are optional for encryption.

Practice this question →

7

Multi-Selecthard

A company runs a stateless web application on EC2 instances behind an Application Load Balancer. The SysOps Administrator needs to ensure the application can withstand the loss of an entire Availability Zone. Which THREE steps should be taken? (Choose THREE.)

Select 3 answers

A.Enable cross-zone load balancing on the ALB.

B.Configure the Auto Scaling group to launch instances in at least two Availability Zones.

C.Ensure the ALB is configured to route traffic to all enabled AZs.

D.Configure the Auto Scaling group to use a dynamic scaling policy based on CPU utilization.

E.Use an Elastic IP address for each EC2 instance.

AnswersB, C, D

Ensures instances are distributed across AZs.

Why this answer

Option A is correct because deploying across multiple AZs ensures that if one AZ fails, the application still runs. Option C is correct because ALB can distribute traffic to healthy instances in any AZ. Option D is correct because Auto Scaling across AZs replaces failed instances in other AZs.

Option B is wrong because it would distribute traffic to one AZ at a time. Option E is wrong because EC2 is not zone-specific.

Practice this question →

8

Multi-Selectmedium

A company is designing a disaster recovery strategy for a production RDS for MySQL database. The database is currently single-AZ. The recovery point objective (RPO) is 1 hour, and the recovery time objective (RTO) is 15 minutes. Which steps should the SysOps administrator take to meet these requirements? (Choose THREE.)

Select 3 answers

A.Disable automated backups to reduce performance impact.

B.Take manual DB snapshots every hour.

C.Enable automated backups with a retention period of at least 1 day.

D.Modify the DB instance to be Multi-AZ.

E.Create a read replica in a different Availability Zone.

AnswersB, C, D

Manual snapshots provide additional recovery points and can be restored quickly.

Why this answer

Options A, C, and E are correct. Enabling automated backups provides point-in-time recovery with an RPO of 5 minutes (within 1 hour requirement). Multi-AZ provides automatic failover with RTO typically under 1 minute (within 15 minutes).

Taking regular manual snapshots complements automated backups for long-term retention. Option B is wrong because read replicas are for read scaling, not failover. Option D is wrong because disabling automated backups increases RPO.

Practice this question →

9

Multi-Selecthard

A company runs a web application on EC2 instances in an Auto Scaling group. The application uses an Amazon RDS Multi-AZ DB instance. The SysOps administrator notices that during a recent failover test, the application became unresponsive for several minutes. The administrator wants to improve the application's resilience during failover. Which three actions should the administrator take? (Choose THREE.)

Select 3 answers

A.Configure the Application Load Balancer health checks to have a low threshold (e.g., 2 consecutive failures) and a short interval (e.g., 5 seconds).

B.Implement retry logic in the application to handle transient database connection failures.

C.Change the RDS DB instance to use asynchronous replication instead of synchronous replication.

D.Increase the EC2 instance size to handle more connections during failover.

E.Configure an Amazon RDS Proxy in front of the RDS database to pool and share database connections.

AnswersA, B, E

Fast health checks allow the ALB to quickly take failing instances out of service, reducing user impact.

Why this answer

Options A, B, and D are correct. Using an RDS proxy (A) manages connections and reduces failover time. Retry logic (B) allows the application to reconnect after failover.

Health checks with low thresholds (D) help the load balancer quickly detect and route away from unhealthy instances. Option C is wrong because synchronous replication is already used in Multi-AZ; asynchronous replication would reduce durability. Option E is wrong because increasing instance size does not reduce failover time.

Practice this question →

10

MCQhard

A company runs a web application on EC2 instances in a private subnet. The application needs to connect to an RDS database in a different VPC. The VPCs are peered. The SysOps Administrator is troubleshooting connectivity issues. The RDS security group allows inbound traffic from the EC2 security group, but connections still fail. What could be the issue?

A.The RDS instance does not have public DNS resolution enabled.

B.The network ACL for the private subnet is blocking inbound traffic.

C.The route tables in each VPC do not have routes to the peered VPC CIDR.

D.The security group outbound rules on the EC2 instance are blocking traffic.

AnswerC

Without proper routes, traffic cannot flow across the VPC peering connection.

Why this answer

Option A is correct because VPC peering does not support transitive routing, so the EC2 instance cannot access the RDS if it is in a different region or if the route tables are not configured properly. Option B is wrong because security groups are stateful. Option C is wrong because network ACLs must allow outbound traffic, but the error description suggests routing issue.

Option D is wrong because DNS resolution is not needed for direct IP connectivity.

Practice this question →

11

MCQhard

A company runs a critical e-commerce application on Amazon ECS with Fargate launch type, fronted by an Application Load Balancer. The application uses an Amazon ElastiCache for Redis cluster for session state and an Amazon RDS for MySQL Multi-AZ database for persistent data. Recently, during a deployment of a new service version, the application became unresponsive for 15 minutes. The SysOps administrator discovered that the deployment updated the task definition with a new environment variable that pointed to an incorrect ElastiCache endpoint. The ECS service was configured with a rolling update, minimum healthy percent of 50%, and maximum percent of 200%. After the deployment, all tasks failed health checks due to a connection timeout to the wrong Redis endpoint. What is the MOST effective way to prevent this issue in future deployments?

A.Configure a CloudWatch alarm that triggers an automatic rollback if the error rate exceeds 10%.

B.Update the ECS service to use a canary deployment by updating one task at a time.

C.Implement a blue/green deployment strategy using AWS CodeDeploy and test the new task definition before shifting traffic.

D.Enable ECS deployment circuit breaker and set the rollback configuration to automatically roll back failed deployments.

AnswerC

Allows pre-production validation.

Why this answer

Option B is correct. Using AWS CodeDeploy with a blue/green deployment strategy allows testing the new task definition with a new target group before switching traffic. If health checks fail, traffic is not shifted.

Option A is wrong because CloudWatch alarms alert but do not prevent the deployment. Option C is wrong because updating one task first still risks the entire deployment if the variable is wrong. Option D is wrong because Circuit breaker only rolls back after failure, not preventing impact.

Practice this question →

12

MCQhard

A company runs a critical application on an EC2 instance that stores data on an EBS volume. The SysOps administrator needs to implement a backup strategy that provides the ability to restore the volume to a specific point in time within the last 24 hours, with a recovery time objective (RTO) of less than 15 minutes. Which solution meets these requirements?

A.Configure a RAID 1 mirror of the EBS volume across two Availability Zones.

B.Enable automated backups on the EC2 instance.

C.Use AWS Backup to create backup plans for the EBS volume.

D.Schedule EBS snapshots every hour and keep them for 24 hours.

AnswerD

EBS snapshots provide point-in-time recovery and can be restored to a new volume quickly, meeting the RTO.

Why this answer

Option C is correct because EBS Snapshots are point-in-time backups that can be used to create a new volume and attach it to an instance within minutes. Option A is wrong because automated backups are not available for EBS volumes. Option B is wrong because AWS Backup supports EBS snapshots but adds no advantage for RTO; snapshots still need to be restored.

Option D is wrong because RAID 1 mirrors data but does not provide point-in-time recovery.

Practice this question →

13

MCQmedium

A company runs a critical production database on Amazon RDS for MySQL with Multi-AZ deployment. The SysOps administrator needs to be automatically notified when a failover event occurs, and also capture the exact time and reason for the failover for compliance purposes. Which AWS service or feature should be used to capture the failover event details with the least operational overhead?

A.Create an Amazon CloudWatch Events rule that matches the 'RDS DB Instance Event' for 'failover' and sends the event to an Amazon SNS topic for notification and logging.

B.Enable detailed monitoring on the RDS instance and stream the logs to Amazon CloudWatch Logs where a metric filter can detect failover patterns.

C.Configure AWS CloudTrail to log all RDS API calls and analyze the logs for the 'Failover' event type.

D.Use AWS Config to create a config rule that evaluates whether the 'DBInstanceStatus' changes to 'failover' and then trigger a remediation action.

AnswerA

This correctly uses CloudWatch Events / EventBridge to capture RDS events including failovers with detailed information such as time and cause. It requires minimal setup and is designed for this purpose.

Why this answer

Amazon CloudWatch Events (now part of Amazon EventBridge) can match RDS DB Instance events, including 'failover', and route them to an SNS topic for notification and to CloudWatch Logs for logging. This approach requires no custom scripting or polling, providing the least operational overhead while capturing the exact time and reason for the failover directly from the RDS event stream.

Exam trap

The trap here is that candidates confuse CloudTrail (which logs API calls) with RDS events (which log internal service events), leading them to choose CloudTrail even though automatic failovers are not API-driven and thus not recorded by CloudTrail.

How to eliminate wrong answers

Option B is wrong because detailed monitoring on RDS provides enhanced metrics (e.g., CPU, memory) but does not generate failover events or detect failover patterns; metric filters on CloudWatch Logs would require RDS to log failover details to CloudWatch Logs, which RDS does not do by default. Option C is wrong because AWS CloudTrail logs API calls (e.g., FailoverDBInstance), not internal failover events triggered by AWS; a Multi-AZ failover is an automatic process, not an API call, so CloudTrail will not capture it. Option D is wrong because AWS Config evaluates resource configuration changes (e.g., DBInstanceStatus) but does not natively detect a 'failover' status change; the DBInstanceStatus transitions through multiple states (e.g., 'creating', 'available', 'resetting-master-credentials') and 'failover' is not a valid status—Config rules would require custom logic and still not capture the exact reason for the failover.

Practice this question →

14

MCQeasy

A company uses Amazon S3 to store critical data. The SysOps administrator needs to protect against accidental deletion of objects. Which combination of actions should the administrator take? (Choose the best answer.)

A.Apply a bucket policy that denies s3:DeleteObject for all principals.

B.Set a lifecycle policy to expire objects after 30 days.

C.Enable S3 Versioning and MFA Delete on the bucket.

D.Configure cross-Region replication to a different bucket.

AnswerC

Versioning preserves overwrites and deletes; MFA Delete adds extra protection.

Why this answer

Option B is correct: Enable S3 Versioning to preserve deleted objects and MFA Delete to require additional authentication for permanent deletions. Option A is incorrect because bucket policies do not prevent deletion by authorized users. Option C is incorrect because replication does not protect against deletion; it copies objects to another bucket.

Option D is incorrect because lifecycle policies can delete objects automatically, increasing risk.

Practice this question →

15

MCQmedium

A company runs a file-sharing application on AWS. Users upload files to an S3 bucket, which triggers a Lambda function to process the files and store metadata in a DynamoDB table. Recently, users have reported that some uploaded files are never processed. The SysOps Administrator checks the CloudWatch logs and finds no errors from the Lambda function. The S3 bucket is configured to send events to the Lambda function. The DynamoDB table has sufficient write capacity. The administrator suspects that the event notifications are being lost. Which action should the SysOps Administrator take to ensure that every file upload triggers a Lambda function and that the function processes the file successfully?

A.Configure an SQS queue as the event destination for the S3 bucket, and have the Lambda function process messages from the queue.

B.Use DynamoDB Streams to capture file metadata changes instead of Lambda invocation.

C.Increase the Lambda function's reserved concurrency to handle more invocations.

D.Increase the write capacity of the DynamoDB table to avoid throttling.

AnswerA

SQS provides reliable message delivery and retries.

Why this answer

Option B is correct because enabling SQS as the event destination provides a durable queue that can retry failed deliveries. The Lambda function can poll the queue and process messages reliably. Option A is wrong because adding more Lambda concurrency does not solve event loss.

Option C is wrong because DynamoDB Streams are not triggered by S3 events. Option D is wrong because increasing DynamoDB capacity does not address event notification issues.

Practice this question →

16

MCQmedium

A company has a production Amazon RDS for MySQL DB instance in a single Availability Zone. The SysOps administrator needs to improve database availability to ensure automatic failover in the event of a database failure or an Availability Zone outage. Which configuration should the administrator enable?

A.Enable Multi-AZ deployment

B.Create a read replica in another Availability Zone

C.Enable automated backups

D.Change the DB instance to a larger instance class

AnswerA

Multi-AZ provides a standby instance in another AZ with automatic failover, meeting the high availability requirement.

Why this answer

Enabling a Multi-AZ deployment for Amazon RDS for MySQL automatically provisions and maintains a synchronous standby replica in a different Availability Zone. In the event of a database failure or an AZ outage, Amazon RDS automatically fails over to the standby replica, typically within 60–120 seconds, without requiring manual intervention. This configuration meets the requirement for automatic failover and improved availability.

Exam trap

The trap here is that candidates often confuse a read replica with a Multi-AZ standby, mistakenly believing that a read replica can provide automatic failover, but read replicas require manual promotion and do not maintain synchronous replication.

How to eliminate wrong answers

Option B is wrong because a read replica is an asynchronous copy used for offloading read traffic, not for automatic failover; while it can be promoted to a standalone instance, this requires manual action and does not provide automatic failover. Option C is wrong because automated backups only enable point-in-time recovery and do not provide any failover capability or high availability. Option D is wrong because changing the DB instance to a larger instance class improves performance and scalability but does not provide redundancy or automatic failover across Availability Zones.

Practice this question →

17

MCQeasy

A company is designing a disaster recovery plan for its on-premises database. They need to replicate the database to AWS with low latency. Which AWS service should they use?

A.Amazon S3 with Cross-Region Replication.

B.AWS Storage Gateway with volume gateway.

C.AWS Database Migration Service (DMS) with ongoing replication.

D.AWS Direct Connect to establish a dedicated network connection.

AnswerC

DMS can replicate databases continuously to RDS.

Why this answer

Option B is correct because AWS Database Migration Service (DMS) supports continuous replication from on-premises to AWS RDS. Option A is wrong because S3 is object storage. Option C is wrong because Direct Connect is a network connection, not a replication service.

Option D is wrong because Storage Gateway is for file/volume storage, not database replication.

Practice this question →

18

MCQeasy

A company uses Amazon Route 53 for DNS resolution. The company wants to ensure that if a web server becomes unhealthy, traffic is automatically routed to a healthy server in another Availability Zone. Which routing policy should be used?

A.Latency routing policy

B.Weighted routing policy

C.Geolocation routing policy

D.Failover routing policy

AnswerD

Routes to primary unless unhealthy, then to secondary.

Why this answer

Option A is correct because failover routing policy directs traffic to a primary resource, and if it is unhealthy, to a secondary resource. Option B is wrong because weighted routing distributes traffic proportionally, not based on health. Option C is wrong because latency routing routes based on latency, not health.

Option D is wrong because geolocation routes based on geographic location.

Practice this question →

19

MCQmedium

A SysOps administrator needs to implement a backup strategy for an Amazon RDS for PostgreSQL database. The database is 500 GB and experiences heavy write traffic. Which solution provides the most cost-effective backup with the least impact on database performance?

A.Enable automated backups with a retention period of 7 days.

B.Create a Multi-AZ deployment and use the standby for backups.

C.Use AWS Database Migration Service to continuously replicate data to an S3 bucket.

D.Take manual DB snapshots daily during off-peak hours.

AnswerA

Automated backups have minimal performance impact and provide point-in-time recovery.

Why this answer

Option B is correct because automated backups are enabled by default with minimal performance impact and include transaction logs for point-in-time recovery. Option A is wrong because manual snapshots require a brief I/O suspension and are less automated. Option C is wrong because read replicas are for read scaling, not backups.

Option D is wrong because exporting to S3 via AWS DMS incurs additional cost and complexity.

Practice this question →

20

Multi-Selectmedium

A company is designing a disaster recovery plan for its critical applications. The plan must minimize data loss and recovery time. Which TWO measures should the SysOps administrator implement?

Select 2 answers

A.Perform regular backups to Amazon S3.

B.Set a recovery time objective (RTO) of 24 hours.

C.Use manual procedures to restore from backups.

D.Run all workloads in a single AWS region.

E.Replicate data to another AWS region.

AnswersA, E

Backups provide data recovery.

Why this answer

Options A and C are correct. Regular backups to S3 provide data durability, and multi-region replication ensures availability in another region. Option B is wrong because a single region does not protect against region failure.

Option D is wrong because manual failover is slow. Option E is wrong because RTO is about time, not backup frequency.

Practice this question →

21

MCQhard

A company has a production AWS account with multiple VPCs connected via a transit gateway. The security team requires that all cross-VPC traffic be inspected by a centralized network firewall appliance. The firewall is deployed in a dedicated inspection VPC. The SysOps administrator must ensure that traffic from VPC A to VPC B is routed through the inspection VPC. Which configuration achieves this?

A.Create a VPC peering connection between VPC A and VPC B, and use route tables to send traffic through the inspection VPC.

B.Configure the transit gateway route tables to propagate a blackhole route for VPC A and VPC B CIDRs to the inspection VPC attachment, then have the firewall forward traffic.

C.Use security groups in the inspection VPC to filter traffic between VPC A and VPC B.

D.Attach a NAT gateway in the inspection VPC and configure route tables in VPC A and B to point to the NAT gateway.

AnswerB

TGW route tables can force traffic through inspection VPC.

Why this answer

Option D is correct because a blackhole route to the inspection VPC forces traffic to go through it, and then the firewall forwards it. Option A is wrong because NAT gateways do not inspect traffic. Option B is wrong because VPC peering does not route through a central inspection point.

Option C is wrong because Security Groups are stateful and not designed for centralized inspection.

Practice this question →

22

MCQmedium

A company is running a web application on EC2 instances behind an Application Load Balancer. They want to ensure that if an entire Availability Zone fails, the application remains available. Which configuration should they implement?

A.Configure the Auto Scaling group to launch instances in multiple Availability Zones.

B.Use an RDS Multi-AZ deployment for the application.

C.Use a larger EC2 instance type.

D.Enable detailed monitoring on the EC2 instances.

AnswerA

Distributing instances across AZs ensures availability if one AZ fails.

Why this answer

Option B is correct because deploying EC2 instances across multiple Availability Zones (AZs) ensures that if one AZ fails, traffic can be routed to healthy instances in other AZs. Option A is wrong because an Auto Scaling group with a single AZ does not protect against AZ failure. Option C is wrong because a larger instance type improves performance but not availability.

Option D is wrong because read replicas are for databases, not EC2 instances.

Practice this question →

23

Multi-Selectmedium

A SysOps administrator is designing a highly available architecture for a web application using an Application Load Balancer (ALB) with EC2 instances in an Auto Scaling group. Which TWO configurations are required to ensure high availability? (Choose TWO.)

Select 2 answers

A.Launch all EC2 instances in a single Availability Zone to reduce latency

B.Use t2.micro instances to reduce cost

C.Configure the ALB with health checks for the target group

D.Disable health checks to reduce load on the ALB

E.Configure the Auto Scaling group to launch instances in at least two Availability Zones

AnswersC, E

Health checks ensure traffic is only sent to healthy instances.

Why this answer

Correct answers are A and C. Spreading instances across multiple Availability Zones ensures zone failure doesn't take down the application. Using an ALB with health checks ensures that unhealthy instances are removed from rotation.

Option B is wrong because placing instances in the same AZ creates a single point of failure. Option D is wrong because disabling health checks prevents the ALB from detecting failures. Option E is wrong because using t2.micro instances is not a high availability consideration.

Practice this question →

24

MCQeasy

A SysOps administrator needs to ensure that an EC2 instance automatically recovers from an underlying hardware failure. Which action should be taken?

A.Launch a second instance in a different Availability Zone.

B.Assign an Elastic IP address to the instance.

C.Create a CloudWatch alarm on the StatusCheckFailed metric and configure the recovery action.

D.Place the instance in an Auto Scaling group with a min size of 1.

AnswerC

CloudWatch alarm can initiate instance recovery.

Why this answer

Option B is correct because configuring the EC2 instance with a CloudWatch alarm based on StatusCheckFailed will trigger a recovery action that stops and starts the instance on new hardware. Option A is wrong because Auto Scaling groups manage instance counts but do not recover a specific instance. Option C is wrong because Elastic IP reassignment does not recover the instance.

Option D is wrong because a second instance does not automatically recover the original.

Practice this question →

25

MCQmedium

A company runs a web application on EC2 instances behind an Application Load Balancer. The database is an RDS MySQL instance with Multi-AZ enabled. The application experiences intermittent 5xx errors that correlate with database failover events. What is the MOST likely cause and solution?

A.Configure the application to use the RDS reader endpoint.

B.Use a read replica to offload read traffic and reduce failover impact.

C.Increase the database connection pool size to handle retries.

D.Enable DNS caching with a low TTL in the application and use the RDS instance endpoint with a retry mechanism.

AnswerD

Low TTL and retries ensure the application reconnects to the new primary quickly after failover.

Why this answer

Option D is correct because during a database failover, the DNS record for the RDS endpoint changes, and if the application caches the DNS resolution, it may try to connect to the old primary. Enabling TTL-aware caching forces the application to re-resolve DNS quickly. Option A is wrong because Multi-AZ already provides a standby; using a read replica does not address write failures.

Option B is wrong because the issue is DNS caching, not missing read replicas. Option C is wrong because increasing connection pool size does not handle DNS changes.

Practice this question →

26

Multi-Selecteasy

Which TWO actions should a SysOps administrator take to ensure high availability of a web application running on EC2 instances? (Choose two.)

Select 2 answers

A.Enable termination protection on all EC2 instances.

B.Launch all EC2 instances in a single Availability Zone.

C.Use a larger instance type for all EC2 instances.

D.Configure an Auto Scaling group with a health check to replace unhealthy instances.

E.Deploy EC2 instances across multiple Availability Zones.

AnswersD, E

Auto Scaling automatically replaces unhealthy instances.

Why this answer

Options B and D are correct. Deploying across multiple Availability Zones ensures that an AZ failure does not take down the entire application. Using an Auto Scaling group with a health check automatically replaces unhealthy instances.

Option A is incorrect because a single AZ is a single point of failure. Option C is incorrect because a larger instance type does not provide high availability. Option E is incorrect because termination protection does not automatically replace instances; it only prevents accidental termination.

Practice this question →

27

MCQhard

A company uses Amazon MQ (RabbitMQ) for messaging between microservices. The SysOps administrator needs to ensure the message broker is highly available with automatic failover and no data loss. Which deployment mode should be used?

A.Single-instance broker

B.Active/standby broker

C.Cluster deployment

D.Multi-AZ broker with read replicas

AnswerB

Active/standby provides automatic failover and synchronous replication, meeting the high availability and data loss prevention requirements.

Why this answer

Amazon MQ for RabbitMQ supports an active/standby deployment mode that provides automatic failover and no data loss. In this mode, one broker instance is active and a second is a synchronous standby; if the active fails, the standby takes over without losing messages because all data is replicated synchronously across both instances. This meets the high availability and data durability requirements specified in the question.

Exam trap

The trap here is that candidates confuse Amazon MQ's cluster deployment (which is for scaling) with active/standby (which is for high availability and data durability), or they incorrectly apply RDS Multi-AZ concepts to Amazon MQ.

How to eliminate wrong answers

Option A is wrong because a single-instance broker has no redundancy or automatic failover; if it fails, all messages are lost until manual recovery. Option C is wrong because RabbitMQ cluster deployment in Amazon MQ is designed for horizontal scaling and throughput, not for automatic failover with zero data loss; it uses asynchronous replication and can lose messages during a node failure. Option D is wrong because Amazon MQ does not support Multi-AZ brokers with read replicas; that concept applies to Amazon RDS, not to message brokers.

Practice this question →

28

MCQmedium

A company runs a critical application on Amazon EC2 instances with data stored on Amazon EBS volumes. The SysOps administrator needs to implement a backup strategy that supports point-in-time recovery with a Recovery Point Objective (RPO) of 1 hour and a Recovery Time Objective (RTO) of 4 hours. Which solution meets these requirements with the least operational overhead?

A.Use AWS Backup to schedule hourly EBS snapshots and restore to a new volume when needed.

B.Use Amazon Data Lifecycle Manager (DLM) to take hourly snapshots and create an AWS CloudFormation template to launch a new instance from the snapshot.

C.Use custom scripts to copy snapshots to an Amazon S3 bucket and restore from there.

D.Use Amazon S3 Lifecycle policies to transition data to Amazon S3 Glacier.

AnswerA

AWS Backup provides a centralized way to define backup policies (including hourly schedules) and automates retention. Restoring from a snapshot is straightforward and can be done within the RTO.

Why this answer

AWS Backup provides a fully managed, policy-based backup service that can schedule EBS snapshots hourly, meeting the 1-hour RPO. Restoring from an AWS Backup snapshot to a new EBS volume and attaching it to an EC2 instance can be completed within the 4-hour RTO, with minimal operational overhead as it eliminates the need for custom scripts or lifecycle management.

Exam trap

The trap here is that candidates may choose DLM (Option B) because it can schedule snapshots, but they overlook the operational overhead of manually creating a CloudFormation template for recovery, whereas AWS Backup provides a fully managed restore workflow that meets the least operational overhead requirement.

How to eliminate wrong answers

Option B is wrong because Amazon Data Lifecycle Manager (DLM) can schedule hourly snapshots, but requiring a CloudFormation template to launch a new instance from the snapshot adds unnecessary operational overhead and complexity, whereas AWS Backup can directly restore the volume and instance. Option C is wrong because using custom scripts to copy snapshots to S3 introduces additional complexity, potential for errors, and does not leverage native AWS backup services, increasing operational overhead. Option D is wrong because Amazon S3 Lifecycle policies are designed for object lifecycle management in S3, not for EBS snapshots or point-in-time recovery of EC2 instances, and S3 Glacier is for archival, not rapid recovery with a 4-hour RTO.

Practice this question →

29

Multi-Selectmedium

A company runs a critical application on Amazon EC2 instances in an Auto Scaling group. The application stores data on an Amazon EBS volume. The SysOps administrator needs to implement a backup strategy that ensures data can be recovered in the event of an AZ failure. Which TWO actions should be taken? (Choose TWO.)

Select 2 answers

A.Increase the EBS volume size to maximize I/O performance.

B.Configure automated snapshots using Amazon Data Lifecycle Manager.

C.Create a lifecycle policy to automatically take snapshots of the EBS volume and copy them to another region.

D.Enable EBS encryption using AWS KMS.

E.Enable EBS Multi-Attach to allow the volume to be attached to instances in another AZ.

AnswersB, C

DLM automates snapshot creation and retention.

Why this answer

Options B and D are correct. Option B: Cross-region snapshot copy protects against AZ failure by storing data in another region. Option D: Automating snapshots with a lifecycle policy ensures regular backups.

Option A is wrong because Multi-Attach EBS does not provide backup. Option C is wrong because increasing volume size does not provide backup. Option E is wrong because EBS encryption protects data at rest but not against failure.

Practice this question →

30

MCQmedium

A company runs a stateless web application on Amazon EC2 instances in an Auto Scaling group with a minimum of 2 and maximum of 10 instances. The instances are behind an Application Load Balancer (ALB). The SysOps administrator needs to ensure that the application can survive the failure of an entire AWS Availability Zone (AZ) in the region. Which configuration is necessary?

A.Configure the Auto Scaling group with subnets in at least two Availability Zones and ensure the ALB has subnets in the same AZs.

B.Increase the Auto Scaling group minimum to 10 instances to absorb the failure.

C.Use larger instance types to handle the load of a failed AZ.

D.Use multiple Application Load Balancers in different AZs.

AnswerA

This distributes instances across multiple AZs, so if one AZ fails, the other AZ continues serving traffic.

Why this answer

Option A is correct because deploying the Auto Scaling group across multiple Availability Zones (AZs) and ensuring the ALB has subnets in the same AZs allows the application to continue serving traffic even if one entire AZ fails. The ALB can route requests to healthy instances in the remaining AZs, and the Auto Scaling group will replace failed instances in other AZs as needed, maintaining the minimum instance count. This architecture is a fundamental pattern for high availability in AWS.

Exam trap

The trap here is that candidates often think increasing instance count or size alone provides high availability, but without multi-AZ distribution, a single AZ failure can still cause total application downtime.

How to eliminate wrong answers

Option B is wrong because simply increasing the minimum to 10 instances does not provide AZ resilience; all instances could still be in a single AZ, and a failure of that AZ would take down all 10 instances. Option C is wrong because using larger instance types only increases compute capacity per instance, but does not distribute instances across AZs; a single AZ failure would still eliminate all instances if they are all in that AZ. Option D is wrong because using multiple ALBs in different AZs is unnecessary and adds complexity; a single ALB can already distribute traffic across multiple AZs, and multiple ALBs would require additional DNS routing logic (e.g., Route 53) and do not inherently improve AZ failure survival.

Practice this question →

31

MCQhard

A company is running a stateful web application on a single EC2 instance in a public subnet. The instance stores user sessions locally. The company wants to improve availability without rewriting the application. Which design should they use?

A.Create a second EC2 instance in a different AZ and use Route 53 with health checks.

B.Use an Auto Scaling group across multiple AZs but keep sessions on instance.

C.Deploy an Application Load Balancer across multiple AZs, move session storage to ElastiCache, and use an Auto Scaling group.

D.Use an Application Load Balancer with sticky sessions and an Auto Scaling group in a single AZ.

AnswerC

ElastiCache externalizes session state; ALB and Auto Scaling provide high availability.

Why this answer

Option B is correct because using Network Load Balancer with target groups across AZs and storing sessions in ElastiCache allows statelessness and high availability without code changes. Option A is wrong because Route 53 health checks alone do not handle session state. Option C is wrong because an Application Load Balancer with sticky sessions still ties sessions to instances.

Option D is wrong because an Auto Scaling group without session externalization will lose sessions.

Practice this question →

32

MCQmedium

A team of developers is deploying a new microservice that uses Amazon DynamoDB as its data store. The SysOps administrator must ensure that the application can handle a sudden spike in read traffic without throttling. Which DynamoDB feature can be used to automatically handle increases in read capacity?

A.DynamoDB Global Tables

B.DynamoDB Time to Live (TTL)

C.DynamoDB Auto Scaling

D.DynamoDB Accelerator (DAX)

AnswerC

Auto scaling adjusts capacity automatically.

Why this answer

Option A is correct because DynamoDB auto scaling can adjust provisioned read capacity based on traffic. Option B is wrong because DynamoDB Accelerator (DAX) is a caching layer, not capacity scaling. Option C is wrong because global tables are for multi-region replication.

Option D is wrong because TTL is for item expiration.

Practice this question →

33

MCQhard

A company uses S3 to store critical data. They need to ensure that data can be recovered in the event of accidental deletion or overwriting by users. Which combination of actions should they take?

A.Enable S3 Cross-Region Replication and S3 Transfer Acceleration.

B.Enable S3 Versioning and S3 Transfer Acceleration.

C.Enable S3 Versioning and MFA Delete.

D.Enable S3 Server Access Logging and S3 Object Lock.

AnswerC

Versioning retains previous versions; MFA Delete prevents unauthorized deletions.

Why this answer

Option C is correct because enabling S3 Versioning allows recovery of previous versions, and enabling MFA Delete adds an extra layer of protection. Option A is wrong because S3 Cross-Region Replication does not prevent deletion. Option B is wrong because S3 Transfer Acceleration is for speed.

Option D is wrong because logging does not prevent deletion.

Practice this question →

34

MCQmedium

A SysOps administrator is designing a disaster recovery plan for a web application that runs on EC2 instances with data stored in an RDS MySQL database. The application requires a Recovery Point Objective (RPO) of 5 minutes and a Recovery Time Objective (RTO) of 1 hour. Which solution meets these requirements most cost-effectively?

A.Use a single RDS instance with daily snapshots and EC2 instance store.

B.Use EC2 Auto Scaling across Regions with an RDS standby instance.

C.Use RDS Multi-AZ with read replicas in another Region.

D.Use RDS Multi-AZ with automated backups and EC2 AMI backups.

AnswerD

Multi-AZ provides quick failover; AMI backups allow fast instance recovery.

Why this answer

Option C is correct: RDS Multi-AZ with automatic backups provides synchronous standby replication and point-in-time recovery, meeting RPO of 5 minutes. AMI backups of EC2 can be used to launch instances quickly, meeting RTO of 1 hour. Option A is incorrect because read replicas are not used for failover by default; promoting a read replica takes time.

Option B is incorrect because standby EC2 in another Region incurs higher costs and complexity. Option D is incorrect because a single RDS instance is a single point of failure.

Practice this question →

35

Multi-Selectmedium

A company is designing a disaster recovery strategy for its primary AWS region. The application runs on EC2 instances with an RDS database. The RPO is 15 minutes and RTO is 2 hours. Which TWO actions should the SysOps Administrator take to meet these requirements? (Choose TWO.)

Select 2 answers

A.Create AMIs of the EC2 instances and share them with the DR region.

B.Configure RDS cross-region replication to a second region.

C.Take manual RDS snapshots every 15 minutes and copy them to the DR region.

D.Use AWS Backup to copy backups to a second region daily.

E.Store CloudFormation templates in S3 with cross-region replication.

AnswersB, E

Provides near real-time replication, meeting RPO.

Why this answer

Option A is correct because RDS cross-region replication can achieve RPO of seconds to minutes. Option D is correct because using CloudFormation with a template stored in S3 allows quick provisioning of resources in the DR region, meeting RTO. Option B is wrong because AMIs are region-specific; they must be copied to the DR region.

Option C is wrong because it does not meet RPO. Option E is wrong because manual snapshots cannot achieve 15-minute RPO.

Practice this question →

36

Multi-Selecthard

A company runs a stateless web application on EC2 instances behind an Application Load Balancer. The application is deployed in an Auto Scaling group with a minimum of 2 and maximum of 10 instances. During a traffic spike, the Auto Scaling group launches new instances, but the new instances are immediately marked as unhealthy by the ALB and terminated. What could be the cause? (Choose TWO.)

Select 2 answers

A.The health check path is misconfigured.

B.The Auto Scaling group does not have sufficient capacity in the target AZ.

C.The instances do not have the required IAM role to register with the ALB.

D.The security group for the instances does not allow inbound traffic from the ALB.

E.The instances are launched with a larger instance type than expected.

AnswersA, D

A misconfigured health check path causes the ALB to consider instances unhealthy.

Why this answer

Options B and D are correct. If the health check path is incorrect, the ALB marks instances as unhealthy. If the security group does not allow traffic from the ALB, the health check fails.

Option A is wrong because a missing IAM role does not cause health check failures. Option C is wrong because insufficient capacity would prevent launch, not cause health check failures. Option E is wrong because a larger instance would not cause health check failures.

Practice this question →

37

MCQmedium

A company runs a critical web application on Amazon EC2 instances in an Auto Scaling group behind an Application Load Balancer (ALB). The application uses session stickiness (sticky sessions) to maintain user sessions. The SysOps administrator notices that when instances are replaced during a scale-in or failure event, users lose their session data. The administrator needs to preserve session data across instance failures without losing stickiness benefits. What should the administrator do?

A.Disable sticky sessions on the ALB and configure the application to store session data in an external session store like Amazon ElastiCache for Redis.

B.Increase the stickiness duration to a very high value so that sessions are not lost during brief interruptions.

C.Change the Auto Scaling group to use a larger instance type to handle more sessions per instance, reducing the likelihood of session loss.

D.Configure the Auto Scaling group to use a larger minimum size and a lower maximum, so instances are less likely to be terminated.

AnswerA

This decouples session state from the EC2 instance. If an instance fails, any other instance can retrieve the session from ElastiCache, preserving the session for the user.

Why this answer

Option A is correct because storing session data externally in a service like Amazon ElastiCache for Redis decouples session state from the EC2 instance lifecycle. This allows the ALB to continue using sticky sessions to route a user to the same instance, but if that instance fails or is replaced, the new instance can retrieve the session data from the shared Redis cache, preserving the user's session without interruption.

Exam trap

The trap here is that candidates may think sticky sessions alone preserve session data, but they only preserve routing affinity, not the session state itself, which must be stored externally to survive instance failures.

How to eliminate wrong answers

Option B is wrong because increasing the stickiness duration does not preserve session data when an instance is terminated or fails; it only controls how long the ALB remembers the routing cookie, but the session data stored locally on the instance is still lost. Option C is wrong because using a larger instance type does not solve the fundamental problem of session data being stored locally; it only reduces the frequency of scale-in events but does not protect against instance failures or replacements. Option D is wrong because adjusting the Auto Scaling group's minimum and maximum sizes does not prevent session loss during scale-in or failure events; it only changes the number of instances running, but any instance that is terminated or replaced will still lose its locally stored session data.

Practice this question →

38

MCQeasy

A company uses Amazon Route 53 for DNS. They want to ensure that if the primary web server fails, traffic is automatically routed to a secondary server in another region. Which routing policy should be used?

A.Simple routing policy

B.Failover routing policy

C.Latency routing policy

D.Weighted routing policy

AnswerB

Failover routing automatically routes to secondary when primary is unhealthy.

Why this answer

Correct answer is C. Failover routing policy allows you to configure active-passive failover. Option A is wrong because simple routing does not support health checks.

Option B is wrong because weighted routing distributes traffic based on weights, not failover. Option D is wrong because latency routing routes based on latency, not failover.

Practice this question →

39

MCQmedium

A company runs a critical production database on Amazon RDS for MySQL with a Multi-AZ deployment. The database experiences a primary instance failure. The SysOps administrator needs to understand exactly how the failover process worked and why the application experienced a longer-than-expected downtime. Which AWS service or feature should the administrator use to review detailed events and actions during the failover?

A.AWS Personal Health Dashboard

B.Amazon RDS Performance Insights

C.Amazon CloudWatch Logs

D.AWS CloudTrail

AnswerA

The Personal Health Dashboard shows relevant events and notifications specific to the customer's RDS Multi-AZ failover, including timing and causes.

Why this answer

AWS Personal Health Dashboard provides a personalized view of the health of AWS services and resources, including detailed event logs for RDS Multi-AZ failovers. It surfaces the exact sequence of actions (e.g., DNS record update, failover initiation, completion) and any underlying AWS infrastructure issues that caused the extended downtime, such as degraded hardware or network latency. This is the correct tool because it gives the administrator a chronological, AWS-side account of the failover process, which is not available through other services.

Exam trap

The trap here is that candidates often confuse AWS CloudTrail (which records API calls) with the ability to view internal service events, but CloudTrail does not capture automatic failover processes or infrastructure health events that are only available through AWS Personal Health Dashboard.

How to eliminate wrong answers

Option B is wrong because Amazon RDS Performance Insights focuses on database performance metrics (e.g., CPU, memory, SQL query load) and does not log failover events or infrastructure-level actions. Option C is wrong because Amazon CloudWatch Logs can capture RDS log files (e.g., error logs, slow query logs) but does not inherently record the failover process steps or AWS-side infrastructure events; it would require custom agent configuration to capture such data. Option D is wrong because AWS CloudTrail records API calls made to the RDS service (e.g., ModifyDBInstance) but does not capture internal failover events or DNS propagation details that occur automatically during a Multi-AZ failover.

Practice this question →

40

MCQhard

A company has a production RDS for PostgreSQL instance. They need to recover from a logical corruption that occurred 2 hours ago. Which recovery method will minimize data loss?

A.Restore from the latest automated snapshot taken 1 hour ago.

B.Use pg_dump to export the database and restore it.

C.Fail over to the read replica in another AZ.

D.Perform a point-in-time recovery to a time just before the corruption occurred.

AnswerD

PITR uses automated backups and transaction logs to restore to a specific point in time within seconds.

Why this answer

Option A is correct because point-in-time recovery can restore to any time within the backup retention period, allowing recovery to just before the corruption. Option B is wrong because a snapshot from 1 hour ago would include the corruption. Option C is wrong because a read replica contains the corruption if it was replicated.

Option D is wrong because a full database restore from backup is less precise.

Practice this question →

41

MCQhard

A company has a production RDS for PostgreSQL instance with Multi-AZ enabled. During a recent failover test, the application experienced a 5-minute downtime. The company requires that failover be completed within 2 minutes. Which action should be taken to meet this requirement?

A.Migrate the database to Amazon Aurora with Multi-AZ.

B.Enable automated backups with a short retention period.

C.Increase the DB instance class to a larger size.

D.Configure an RDS Proxy in front of the database.

AnswerD

RDS Proxy pools connections and reduces failover time.

Why this answer

Option C is correct because RDS Proxy reduces failover time by pooling database connections and maintaining them during failover. Option A is wrong because increasing instance size does not directly reduce failover time. Option B is wrong because enabling automatic backups does not affect failover speed.

Option D is wrong because switching to Aurora may help but is a larger architectural change; RDS Proxy is a simpler solution for RDS.

Practice this question →

42

MCQmedium

A company runs a stateful web application on a single Amazon EC2 instance. The SysOps administrator needs to implement a high availability architecture that can tolerate an Availability Zone (AZ) failure. The application stores session state in memory and also writes critical data to an Amazon EBS volume. The administrator wants to use an Auto Scaling group and an Application Load Balancer (ALB). Which combination of steps is required to make the application highly available?

A.Create an Auto Scaling group that spans at least two Availability Zones, attach the existing EBS volume to the new instances, and use an ALB to distribute traffic.

B.Migrate session state to Amazon ElastiCache for Redis, store critical data in Amazon EFS, create an Auto Scaling group across multiple AZs, and place it behind an ALB.

C.Place the EC2 instance in an Auto Scaling group with a minimum and maximum of 1 in the same AZ, and attach an Elastic IP to the instance.

D.Use an ALB with the existing single instance as the target, and enable cross-zone load balancing.

AnswerB

Externalizing session state and data to shared services (ElastiCache, EFS) allows any instance to take over. An Auto Scaling group across multiple AZs and an ALB provide fault tolerance for AZ failures.

Why this answer

Option B is correct because it addresses both the stateless requirement for horizontal scaling and the persistence of critical data across AZ failures. Migrating session state to ElastiCache for Redis removes the dependency on local instance memory, allowing any instance to handle any request. Storing critical data on Amazon EFS provides a shared, NFS-based file system that is accessible from all instances across multiple AZs, unlike EBS which is tied to a single AZ.

Combining these with a multi-AZ Auto Scaling group and an ALB ensures the application can survive an entire AZ outage.

Exam trap

The trap here is that candidates assume EBS volumes can be shared across instances or AZs, or that a single-instance setup with an ALB provides high availability, when in fact EBS is a single-AZ resource and the ALB requires multiple healthy targets to tolerate failures.

How to eliminate wrong answers

Option A is wrong because EBS volumes are AZ-scoped and cannot be attached to instances in a different AZ; attaching the existing EBS volume to new instances in another AZ is impossible without snapshotting and recreating, which defeats high availability. Option C is wrong because keeping a single instance in one AZ with an Elastic IP does not provide fault tolerance for an AZ failure; the Auto Scaling group with min/max of 1 cannot replace the instance in a different AZ automatically, and the Elastic IP does not reroute traffic to a healthy instance. Option D is wrong because using an ALB with a single instance as the target and enabling cross-zone load balancing does not add redundancy; if the instance or its AZ fails, the ALB has no other targets to route traffic to, so the application becomes unavailable.

Practice this question →

43

MCQhard

A company uses a Multi-AZ RDS for MySQL instance for its production database. During a maintenance window, the primary instance fails and a failover occurs. However, the application experiences a 5-minute downtime. The application uses a DNS CNAME record pointing to the RDS endpoint. What is the MOST likely cause of the downtime?

A.The application was using a cached DNS resolution for the RDS endpoint.

B.The application was not configured to retry connections after a failover.

C.The RDS endpoint changed after failover and the application did not update.

D.The failover process took longer than expected due to a large transaction log.

AnswerA

DNS caching can cause continued use of old IP.

Why this answer

Option B is correct because DNS caching at the client or resolver level can cause the application to continue using the old IP address until the TTL expires. Option A is wrong because Multi-AZ failover typically completes within minutes. Option C is wrong because RDS endpoints automatically update.

Option D is wrong because Multi-AZ failover is automatic.

Practice this question →

44

MCQhard

A company runs a critical web application on Amazon EC2 instances that are part of an Auto Scaling group. The application receives unpredictable traffic spikes. The SysOps administrator needs to ensure that when a scale-out event occurs, new instances are ready to serve traffic quickly to minimize latency spikes. Currently, the instance launch and configuration process (including software installs and cache warming) takes about 5 minutes. The administrator wants to reduce the time it takes for new instances to start serving traffic. Which combination of Auto Scaling features should be used?

A.Use a launch template that includes a pre-warmed Amazon Machine Image (AMI) with all software pre-installed, and configure the Auto Scaling group to use a larger instance type to reduce initialization time.

B.Implement an Auto Scaling warm pool with a minimum number of pre-initialized instances in a 'Stopped' state. Configure the scaling policy to move instances from the warm pool to the Auto Scaling group when needed.

C.Use scheduled scaling to predictively launch instances before the traffic spikes based on historical patterns.

D.Configure lifecycle hooks to add a wait time during instance launch so that the instance is fully configured before it is placed behind the load balancer.

AnswerB

A warm pool maintains instances that have been fully launched and configured but are stopped or in a standby state. When scale-out occurs, instances from the warm pool are started or moved into service quickly, drastically reducing the time to handle traffic.

Why this answer

Option B is correct because an Auto Scaling warm pool maintains a pool of pre-initialized instances in a 'Stopped' state that are fully configured (software installed, cache warmed) and ready to serve traffic. When a scale-out event occurs, instances from the warm pool are moved to the Auto Scaling group and transitioned to 'Running' state, bypassing the 5-minute launch and configuration delay, thereby minimizing latency spikes.

Exam trap

The trap here is that candidates often confuse warm pools with lifecycle hooks or pre-warmed AMIs, assuming that reducing software install time alone is sufficient, when the real bottleneck is the entire instance initialization process that warm pools bypass.

How to eliminate wrong answers

Option A is wrong because using a pre-warmed AMI reduces software installation time but does not eliminate the instance launch and initialization overhead (e.g., kernel boot, network setup, cache warming), and using a larger instance type does not inherently reduce initialization time—it may even increase it due to more hardware resources to initialize. Option C is wrong because scheduled scaling relies on predictable traffic patterns and cannot handle unpredictable traffic spikes; it would either over-provision or under-provision for unexpected demand. Option D is wrong because lifecycle hooks add a wait time during instance launch, which would increase the time before the instance is ready to serve traffic, contradicting the goal of reducing latency spikes.

Practice this question →

45

MCQeasy

A company runs a critical application on an EC2 instance backed by Amazon EBS. To protect against data loss, the company wants to create a backup strategy that allows for point-in-time recovery. Which solution should be used?

A.Configure an S3 Lifecycle policy to move data to Glacier.

B.Create an Amazon Machine Image (AMI) of the instance.

C.Use Amazon EFS to store data.

D.Create automated EBS snapshots.

AnswerD

Correct: EBS snapshots provide point-in-time backups of volumes.

Why this answer

Option B is correct because Amazon EBS snapshots provide point-in-time backups of EBS volumes, stored in S3. Option A is wrong because S3 lifecycle policies do not back up EBS volumes. Option C is wrong because AMIs include snapshots but are not the primary backup mechanism for data.

Option D is wrong because EFS is a file system, not a backup service.

Practice this question →

46

MCQmedium

A SysOps administrator receives an alert that an EC2 instance in an Auto Scaling group is unhealthy. The instance fails the EC2 status check. What is the BEST course of action to restore availability automatically?

A.Use AWS Systems Manager to replace the underlying host.

B.Manually reboot the instance from the EC2 console.

C.Create a CloudWatch alarm that triggers an SNS notification to the administrator.

D.Configure the Auto Scaling group to use EC2 status checks for health checks and set the health check grace period appropriately.

AnswerD

ASG can automatically terminate and replace unhealthy instances based on EC2 status checks.

Why this answer

Auto Scaling groups can automatically replace unhealthy instances based on EC2 status checks. Option A is wrong because manually rebooting does not scale. Option C is wrong because CloudWatch alarms can trigger actions but the ASG health check is the direct mechanism.

Option D is wrong because replacing the underlying host is not necessary and not automatic.

Practice this question →

47

MCQhard

A company runs a critical database workload on an Amazon RDS for MySQL DB instance with Multi-AZ deployment in the us-east-1 region. The SysOps administrator must design a disaster recovery strategy that can recover from a complete regional outage. The Recovery Time Objective (RTO) is 2 hours and the Recovery Point Objective (RPO) is 1 hour. Which solution meets these requirements at the lowest cost?

A.Create manual snapshots of the DB instance every hour and copy them to another AWS Region.

B.Enable automated backups with a retention period of 35 days and restore to a different Region when needed.

C.Create a cross-Region read replica in another Region and promote it to a standalone DB instance during a disaster.

D.Use AWS Database Migration Service (DMS) to continuously replicate data to a DB instance in another Region.

AnswerC

A cross-Region read replica provides continuous asynchronous replication with low lag (typically seconds). In a disaster, promoting the replica to a primary instance takes only minutes, meeting the RTO and RPO requirements with minimal cost.

Why this answer

Option C is correct because a cross-Region read replica continuously replicates data from the primary RDS MySQL instance to another Region with minimal lag, typically achieving an RPO of seconds to minutes, well within the 1-hour requirement. Promoting the replica to a standalone instance during a disaster can be done in minutes, meeting the 2-hour RTO. This approach is the lowest cost among the viable options as it uses existing replication infrastructure without additional data transfer fees for snapshots or DMS replication instances.

Exam trap

The trap here is that candidates often choose Option B (automated backups) because they assume backups can be restored cross-Region, but automated backups are Region-specific and do not support cross-Region restore without additional snapshot copy configuration, which is not mentioned in the option.

How to eliminate wrong answers

Option A is wrong because manual snapshots taken every hour would incur significant storage costs for storing and copying snapshots across Regions, and the copy process can take longer than 1 hour, potentially exceeding the RPO. Option B is wrong because automated backups with a 35-day retention period are stored only in the source Region and cannot be restored to a different Region; cross-Region snapshot copy must be explicitly configured and is not part of automated backups. Option D is wrong because AWS DMS incurs additional costs for a replication instance and data transfer, making it more expensive than a cross-Region read replica, and it adds operational complexity for continuous replication that is unnecessary when native MySQL replication can achieve the same RPO/RTO.

Practice this question →

48

MCQeasy

A company has a fleet of EC2 instances that need to be patched monthly. The SysOps administrator must ensure that the patching process does not affect the availability of the application. Which strategy should the administrator use?

A.Patch one instance at a time manually by stopping and starting.

B.Use an Auto Scaling group with a rolling update strategy.

C.Use AWS Systems Manager Patch Manager to patch all instances at once.

D.Stop all instances, apply patches, then start them.

AnswerB

A rolling update replaces instances incrementally, maintaining availability.

Why this answer

Option B is correct because using an Auto Scaling group with a rolling update will replace instances one at a time, ensuring that the application remains available. Option A is wrong because stopping all instances at once causes downtime. Option C is wrong because patching one instance manually is not automated and still risks downtime if the instance is needed.

Option D is wrong because AWS Systems Manager Patch Manager can patch instances but without rolling update, it may cause downtime if all instances are patched simultaneously.

Practice this question →

49

MCQeasy

An application uploads files to an S3 bucket. The SysOps administrator needs to ensure that the files are automatically replicated to another bucket in a different AWS Region for disaster recovery. Which action should be taken?

A.Enable Cross-Region Replication on the source bucket.

B.Use S3 Transfer Acceleration for faster uploads.

C.Enable versioning on the source bucket.

D.Configure a lifecycle policy to transition objects to Glacier.

AnswerA

CRR automatically replicates objects to a specified destination bucket in a different region.

Why this answer

Option C is correct because S3 Cross-Region Replication (CRR) automatically replicates objects to a destination bucket in another region. Option A is wrong because versioning alone does not replicate data. Option B is wrong because lifecycle policies only manage storage tiers, not replication.

Option D is wrong because S3 Transfer Acceleration speeds up uploads but does not replicate.

Practice this question →

50

Multi-Selecthard

Which THREE measures help protect an S3 bucket from accidental data loss? (Choose 3)

Select 3 answers

A.Enable MFA Delete on the bucket.

B.Create a lifecycle policy to transition objects to S3 Glacier.

C.Enable server-side encryption on the bucket.

D.Configure cross-region replication to a destination bucket.

E.Enable versioning on the bucket.

AnswersA, D, E

MFA Delete requires an additional authentication factor to permanently delete object versions, reducing accidental deletion risk.

Why this answer

Options A, C, and E are correct. Versioning preserves multiple versions, MFA Delete adds protection, and cross-region replication provides a copy in another region. Option B is wrong because encryption does not prevent data loss.

Option D is wrong because lifecycle policies can delete objects, increasing risk.

Practice this question →

51

MCQhard

A company runs a critical application on EC2 instances in an Auto Scaling group. The application stores state information locally on the instance. The SysOps administrator needs to ensure that if an instance fails, the state is not lost. What should the administrator do?

A.Move the state data to an external data store such as ElastiCache or RDS.

B.Attach an EBS volume and set the 'DeleteOnTermination' flag to false.

C.Use instance store volumes for the state data.

D.Use Amazon SQS to store the state data.

AnswerA

Makes the application stateless and resilient.

Why this answer

Option B is correct because offloading state to ElastiCache or RDS makes the application stateless and resilient to instance failure. Option A is wrong because instance store is ephemeral and data is lost on failure. Option C is wrong because EBS volumes can be preserved but take time to attach, and the state may be outdated.

Option D is wrong because SQS is for message queues, not general state storage.

Practice this question →

52

Multi-Selecteasy

A company is designing a backup strategy for its Amazon S3 buckets. Which TWO methods can be used to protect against accidental deletion or overwriting of objects? (Choose two.)

Select 2 answers

A.Enable S3 Versioning on the bucket.

B.Enable MFA Delete on the bucket.

C.Configure S3 Cross-Region Replication.

D.Enable S3 Object Lock in compliance mode.

E.Enable default encryption (SSE-S3) on the bucket.

AnswersA, D

Retains all versions, enabling recovery.

Why this answer

Options A and D are correct. S3 Versioning retains all versions, allowing recovery from accidental deletion or overwrite. S3 Object Lock prevents objects from being deleted or overwritten for a specified period.

Option B is wrong because MFA Delete requires multi-factor authentication for delete operations but does not prevent overwrites. Option C is wrong because default encryption does not protect against deletion. Option E is wrong because cross-region replication does not prevent deletion in the source bucket.

Practice this question →

53

MCQeasy

A company wants to ensure that its EC2 instances automatically recover from an instance failure. Which feature should be used?

A.Create a CloudWatch alarm that sends an email when the instance status check fails.

B.Configure an Auto Scaling group with a launch configuration.

C.Attach the instance to an Elastic Load Balancer.

D.Enable EC2 Auto Recovery on the instance.

AnswerD

Correct: EC2 Auto Recovery automatically recovers an instance if it becomes impaired.

Why this answer

Option B is correct because EC2 Auto Recovery automatically recovers an instance if it becomes impaired due to an underlying hardware failure. Option A is wrong because EC2 Auto Scaling replaces instances based on scaling policies, not for individual instance failures. Option C is wrong because Elastic Load Balancing distributes traffic, it does not recover instances.

Option D is wrong because CloudWatch alarms can trigger actions but not directly recover an instance by themselves.

Practice this question →

54

MCQhard

A company uses AWS Backup to back up its Amazon EFS file system daily. The backup retention policy is set to 30 days. Recently, a user accidentally deleted a critical directory. The company wants to restore the directory as it existed 2 days ago. What is the MOST cost-effective and quickest way to achieve this?

A.Use the EFS console to recover the directory from the .Trash folder.

B.Enable EFS replication to another region and then fail back.

C.Use AWS Backup to restore the entire file system to an on-premises server, then copy the directory back.

D.Restore the backup from 2 days ago to a new EFS file system, then copy the directory to the original file system.

AnswerD

This allows selective directory recovery from the backup.

Why this answer

Option B is correct because AWS Backup creates recovery points that can be restored to a new EFS file system, and then the specific directory can be copied. Option A is wrong because EFS does not have a native trash feature. Option C is wrong because enabling EFS replication is not a backup replacement and does not provide point-in-time recovery.

Option D is wrong because restoring to an on-premises server is unnecessary and slower.

Practice this question →

55

Multi-Selectmedium

A SysOps administrator is troubleshooting an issue where an Application Load Balancer (ALB) is returning 503 errors to clients. The target group has healthy EC2 instances. Which THREE possible causes should the administrator investigate? (Choose three.)

Select 3 answers

A.The load balancer is not attached to a subnet.

B.The security group for the load balancer is blocking traffic.

C.The load balancer has reached its capacity limit.

D.The target group has no registered targets.

E.The target group health check is misconfigured.

AnswersB, C, D

Can cause 503 if inbound/outbound rules are wrong.

Why this answer

Options A, B, and D are correct. An ALB returns 503 if no healthy targets are registered, if the load balancer is at capacity, or if there is a misconfigured security group blocking traffic. Option C is wrong because an unhealthy target group would show unhealthy instances.

Option E is wrong because a missing subnet would prevent the ALB from being created or functioning, but the ALB is already working.

Practice this question →

56

MCQeasy

A company wants to create a disaster recovery (DR) strategy for its RDS for PostgreSQL database. The primary database is in us-east-1. The company needs a recovery point objective (RPO) of less than 5 minutes and a recovery time objective (RTO) of less than 1 hour. Which solution meets these requirements?

A.Enable Multi-AZ in us-east-1 and create a standby in a different Availability Zone.

B.Create a cross-region Read Replica in us-west-2 and promote it during a disaster.

C.Use AWS Database Migration Service (DMS) to continuously replicate to an EC2 instance.

D.Take daily automated snapshots and copy them to us-west-2.

AnswerB

Low RPO and RTO.

Why this answer

Option B is correct because a cross-region Read Replica can be promoted to a primary in another region, achieving low RPO (seconds) and low RTO (minutes). Option A is wrong because snapshots have higher RPO and RTO. Option C is wrong because DMS is for migration, not DR.

Option D is wrong because single-region Multi-AZ does not protect against region failure.

Practice this question →

57

MCQmedium

A SysOps administrator is tuning the health check of an Auto Scaling group. The group uses an ALB. The application takes up to 2 minutes to start. The health check settings are: HealthCheckGracePeriod=300, HealthCheckType=EC2. The administrator notices that instances are often marked unhealthy and terminated shortly after launch. What should the administrator change?

A.Change the HealthCheckType to EC2 (already EC2, but change to ELB).

B.Increase the HealthCheckGracePeriod to 600 seconds.

C.Change the HealthCheckType to ELB.

D.Decrease the HealthCheckGracePeriod to 120 seconds.

AnswerA

ELB health checks can be configured to check the application endpoint after it is ready.

Why this answer

Option C is correct: The HealthCheckGracePeriod is 300 seconds (5 minutes), which is sufficient for the 2-minute startup. However, HealthCheckType=EC2 means the ASG uses EC2 status checks, not ALB health checks. The EC2 status checks may fail if the instance is not fully initialized.

Changing to ELB health checks allows the ASG to use the ALB health check, which can be configured to wait for the application to respond. Option A is incorrect because the grace period is already long enough. Option B is incorrect because increasing the grace period doesn't fix the health check type mismatch.

Option D is incorrect because decreasing the grace period would make the problem worse.

Practice this question →

58

MCQhard

A company runs a critical application on a fleet of EC2 instances in an Auto Scaling group behind an Application Load Balancer (ALB). The application uses an Amazon RDS MySQL Multi-AZ DB instance for persistent storage. The SysOps administrator recently configured a lifecycle hook on the Auto Scaling group to perform a custom action before instance termination. During a recent scale-in event, the administrator noticed that some requests were still being routed to the terminating instance, causing errors. The ALB's deregistration delay is set to 300 seconds. The lifecycle hook has a default timeout of 3600 seconds. The administrator wants to ensure that the instance completes its custom action and that all in-flight requests are drained before the instance is terminated. The custom action typically takes 120 seconds. What should the administrator do to resolve this issue?

A.Change the lifecycle hook to use the 'Replace and terminate' type instead of 'Terminate'.

B.Disable the lifecycle hook and rely solely on the ALB's deregistration delay to drain connections.

C.Reduce the lifecycle hook timeout to 120 seconds. Keep the deregistration delay at 300 seconds.

D.Increase the deregistration delay to 600 seconds. Keep the lifecycle hook timeout at 3600 seconds.

AnswerC

This allows the custom action to complete within the hook timeout, then the ALB completes draining before termination.

Why this answer

Option A is correct because reducing the lifecycle hook timeout to match the custom action duration (120 seconds) and keeping the deregistration delay slightly higher (e.g., 300 seconds) ensures that the instance completes its custom action and then ALB finishes draining requests before the instance terminates. Option B is wrong because increasing the deregistration delay to 600 seconds would delay termination unnecessarily and might not align with the lifecycle hook. Option C is wrong because disabling the lifecycle hook removes the custom action.

Option D is wrong because changing to 'Replace and terminate' lifecycle hook type would terminate the old instance before launching a new one, which could cause capacity issues.

Practice this question →

59

Multi-Selecthard

A company wants to implement a disaster recovery solution for its on-premises database using AWS. The solution must have an RPO of less than 1 hour and an RTO of less than 4 hours. Which THREE steps should the SysOps administrator take? (Choose THREE.)

Select 3 answers

A.Set up a cross-Region read replica for the RDS instance.

B.Launch an EC2 instance with the database software and configure replication.

C.Use AWS Database Migration Service (DMS) to replicate data to an RDS instance.

D.Use AWS DataSync to sync the database files to Amazon S3.

E.Configure the RDS instance with Multi-AZ.

AnswersA, C, E

Allows failover to another Region.

Why this answer

Options B, C, and E are correct. AWS DMS can replicate data continuously to RDS, meeting RPO. RDS with Multi-AZ provides automatic failover.

An RDS read replica in another Region provides cross-region failover. Option A is wrong because S3 is not suitable for live database replication. Option D is wrong because EC2 with self-managed replication is more complex and may not meet RTO.

Practice this question →

60

MCQhard

A company runs a critical database on an RDS for PostgreSQL instance in a single Availability Zone. The database experiences high write latency. The SysOps Administrator needs to improve the database's reliability and performance without downtime. Which solution meets these requirements?

A.Modify the RDS instance to be Multi-AZ with a standby in another Availability Zone.

B.Create a Multi-AZ deployment in the same Availability Zone.

C.Increase the allocated storage for the RDS instance.

D.Create a read replica in another Availability Zone and redirect read traffic.

AnswerA

Provides high availability and failover, and can be done without downtime.

Why this answer

Option B is correct because enabling Multi-AZ provides high availability with automatic failover, and it can be modified without downtime. Option A is wrong because converting to Multi-AZ in a single AZ does not improve performance. Option C is wrong because creating a read replica does not improve write performance.

Option D is wrong because increasing storage size might help performance but does not improve reliability.

Practice this question →

61

MCQhard

A company runs a web application on AWS that uses an Application Load Balancer (ALB) across multiple Availability Zones. The application is deployed on EC2 instances in an Auto Scaling group behind the ALB. The RDS database is Multi-AZ with synchronous replication. Recently, the operations team noticed that during a planned failover test of the primary RDS instance, the application experienced a 30-second timeout and returned 503 errors to users. The RDS failover completed successfully, but the application did not recover until the Auto Scaling group replaced all instances. The application health check endpoint on the EC2 instances checks database connectivity. The ALB health check is configured to check the health check endpoint every 10 seconds with a threshold of 2 consecutive failures. The application uses a connection pool with a timeout of 5 seconds. What is the MOST likely cause of the 503 errors and the need to replace instances?

A.The EC2 instances are caching the RDS DNS name, causing them to connect to the old primary endpoint after failover. The application health check fails, leading to instance replacement.

B.The ALB health check threshold of 2 consecutive failures is too low, causing premature instance replacement.

C.The application connection pool timeout of 5 seconds is too short for RDS failover.

D.The Multi-AZ RDS configuration is insufficient; the company should implement a warm standby database in another Region.

AnswerA

DNS caching on EC2 instances causes them to attempt connections to the old primary endpoint until the TTL expires. This results in health check failures and instance replacement.

Why this answer

Option B is correct. The RDS DNS record is updated during failover, but DNS TTL caching on the EC2 instances causes them to continue connecting to the old primary endpoint until the TTL expires. The application's health check fails because it cannot connect to the database, causing the ALB to mark the instances unhealthy.

The Auto Scaling group then terminates and replaces the instances, which resolves the DNS cache on new instances. Option A is incorrect because the ALB health check threshold is 2 failures, but the timeout is longer than the health check interval; however, the root cause is DNS caching, not the health check configuration. Option C is incorrect because a warm standby would not address the DNS caching issue.

Option D is incorrect because the application is already using connection pooling, but the pool timeout of 5 seconds is not the primary issue; the DNS change propagation delay is.

Practice this question →

62

MCQmedium

A SysOps administrator is reviewing the reliability of a production system that uses Amazon DynamoDB as its primary data store. The table has on-demand capacity and a single partition key. The application experiences occasional throttling errors during peak hours. Which action would most effectively improve reliability?

A.Switch to provisioned capacity and set high read/write units.

B.Enable auto-scaling and increase the maximum capacity.

C.Review and optimize the partition key design to avoid hot partitions.

D.Enable DynamoDB Accelerator (DAX) to reduce read latency.

AnswerC

Even with on-demand, hot partitions can cause throttling; optimizing the key distributes load.

Why this answer

Option A is correct: On-demand capacity automatically scales to handle traffic spikes, but if the partition key design causes hot partitions, throttling can still occur. The most effective solution is to optimize the partition key to distribute workload evenly. Option B is incorrect because waiting for auto-scaling may not address hot partitions.

Option C is incorrect because increasing read/write capacity in provisioned mode is a manual process and may not solve hot partition issues. Option D is incorrect because DAX is a caching layer that reduces read load but does not help with write throttling or hot partitions.

Practice this question →

63

MCQeasy

A company runs a stateless web application on EC2 instances in an Auto Scaling group. The application is deployed in us-east-1 with three Availability Zones. The SysOps administrator wants to ensure that the application remains available even if an entire Availability Zone becomes unavailable. The Auto Scaling group is configured with a minimum of 3, maximum of 9, and desired capacity of 3. The instances are distributed evenly across the three AZs. What additional configuration is required to ensure the application can survive an AZ failure?

A.Increase the desired capacity to 6 to ensure enough capacity if one AZ fails.

B.Ensure the load balancer is cross-zone load balancing enabled and the Auto Scaling group has a sufficient maximum size to handle the load of a failed AZ.

C.Configure the Auto Scaling group to launch instances in only two AZs to reduce costs.

D.Place the Auto Scaling group in a single AZ to simplify management.

AnswerB

Cross-zone balancing distributes traffic across healthy instances in all AZs.

Why this answer

Option C is correct. The Auto Scaling group already spans multiple AZs, but to survive an AZ failure, the group should be configured with a sufficient buffer and the load balancer should be cross-zone enabled. However, the simplest answer is to ensure that the Auto Scaling group has a balanced distribution and the load balancer is configured to distribute traffic across all AZs.

Option A is wrong because increasing the desired capacity does not necessarily protect against AZ failure if all instances are in one AZ. Option B is wrong because the group already spans multiple AZs. Option D is wrong because distributing instances evenly is already done.

Practice this question →

64

Matchingmedium

Match each AWS storage service to its description.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts

Matches

Object storage for any data

Block storage for EC2 instances

File storage for Linux instances

Managed file system for Windows or Lustre

Low-cost archival storage

Why these pairings

These are the primary AWS storage services.

Practice this question →

65

MCQmedium

An application running on EC2 instances in an Auto Scaling group uses an SQS queue for decoupling. The application experiences increased latency when the queue has a high number of messages. The SysOps Administrator needs to maintain responsiveness. Which solution is the most cost-effective?

A.Increase the desired capacity of the Auto Scaling group.

B.Configure a CloudWatch alarm on the queue depth to trigger Auto Scaling policies.

C.Use a larger instance type for the EC2 instances.

D.Increase the visibility timeout of the SQS queue.

AnswerB

Cost-effectively scales consumers based on demand.

Why this answer

Option D is correct because using CloudWatch alarms on the queue depth to trigger Auto Scaling policies automatically scales out when needed and scales in when not, balancing cost and responsiveness. Option A is wrong because it may be cost-inefficient. Option B is wrong because it does not change the number of consumers.

Option C is wrong because increasing batch size may help but does not scale capacity.

Practice this question →

66

MCQmedium

A company runs a web application on EC2 instances in an Auto Scaling group across two Availability Zones. The application uses an Application Load Balancer. The SysOps administrator receives an alert that the application is returning 503 errors. The administrator checks the CloudWatch metrics and sees that the ALB's RequestCount is normal, but the HealthyHostCount is zero. The EC2 instances are in running state and pass the EC2 status checks. What is the MOST likely cause and what should the administrator do to resolve the issue?

A.The ALB health check configuration is incorrect; verify the health check path and port.

B.The instances are out of memory; increase the instance size.

C.The Auto Scaling group's scaling policy is not launching new instances; update the policy.

D.The security group on the instances is blocking traffic from the ALB; update the security group.

AnswerA

Health checks fail, causing HealthyHostCount to be zero.

Why this answer

Option C is correct. The ALB health checks are failing. The most likely cause is that the health check path is wrong or the application is not responding on the configured port.

The administrator should verify the health check settings. Option A is wrong because the instances are running and passing EC2 checks. Option B is wrong because security group rules would affect connectivity, but the instances are running.

Option D is wrong because scaling policies do not affect health checks.

Practice this question →

67

MCQmedium

A company runs a critical web application on Amazon EC2 instances in an Auto Scaling group across three Availability Zones in us-east-1. The application stores data in an Amazon RDS for MySQL DB instance with Multi-AZ deployment. The SysOps administrator needs to design a disaster recovery strategy that can recover from a complete regional outage. The Recovery Time Objective (RTO) is 2 hours and the Recovery Point Objective (RPO) is 1 hour. Which solution should the administrator implement?

A.Create a read replica of the RDS instance in a second region. Configure an Amazon CloudFront distribution with the ALB as origin. Use Route53 failover routing policy to route traffic to the CloudFront distribution.

B.Take daily manual snapshots of the RDS instance and copy them to a second region. Store the AWS CloudFormation template for the infrastructure in an S3 bucket with cross-region replication. In the event of a disaster, manually deploy the stack and restore the snapshot.

C.Configure cross-region automated backups for the RDS instance with a backup window. Deploy an identical infrastructure stack in a second region using AWS CloudFormation StackSets. Create an Amazon Route53 DNS failover record set with health checks to automatically fail over to the second region.

D.Use AWS Database Migration Service (DMS) to continuously replicate data to a second region. Use an Application Load Balancer in the primary region and a Network Load Balancer in the secondary region. Create a Route53 weighted routing policy to distribute traffic.

AnswerC

Cross-region automated backups (with a 1-hour backup window) can meet RPO. StackSets ensure consistent infrastructure deployment. Route53 failover automatically redirects traffic, meeting RTO.

Why this answer

Option C is correct because it meets both the RTO of 2 hours and RPO of 1 hour. Cross-region automated backups for RDS provide an RPO of 1 hour or less by continuously backing up transaction logs to a secondary region. Deploying an identical infrastructure stack via CloudFormation StackSets ensures rapid provisioning in the secondary region, and Route53 DNS failover with health checks automates traffic redirection within the RTO window.

Exam trap

The trap here is that candidates often confuse a read replica with a Multi-AZ standby, not realizing that a cross-region read replica cannot be promoted to a primary instance for disaster recovery, and that manual snapshots cannot meet a 1-hour RPO.

How to eliminate wrong answers

Option A is wrong because a read replica in a second region does not support failover to become a standalone writer; it is read-only and cannot be promoted in a disaster scenario, and CloudFront with an ALB origin does not provide regional failover. Option B is wrong because daily manual snapshots cannot achieve an RPO of 1 hour (snapshots are taken at most once per day), and manual deployment of CloudFormation stacks in a disaster exceeds the 2-hour RTO. Option D is wrong because AWS DMS continuous replication can meet RPO but the use of a Network Load Balancer in the secondary region (which does not support path-based routing or health checks for HTTP applications) and weighted routing policy (which is not designed for automatic failover) fails to meet the RTO requirement.

Practice this question →

68

MCQmedium

A company's critical application uses an EBS-backed EC2 instance. They want to back up the instance daily with a retention policy of 30 days. What is the MOST efficient way to achieve this?

A.Use Amazon Data Lifecycle Manager (DLM) to schedule EBS snapshots with a 30-day retention.

B.Use AWS Backup to schedule EBS snapshots and set the retention policy.

C.Schedule a Lambda function to create an EBS snapshot daily and delete snapshots older than 30 days.

D.Create an AMI daily using AWS Backup and set the retention to 30 days.

AnswerA

DLM automates the creation and deletion of EBS snapshots based on a schedule, meeting the requirement efficiently.

Why this answer

Option C is correct because Amazon Data Lifecycle Manager can automate EBS snapshots with retention rules. Option A is wrong because AMI backups include additional metadata and are heavier; also DLM supports snapshots only. Option B is wrong because AWS Backup can handle this but DLM is more lightweight for EBS-only backups.

Option D is wrong because creating snapshots manually via Lambda is less reliable and more complex.

Practice this question →

69

MCQmedium

A company runs a production RDS for PostgreSQL instance with Multi-AZ enabled. The database experiences a failover due to an AZ outage. After the failover, the application experiences high latency on write operations. What is the most likely cause?

A.The application is now reading from the standby instance, which has higher read latency.

B.Synchronous replication to the standby instance in the other AZ is causing additional latency.

C.The failover switched to a read replica in a different AZ.

D.The failover switched to asynchronous replication mode.

AnswerB

Multi-AZ uses synchronous replication, so every write must be committed on both the primary and standby, which adds latency.

Why this answer

Option A is correct because synchronous replication in Multi-AZ requires acknowledgment from the standby before the transaction is committed, increasing write latency. Option B is incorrect because the standby is in a different AZ and is not used for reads automatically. Option C is incorrect because synchronous replication does not use asynchronous replication.

Option D is incorrect because the Multi-AZ feature does not use a read replica; it uses a standby in a different AZ.

Practice this question →

70

MCQeasy

A company wants to back up its on-premises file server to AWS. The backup must be encrypted in transit and at rest. Which AWS service should the company use to meet these requirements?

A.AWS Storage Gateway (File Gateway) backed by Amazon S3.

B.Amazon EBS volumes attached to an EC2 instance acting as a file server.

C.AWS CloudFormation to replicate the file server configuration.

D.Amazon S3 with server-side encryption and a custom script to upload files.

AnswerA

Managed service that handles encryption and transfer.

Why this answer

Option B is correct because AWS Storage Gateway's File Gateway can back up on-premises files to S3 with encryption in transit (using TLS) and at rest (using S3 server-side encryption). Option A is wrong because S3 alone does not provide a backup agent for on-premises. Option C is wrong because EBS volumes are for EC2, not on-premises.

Option D is wrong because CloudFormation is for infrastructure as code.

Practice this question →

71

Multi-Selectmedium

Which TWO steps should a SysOps administrator take to ensure that an RDS for MySQL instance can withstand an Availability Zone failure? (Choose 2)

Select 2 answers

A.Enable Multi-AZ deployment.

B.Create a read replica in a different AZ.

C.Enable automated backups with a short retention period.

D.Enable deletion protection on the DB instance.

E.Enable provisioned IOPS for the DB instance.

AnswersA, C

Multi-AZ automatically provisions a standby instance in a different AZ and handles failover automatically.

Why this answer

Options A and D are correct. Multi-AZ creates a standby in another AZ, and automated backups with PITR allow recovery to a point in time. Option B is wrong because read replicas are for read scaling, not failover.

Option C is wrong because provisioned IOPS improve performance, not availability. Option E is wrong because deletion protection prevents accidental deletion, not AZ failure.

Practice this question →

72

MCQmedium

A company uses AWS Backup to back up its Amazon EFS file systems. The SysOps administrator needs to ensure that backups are retained for 7 years to meet compliance requirements. What should the administrator do?

A.Create a backup plan with a lifecycle policy that retains backups for 7 years.

B.Manually delete backups older than 7 years every month.

C.Increase the backup frequency to daily.

D.Configure cross-region backup to copy backups to another region.

AnswerA

Lifecycle policy allows setting retention duration.

Why this answer

Option C is correct because AWS Backup lifecycle policies allow you to transition backups to cold storage after a specified period and define retention rules up to 100 years. Option A is wrong because increasing backup frequency does not affect retention duration. Option B is wrong because cross-region backup does not extend retention.

Option D is wrong because manual deletion is not automated and does not enforce compliance.

Practice this question →

73

MCQeasy

A company stores critical data in an S3 bucket. To ensure data durability and availability, the company wants to automatically replicate objects to a bucket in a different AWS Region. Which S3 feature should be used?

A.Enable S3 Standard storage class on the bucket.

B.Use S3 One Zone-IA storage class.

C.Configure S3 Cross-Region Replication.

D.Enable S3 Versioning on the bucket.

AnswerC

Correct: S3 CRR automatically replicates objects to a destination bucket in another Region.

Why this answer

Option B is correct because S3 Cross-Region Replication (CRR) automatically replicates objects to a bucket in another Region. Option A is wrong because S3 Standard provides high durability but not automatic cross-region replication. Option C is wrong because S3 One Zone-IA does not replicate to another region.

Option D is wrong because S3 Versioning is needed for CRR but does not itself replicate objects.

Practice this question →

74

Multi-Selecthard

Which TWO steps should a SysOps administrator take to ensure data durability for an Amazon S3 bucket that stores critical documents? (Choose two.)

Select 2 answers

A.Enable default encryption with SSE-S3.

B.Enable S3 Versioning.

C.Use S3 Transfer Acceleration.

D.Enable MFA Delete.

E.Configure cross-region replication (CRR).

AnswersB, E

Versioning preserves all versions, preventing permanent deletion.

Why this answer

Options B and C are correct. S3 Versioning protects against accidental overwrites and deletions. Cross-region replication ensures data survives a regional disaster.

Option A is incorrect because SSE-S3 provides encryption at rest, not durability. Option D is incorrect because MFA Delete adds security but does not directly improve durability. Option E is incorrect because transfer acceleration improves upload speed, not durability.

Practice this question →

75

MCQmedium

A company has a production RDS for MySQL database. The SysOps administrator receives an alert that the database instance is running out of storage. The company requires high availability and minimal downtime during any modifications. What should the administrator do?

A.Add a read replica and use it for read traffic to reduce load on the primary.

B.Modify the RDS instance to increase the allocated storage. Since the instance is Multi-AZ, the modification will be applied with minimal downtime.

C.Create a CloudWatch alarm to notify when storage is low, then manually clean up old data.

D.Create a new RDS instance with larger storage and migrate the data using AWS Database Migration Service.

AnswerB

RDS allows storage scaling with minimal downtime, especially for Multi-AZ instances.

Why this answer

Modifying the allocated storage for a Multi-AZ RDS instance can be done with minimal downtime; the modification is applied during the next maintenance window or immediately with a brief failover. Option A is wrong because it does not address the storage issue permanently. Option C is wrong because it requires more effort and potential downtime.

Option D is wrong because read replicas do not increase storage capacity of the primary.

Practice this question →

Page 1 of 4 · 240 questions totalNext →

Ready to test yourself?

Try a timed practice session using only Reliability and Business Continuity questions.

Start 20-question session