Knowledge + Practice

CCNA Resilient Cloud Questions

75 of 259 questions · Page 3/4 · Resilient Cloud topic · Answers revealed

Practice these questions Exam hub All questions

151

MCQeasy

A DevOps team uses the above CloudFormation template to create an S3 bucket. What does the bucket policy accomplish?

A.It denies all S3 operations on the bucket unless the request uses HTTPS.

B.It denies all read access to the bucket for anonymous users.

C.It prevents anyone from deleting objects in the bucket.

D.It allows only HTTPS requests to the bucket and denies all HTTP requests.

AnswerA

The condition denies if SecureTransport is false.

Why this answer

Option B is correct. The policy denies all S3 actions on the bucket objects if the request is not sent over HTTPS (SecureTransport false). This enforces encryption in transit.

Option A is wrong because it denies all actions, not just read. Option C is wrong because it denies only when not using HTTPS. Option D is wrong because the policy denies all actions, not just delete.

Practice this question →

152

MCQeasy

A company wants to automate the recovery of an Amazon RDS DB instance in a different region if the primary region becomes unavailable. Which service should they use?

A.RDS Multi-AZ deployment.

B.RDS cross-region automated backups.

C.RDS read replicas.

D.AWS CloudFormation custom resource.

AnswerB

Cross-region backups allow restoring in another region.

Why this answer

Option B is correct because RDS cross-region automated backups can be restored to a different region. Option A is incorrect because RDS Multi-AZ only provides failover within the same region. Option C is incorrect because read replicas can be promoted but require manual intervention.

Option D is incorrect because RDS does not support CloudFormation for automated recovery across regions.

Practice this question →

153

MCQhard

A company has a multi-region application with an RDS for MySQL database in us-east-1. They want to minimize downtime if the primary region fails. They set up a cross-region read replica in us-west-2. What additional step is needed for automated failover?

A.Create a second read replica in the secondary region

B.Use a custom automation to monitor the primary and promote the replica

C.Configure automatic backup retention on the replica

D.Enable Multi-AZ on the read replica

AnswerB

Automated failover requires custom logic or use of Route 53 health checks with Lambda.

Why this answer

Read replicas can be promoted to master, but automated failover requires additional scripting or services. Option A is wrong because RDS does not support cross-region Multi-AZ. Option B is wrong because a single read replica cannot be promoted automatically without custom automation.

Option C is wrong because read replicas can be promoted. Option D is correct: you need to implement custom monitoring to promote the replica.

Practice this question →

154

MCQeasy

A company is designing a disaster recovery strategy for its primary RDS for PostgreSQL database in us-east-1. The RTO is 15 minutes and RPO is 1 minute. Which solution meets these requirements?

A.Create a cross-Region read replica in the secondary Region and promote it during failover.

B.Use AWS Backup to copy automated backups to the secondary Region every hour.

C.Deploy a Multi-AZ RDS instance and failover to the standby in the same region.

D.Take manual snapshots of the database every 5 minutes and copy them to the secondary Region.

AnswerA

Cross-Region read replicas can be promoted quickly, achieving low RTO/RPO.

Why this answer

Option C is correct because a cross-Region read replica can be promoted to a primary instance quickly, meeting both RTO and RPO. Option A is wrong because manual snapshots have a higher RPO. Option B is wrong because Multi-AZ is only within a region.

Option D is wrong because automated backups alone do not enable cross-Region failover.

Practice this question →

155

MCQeasy

A company runs a production web application on EC2 instances behind an Application Load Balancer. The application experiences intermittent high latency. The operations team needs to identify the root cause without affecting live traffic. Which approach is the MOST efficient?

A.Deploy a separate test environment with identical configuration and run load tests

B.Enable EC2 detailed monitoring and SSH into each instance to run top and iostat

C.Enable detailed CloudWatch metrics on the ALB and analyze ALB access logs

D.Run tcpdump on all EC2 instances and analyze packet captures

AnswerC

ALB metrics and logs provide request-level latency without impacting production.

Why this answer

Using detailed CloudWatch metrics and access logs from the ALB allows analyzing request latency patterns without impacting production traffic. Option A is wrong because SSH access may be restricted and cannot provide historical data. Option C is wrong because it requires creating a separate test environment.

Option D is wrong because tcpdump generates large volumes of data and can impact performance.

Practice this question →

156

Multi-Selectmedium

A company runs a containerized application on Amazon ECS with Fargate. The application uses an Application Load Balancer (ALB) and stores data in Amazon Aurora Serverless v2. The application experiences intermittent timeouts during periods of rapid scaling. The DevOps engineer notices that the Aurora database's ACU utilization spikes to 100% during these events. What should the engineer do to improve resilience? (Choose THREE.)

Select 3 answers

A.Increase the Fargate task memory and CPU limits.

B.Implement exponential backoff and jitter in application retry logic.

C.Add Amazon ElastiCache in front of the database for caching.

D.Configure Aurora Auto Scaling with a higher maximum ACU limit.

E.Use Amazon RDS Proxy to pool database connections.

AnswersB, D, E

Prevents overwhelming the database with retries during recovery.

Why this answer

Option A is correct because enabling Aurora Auto Scaling with a higher maximum ACU allows the database to handle bursts. Option B is correct because using RDS Proxy reduces connection overhead and helps manage scaling. Option C is correct because implementing exponential backoff and jitter in application retries prevents thundering herd.

Option D (increasing ECS task memory) does not address database bottleneck. Option E (using ElastiCache) could offload reads but not writes; the issue is write-heavy scaling.

Practice this question →

157

MCQeasy

A company wants to automatically recover an Amazon RDS DB instance if the underlying hardware fails. Which feature should the DevOps engineer enable?

A.Multi-AZ deployment.

B.Deletion protection.

C.Read replicas in a different Region.

D.Automated backups with a retention period of 35 days.

AnswerA

Multi-AZ automatically fails over to standby on hardware failure.

Why this answer

Option B is correct because Multi-AZ deployment provides automatic failover to a standby in a different AZ. Option A is wrong because automated backups are for point-in-time recovery, not hardware failure. Option C is wrong because read replicas are for read scaling, not automatic failover.

Option D is wrong because deletion protection prevents accidental deletion, not recovery.

Practice this question →

158

MCQeasy

A DevOps engineer is designing a highly available web application using Amazon Route 53. The application is deployed in two AWS Regions. The engineer wants to route traffic to the nearest healthy endpoint. Which routing policy should be used?

A.Failover routing

B.Weighted routing

C.Geolocation routing

D.Latency routing

AnswerD

Routes to lowest-latency endpoint.

Why this answer

Option B is correct because latency-based routing routes to the region with the lowest latency. Option A is wrong because geolocation routes based on geographic location, not latency. Option C is wrong because failover is for active-passive setup.

Option D is wrong because weighted routing distributes traffic by weight.

Practice this question →

159

Multi-Selectmedium

A company runs a multi-tier web application on AWS. The application consists of an Application Load Balancer, EC2 instances in an Auto Scaling group, and an Amazon RDS Multi-AZ DB instance. The application experiences intermittent failures when the RDS primary instance fails over to the standby. The engineer needs to ensure that the application handles failover gracefully without manual intervention.

Select 2 answers

A.Modify the application to use DNS caching with a TTL of 300 seconds to avoid stale DNS records.

B.Configure the Application Load Balancer to perform health checks on the RDS instance.

C.Use Amazon RDS Proxy to pool and reuse database connections, which reduces connection churn during failover.

D.Enable Multi-AZ on the Auto Scaling group to ensure EC2 instances are in multiple AZs.

E.Configure the application to use the RDS instance endpoint (not the cluster endpoint) and implement retry logic for database connections.

AnswersC, E

RDS Proxy helps maintain connections during failover and reduces load on the database.

Why this answer

Using a proxy like ProxySQL or configuring the application to use the RDS endpoint (which automatically points to the current primary) helps handle failover. Additionally, enabling RDS connection pooling or using Lambda to update the application can help, but the simplest is to use the instance endpoint with a retry mechanism.

Practice this question →

160

MCQhard

A DevOps engineer applies this S3 bucket policy to an S3 bucket. What is the effect of this policy?

A.All objects uploaded must be encrypted with SSE-C.

B.All uploads to the bucket are blocked.

C.All objects uploaded must use server-side encryption with Amazon S3 managed keys (SSE-S3).

D.All objects uploaded must be encrypted with SSE-KMS.

AnswerC

The policy allows only PutObject with s3:x-amz-server-side-encryption set to AES256.

Why this answer

The S3 bucket policy in question denies uploads unless the `x-amz-server-side-encryption` header is set to `AES256`, which is the value for SSE-S3 (Amazon S3 managed keys). This ensures that all objects uploaded to the bucket must be encrypted using server-side encryption with S3-managed keys (SSE-S3). Option C correctly identifies this requirement.

Exam trap

The trap here is that candidates confuse the `x-amz-server-side-encryption` header values: `AES256` is specific to SSE-S3, not SSE-C or SSE-KMS, leading to incorrect selections of A or D.

How to eliminate wrong answers

Option A is wrong because SSE-C requires the `x-amz-server-side-encryption-customer-algorithm` header, not `x-amz-server-side-encryption: AES256`. Option B is wrong because the policy does not block all uploads; it only denies uploads that do not meet the encryption requirement, so uploads with the correct encryption header are allowed. Option D is wrong because SSE-KMS requires the `x-amz-server-side-encryption` header set to `aws:kms`, not `AES256`.

Practice this question →

161

MCQeasy

A company is using Amazon S3 to store critical data. The company requires that all versions of objects be retained, including deleted objects, to meet compliance requirements. Which S3 feature should be enabled?

A.S3 Versioning

B.S3 Object Lock

C.S3 Lifecycle policies

D.MFA Delete

AnswerA

Versioning retains all versions of objects, including those deleted.

Why this answer

Option A is correct because S3 Versioning keeps all versions of objects, including deleted ones (marked with delete markers). Option B is wrong because MFA Delete adds an extra layer of security but does not retain all versions. Option C is wrong because S3 Object Lock prevents deletion but requires versioning to be enabled; however, versioning itself is the feature that retains all versions.

Option D is wrong because lifecycle policies can transition or expire objects, not retain all versions.

Practice this question →

162

MCQmedium

A company has deployed a multi-tier application on AWS. The web tier uses an Auto Scaling group of EC2 instances behind an Application Load Balancer. The application tier uses another Auto Scaling group of EC2 instances that process messages from an Amazon SQS queue. The database tier uses Amazon RDS Multi-AZ. Recently, the application experienced a complete outage when the SQS queue became overwhelmed with messages due to a sudden spike in traffic. The application tier could not process messages fast enough, causing the queue to grow indefinitely and eventually exceed the visibility timeout, leading to message loss and degraded performance. The DevOps engineer needs to improve the resilience of the architecture to handle traffic spikes without losing messages. Which solution should be implemented?

A.Limit the maximum message size and set a queue size limit to prevent overflow

B.Replace the standard SQS queue with a FIFO SQS queue to ensure exactly-once processing

C.Increase the visibility timeout in the SQS queue to allow more time for processing

D.Configure a dead-letter queue for unprocessed messages and implement Auto Scaling based on SQS queue depth

AnswerD

DLQ captures failed messages; scaling handles spikes.

Why this answer

Option C is correct because a dead-letter queue captures messages that cannot be processed after a number of attempts, preventing loss. Combined with Auto Scaling based on queue depth, it ensures the application tier scales to handle spikes. Option A is wrong because increasing visibility timeout only delays re-processing, it does not prevent message loss.

Option B is wrong because limiting queue size causes rejection of messages. Option D is wrong because SQS FIFO does not improve resilience against spikes and may reduce throughput.

Practice this question →

163

MCQhard

A company runs a production application on Amazon ECS with Fargate, fronted by an Application Load Balancer (ALB). The application experiences periodic latency spikes and occasional 502 errors. The ECS service is configured with a desired count of 2 tasks, and the ALB health check is set to /health with a 30-second interval and 2 consecutive failures threshold. The team uses CloudWatch Container Insights and has noticed that CPU and memory utilization of tasks remain below 50%. However, the ALB TargetGroup's HealthyHostCount metric occasionally drops to 0 for a few minutes before recovering. The deployment strategy is rolling update with a minimum healthy percent of 50% and maximum percent of 200%. The team recently updated the task definition to increase memory and CPU, but the issue persists. What is the MOST likely cause of the problem?

A.The ECS service role lacks permissions to register targets with the ALB.

B.The ALB health check is too aggressive, causing tasks to be marked unhealthy during brief initialization or deployment.

C.The ALB target group is configured with only one Availability Zone, causing loss of all targets when that AZ fails.

D.The task's CPU or memory limits are set too low, causing the container to be throttled.

AnswerB

Health check timing causes temporary loss of healthy targets during rolling updates.

Why this answer

Option B is correct because the health check interval (30 seconds) and failure threshold (2) mean it takes up to 60 seconds to mark a task unhealthy. During deployments, the rolling update may temporarily have only 1 healthy task (minimum 50% of 2 = 1), and if that task becomes unhealthy, HealthyHostCount drops to 0. Option A is wrong because CPU and memory are below 50%, so resource limits are not the issue.

Option C is wrong because a target group with 2 tasks and 2 AZs is fine; the problem is not AZ-specific. Option D is wrong because ECS service-linked role does not affect health checks.

Practice this question →

164

MCQeasy

A company is designing a disaster recovery strategy for its on-premises database to AWS using AWS Elastic Disaster Recovery (AWS DRS). The recovery time objective (RTO) is 15 minutes, and the recovery point objective (RPO) is 1 minute. Which configuration should they use?

A.Use AWS CloudEndure Disaster Recovery (now AWS DRS) with periodic replication every hour.

B.Use AWS Backup to take snapshots every 5 minutes and restore in the target region.

C.Use RDS cross-region automated backups with a 5-minute backup window.

D.Configure AWS DRS to continuously replicate data to a staging area in AWS, and launch instances in the target region on failover.

AnswerD

Continuous replication achieves sub-minute RPO and fast RTO.

Why this answer

Option A is correct because continuous replication with a staging area provides very low RPO. Option B is wrong because point-in-time restore from snapshots has higher RPO. Option C is wrong because CloudEndure (now AWS DRS) uses continuous replication.

Option D is wrong because scheduled snapshots cannot achieve 1-minute RPO.

Practice this question →

165

MCQhard

A financial services company runs a critical application on Amazon ECS with Fargate launch type. The application has strict availability requirements and must survive an Availability Zone failure. The ECS service is configured with a desired count of 4 tasks, spread across two Availability Zones using a spread strategy. The service is fronted by an Application Load Balancer. During a recent AZ outage, one AZ became completely unavailable, but the application continued to serve traffic. However, after the AZ recovered, the ECS service did not automatically place new tasks in the recovered AZ to restore the desired count. The service remains with only 2 tasks in the remaining AZ. What is the most likely cause and solution?

A.The ECS service does not automatically replace tasks in a recovered AZ because the spread strategy is static. Manually update the service (e.g., force new deployment) to trigger rebalancing.

B.The Application Load Balancer is not health-checking the recovered AZ. Configure cross-zone load balancing.

C.The ECS service uses a spread strategy that does not automatically rebalance after an AZ recovers. Update the service to use a 'rebalance' strategy.

D.The ECS service is configured with a minimum healthy percent of 50%, which prevents replacement. Lower the minimum healthy percent to 0%.

AnswerA

Force new deployment or modify desired count to trigger rebalancing across AZs.

Why this answer

The ECS service uses a spread strategy, which maintains balance across AZs. When an AZ is unhealthy, ECS does not place tasks there. After recovery, the service may not automatically rebalance because the spread strategy is not proactive; it only ensures placement of new tasks.

To force rebalancing, one can update the service (e.g., change desired count, then change back).

Practice this question →

166

Multi-Selecthard

A company is designing a disaster recovery plan for an Amazon S3 data lake. The data lake stores sensitive data that must be replicated to a secondary Region with an RPO of 15 minutes. Which THREE actions should the company take? (Choose THREE.)

Select 3 answers

A.Enable S3 Versioning on both the source and destination buckets.

B.Enable S3 Replication Time Control (RTC) for the replication rule.

C.Configure cross-Region replication (CRR) from the source bucket to the destination bucket.

D.Configure S3 Event Notifications to trigger a Lambda function that copies objects to the secondary Region.

E.Enable S3 Transfer Acceleration on the source bucket.

AnswersA, B, C

Versioning is required for CRR.

Why this answer

Option A is correct because enabling S3 Versioning on both the source and destination buckets is a prerequisite for S3 Replication. Without versioning, replication cannot track object versions, which is essential for meeting the 15-minute RPO with consistency guarantees.

Exam trap

The trap here is that candidates may confuse S3 Event Notifications with Lambda as a viable replication method, overlooking that native CRR with RTC provides guaranteed RPO and versioning consistency without custom code overhead.

Practice this question →

167

MCQmedium

A company runs a web application on EC2 instances behind an Application Load Balancer. The application uses an Aurora MySQL database. Recently, the database experienced a failover, and the application started throwing connection errors. The DevOps engineer needs to make the application resilient to database failovers with minimal code changes. What should they do?

A.Configure the application to use the Aurora cluster endpoint for database connections

B.Configure the application to use the Aurora reader endpoint for all queries

C.Create a cross-Region read replica and configure the application to retry on failure

D.Use Amazon RDS Proxy with IAM authentication to handle connection pooling

AnswerA

Cluster endpoint always points to the current writer.

Why this answer

Using the Aurora cluster endpoint with a read-write endpoint that automatically points to the new writer after a failover ensures the application reconnects without code changes. Option A is wrong because the reader endpoint is for read-only. Option C is wrong because a read replica does not automatically failover.

Option D is wrong because RDS proxy does not change the endpoint behavior.

Practice this question →

168

MCQhard

A company runs a critical application on EC2 instances behind an Application Load Balancer. The application uses an Amazon RDS for PostgreSQL Multi-AZ DB instance. During a recent failover test, the application experienced a 5-minute downtime. The RDS failover completed within 30 seconds. What is the most likely cause of the prolonged downtime?

A.The application caches DNS resolutions, causing it to connect to the old writer endpoint

B.The RDS Multi-AZ failover took longer than expected due to a large transaction log

C.The Application Load Balancer health checks marked all instances as unhealthy during the failover

D.The application was using read replicas for writes, which failed during failover

AnswerA

Stale DNS causes connectivity issues.

Why this answer

The most likely cause is that the application caches DNS resolutions, causing it to continue connecting to the old writer endpoint after failover. When an RDS Multi-AZ failover occurs, the DNS record for the writer endpoint is updated to point to the new primary instance, but the application's cached DNS entry still points to the old IP address. Since the old primary is now a standby and no longer accepts connections, the application experiences downtime until the DNS cache expires (typically 5–60 seconds) or the application refreshes the DNS resolution.

The 5-minute downtime suggests the application uses a long DNS TTL or a custom caching layer that delays reconnection.

Exam trap

The trap here is that candidates assume the 5-minute downtime must be caused by the database failover itself, but the question explicitly states the failover completed in 30 seconds, so the real issue is application-side DNS caching or stale connection handling.

How to eliminate wrong answers

Option B is wrong because the scenario explicitly states the RDS failover completed within 30 seconds, so a large transaction log did not cause the prolonged downtime. Option C is wrong because the Application Load Balancer health checks are independent of RDS failover; even if the database is briefly unavailable, the ALB does not mark instances unhealthy unless the application itself fails health checks due to database connectivity issues. Option D is wrong because read replicas are not used for writes in a standard RDS Multi-AZ setup; writes always go to the primary instance, and read replicas are read-only, so this scenario does not apply.

Practice this question →

169

Multi-Selecthard

A company runs a web application on EC2 instances in an Auto Scaling group across three Availability Zones. The application uses an Application Load Balancer (ALB) and stores session data in an ElastiCache for Redis cluster with cluster mode enabled. During a recent deployment, a new version of the application caused a memory leak in the Redis cluster, leading to out-of-memory errors and evictions. The DevOps team wants to prevent future deployments from affecting the Redis cluster's health. What should the team do? (Choose TWO.)

Select 2 answers

A.Implement a blue/green deployment strategy using a separate Redis cluster for the new version.

B.Increase the ElastiCache node type to a larger instance size.

C.Disable cluster mode on the Redis cluster to reduce overhead.

D.Take a manual snapshot of the Redis cluster before each deployment.

E.Configure Amazon CloudWatch alarms on Redis memory usage and evictions to trigger an automatic rollback.

AnswersA, E

Isolates the new version's impact; if memory leak occurs, only the green cluster is affected.

Why this answer

Option A is correct because using a blue/green deployment with a separate Redis cluster for the new version isolates the risk. Option E is correct because enabling CloudWatch alarms on Redis memory usage and evictions can trigger automatic rollback or alerting. Option B (increasing instance size) treats the symptom, not the cause.

Option C (snapshot before deployment) is good for backup but doesn't prevent impact. Option D (cluster mode disabled) reduces scalability.

Practice this question →

170

Multi-Selectmedium

A company has a critical application running on Amazon EC2 instances in an Auto Scaling group. The application writes logs to an Amazon EFS file system. The DevOps team needs to ensure that log data is durable and available even if an Availability Zone fails. The EFS file system is currently in one AZ. What should the team do? (Choose TWO.)

Select 2 answers

A.Increase the EFS throughput mode to Provisioned.

B.Enable AWS Backup for the EFS file system with daily backups.

C.Copy the log files to Amazon S3 using a cron job.

D.Recreate the EFS file system as a Regional (Standard) file system.

E.Configure the EC2 instances to mount the EFS file system from multiple Availability Zones.

AnswersB, D

Backups provide additional durability and recovery options.

Why this answer

Option A is correct because EFS One Zone is not resilient to AZ failure. The team should recreate the file system as EFS Standard (Regional) which stores data across multiple AZs. Option D is correct because enabling backups (e.g., AWS Backup) provides additional durability and point-in-time recovery.

Option B (increasing throughput) does not add durability. Option C (using S3) changes the architecture significantly. Option E (mounting from multiple AZs) is possible with EFS Standard but not with One Zone.

Practice this question →

171

MCQmedium

A company runs a critical e-commerce application on Amazon EC2 instances behind an Application Load Balancer (ALB) with Auto Scaling. The application must be resilient to an Availability Zone (AZ) failure. What is the MOST resilient configuration?

A.Configure the Auto Scaling group to launch instances in a single AZ with a larger instance type.

B.Deploy a single large EC2 instance in one AZ and use an Elastic IP for failover.

C.Use a Network Load Balancer instead of an ALB and deploy instances in two AZs.

D.Configure the Auto Scaling group to span at least three AZs and set the ALB to route traffic to all AZs.

AnswerD

Multi-AZ deployment ensures resilience.

Why this answer

Distributing instances across three AZs ensures that if one AZ fails, the remaining AZs can handle the load. This provides high availability and resilience.

Practice this question →

172

MCQeasy

A company wants to design a disaster recovery solution for its primary AWS Region. The solution should have a Recovery Point Objective (RPO) of a few seconds and a Recovery Time Objective (RTO) of a few minutes. Which strategy meets these requirements?

A.Pilot light

B.Backup and restore

C.Warm standby

D.Multi-Region active-active

AnswerD

Active-active with synchronous replication achieves low RPO and RTO.

Why this answer

A multi-Region active-active setup with synchronous replication provides near-zero RPO and minimal RTO.

Practice this question →

173

MCQhard

An AWS Lambda function that processes sensitive data writes objects to an S3 bucket. The security team requires that all objects be encrypted at rest using SSE-S3. The Lambda execution role uses the above IAM policy. Despite the policy, some objects are uploaded without server-side encryption. What is the most likely cause?

A.The Lambda function does not include the x-amz-server-side-encryption header in the PutObject request.

B.The bucket has a default encryption policy that overrides the IAM policy.

C.The Lambda function is using a KMS key instead of SSE-S3.

D.The Lambda function is specifying a different encryption algorithm, such as aws:kms.

AnswerA

If the header is absent, the condition in the Deny statement does not evaluate (missing key not equals false), so the Deny does not apply. The Allow statement allows the action without encryption.

Why this answer

Option A is correct because the IAM policy only allows the PutObject action when the request includes the `x-amz-server-side-encryption` header set to `AES256` (SSE-S3). If the Lambda function omits this header in its PutObject call, the request does not satisfy the IAM condition key `s3:x-amz-server-side-encryption`, and the policy denies the upload. However, the question states that objects are uploaded without encryption, which implies the policy is not being enforced as intended—likely because the Lambda function is not including the required header, and the bucket's default encryption or another mechanism is allowing the upload to succeed despite the policy.

Exam trap

The trap here is that candidates assume a bucket's default encryption setting can override an IAM policy's condition, but in reality, IAM policies are evaluated first, and a missing encryption header causes a denial unless the bucket's default encryption is configured to apply encryption automatically—but even then, the object would be encrypted, not unencrypted.

How to eliminate wrong answers

Option B is wrong because a bucket's default encryption policy does not override an IAM policy; IAM policies are evaluated first, and if the IAM policy denies the request (due to missing encryption header), the upload is blocked regardless of bucket default settings. Option C is wrong because using a KMS key would set the encryption algorithm to `aws:kms`, which does not match the `AES256` value required by the IAM policy, causing the request to be denied—not allowed. Option D is wrong because specifying a different encryption algorithm like `aws:kms` would also fail the IAM condition check for `s3:x-amz-server-side-encryption` set to `AES256`, resulting in a denied request, not an unencrypted upload.

Practice this question →

174

Multi-Selecthard

A company is running a critical application on Amazon RDS for PostgreSQL with Multi-AZ deployment. The application performs frequent writes. During a recent failover test, the team observed that the application experienced a 30-second write outage. To minimize downtime during automatic failovers, which configuration change should the DevOps engineer implement? (Choose TWO.)

Select 2 answers

A.Enable Performance Insights to monitor database load.

B.Configure Amazon RDS Proxy in front of the RDS instance.

C.Use synchronous replication to the standby instance.

D.Increase the DB instance class to a larger size.

E.Set the DNS TTL for the RDS endpoint to 1 second.

AnswersB, E

RDS Proxy maintains connection pools and handles failover transparently, reducing application downtime.

Why this answer

Option A is correct because using Amazon RDS Proxy reduces failover time by pooling and reusing connections, so the application can resume quickly after failover. Option D is correct because enabling Multi-AZ with automatic failover is already in place, but ensuring the database connections use the RDS endpoint with a short DNS TTL allows faster reconnection. Option B (increasing DB instance size) does not reduce failover time.

Option C (enabling Performance Insights) is for monitoring, not failover. Option E (using synchronous replication) is the default for Multi-AZ and does not reduce failover time.

Practice this question →

175

MCQeasy

A company runs a critical batch processing job on Amazon ECS using Fargate. The job must complete within 2 hours. If the job fails, it must be retried automatically up to 3 times. Which solution meets these requirements?

A.Use AWS Batch with a retry strategy set to 3 attempts

B.Use AWS Step Functions with a task that invokes the ECS task, and configure a retry policy in the state machine

C.Use an Amazon ECS service with a desired count of 1 and enable automatic task replacement

D.Use AWS Lambda with a dead-letter queue and reprocess events

AnswerA

AWS Batch natively supports retry and is designed for batch jobs.

Why this answer

Option B is correct because AWS Batch provides managed retry logic and job scheduling. Option A is wrong because Step Functions requires custom retry logic. Option C is wrong because ECS does not natively retry on failure.

Option D is wrong because Lambda has a 15-minute execution limit.

Practice this question →

176

MCQhard

A company uses DynamoDB global tables with two regions. They notice that writes in one region are not replicating to the other region after a brief network partition. Which configuration will ensure replication resumes automatically?

A.Use DynamoDB Streams with a Lambda function to manually replicate writes.

B.No action needed; DynamoDB automatically resumes replication when connectivity is restored.

C.Manually fail over the table to the other region.

D.Delete the replica table and recreate it.

AnswerB

DynamoDB global tables handle temporary partitions and resume replication automatically.

Why this answer

Option B is correct because DynamoDB global tables automatically resume replication after a partition is resolved. Option A is incorrect because there is no failover needed; replication is handled automatically. Option C is incorrect because disabling and re-enabling global tables would cause data loss.

Option D is incorrect because the issue is not a conflict but a temporary partition.

Practice this question →

177

MCQeasy

A company is running a production database on Amazon RDS for PostgreSQL with Multi-AZ deployment. The database experiences a failover due to an AZ outage. What happens to the existing database connections during the failover?

A.Existing connections are automatically redirected to the standby without interruption.

B.The RDS endpoint IP address changes, and the application must update its configuration.

C.Existing connections are dropped, and applications must reconnect to the new primary using the same endpoint.

D.The primary DB instance is promoted to standby and connections remain active.

AnswerC

RDS updates DNS to point to the new primary; reconnection required.

Why this answer

Option C is correct. During a Multi-AZ failover, RDS automatically updates the DNS record to point to the standby, but existing connections to the primary are dropped and must be re-established. Option A is wrong because connections are not preserved.

Option B is wrong because Multi-AZ automatically fails over without manual promotion. Option D is wrong because the CNAME record does not change; it's a DNS update.

Practice this question →

178

MCQeasy

A company has an Amazon S3 bucket that stores critical data. The company wants to protect the data from accidental deletion and ensure that even the root user cannot delete the bucket. Which S3 feature should the company enable?

A.S3 bucket policy that denies s3:DeleteBucket for all principals.

B.S3 bucket versioning with MFA delete.

C.S3 Object Lock with compliance retention mode.

D.S3 bucket versioning with lifecycle policy to expire noncurrent versions.

AnswerC

Compliance mode prevents any user, including root, from deleting objects.

Why this answer

Option C is correct because S3 Object Lock with compliance retention mode prevents any user, including the root user, from deleting or overwriting objects until the retention period expires. Compliance mode applies a strict, immutable lock that cannot be removed or shortened by any user, including the AWS account root user, ensuring the data is protected against accidental or malicious deletion.

Exam trap

The trap here is that candidates often confuse S3 Object Lock with versioning or MFA delete, thinking that versioning alone prevents deletion, but versioning only protects against overwrites and allows recovery via delete markers, not against intentional deletion by the root user.

How to eliminate wrong answers

Option A is wrong because an S3 bucket policy that denies s3:DeleteBucket for all principals can prevent bucket deletion, but it does not protect objects within the bucket from being deleted, and the root user can still delete the bucket if the policy is removed or if explicit allow overrides the deny. Option B is wrong because S3 bucket versioning with MFA delete protects object versions from deletion only when MFA is enforced, but it does not prevent the root user from deleting the bucket itself, and MFA delete can be disabled by the root user. Option D is wrong because S3 bucket versioning with a lifecycle policy to expire noncurrent versions is designed to manage storage costs by automatically deleting old versions, not to prevent deletion; it actually enables deletion of objects, which is the opposite of the requirement.

Practice this question →

179

MCQmedium

Your company runs a multi-tier web application on AWS. The web tier consists of EC2 instances behind an Application Load Balancer (ALB) in an Auto Scaling group across three Availability Zones. The application tier runs on a separate Auto Scaling group of EC2 instances that process requests from the web tier. The database tier uses an Amazon RDS for PostgreSQL Multi-AZ deployment. All application servers write logs to Amazon CloudWatch Logs. Recently, the operations team reported that during peak hours, the web tier experiences intermittent 503 errors. The ALB access logs show that the errors occur when the target group's healthy host count drops to zero momentarily. The Auto Scaling group's minimum and desired capacity is 6, with a maximum of 12. The scaling policy is based on average CPU utilization, with a target of 60%. The health check grace period is 300 seconds. The application health check endpoint returns a 200 status when healthy. The DevOps engineer suspects that the scaling policy is too slow to react to traffic spikes. The engineer wants to implement a more proactive scaling approach. Which solution should the engineer implement?

A.Implement a predictive scaling policy combined with dynamic scaling to proactively adjust capacity based on forecasted traffic.

B.Implement a scheduled scaling policy that increases capacity 30 minutes before the expected peak.

C.Increase the health check grace period to 600 seconds to give new instances more time to become healthy.

D.Switch to a step scaling policy with a lower cooldown period and a greater scaling adjustment.

AnswerA

Predictive scaling uses historical data to forecast demand and proactively adjust capacity, preventing the healthy host count from dropping to zero.

Why this answer

Option D is correct because using a predictive scaling policy combined with dynamic scaling provides a proactive approach that anticipates traffic patterns and adjusts capacity in advance. Option A (increasing health check grace period) does not help with scaling speed. Option B (step scaling with a lower cooldown) is reactive and may still cause dips.

Option C (scheduled scaling) works only if traffic patterns are predictable and doesn't handle spikes well.

Practice this question →

180

Multi-Selectmedium

A company is building a serverless application using AWS Lambda, Amazon API Gateway, and Amazon DynamoDB. The application is expected to have unpredictable traffic patterns. The DevOps team needs to ensure that the application can handle sudden spikes in traffic without throttling. Which TWO actions should the team take? (Choose TWO.)

Select 2 answers

A.Use DynamoDB on-demand capacity mode for the table.

B.Configure Lambda provisioned concurrency to keep a set number of execution environments warm.

C.Configure DynamoDB auto scaling with a minimum capacity of 10 read/write capacity units.

D.Increase the Lambda function timeout to the maximum (15 minutes).

E.Set API Gateway throttling limits to a high value to prevent throttling.

AnswersA, B

On-demand instantly scales to handle spikes.

Why this answer

Option A is correct because DynamoDB on-demand capacity mode automatically scales to handle unpredictable traffic spikes without requiring capacity planning or throttling. This mode charges per request and can accommodate sudden bursts of traffic up to the table's previous peak, making it ideal for serverless applications with variable workloads.

Exam trap

The trap here is that candidates often confuse DynamoDB auto scaling with on-demand capacity, thinking auto scaling can handle sudden spikes as effectively as on-demand, but auto scaling has a lag time and can still throttle during rapid bursts.

Practice this question →

181

MCQmedium

A company runs a web application on AWS that uses Amazon SQS to decouple the frontend from the backend processing. The application experiences sudden spikes in traffic, causing the SQS queue to accumulate a large number of messages. The backend workers are unable to process messages fast enough, leading to increased latency. What solution can the company implement to improve the resilience and scalability of the backend?

A.Reduce the receive message wait time (long polling) to poll the queue more frequently.

B.Increase the visibility timeout of the SQS queue to allow more time for processing.

C.Use an SQS FIFO queue instead of a standard queue to ensure ordered processing.

D.Configure an Auto Scaling group for the backend workers with a scaling policy based on the SQS queue depth.

AnswerD

Auto Scaling based on queue length dynamically adjusts the number of workers to handle spikes.

Why this answer

Option D is correct because configuring an Auto Scaling group for the backend workers with a scaling policy based on the SQS queue depth (ApproximateNumberOfMessagesVisible) directly addresses the sudden traffic spikes. This approach dynamically adds more worker instances when the queue depth increases, improving processing throughput and reducing latency. It ensures the backend scales in response to demand, enhancing both resilience and scalability.

Exam trap

The trap here is that candidates often confuse operational fixes (like adjusting polling or visibility timeout) with architectural scalability solutions, failing to recognize that only dynamic scaling of compute resources can handle unpredictable traffic spikes.

How to eliminate wrong answers

Option A is wrong because reducing the receive message wait time (long polling) to poll more frequently would increase the number of empty responses and API calls, potentially throttling the workers without improving processing capacity; long polling (wait time up to 20 seconds) is actually more efficient for reducing latency and empty receives. Option B is wrong because increasing the visibility timeout only gives workers more time to process a single message, but does not address the root cause of insufficient worker capacity; it can even cause message processing delays if workers fail and messages become visible again after the timeout. Option C is wrong because using an SQS FIFO queue ensures exactly-once processing and message ordering, but does not improve throughput or scalability; FIFO queues have a lower throughput limit (300 transactions per second without batching) compared to standard queues, which would worsen the backlog during spikes.

Practice this question →

182

Multi-Selectmedium

Which TWO AWS services can be used to distribute incoming traffic across multiple AWS resources in different Availability Zones within a single region?

Select 2 answers

A.AWS Global Accelerator

B.Amazon Route 53

C.Amazon CloudFront

D.AWS Direct Connect

E.Application Load Balancer

AnswersA, E

Global Accelerator directs traffic to endpoints in multiple AZs.

Why this answer

AWS Global Accelerator uses the AWS global network and Anycast static IP addresses to route incoming traffic to the optimal endpoint, such as an Application Load Balancer or EC2 instance, across multiple Availability Zones within a single region. It improves performance and reliability by directing traffic to the healthiest endpoint and automatically rerouting in case of failure, making it a valid service for distributing traffic across AZs.

Exam trap

The trap here is that candidates often think only Elastic Load Balancers (like ALB) can distribute traffic across AZs, but AWS Global Accelerator also performs this function at the network layer, and the question asks for TWO services, so both ALB and Global Accelerator are correct.

Practice this question →

183

MCQhard

A company uses AWS Lambda functions to process events from an Amazon SQS queue. The Lambda function occasionally fails due to a transient downstream service error. The DevOps team wants to ensure that failed messages are not lost and can be retried later. The team also wants to reduce the number of invocations on the downstream service. Which configuration should the team use?

A.Configure a dead-letter queue (DLQ) on the SQS queue and set the Lambda function's reserved concurrency to 1.

B.Configure an Amazon SNS topic as a Lambda destination for failure events and subscribe the SQS queue to it.

C.Configure a dead-letter queue (DLQ) on the Lambda function and set the function's maximum retry attempts to 2.

D.Configure the Lambda function to write failed messages to an Amazon DynamoDB table and set up a scheduled Lambda to retry.

AnswerA

DLQ captures failed messages; reserved concurrency limits throttling impact.

Why this answer

Option A is correct because configuring a dead-letter queue (DLQ) on the SQS queue ensures that messages that exhaust their retries (due to Lambda failures) are preserved for later reprocessing, preventing data loss. Setting the Lambda function's reserved concurrency to 1 throttles the function to a single concurrent invocation, which naturally reduces the rate of downstream service calls and allows the SQS queue's visibility timeout and redrive policy to manage retry timing, thereby reducing pressure on the downstream service.

Exam trap

The trap here is that candidates often confuse a Lambda function's DLQ (which captures invocation records) with an SQS queue's DLQ (which captures the original messages), and they overlook that reserved concurrency is a direct way to throttle invocation rate, not just a capacity planning tool.

How to eliminate wrong answers

Option B is wrong because using an SNS topic as a Lambda destination for failure events and subscribing the SQS queue to it would create an asynchronous loop where failed events are re-sent to the same SQS queue, potentially causing infinite retries without a controlled retry mechanism or throttling to protect the downstream service. Option C is wrong because a dead-letter queue on the Lambda function (via Lambda destinations) only captures invocation records, not the original SQS messages; setting maximum retry attempts to 2 on the Lambda function does not reduce downstream service invocations—it actually increases them by retrying immediately without backoff. Option D is wrong because writing failed messages to DynamoDB and using a scheduled Lambda to retry adds unnecessary complexity and latency, and does not inherently reduce downstream service invocations; it also bypasses SQS's built-in retry and DLQ mechanisms, which are simpler and more reliable for transient failures.

Practice this question →

184

MCQhard

A company uses Amazon DynamoDB with global tables for a multi-region active-active application. They notice that occasionally, concurrent updates to the same item in different regions cause data inconsistency. How can they resolve this?

A.Disable global tables and use a single region

B.Use DynamoDB read replicas instead of global tables

C.Use conditional writes and design the application to handle conflicts

D.Use DynamoDB Accelerator (DAX) to cache writes

AnswerC

Conditional writes prevent overwrites and allow conflict resolution.

Why this answer

Option C is correct because DynamoDB global tables use an eventually consistent model for multi-region replication, meaning concurrent updates to the same item in different regions can lead to conflicts. Conditional writes allow the application to enforce a last-writer-wins (LWW) strategy or custom conflict resolution logic, ensuring data consistency by only applying updates that meet specified conditions (e.g., a version number or timestamp check). This approach aligns with the recommended practice for handling concurrent writes in an active-active global table setup.

Exam trap

The trap here is that candidates often assume DynamoDB global tables automatically resolve all write conflicts, but the exam tests the understanding that without conditional writes or custom conflict resolution, concurrent updates can cause data inconsistency due to eventual consistency.

How to eliminate wrong answers

Option A is wrong because disabling global tables and using a single region eliminates multi-region active-active capability, which is a core requirement of the scenario, and does not resolve the underlying conflict issue—it just avoids it by sacrificing availability and latency benefits. Option B is wrong because DynamoDB read replicas (via global tables or otherwise) are designed for read scaling, not for handling concurrent writes; they do not address write conflicts or provide write consistency across regions. Option D is wrong because DynamoDB Accelerator (DAX) is an in-memory cache for read-heavy workloads that reduces read latency, but it does not manage write conflicts or provide cross-region write consistency; caching writes does not resolve the fundamental issue of concurrent updates to the same item in different regions.

Practice this question →

185

MCQeasy

A company wants to ensure its RDS Multi-AZ deployment automatically fails over to a standby instance in a different Availability Zone. Which additional step is required?

A.Create an Amazon Route 53 health check to update the DNS record.

B.No additional step; RDS Multi-AZ handles automatic failover.

C.Configure a read replica in another AZ.

D.Deploy the standby instance in a different VPC.

AnswerB

RDS Multi-AZ automatically fails over to the standby instance.

Why this answer

Option A is correct because RDS Multi-AZ automatically fails over to the standby when the primary becomes unhealthy. Option B is incorrect because a read replica is for read scaling, not automatic failover. Option C is incorrect because RDS handles DNS changes automatically.

Option D is incorrect because Multi-AZ does not require a different VPC.

Practice this question →

186

MCQmedium

A company runs a critical web application on Amazon EC2 instances behind an Application Load Balancer (ALB). The application uses an Amazon RDS for MySQL Multi-AZ DB instance for data storage. During an AWS infrastructure event, the primary Availability Zone (AZ) becomes unavailable, and the application experiences downtime. The RDS Multi-AZ failover completes automatically, but the application takes several minutes to reconnect. Which combination of actions would MOST reduce the recovery time for the application during such an event?

A.Place all EC2 instances in a single AZ and use an Amazon Route 53 health check to reroute traffic to a standby environment in another AZ.

B.Use an RDS proxy (Amazon RDS Proxy) to pool and share database connections, and ensure the application uses the RDS cluster endpoint.

C.Deploy the RDS instance as a Single-AZ instance in the same AZ as the primary EC2 instances, and use read replicas for failover.

D.Configure an Application Load Balancer in front of the RDS instance to distribute connections across AZs.

AnswerB

RDS Proxy reduces connection disruption during failover by maintaining connections, and the cluster endpoint points to the current primary.

Why this answer

Option B is correct because Amazon RDS Proxy maintains a warm connection pool to the database, so when the RDS Multi-AZ failover occurs, the proxy automatically reconnects to the new primary DB instance without requiring the application to re-establish connections. This eliminates the connection storm and the several-minute delay caused by the application's connection retry logic. By using the RDS cluster endpoint (which points to the proxy), the application benefits from seamless failover and reduced latency during the DNS propagation of the new primary.

Exam trap

The trap here is that candidates often assume that simply using an RDS Multi-AZ deployment is sufficient for application availability, but they overlook the critical bottleneck of application-side connection re-establishment and DNS propagation delays, which RDS Proxy directly addresses.

How to eliminate wrong answers

Option A is wrong because placing all EC2 instances in a single AZ creates a single point of failure; if that AZ becomes unavailable, the entire application goes down, and Route 53 health checks cannot reroute traffic fast enough to avoid downtime. Option C is wrong because a Single-AZ RDS instance in the same AZ as the primary EC2 instances would fail completely if that AZ becomes unavailable, and read replicas cannot be promoted for write traffic quickly enough to meet recovery time objectives. Option D is wrong because an Application Load Balancer operates at Layer 7 (HTTP/HTTPS) and cannot be placed in front of an RDS instance, which uses the MySQL protocol (TCP/3306); ALB does not support database connection pooling or failover for non-HTTP traffic.

Practice this question →

187

MCQmedium

A company is deploying a stateful application on Amazon EKS. The application requires persistent storage that can be reattached to a new pod if the original pod fails. The cluster spans multiple Availability Zones. Which storage solution provides the BEST resilience and meets these requirements?

A.Amazon S3 bucket with a mountpoint.

B.Amazon EBS with gp3 volume type.

C.EC2 instance store volumes.

D.Amazon EFS file system.

AnswerD

EFS is regional and can be mounted from any AZ, providing resilience.

Why this answer

Amazon EFS provides a fully managed, regional NFS file system that can be mounted concurrently by multiple pods across different Availability Zones. It is designed for high availability and durability, automatically replicating data across multiple AZs, and supports automatic reattachment to a new pod if the original pod fails, making it the best choice for stateful applications requiring resilient, shared persistent storage on Amazon EKS.

Exam trap

The trap here is that candidates often assume EBS is the default persistent storage for Kubernetes because of its common use with single-node stateful workloads, but they overlook the multi-AZ requirement that makes EBS unsuitable due to its zonal scope, while EFS's regional nature provides the necessary cross-AZ resilience.

How to eliminate wrong answers

Option A is wrong because Amazon S3 is an object storage service, not a file system; using a mountpoint (e.g., s3fs) introduces POSIX compatibility issues, performance overhead, and does not provide the native file locking or consistent read-after-write semantics required for a stateful application's persistent storage. Option B is wrong because Amazon EBS volumes are tied to a single Availability Zone and cannot be reattached to a pod in a different AZ; if the original pod fails and a replacement pod is scheduled in another AZ, the EBS volume cannot be mounted, breaking resilience across the multi-AZ cluster. Option C is wrong because EC2 instance store volumes are ephemeral and data is lost if the instance stops, terminates, or fails; they do not provide persistent storage that survives pod or node failures.

Practice this question →

188

MCQmedium

A company is designing a disaster recovery strategy for a critical application. They need a Recovery Time Objective (RTO) of 15 minutes and a Recovery Point Objective (RPO) of 1 minute. Which AWS database service configuration meets these requirements?

A.RDS MySQL with Multi-AZ and cross-region read replica

B.DynamoDB global tables

C.Aurora Global Database

D.RDS PostgreSQL with cross-region read replica

AnswerC

Typical RTO < 1 min, RPO sub-second.

Why this answer

Aurora Global Database provides cross-region replication with sub-second RPO and failover in minutes. Option A is wrong because RDS Multi-AZ is single-region. Option B is wrong because DynamoDB global tables have sub-second RPO but may not meet RTO.

Option D is wrong because read replicas have higher RPO.

Practice this question →

189

MCQeasy

A company is designing a highly available architecture for a web application. The application runs on Amazon EC2 instances in an Auto Scaling group across three Availability Zones. The instances are behind an Application Load Balancer (ALB). Which additional step should the team take to ensure that traffic is evenly distributed across all healthy instances in all Availability Zones?

A.Use Amazon Route 53 weighted routing policy to distribute traffic to each AZ.

B.Configure health checks on the target group to mark instances as unhealthy if they are in an AZ with fewer instances.

C.Enable cross-zone load balancing on the ALB.

D.Configure the ALB to use least outstanding requests routing algorithm.

AnswerC

This ensures even distribution across all instances in all AZs.

Why this answer

Option C is correct. By default, ALB distributes traffic evenly across AZs, but cross-zone load balancing must be enabled to distribute traffic evenly across all instances regardless of AZ. Option A is wrong because ALB already distributes traffic across AZs.

Option B is wrong because Route 53 is not needed for internal distribution. Option D is wrong because target group health checks are already enabled.

Practice this question →

190

MCQhard

A company runs a critical microservices application on Amazon EKS with multiple services. They use an ingress controller (ALB Ingress Controller) to route traffic to services. They notice that when a pod fails, new requests are still sent to the failed pod for a few seconds, causing errors. The health check interval is set to 5 seconds. They want to minimize the time during which failed pods receive traffic. They also need to ensure that during rolling updates, traffic is not sent to pods that are terminating. Which solution should they implement?

A.Define liveness probes for all pods with a low failure threshold.

B.Use pod anti-affinity to spread pods across nodes.

C.Decrease the health check interval on the ALB target group to 2 seconds.

D.Configure readiness probes on the pods and set the ALB ingress controller to use the readiness probe endpoint.

AnswerD

Readiness probes ensure that only ready pods receive traffic; the ingress controller will stop sending traffic to pods that fail the probe.

Why this answer

Option D is correct because readiness probes detect when a pod is not ready to serve traffic and remove it from the service endpoints quickly. The ALB ingress controller respects readiness probes. Option A is incorrect because liveness probes restart pods but do not remove them from service immediately.

Option B is incorrect because increasing the health check interval would make the situation worse. Option C is incorrect because pod anti-affinity affects scheduling, not traffic routing.

Practice this question →

191

Multi-Selectmedium

Which TWO strategies can be used to improve the resilience of an application running on Amazon ECS with Fargate? (Select TWO.)

Select 2 answers

A.Use a single subnet for all tasks to simplify networking.

B.Configure the ECS service to place tasks in multiple Availability Zones.

C.Increase the task memory reservation to handle peak load.

D.Implement a circuit breaker pattern for downstream dependencies.

E.Use scheduled scaling to adjust task count based on historical patterns.

AnswersB, D

Spreads tasks across AZs for fault tolerance.

Why this answer

Option A (multi-AZ task placement) and Option D (circuit breaker pattern) are correct. Multi-AZ placement spreads tasks across Availability Zones to tolerate AZ failures. Circuit breaker pattern prevents cascading failures.

Option B is wrong because increasing memory does not improve resilience. Option C is wrong because using the same subnet reduces resilience. Option E is wrong because scheduled scaling does not handle unexpected spikes.

Practice this question →

192

Multi-Selectmedium

A company runs a mission-critical database on Amazon RDS for MySQL. They need to ensure that if the primary DB instance fails, the database remains available with minimal downtime. Which TWO configurations should they implement? (Choose TWO.)

Select 2 answers

A.Enable automated backups with point-in-time recovery.

B.Create a read replica in the same region.

C.Enable deletion protection on the DB instance.

D.Enable Multi-AZ deployment.

E.Configure cross-region replication.

AnswersA, D

Automated backups allow restoring to a specific point, reducing data loss.

Why this answer

Options A and D are correct. Multi-AZ provides automatic failover to a standby in a different AZ. Automated backups with point-in-time recovery allow restoring to a specific time.

Option B is incorrect because read replicas do not provide automatic failover. Option C is incorrect because cross-region replication is for regional DR, not immediate failover. Option E is incorrect because deletion protection prevents accidental deletion but does not help with failure.

Practice this question →

193

Multi-Selecthard

A company is designing a disaster recovery plan for a critical application that uses Amazon RDS for MySQL with Multi-AZ. The RPO must be less than 1 minute and RTO less than 15 minutes. The primary Region is us-east-1. Which THREE steps should the company take to meet these requirements?

Select 3 answers

A.Take manual snapshots of the RDS instance every 30 seconds and copy them to the secondary Region

B.Enable Multi-AZ in the primary Region

C.Create a cross-Region read replica of the RDS instance in the secondary Region

D.Create an AWS Lambda function to promote the read replica to primary in the secondary Region during a disaster

E.Enable automated backups with cross-Region copy enabled to the secondary Region

AnswersC, D, E

Provides low-lag replication and fast promotion.

Why this answer

B, C, and E are correct. Cross-Region read replica provides near-real-time replication (low RPO) and can be promoted quickly (low RTO). Automated backups are replicated to another Region and can be restored, but RTO may exceed 15 minutes.

Option A is wrong because manual snapshots cannot achieve <1 minute RPO. Option D is wrong because Multi-AZ protects within a Region only.

Practice this question →

194

Multi-Selecthard

A company is migrating a monolithic application to a microservices architecture on Amazon EKS. The application uses a relational database. The team wants to ensure that database connections are managed efficiently and that the database can withstand a sudden spike in connections from multiple microservices. Which solution should the DevOps engineer implement? (Choose THREE.)

Select 3 answers

A.Use direct database connections from each microservice pod.

B.Use Amazon ElastiCache for Redis to cache database query results.

C.Deploy Amazon RDS Proxy in front of the database.

D.Configure the database to have a higher max_connections and enable Auto Scaling.

E.Implement a connection pool sidecar container (e.g., PgBouncer) in each EKS pod.

AnswersC, D, E

RDS Proxy manages connection pooling and reduces database load.

Why this answer

Option A is correct because RDS Proxy pools and shares database connections, reducing the number of connections to the database. Option D is correct because configuring the database with appropriate max_connections and using Auto Scaling (e.g., Aurora Auto Scaling) helps handle spikes. Option E is correct because using a connection pool sidecar (e.g., PgBouncer) in the EKS pods adds connection pooling at the application level.

Option B (direct connections) would increase load. Option C (ElastiCache) is for caching, not connection management.

Practice this question →

195

MCQmedium

A company runs a critical web application on EC2 instances behind an Application Load Balancer (ALB) across multiple Availability Zones. During a recent failure of one AZ, the application experienced downtime because the Auto Scaling group did not launch new instances quickly enough. What should a DevOps engineer do to improve resilience?

A.Configure the Auto Scaling group to span multiple AZs and enable health checks to replace unhealthy instances.

B.Use a larger AMI to reduce boot times.

C.Increase the instance size of the EC2 instances to handle more traffic.

D.Configure the Auto Scaling group to launch instances in a single AZ with a larger instance count.

AnswerA

Multiple AZs provide high availability and health checks ensure quick replacement.

Why this answer

Option D is correct because distributing instances across multiple AZs ensures that failure of one AZ does not affect the entire application, and using a multi-AZ deployment with proper Auto Scaling group configuration is a best practice for high availability. Option A is wrong because increasing instance size does not mitigate AZ failure. Option B is wrong because a single AZ still presents a single point of failure.

Option C is wrong because using a larger AMI does not help with resilience.

Practice this question →

196

MCQeasy

A company runs a web application on EC2 instances behind an ALB. To improve resilience, they want to automatically replace failed instances and maintain a minimum number of instances. Which AWS service should be used?

A.Amazon EC2 Auto Scaling

B.AWS CloudFormation

C.AWS Elastic Beanstalk

D.AWS Systems Manager

AnswerA

Auto Scaling automatically replaces unhealthy instances and maintains a minimum count.

Why this answer

Auto Scaling is designed to automatically replace failed instances and maintain a desired capacity, improving resilience.

Practice this question →

197

Multi-Selectmedium

A company runs a stateful web application on EC2 instances behind an ALB. The application stores session data in memory. The company wants to make the application stateless to improve resilience. Which TWO changes should the company make?

Select 2 answers

A.Increase the instance memory to store more sessions

B.Disable sticky sessions on the ALB

C.Enable sticky sessions (session affinity) on the ALB

D.Store session data in Amazon ElastiCache for Redis

E.Use an NLB instead of an ALB

AnswersB, D

Without stickiness, any instance can serve any request if state is external.

Why this answer

B and E are correct. Storing session data externally (ElastiCache) makes instances stateless. Enabling sticky sessions is not needed if state is external; however, the question asks for statelessness, so removing stickiness is correct.

Option A is wrong because stickiness contradicts statelessness. Option C is wrong because in-memory storage is the problem. Option D is wrong because it doesn't address session state.

Practice this question →

198

MCQhard

A company's application on Amazon ECS experiences intermittent failures when the task attempts to access an S3 bucket. The task role has the correct S3 permissions. What is the most likely cause?

A.The task is using the wrong IAM role

B.The S3 bucket is in a different region

C.The S3 bucket has public access blocked

D.The S3 bucket policy explicitly denies access from the task's VPC

AnswerD

Bucket policies can override IAM permissions and cause failures.

Why this answer

If the S3 bucket policy denies access from the task's VPC or source, it can cause intermittent failures.

Practice this question →

199

MCQeasy

A company wants to protect its S3 bucket data from accidental deletion or overwrite. Which feature should be enabled?

A.Enable cross-region replication

B.Apply a bucket policy that denies DeleteObject

C.Enable S3 Versioning

D.Enable MFA Delete

AnswerC

Preserves previous versions.

Why this answer

S3 Versioning keeps all versions, allowing recovery from deletes and overwrites. Option B is wrong because MFA Delete is an additional protection but not the primary. Option C is wrong because bucket policies control access.

Option D is wrong because replication is for cross-region copies.

Practice this question →

200

MCQmedium

A company uses AWS CodeDeploy for blue/green deployments to an Auto Scaling group. The deployment fails because the new instances do not pass health checks. The DevOps engineer discovers that the health check URL returns a 503 error. What is the MOST likely cause?

A.The target group health check path is '/health' but the application does not serve that endpoint

B.The CodeDeploy agent on the new instances is not running

C.The security group for the ALB does not allow inbound traffic on port 80

D.The Auto Scaling group health check type is set to EC2 instead of ELB

AnswerA

A 503 indicates the application is running but not handling the request correctly.

Why this answer

The health check URL returning a 503 error indicates that the application is not responding to the health check endpoint. Since the target group health check path is configured as '/health' but the application does not serve that endpoint, the ALB considers the instances unhealthy, causing CodeDeploy to fail the deployment. This is the most direct cause because the health check is failing at the application layer, not due to infrastructure issues.

Exam trap

The trap here is that candidates may confuse a 503 error with a network-level failure (like a security group blocking traffic) rather than recognizing it as an application-layer response indicating the health check endpoint is missing or misconfigured.

How to eliminate wrong answers

Option B is wrong because if the CodeDeploy agent were not running, the deployment would likely fail earlier (e.g., during the Install event) or the agent would not report success, but the health check failure (503) specifically indicates the application is running but not responding correctly. Option C is wrong because if the security group for the ALB did not allow inbound traffic on port 80, the health check would likely time out or return a connection refused error, not a 503 (Service Unavailable) which is an HTTP response from the application. Option D is wrong because the Auto Scaling group health check type (EC2 vs ELB) affects how ASG replaces unhealthy instances, but it does not directly cause the health check URL to return a 503; the 503 error is a symptom of the application not serving the correct endpoint.

Practice this question →

201

Matchingmedium

Match each AWS CloudFormation concept to its description.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts

Matches

Collection of AWS resources managed as a single unit

JSON or YAML document describing AWS resources

Preview of changes before applying to a stack

Enables stack creation across multiple accounts and regions

Identifies differences between stack and actual resource configurations

Why these pairings

These are key CloudFormation constructs and operations.

Practice this question →

202

MCQeasy

A company uses Amazon Route 53 to route traffic to an Application Load Balancer. They want to improve availability by routing traffic to multiple ALBs in different AWS Regions. Which routing policy should they use?

A.Latency-based routing policy

B.Weighted routing policy

C.Geolocation routing policy

D.Simple routing policy

AnswerA

Routes to lowest latency region, and supports health checks.

Why this answer

Latency-based routing policy is correct because it directs traffic to the AWS Region that provides the lowest latency for the end user, improving availability and performance by distributing requests across multiple Application Load Balancers in different regions. This policy uses latency measurements between the user and each region to select the optimal endpoint, ensuring that if one region becomes unavailable, traffic is automatically routed to the next lowest-latency region.

Exam trap

The trap here is that candidates often confuse latency-based routing with geolocation routing, mistakenly thinking that geographic proximity equals low latency, but latency-based routing uses actual network measurements rather than fixed geographic boundaries.

How to eliminate wrong answers

Option B (Weighted routing policy) is wrong because it distributes traffic based on assigned weights (e.g., 80% to one region, 20% to another) and does not consider real-time latency or availability; it is designed for load balancing or testing, not for optimizing user-perceived performance across regions. Option C (Geolocation routing policy) is wrong because it routes traffic based on the geographic location of the user (e.g., country or continent), not on actual network latency or regional health; it can cause traffic to be sent to a distant region if the user's location is mapped there, even if that region is degraded. Option D (Simple routing policy) is wrong because it only supports a single record with multiple values (e.g., multiple IPs) and returns all values in a random order without any health checking or latency awareness, making it unsuitable for active-active multi-region failover.

Practice this question →

203

Multi-Selecthard

A company uses DynamoDB global tables for a multi-region application. They notice that write conflicts are occurring. Which TWO strategies can reduce write conflicts?

Select 2 answers

A.Reduce read capacity units to limit concurrent reads

B.Enable DynamoDB Streams with last writer wins

C.Use conditional writes in the application code

D.Increase write capacity units on the table

E.Implement application-level conflict resolution

AnswersC, E

Prevents overwriting if condition fails.

Why this answer

Using conditional writes ensures no overwrites unless condition is met, reducing conflicts. Application-level conflict resolution can handle conflicts when they occur. Option C is wrong because increasing WCU does not reduce conflicts.

Option D is wrong because last writer wins is default and may cause data loss. Option E is wrong because reducing read capacity is irrelevant.

Practice this question →

204

MCQmedium

A company uses AWS Lambda to process messages from an SQS queue. They need to ensure that if the Lambda function fails, the message is not lost and can be processed again. Which configuration is required?

A.Set the visibility timeout to less than the Lambda function timeout.

B.Enable SQS redrive policy to retry messages.

C.Configure a dead-letter queue (DLQ) on the SQS queue.

D.Set the Lambda event source mapping to not delete messages from the queue on failure.

AnswerD

This ensures the message remains in the queue for retry.

Why this answer

Option C is correct because the Lambda event source mapping can be configured to delete messages only after successful processing; if the function fails, the message remains in the queue. Option A is incorrect because a DLQ is for messages that repeatedly fail, not for retry. Option B is incorrect because the visibility timeout should be set longer than the function timeout to avoid other consumers picking up the message.

Option D is incorrect because SQS itself does not retry; the Lambda service handles retries based on the event source mapping.

Practice this question →

205

MCQmedium

A company is running a stateful web application on Amazon EC2 instances. The application stores session data locally on the instance. The company wants to make the application stateless and improve resilience. The DevOps team decides to use Amazon ElastiCache for Redis to store session data. What additional step should the team take to ensure that the session data is highly available?

A.Deploy ElastiCache for Redis in a Multi-AZ configuration with automatic failover.

B.Use a single ElastiCache for Redis node and rely on the application to reconnect if the node fails.

C.Enable ElastiCache for Redis cluster mode with automatic failover across multiple Availability Zones.

D.Configure ElastiCache for Redis to back up session data to Amazon S3 every 5 minutes.

AnswerC

Cluster mode with automatic failover ensures HA.

Why this answer

Option B is correct. Enabling ElastiCache for Redis cluster mode with automatic failover ensures that if the primary node fails, a read replica is promoted, providing high availability. Option A is wrong because Multi-AZ for ElastiCache is for replication groups, not cluster mode.

Option C is wrong because backing up to S3 does not provide automatic failover. Option D is wrong because manually promoting a replica is not automatic.

Practice this question →

206

MCQeasy

A company uses AWS CloudFormation to deploy infrastructure. During a recent deployment, the stack failed to create an Amazon RDS DB instance because of a parameter validation error. The DevOps engineer fixed the parameter and wants to resume the stack creation without recreating the resources that were already successfully created. The stack template is parameterized and uses nested stacks. What is the MOST efficient way to resume the stack creation?

A.Use the CloudFormation stack update operation with the corrected parameter.

B.Manually create the RDS instance with the corrected parameter and update the stack to import it.

C.Delete the entire stack and redeploy with the corrected parameter.

D.Use the 'ContinueUpdateRollback' feature to rollback the failed stack and then redeploy.

AnswerA

Update will only modify the failed resource, preserving already created resources.

Why this answer

CloudFormation stack updates can be used to fix the issue. By updating the stack with the corrected parameters, CloudFormation will only modify the failed resource and not recreate already created resources.

Practice this question →

207

MCQeasy

A company uses Amazon DynamoDB as the database for a mobile application. The application requires single-digit millisecond read and write latency and must be resilient to the failure of an entire AWS Region. Which DynamoDB feature should the company use?

A.DynamoDB point-in-time recovery (PITR)

B.DynamoDB global tables

C.DynamoDB Accelerator (DAX)

D.DynamoDB on-demand capacity mode

AnswerB

Global tables replicate data across Regions, providing low latency and resilience to Region failures.

Why this answer

DynamoDB global tables provide multi-Region, multi-active replication, ensuring the application can withstand an entire AWS Region failure while maintaining single-digit millisecond read and write latency in each Region. This is achieved through DynamoDB Streams and a last-writer-wins conflict resolution mechanism, making it the correct choice for cross-Region resilience.

Exam trap

The trap here is that candidates often confuse high-availability features like DAX (caching) or PITR (backup) with true disaster recovery and multi-Region resilience, failing to recognize that only global tables replicate data across Regions for active-active failover.

How to eliminate wrong answers

Option A is wrong because point-in-time recovery (PITR) protects against accidental writes or deletions by enabling restoration to any point within the last 35 days, but it does not provide cross-Region resilience or continuous availability during a Region outage. Option C is wrong because DynamoDB Accelerator (DAX) is an in-memory cache that improves read latency but operates within a single Region and does not replicate data across Regions, offering no protection against a full Region failure. Option D is wrong because on-demand capacity mode handles traffic spikes automatically but is a scaling feature within a single Region, not a disaster recovery or multi-Region replication solution.

Practice this question →

208

MCQeasy

A company uses Amazon Route 53 for DNS. They want to ensure that if their primary website endpoint fails, traffic is automatically routed to a secondary endpoint in a different Region. Which routing policy should be used?

A.Latency routing

B.Simple routing

C.Failover routing

D.Weighted routing

AnswerC

Failover routing performs automatic failover based on health checks.

Why this answer

Failover routing policy allows you to configure an active-passive failover setup.

Practice this question →

209

MCQhard

Refer to the exhibit. A DevOps engineer runs the describe-target-health command and receives the output shown. The ALB target group has two instances. One instance is healthy, and the other is unhealthy with a 502 error. What is the most likely cause of the 502 error?

A.The security group for the instance does not allow inbound traffic on port 80 from the ALB.

B.The application running on the instance is not responding correctly or has crashed.

C.The instance's route table does not have a route to the internet gateway.

D.The health check path is configured to return a 404 status code.

AnswerB

A 502 Bad Gateway indicates the target closed the connection or the application is faulty.

Why this answer

Option D is correct because a 502 Bad Gateway error from an ALB indicates that the target (EC2 instance) has closed the connection or the application is not responding properly. Common causes include the web server or application crashing, or the target not being able to handle the request. Option A is wrong because security group rules would cause connection timeouts (504) or refused connections, not 502.

Option B is wrong because a missing route table would cause network unreachability, not a 502. Option C is wrong because the health check path returning a 404 would result in a 404, not a 502.

Practice this question →

210

MCQeasy

A company's DevOps team is designing a disaster recovery plan for a critical application. The application runs on EC2 instances with an RDS MySQL database. The Recovery Time Objective (RTO) is 15 minutes, and the Recovery Point Objective (RPO) is 1 hour. Which approach BEST meets these requirements?

A.Use backup and restore with daily snapshots stored in S3 and cross-Region replication.

B.Use a multi-Region application with Route 53 latency-based routing and RDS read replicas in the DR Region.

C.Use a warm standby strategy with a scaled-down copy of the production environment in the DR Region, and replicate data using RDS Multi-AZ with synchronous replication.

D.Use a pilot light strategy with EC2 instances stopped and RDS snapshots copied to the DR Region.

AnswerC

Warm standby allows quick failover; synchronous replication meets RPO of 1 hour.

Why this answer

Option C is correct because a warm standby strategy with a scaled-down copy of the production environment in the DR Region, combined with RDS Multi-AZ using synchronous replication, meets the RTO of 15 minutes and RPO of 1 hour. Multi-AZ synchronous replication ensures zero data loss (RPO of seconds) and automatic failover within minutes, while the warm standby environment can be scaled up quickly to handle production traffic, satisfying the RTO.

Exam trap

The trap here is that candidates often confuse Multi-AZ (which is for high availability within a Region) with cross-Region disaster recovery, but the question's RTO/RPO requirements are met by combining Multi-AZ synchronous replication (for near-zero RPO) with a warm standby environment (for fast RTO), not by using asynchronous read replicas or snapshot-based approaches.

How to eliminate wrong answers

Option A is wrong because daily snapshots with cross-Region replication result in an RPO of up to 24 hours, far exceeding the 1-hour requirement, and the restore process takes longer than 15 minutes. Option B is wrong because Route 53 latency-based routing is for active-active traffic distribution, not disaster recovery failover, and RDS read replicas are asynchronous, leading to potential data loss and RPO that can exceed 1 hour during a failure. Option D is wrong because a pilot light strategy with stopped EC2 instances and RDS snapshots copied to the DR Region requires provisioning and restoring from snapshots, which typically takes longer than 15 minutes to become fully operational, and the RPO is limited by snapshot frequency.

Practice this question →

211

MCQeasy

Refer to the exhibit. A DevOps engineer applies the IAM policy shown to an S3 bucket to enforce server-side encryption. However, users report that some uploads succeed without encryption. What is the most likely reason?

A.The policy uses StringEquals instead of StringNotEquals.

B.The policy only allows the action but does not deny actions that do not meet the condition.

C.The resource ARN is incorrect; it should be the bucket ARN.

D.The action should be s3:PutEncryptedObject instead of s3:PutObject.

AnswerB

Without an explicit Deny, other policies may allow uploads without encryption.

Why this answer

Option B is correct because the IAM policy only allows the s3:PutObject action when the encryption condition is met, but it does not include an explicit Deny statement to block uploads that do not satisfy the condition. In IAM, an Allow statement with a condition does not automatically deny requests that fail the condition; it simply does not apply the Allow. If there is another policy (e.g., a bucket policy or an identity-based policy) that grants s3:PutObject without the encryption condition, or if the default S3 behavior permits unencrypted uploads (since S3 does not require encryption by default), then unencrypted uploads can still succeed.

To enforce encryption, you must add a Deny statement with a condition like `StringNotEquals` on `s3:x-amz-server-side-encryption` to explicitly reject requests that lack the required encryption header.

Exam trap

The trap here is that candidates assume an Allow statement with a condition implicitly denies requests that don't meet the condition, but AWS IAM requires an explicit Deny to block non-compliant requests, and the absence of that Deny is the root cause of the enforcement failure.

How to eliminate wrong answers

Option A is wrong because using StringEquals is correct for allowing only requests with the specified encryption value; the issue is not the operator but the lack of a Deny statement. Option C is wrong because the resource ARN in the policy is correct for the bucket itself (e.g., `arn:aws:s3:::bucket-name`), and the action s3:PutObject applies to objects, but the policy's Resource field can be the bucket ARN or the bucket ARN with a wildcard for objects; the given ARN is not the root cause of the failure to enforce encryption. Option D is wrong because there is no such action as s3:PutEncryptedObject in AWS S3; encryption is controlled via request headers and conditions, not a separate API action.

Practice this question →

212

Multi-Selecthard

Which THREE strategies can improve the resilience of an Amazon RDS for PostgreSQL database?

Select 3 answers

A.Disable automated backups to save costs

B.Enable automated backups with a retention period

C.Create read replicas in another Availability Zone

D.Use a single-AZ instance to reduce complexity

E.Enable Multi-AZ deployment

AnswersB, C, E

Allows point-in-time recovery.

Why this answer

Multi-AZ deployment provides automatic failover. Automated backups allow point-in-time recovery. Read replicas in other AZs provide read scalability and can be promoted in a disaster.

Option B is wrong because a single-AZ is not resilient. Option E is wrong because disabling backups prevents recovery.

Practice this question →

213

MCQhard

A company runs a critical application on EC2 instances in an Auto Scaling group. The application uses an EBS volume attached to each instance for temporary data. The company needs to ensure that if an instance fails, the data is not lost, and the new instance can resume quickly. What should they do?

A.Store temporary data on instance store volumes instead of EBS

B.Use an EBS snapshot and create a new volume from the snapshot for the new instance

C.Migrate to Amazon EFS and mount the same file system on all instances

D.Use an EBS Multi-Attach volume attached to all instances in the Auto Scaling group

AnswerD

Multi-Attach allows shared block storage across instances in the same AZ.

Why this answer

Using an EBS Multi-Attach volume allows multiple instances in the same AZ to attach the same volume, providing shared access and quick recovery. Option A is wrong because EBS snapshots are point-in-time and take time to restore. Option C is wrong because EFS is a file system, not block storage, and may not be compatible with the application.

Option D is wrong because instance store is ephemeral.

Practice this question →

214

Multi-Selecthard

A company is migrating a monolithic application to a microservices architecture on AWS. To improve resilience, which THREE design patterns should be implemented? (Select THREE.)

Select 3 answers

A.Synchronous communication between services to ensure consistency

B.Single shared database to maintain data consistency

C.Retry with exponential backoff for transient failures

D.Bulkhead pattern to isolate critical services from non-critical ones

E.Circuit breaker pattern to stop calls to a failing service

AnswersC, D, E

Handles transient errors gracefully.

Why this answer

Option C is correct because implementing retry with exponential backoff allows services to handle transient failures (e.g., network timeouts, throttling) by automatically retrying operations after increasing delays, reducing load on recovering systems. This pattern is essential for microservices on AWS, where services like DynamoDB or Lambda may throttle requests, and exponential backoff (e.g., using jitter as per AWS SDK defaults) prevents cascading failures.

Exam trap

The trap here is that candidates confuse synchronous communication (Option A) with resilience, but in microservices, synchronous calls increase failure propagation, while asynchronous patterns and the three selected patterns (retry, circuit breaker, bulkhead) are the correct resilience mechanisms.

Practice this question →

215

MCQhard

A company runs a containerized application on Amazon ECS with Fargate launch type. The application experiences intermittent failures when the ECS service scheduler attempts to place tasks during a deployment. The DevOps engineer notices that tasks fail to start due to insufficient IP addresses in the VPC subnets. What is the MOST resilient solution to prevent this issue?

A.Create an ECS service-linked role with permissions to allocate IPs.

B.Increase the desired task count in the ECS service to pre-warm IP addresses.

C.Use VPC endpoints for ECS to reduce IP usage.

D.Configure the ECS service to use multiple subnets with larger CIDR blocks across multiple Availability Zones.

AnswerD

More subnets and larger CIDRs increase available IPs and resilience.

Why this answer

Option D is correct because using a larger CIDR block for subnets provides more IP addresses, and using multiple subnets across Availability Zones increases availability and capacity. Option A is wrong because increasing desired count does not solve IP shortage. Option B is wrong because ECS service-linked role does not affect IP allocation.

Option C is wrong because VPC endpoints do not provide IP addresses for tasks.

Practice this question →

216

MCQeasy

A company runs a containerized application on Amazon ECS with Fargate. The application needs to store session state. Which service provides the MOST resilient and scalable solution?

A.Amazon ElastiCache for Redis

B.Amazon EFS

C.Ephemeral storage on the container instance

D.Amazon S3

AnswerA

In-memory, low latency, supports replication and failover.

Why this answer

Option C is correct because ElastiCache for Redis provides a highly available, scalable, and fast in-memory data store for session state. Option A is wrong because EFS is for file storage, not session state. Option B is wrong because S3 is object storage with higher latency.

Option D is wrong because local storage is ephemeral.

Practice this question →

217

MCQhard

A DevOps engineer is troubleshooting an issue where an EC2 instance behind an ALB target group is marked as unhealthy. The instance i-0abcd1234efgh5678 is serving traffic but the health check is timing out. The security group for the instance allows inbound HTTP from the ALB's security group. What is the most likely cause?

A.The instance's web server is overloaded and not responding within the health check interval.

B.The instance is in a private subnet with no route to the internet.

C.The health check path is misconfigured and returns a 404 status code.

D.The security group of the ALB does not allow outbound traffic to the instance.

AnswerA

A timeout indicates the instance is not responding in time, often due to high CPU or application hang.

Why this answer

Option D is correct because the health check is timing out, which suggests the instance is not responding within the timeout period. The instance may be busy or the health check path may be slow. Option A (security group) is already allowed.

Option B (subnet) doesn't cause timeout. Option C (health check path) could be a cause if it's incorrect, but timing out indicates the request is reaching the instance but not getting a response in time.

Practice this question →

218

MCQeasy

A company wants to ensure its Amazon RDS DB instance is highly available with automatic failover in case of an AZ failure. Which configuration should they use?

A.Multi-AZ deployment

B.Amazon RDS Proxy

C.Single-AZ with automated backups

D.Read replicas in multiple AZs

AnswerA

Multi-AZ provides automatic failover for high availability.

Why this answer

Multi-AZ deployment provides automatic failover to a standby instance in another AZ.

Practice this question →

219

MCQmedium

A company runs a web application on EC2 instances behind an ALB. To improve resilience, they want to automatically re-register failed instances. Which solution meets this requirement?

A.Set up a CloudWatch alarm to terminate the instance and notify an operator to re-register it.

B.Enable EC2 instance recovery and configure ALB health checks to deregister unhealthy instances.

C.Configure Auto Scaling to launch a new instance on instance failure.

D.Use Route 53 health checks to detect failure and update DNS to remove the instance.

AnswerB

EC2 instance recovery replaces the instance and ALB health checks will automatically re-register it once healthy.

Why this answer

Option C is correct because an ALB health check failure triggers deregistration and, with instance recovery, the instance is replaced and re-registered. Option A is wrong because Auto Scaling launches new instances but doesn't automatically re-register failed ones without a scaling policy. Option B is wrong because Route 53 health checks don't manage instance registration with ALB.

Option D is wrong because CloudWatch does not directly re-register instances.

Practice this question →

220

Multi-Selecteasy

A company is designing a highly available architecture for a web application that uses Amazon EC2 instances. The application must be resilient to the failure of a single instance and a single Availability Zone. Which TWO actions should the company take? (Choose TWO.)

Select 2 answers

A.Use an Auto Scaling group with a minimum of two instances spread across two Availability Zones.

B.Distribute EC2 instances across at least two Availability Zones.

C.Place all EC2 instances in a single Availability Zone and use a Network Load Balancer.

D.Use a single Application Load Balancer in one Availability Zone.

E.Use a single large EC2 instance in one Availability Zone.

AnswersA, B

Auto Scaling group with multiple AZs provides automatic recovery and resilience.

Why this answer

Option A is correct because an Auto Scaling group with a minimum of two instances spread across two Availability Zones ensures that if one instance or one entire AZ fails, the remaining instance(s) in the other AZ can continue serving traffic, and Auto Scaling will automatically launch a replacement instance in the healthy AZ to restore the desired count. Option B is correct because distributing EC2 instances across at least two Availability Zones is the fundamental requirement for AZ-level resilience, as it eliminates a single point of failure at the AZ boundary.

Exam trap

The trap here is that candidates often think a load balancer alone provides high availability, but they overlook that the load balancer itself must be deployed across multiple AZs (or be a Regional service like ALB with cross-zone load balancing enabled) and that instances must be in at least two AZs to survive an AZ failure.

Practice this question →

221

Multi-Selecteasy

A company is deploying a critical web application on AWS and needs to ensure high availability and disaster recovery across multiple AWS Regions. The application uses an Application Load Balancer (ALB) in the primary Region and an Amazon RDS Multi-AZ DB instance. Which TWO actions should the company take to meet these requirements? (Choose two.)

Select 2 answers

A.Configure an Amazon RDS Multi-AZ deployment in a secondary Region.

B.Set up Amazon CloudFront with multiple origins pointing to each Region's ALB.

C.Create an Auto Scaling group in the secondary Region that automatically scales up when the primary fails.

D.Use AWS Global Accelerator with endpoint groups in multiple Regions.

E.Configure Amazon Route 53 with a failover routing policy and health checks.

AnswersD, E

Global Accelerator provides cross-region failover.

Why this answer

Option B is correct because AWS Global Accelerator provides static IP addresses and routes traffic to the optimal endpoint across regions, supporting active-passive failover. Option E is correct because Route 53 health checks can monitor the primary ALB and automatically failover to a secondary ALB in another region. Option A is wrong because RDS Multi-AZ only provides high availability within a single region, not cross-region disaster recovery.

Option C is wrong because Auto Scaling groups are for scaling within a region, not cross-region failover. Option D is wrong because CloudFront is a CDN, not designed for regional failover routing.

Practice this question →

222

MCQmedium

A DevOps engineer runs the above command and sees that one target is unhealthy with reason 'Target.Timeout'. The target is an EC2 instance running a web server on port 80. The security group for the instance allows inbound traffic on port 80 from the ALB's security group. What is the most likely cause of the health check failure?

A.The instance does not have a public IP address.

B.The instance's network ACL is blocking inbound traffic from the internet.

C.The web server on the instance is not responding to health check requests on port 80.

D.The security group for the ALB does not allow outbound traffic to the instance.

AnswerC

Timeout indicates no response from the web server.

Why this answer

Option B is correct. The 'Target.Timeout' reason indicates that the health check request timed out, meaning the instance is not responding within the timeout period. The most common cause is that the web server is not running or is not listening on the correct port.

Option A is wrong because the security group already allows traffic from the ALB. Option C is wrong because the health check is from the ALB, not the internet. Option D is wrong because the health check is to the instance's IP, not a public IP.

Practice this question →

223

MCQhard

A company uses an NLB to distribute traffic to a fleet of EC2 instances in a single Availability Zone. During a recent AWS outage in that zone, the application became completely unavailable. The company wants to achieve high availability without rearchitecting the application. Which change is MOST appropriate?

A.Use a larger instance type and enable detailed CloudWatch monitoring

B.Replace the NLB with an Application Load Balancer and enable cross-zone load balancing

C.Create an Auto Scaling group with a scheduled scaling policy to add instances during peak hours

D.Launch EC2 instances in a second Availability Zone and register them with the NLB target group

AnswerD

Distributes traffic across zones, providing high availability.

Why this answer

Option D is correct because NLB automatically routes traffic to healthy targets across Availability Zones when targets are registered. Option A is wrong because ALB is not required; NLB can handle multi-AZ. Option B is wrong because instance type does not affect availability.

Option C is wrong because it does not address zone failure.

Practice this question →

224

MCQeasy

A company is designing a multi-region active-active architecture for a stateless web application. The application uses a DynamoDB table as its data store. The company wants to minimize write latency and ensure that writes are accepted in any region with eventual consistency. Which DynamoDB feature should they use?

A.DynamoDB read replicas in each region.

B.Cross-region replication using AWS Lambda function.

C.DynamoDB global tables.

D.DynamoDB Accelerator (DAX) with multi-region endpoints.

AnswerC

Global tables provide multi-region, multi-master replication for active-active.

Why this answer

Option B is correct because DynamoDB global tables provide multi-region, multi-master replication for active-active applications. Option A is wrong because DynamoDB Accelerator (DAX) is a cache for read-heavy workloads, not multi-region writes. Option C is wrong because cross-region replication with Lambda is not built-in and adds complexity.

Option D is wrong because read replicas do not accept writes.

Practice this question →

225

Multi-Selecthard

A company is designing a disaster recovery plan for a MySQL database running on Amazon RDS. The database is critical and must have an RPO of 5 minutes and an RTO of 1 hour. The primary Region is us-east-1, and the DR Region is us-west-2. Which TWO steps should the company take to meet these requirements? (Choose TWO.)

Select 2 answers

A.Create a cross-Region read replica in us-west-2.

B.Enable automated backups with a 5-minute backup window.

C.Configure cross-Region automated snapshot copy to us-west-2.

D.Set up a process to promote the read replica to a standalone instance in us-west-2 during a disaster.

E.Enable Multi-AZ deployment in us-east-1.

AnswersA, D

Cross-Region read replicas replicate with low lag, achieving RPO under 5 minutes.

Why this answer

A cross-Region read replica in us-west-2 provides a near-real-time copy of the primary database, with replication lag typically measured in seconds, easily meeting the 5-minute RPO. During a disaster, promoting this read replica to a standalone instance in us-west-2 can be completed in minutes, satisfying the 1-hour RTO. This approach avoids the recovery time needed to restore from a snapshot or backup.

Exam trap

The trap here is that candidates confuse Multi-AZ (which provides automatic failover within a Region) with cross-Region disaster recovery, or assume that automated backups or snapshot copies can meet a low RPO/RTO when they actually require time-consuming restore operations.

Practice this question →

← PreviousPage 3 of 4 · 259 questions totalNext →

Ready to test yourself?

Try a timed practice session using only Resilient Cloud questions.

Start 20-question session