CCNA Resilient Cloud Questions

75 of 259 questions · Page 1/4 · Resilient Cloud topic · Answers revealed

1
MCQhard

A company runs a critical application on AWS Lambda that processes messages from an Amazon SQS queue. The application must be resilient to downstream service failures. The team notices that when the downstream service is unhealthy, messages are repeatedly retried and eventually sent to the dead-letter queue (DLQ) before the service recovers. What design change would improve resilience by allowing automatic retries after the downstream service recovers?

A.Configure the SQS queue with a large visibility timeout (e.g., 6 hours) and use a redrive policy only after a high number of receives. Keep the messages in the queue and retry when the downstream service becomes healthy.
B.Reduce the maxReceiveCount to 1 so that messages are sent to DLQ immediately, then reprocess them from DLQ later.
C.Increase the message retention period to 14 days and use a DLQ with high retention.
D.Use Amazon SNS to fan out messages to multiple SQS queues, each with different retry policies.
AnswerA

Long visibility timeout and high maxReceiveCount allow messages to be retried over an extended period.

Why this answer

Option A is correct because increasing the visibility timeout to a long duration (e.g., 6 hours) prevents messages from being repeatedly retried and sent to the DLQ while the downstream service is unhealthy. Instead, messages remain in the SQS queue and become visible again only after the visibility timeout expires, allowing automatic retries once the downstream service recovers. This approach avoids premature DLQ delivery and leverages SQS's built-in redrive policy based on maxReceiveCount.

Exam trap

The trap here is that candidates often think increasing the DLQ retention or reducing retries (maxReceiveCount) is the solution, but the real key is controlling the retry timing via the visibility timeout to allow the downstream service to recover before messages are exhausted.

How to eliminate wrong answers

Option B is wrong because reducing maxReceiveCount to 1 sends messages to the DLQ immediately after the first failure, which defeats resilience by not allowing any retries and requiring manual reprocessing from the DLQ. Option C is wrong because increasing the message retention period and using a DLQ with high retention does not prevent messages from being sent to the DLQ prematurely; it only keeps them in the DLQ longer, but the downstream service may recover before the messages are consumed from the DLQ. Option D is wrong because using SNS to fan out to multiple SQS queues with different retry policies adds complexity and does not address the core issue of preventing premature DLQ delivery; it still relies on the same visibility timeout and retry mechanism.

2
Multi-Selectmedium

A company runs a microservices application on Amazon ECS with Fargate. The services need to be resilient to AZ failures. Which TWO actions should the company take? (Choose two.)

Select 2 answers
A.Configure the ECS service to spread tasks across multiple Availability Zones
B.Enable Service Auto Scaling to maintain desired count across AZs
C.Use a Network Load Balancer in each AZ for the service
D.Use a placement group to ensure tasks are launched on the same underlying hardware
E.Place all tasks in a single Availability Zone to minimize cross-AZ latency
AnswersA, B

Spreading across AZs provides fault tolerance.

Why this answer

Spreading tasks across multiple AZs and using service auto scaling ensure resilience. Option A is wrong because a single AZ is not resilient. Option C is wrong because Fargate tasks are not attached to EC2 instances.

Option E is wrong because a standalone ALB in one AZ is a single point of failure.

3
MCQmedium

A company is deploying a critical microservice on Amazon ECS with Fargate. They need to ensure that the service can tolerate an Availability Zone failure. What is the BEST approach?

A.Use a cluster placement constraint to spread tasks across instances
B.Use EC2 launch type and spread tasks across instance types
C.Define the service to spread tasks across multiple Availability Zones
D.Configure service auto scaling to add tasks when CPU is high
AnswerC

ECS Fargate services can spread across AZs.

Why this answer

Spreading tasks across multiple AZs ensures that if one AZ fails, tasks in other AZs continue. Option A is wrong because Fargate doesn't support instance diversity. Option C is wrong because service auto scaling adds tasks but not across AZs.

Option D is wrong because cluster placement constraints are not used with Fargate.

4
Multi-Selecteasy

A company wants to ensure that its Amazon S3 bucket is resilient to accidental deletion of objects. Which TWO actions should be taken?

Select 2 answers
A.Enable MFA Delete on the bucket.
B.Enable S3 Object Lock.
C.Enable S3 Versioning.
D.Enable S3 Transfer Acceleration.
E.Configure a lifecycle policy to expire objects after 30 days.
AnswersA, C

Requires MFA to permanently delete versions.

Why this answer

Enable Versioning to keep multiple versions of objects, and enable MFA Delete to require multi-factor authentication for permanent deletions.

5
MCQmedium

A company experiences intermittent high latency for a web application running on EC2 behind an ALB. They want to monitor and automatically replace instances that have high CPU. Which solution meets this requirement?

A.Create a CloudWatch alarm on CPU utilization that triggers an Auto Scaling policy to replace the instance
B.Use Auto Scaling scheduled scaling actions to replace instances at peak times
C.Use AWS Lambda to periodically check CPU and terminate high-CPU instances
D.Configure the ALB health check to mark instances unhealthy when CPU is high
AnswerA

CloudWatch alarm can trigger a scale-in or terminate action.

Why this answer

Option A is correct because you can configure a CloudWatch alarm on the EC2 instance's CPU utilization metric, and then use that alarm to trigger an Auto Scaling lifecycle hook or a scaling policy that terminates the unhealthy instance and launches a replacement. This directly ties performance monitoring to automated instance replacement, meeting the requirement to replace instances with high CPU.

Exam trap

The trap here is that candidates often confuse ALB health checks with instance health monitoring, assuming ALB can react to CPU metrics, when in fact ALB health checks only verify application-level responsiveness (e.g., HTTP status codes) and cannot directly measure CPU utilization.

How to eliminate wrong answers

Option B is wrong because scheduled scaling actions replace instances at fixed times, not in response to real-time high CPU utilization, so they cannot address intermittent latency. Option C is wrong because while Lambda could terminate instances, it adds unnecessary complexity and latency, and Auto Scaling already provides native health-check-based replacement without custom code. Option D is wrong because ALB health checks are designed to detect application or network failures (e.g., HTTP 5xx, connection timeouts), not CPU utilization; they cannot be configured to mark instances unhealthy based on CPU metrics.

6
MCQmedium

A DevOps team is designing a disaster recovery solution for an Amazon RDS for MySQL database. The primary database is in us-east-1, and the recovery point objective (RPO) is 5 minutes, recovery time objective (RTO) is 1 hour. Which solution meets these requirements?

A.Enable Multi-AZ deployment for high availability.
B.Create a cross-Region read replica in the secondary Region.
C.Take manual snapshots and copy them to the secondary Region daily.
D.Configure automated backups with a retention period of 35 days.
AnswerB

A cross-Region read replica can be promoted quickly, meeting RPO of 5 minutes and RTO of 1 hour.

Why this answer

A cross-Region read replica in the secondary Region meets the RPO of 5 minutes because replication from the primary RDS instance to the read replica is asynchronous but typically completes within seconds to a few minutes, well under the 5-minute threshold. In a disaster, promoting the read replica to a standalone instance can be done manually or automated, and the RTO of 1 hour is achievable because promotion takes only a few minutes, leaving ample time for DNS and application failover. This solution provides a continuous replication stream without manual intervention, unlike snapshot-based approaches.

Exam trap

The trap here is that candidates confuse Multi-AZ (high availability within a Region) with cross-Region disaster recovery, assuming Multi-AZ protects against Regional failures, but it only protects against Availability Zone failures within the same Region.

How to eliminate wrong answers

Option A is wrong because Multi-AZ deployment provides high availability within a single Region (us-east-1) by synchronously replicating to a standby in a different Availability Zone, but it does not protect against a Regional disaster, so it cannot meet the cross-Region recovery requirement. Option C is wrong because taking manual snapshots daily and copying them to the secondary Region results in an RPO of up to 24 hours, far exceeding the required 5 minutes, and the copy operation adds additional latency. Option D is wrong because automated backups with a retention period of 35 days are stored within the same Region and cannot be used for cross-Region recovery; they also do not provide a mechanism to restore in a secondary Region within the required RPO/RTO.

7
MCQmedium

A company runs a stateless web application on Amazon ECS with Fargate. The application must be highly available across multiple Availability Zones. What is the BEST way to achieve this?

A.Create an ECS service with tasks in multiple AZs and place an ALB in front.
B.Use an Auto Scaling group of EC2 instances in a single AZ and run ECS tasks on them.
C.Deploy a CloudFront distribution with multiple origins in different AZs.
D.Deploy a single ECS service with tasks in one AZ and use an ALB.
AnswerA

Multi-AZ tasks and ALB provide HA.

Why this answer

Creating an ECS service with tasks distributed across multiple AZs and using an ALB to distribute traffic ensures high availability.

8
MCQeasy

A company runs a stateless web application on EC2 instances in an Auto Scaling group across three Availability Zones. The application uses an Application Load Balancer. The operations team needs to ensure that the application remains available if one AZ fails. Which solution is MOST resilient?

A.Configure the Auto Scaling group to launch instances in a single Availability Zone with a desired capacity of 6.
B.Configure the Auto Scaling group to launch instances in two Availability Zones with a desired capacity of 4.
C.Configure the Auto Scaling group to launch instances in three Availability Zones with a desired capacity of 3.
D.Configure the Auto Scaling group to launch instances in two Availability Zones with a desired capacity of 6, all in one AZ.
AnswerC

Three AZs with at least one instance each ensures capacity remains in two AZs if one fails.

Why this answer

Option C is correct because distributing instances across three Availability Zones (AZs) with a desired capacity of 3 ensures that even if one AZ fails, the remaining two AZs still have at least 2 instances running, maintaining service capacity. The Application Load Balancer (ALB) automatically routes traffic away from the failed AZ, and the Auto Scaling group will replace lost instances in the healthy AZs, providing the highest resilience against a single-AZ failure.

Exam trap

The trap here is that candidates often think using two AZs is sufficient for high availability, but the question specifically asks for the 'MOST resilient' solution, and three AZs provide better fault isolation and recovery capacity than two, especially when the desired capacity is low.

How to eliminate wrong answers

Option A is wrong because launching all instances in a single AZ creates a single point of failure; if that AZ fails, all instances are lost and the application becomes unavailable. Option B is wrong because distributing instances across only two AZs with a desired capacity of 4 means that if one AZ fails, the remaining AZ may have only 2 instances (if evenly split), but the total capacity drops by 50%, and the Auto Scaling group cannot launch instances in the failed AZ, potentially leading to insufficient capacity. Option D is wrong because it configures instances in two AZs but places all 6 instances in one AZ, which is functionally identical to a single-AZ deployment and provides no resilience against an AZ failure.

9
MCQhard

A company runs a critical application on EC2 instances in an Auto Scaling group across three Availability Zones. The application uses an Amazon RDS Multi-AZ DB instance. During a recent incident, one Availability Zone experienced a complete failure. The application remained available, but performance degraded significantly. What is the most likely cause of the degradation?

A.The Route 53 health checks failed and directed traffic to another Region, increasing latency
B.The RDS DB instance failed over to a read replica in a different Region
C.The Auto Scaling group was configured to span only two Availability Zones, and the failed AZ contained a majority of the running instances
D.The EBS volumes in the failed AZ were not available, causing data loss
AnswerC

Loss of one AZ halves capacity if only two AZs used.

Why this answer

Option C is correct because if the Auto Scaling group is configured to span only two Availability Zones, the failure of one AZ would result in a disproportionate loss of capacity. Since the failed AZ contained a majority of the running instances, the remaining instances in the surviving AZ would be overloaded, causing significant performance degradation. The application remained available because the surviving instances and the Multi-AZ RDS instance continued to operate, but the reduced compute capacity led to degraded performance.

Exam trap

The trap here is that candidates may assume Multi-AZ RDS or Route 53 health checks are the primary cause of degradation, but the real issue is the Auto Scaling group's AZ configuration and the resulting imbalance in compute capacity after an AZ failure.

How to eliminate wrong answers

Option A is wrong because Route 53 health checks do not direct traffic to another Region in this scenario; they are used for DNS-based routing within the same Region or across Regions, and the question does not mention any cross-Region setup. Option B is wrong because an RDS Multi-AZ DB instance does not use a read replica for failover; it uses a synchronous standby replica in a different Availability Zone within the same Region, and failover would not cause performance degradation as it is automatic and transparent. Option D is wrong because EBS volumes in the failed AZ would not cause data loss for the application; the EC2 instances in that AZ would be terminated, but the Auto Scaling group would launch new instances in other AZs, and the RDS database remains available, so no data loss occurs.

10
Multi-Selectmedium

A company is deploying a serverless application using AWS Lambda, Amazon API Gateway, and Amazon DynamoDB. The application must be resilient to regional outages. Which THREE steps should the company take to achieve multi-Region resilience? (Choose THREE.)

Select 3 answers
A.Use Amazon CloudFront with multiple origins pointing to each Region's API Gateway.
B.Configure Route 53 with a failover routing policy to direct traffic to the secondary Region if the primary fails.
C.Use DynamoDB global tables to replicate data across Regions.
D.Deploy Lambda@Edge functions to handle requests at edge locations.
E.Deploy a second API Gateway and Lambda function in another Region.
AnswersB, C, E

Route 53 failover routing enables traffic redirection.

Why this answer

Option B is correct because Amazon Route 53 with a failover routing policy allows the company to route traffic to a secondary Region when health checks detect a failure in the primary Region. This provides DNS-level failover, which is a fundamental component of multi-Region resilience for HTTP-based applications.

Exam trap

The trap here is that candidates often confuse CloudFront's origin failover capability (which requires manual configuration of origin groups) with automatic multi-Region failover, or they mistakenly believe Lambda@Edge can serve as a full application backend across Regions, when in fact it is limited to edge processing and cannot replace regional Lambda deployments.

11
MCQhard

A company runs a stateless web application on AWS Lambda behind an Application Load Balancer (ALB). During a deployment, the team updates the Lambda function to a new version. Some users report seeing the old version of the application for several minutes after the deployment. What is the MOST likely cause?

A.The Lambda function versions are not immutable, causing a gradual rollout.
B.Lambda@Edge is overriding the function version at the edge locations.
C.Amazon CloudFront is caching the old response and has not been invalidated.
D.The ALB target group is still pointing to the old Lambda function version due to connection draining.
AnswerD

Connection draining and warm-up can cause ALB to serve old versions until all connections are drained.

Why this answer

Option B is correct because ALB has a warm-up effect and may keep old connections alive, causing traffic to old Lambda versions if the alias is not updated atomically. Option A is wrong because Lambda versions are immutable. Option C is wrong because CloudFront caching is unrelated to ALB.

Option D is wrong because Lambda@Edge is not involved in this setup.

12
MCQmedium

Refer to the exhibit. A DevOps engineer applies the IAM policy shown to an S3 bucket to enforce server-side encryption. However, users report that some uploads succeed without encryption. What is the most likely reason?

A.The resource ARN is incorrect; it should be the bucket ARN.
B.The policy only allows the action but does not deny actions that do not meet the condition.
C.The action should be s3:PutEncryptedObject instead of s3:PutObject.
D.The policy uses StringEquals instead of StringNotEquals.
AnswerB

Without an explicit Deny, other policies may allow uploads without encryption.

Why this answer

Option B is correct because the IAM policy shown only allows the s3:PutObject action when the encryption condition is met, but it does not include a Deny statement to explicitly block uploads that do not satisfy the condition. In AWS IAM, an Allow statement alone does not prevent actions that fail the condition; it simply grants permission when the condition is true. Without a corresponding Deny, users with other permissions (e.g., from a broader policy) can still upload objects without encryption, as the Allow does not override other effective allows.

Exam trap

The trap here is that candidates assume an Allow statement with a condition implicitly denies requests that don't meet the condition, but AWS IAM requires an explicit Deny to block non-compliant actions.

How to eliminate wrong answers

Option A is wrong because the resource ARN in the policy (arn:aws:s3:::example-bucket/*) is correct for object-level operations like s3:PutObject, which require the object ARN (bucket/*), not just the bucket ARN. Option C is wrong because s3:PutEncryptedObject is not a valid AWS S3 action; the correct action is s3:PutObject, and encryption is enforced via conditions, not a separate action. Option D is wrong because using StringEquals is appropriate here to require the encryption header to equal 'AES256'; StringNotEquals would incorrectly allow uploads that do not specify encryption or specify a different value.

13
Multi-Selectmedium

Which TWO actions can help protect against accidental deletion of an Amazon S3 bucket? (Select TWO.)

Select 2 answers
A.Enable versioning on the bucket.
B.Enable AWS CloudTrail to log delete events.
C.Enable MFA Delete on the bucket.
D.Configure a lifecycle policy to expire objects.
E.Add a bucket policy that explicitly denies the s3:DeleteBucket action.
AnswersC, E

Requires MFA to delete objects.

Why this answer

Options A (MFA Delete) and D (bucket policy denying s3:DeleteBucket) are correct. MFA Delete requires multi-factor authentication to delete objects. A bucket policy can deny the DeleteBucket action.

Option B is wrong because versioning does not prevent bucket deletion. Option C is wrong because lifecycle policies delete objects, not protect. Option E is wrong because CloudTrail is auditing, not prevention.

14
MCQeasy

A startup runs a web application on EC2 instances behind an Application Load Balancer. They want to improve resilience by distributing instances across multiple Availability Zones. Currently, all instances are in us-east-1a. They create a launch template and an Auto Scaling group with a desired capacity of 2. They configure the Auto Scaling group to use two subnets: one in us-east-1a and one in us-east-1b. However, after updating, all instances remain in us-east-1a. What is the most likely reason?

A.The instance type is not available in us-east-1b.
B.The Auto Scaling group's subnet configuration was not updated to include the new subnet.
C.The new subnet in us-east-1b has no route to the internet.
D.The launch template specifies a single subnet in us-east-1a.
AnswerB

The group must have both subnets associated to distribute instances.

Why this answer

Auto Scaling group distributes instances across subnets specified. If only one subnet is used, it may be because the launch template specifies only one subnet or the Auto Scaling group's subnets were not updated. The most common cause is that the Auto Scaling group's subnet list still contains only the original subnet.

15
MCQmedium

A company runs a microservices architecture on Amazon ECS with Fargate. Services communicate via an internal Application Load Balancer. Recently, one service became unavailable due to a memory leak, causing cascading failures in downstream services. What design change would MOST effectively improve resilience and limit the blast radius?

A.Increase the memory limit for each ECS task to accommodate memory leaks.
B.Implement circuit breaker patterns in the service discovery and client libraries to stop calling unhealthy services.
C.Enable connection draining on the ALB to allow in-flight requests to complete.
D.Implement automatic scaling policies for ECS services based on memory utilization.
AnswerB

Circuit breakers isolate failures and prevent cascading.

Why this answer

Option C is correct because circuit breakers prevent cascading failures by stopping calls to unhealthy services. Option A is wrong because increasing task memory may delay but not prevent the failure. Option B is wrong while helpful for scaling, it does not prevent requests from being sent to failing services.

Option D is wrong because connection draining only manages in-flight connections during deregistration, not cascading failures.

16
MCQhard

A company runs an application on EC2 with a shared Elastic IP. The instance fails and an engineer manually attaches the Elastic IP to a standby instance. To automate this failover, which service should be used?

A.Use an Auto Scaling group with a lifecycle hook
B.AWS Elastic Beanstalk
C.CloudWatch Events with a Lambda target
D.Configure a second Elastic IP
AnswerC

Can react to health check failures and reassign EIP.

Why this answer

AWS Route 53 health checks can trigger a Lambda function to reassociate the Elastic IP. Option A is wrong because it's a manual feature. Option B is wrong because CloudWatch cannot directly reassociate EIP.

Option D is wrong because ASG does not manage Elastic IPs.

17
Multi-Selecthard

A company runs a critical application on AWS that uses an Auto Scaling group of EC2 instances. The application must remain available even if an entire Availability Zone fails. Which THREE actions should the company take?

Select 3 answers
A.Configure an ALB health check to automatically replace unhealthy instances.
B.Use a single instance in each Availability Zone to minimize cost.
C.Use multiple subnets in each Availability Zone for the instances.
D.Configure the Auto Scaling group to launch instances in at least two Availability Zones.
E.Use an Elastic Load Balancer (ELB) to distribute traffic across the instances in different AZs.
AnswersA, D, E

Health checks ensure replacement of failed instances.

Why this answer

Options A, B, and D are correct. Using multiple AZs ensures AZ failure tolerance; an ELB distributes traffic across AZs; an ALB health check ensures unhealthy instances are replaced. Option C is wrong because a single instance in each AZ is not resilient to AZ failure of one AZ.

Option E is wrong because a single subnet per AZ is sufficient; multiple subnets per AZ are not required for resilience.

18
MCQeasy

A company runs a web application on EC2 instances behind an Application Load Balancer (ALB). The application stores session state in an RDS MySQL database. During a recent spike in traffic, the database CPU utilization reached 100%, causing slow responses. To improve resilience, what should a DevOps engineer do?

A.Migrate session state to Amazon ElastiCache for Memcached.
B.Increase the RDS instance size and enable Multi-AZ deployment.
C.Migrate session state to Amazon ElastiCache for Redis with replication.
D.Configure Auto Scaling groups for the EC2 instances based on CPU utilization.
AnswerC

Redis provides a resilient, in-memory session store that offloads the database and can handle high traffic with replication.

Why this answer

Option C is correct because Amazon ElastiCache for Redis provides a low-latency, in-memory session store that offloads the database, improving resilience under load. Option A (scaling RDS) only addresses database capacity, not offloading session state. Option B (ElastiCache Memcached) is possible but Redis is more commonly used for session state with persistence and replication.

Option D (Auto Scaling groups) does not address the database bottleneck.

19
Multi-Selecteasy

A startup runs a stateless web application on AWS Elastic Beanstalk with a single environment. The application uses an Amazon RDS for MySQL database instance. The startup is preparing for a marketing campaign that is expected to increase traffic by 10x. The CTO is concerned about the application's ability to handle the load and wants to ensure high availability and resilience. The current architecture has a single RDS instance (db.t3.medium) and a single Elastic Beanstalk environment with one EC2 instance (t3.medium). The startup has a limited budget but wants to improve resilience without over-provisioning. Which combination of actions should the DevOps engineer recommend? (Choose THREE.)

Select 3 answers
A.Add an Amazon ElastiCache cluster to cache frequent database queries.
B.Use dedicated instances for the EC2 instances to ensure consistent performance.
C.Switch the Elastic Beanstalk environment to a load-balanced, auto-scaled environment with a minimum of 2 instances across 2 Availability Zones.
D.Enable Multi-AZ deployment for the RDS instance to provide a standby in another AZ.
E.Add Amazon RDS Proxy in front of the RDS instance to handle connection pooling.
AnswersC, D, E

Provides compute resilience and scalability.

Why this answer

Option A is correct because Elastic Beanstalk can be configured for load-balanced, auto-scaled environments. Option B is correct because Multi-AZ RDS provides high availability for the database. Option C is correct because RDS Proxy helps manage connections efficiently during traffic spikes.

Option D (dedicated instances) is costly and not necessary. Option E (ElastiCache) is additional cost; focus on core resilience first.

20
MCQmedium

A company is building a serverless application using AWS Lambda, Amazon API Gateway, and Amazon DynamoDB. The application must be resilient to sudden spikes in traffic without manual intervention. Which combination of services should be used?

A.API Gateway with throttling, Lambda with reserved concurrency, and DynamoDB auto scaling.
B.API Gateway with usage plans, Lambda with provisioned concurrency, and DynamoDB on-demand.
C.API Gateway with WAF, Lambda with function URLs, and DynamoDB Accelerator (DAX).
D.API Gateway with caching, Lambda with no concurrency limits, and DynamoDB global tables.
AnswerA

Throttling prevents overload, reserved concurrency ensures Lambda capacity, auto scaling handles DB load.

Why this answer

API Gateway with throttling and Lambda with reserved concurrency help manage spikes. DynamoDB auto scaling adjusts capacity.

21
Multi-Selecteasy

A company wants to design a highly available web application using AWS services. The application must be resilient to the failure of an entire AWS Region. Which THREE components should the architecture include? (Choose THREE.)

Select 3 answers
A.An Application Load Balancer (ALB) deployed in one Region.
B.Amazon Route 53 with a failover routing policy.
C.Auto Scaling groups in each Region with appropriate instance types.
D.Amazon EC2 instances in a single Region.
E.Amazon RDS Multi-AZ deployment with a cross-Region read replica.
AnswersB, C, E

Failover routing directs traffic to a secondary Region if the primary fails.

Why this answer

Amazon Route 53 with a failover routing policy is correct because it enables DNS-based health checking and automatic traffic routing to a secondary region when the primary region becomes unavailable. This is essential for cross-region disaster recovery, as Route 53 can monitor endpoint health and update DNS records to direct users to the healthy region, ensuring application availability despite a full region failure.

Exam trap

The trap here is that candidates often confuse Multi-AZ deployments (which provide high availability within a single Region) with cross-Region disaster recovery, and they may incorrectly assume that a single-Region ALB or EC2 instances can survive a full Region failure without a multi-Region architecture.

22
MCQmedium

A company runs a critical web application on AWS using an Application Load Balancer (ALB) in front of an Auto Scaling group of EC2 instances. The application experiences periodic traffic spikes. To handle these spikes, the company wants to use a combination of proactive scaling based on a predictable schedule and reactive scaling based on CPU utilization. What is the MOST resilient scaling strategy?

A.Use a scheduled scaling policy for the predictable spikes and a step scaling policy for CPU utilization.
B.Use predictive scaling based on historical traffic patterns.
C.Use manual scaling by increasing the desired capacity before expected spikes.
D.Use a target tracking scaling policy based on average CPU utilization.
AnswerA

Combines proactive and reactive scaling for maximum resilience.

Why this answer

Option B is correct because it combines scheduled scaling for predictable traffic with dynamic scaling for reactive adjustments, ensuring both proactive and reactive resilience. Option A is wrong because target tracking alone may not respond quickly to sudden spikes. Option C is wrong because predictive scaling requires historical data and may not be accurate for new patterns.

Option D is wrong because manual scaling is not resilient for spikes.

23
MCQmedium

A DevOps team is designing a highly available multi-tier application on AWS. The application runs on EC2 instances in an Auto Scaling group across two Availability Zones. The team uses an Application Load Balancer (ALB) to distribute traffic. The application requires the ALB to be accessible via a single, static IP address for whitelisting by third-party partners. What is the most resilient solution?

A.Use a Network Load Balancer (NLB) with a static IP address and Route 53 weighted routing to multiple NLBs.
B.Use a Network Load Balancer (NLB) with an Elastic IP per Availability Zone and front it with AWS Global Accelerator.
C.Use a Network Load Balancer (NLB) with Elastic IP addresses attached to each subnet in each AZ.
D.Use an Application Load Balancer (ALB) with AWS Global Accelerator.
AnswerB

Global Accelerator provides two static IP addresses that act as a fixed entry point, routing traffic to the NLB endpoints in each AZ, offering high resilience and static IPs.

Why this answer

Option D is correct because using a Network Load Balancer (NLB) with static IPs in each AZ, fronted by a Global Accelerator, provides static IP addresses while leveraging the NLB's high availability and Global Accelerator's resilience. Option A (NLB with Elastic IPs) is simpler but requires managing IP failover. Option B (ALB with Global Accelerator) does not give static IPs directly; Global Accelerator provides static IPs but ALB's IPs change.

Option C (Route 53 weighted records) adds latency and complexity.

24
MCQmedium

A company uses AWS CloudFormation to deploy infrastructure. They want to ensure that if a stack update fails, the stack is automatically rolled back to the last known good state. However, they also want to preserve any resources that were created successfully before the failure. Which CloudFormation stack policy should be used?

A.Define a creation policy with a resource signal.
B.Use a stack policy that denies update actions on resources that should be preserved.
C.Use a stack policy that allows all actions except delete.
D.Set the RollbackConfiguration property with a monitoring time.
AnswerB

Stack policies can prevent modification of critical resources during stack updates.

Why this answer

Option C is correct because setting a stack policy that denies updates on specific resources can protect those resources during rollback. Option A is wrong because RollbackConfiguration is not a stack policy. Option B is wrong because stack policies define allowed actions, not rollback behavior.

Option D is wrong because stack policies are not tied to creation time.

25
MCQmedium

Refer to the exhibit. An IAM policy is attached to an IAM role used by an EC2 instance. The instance is part of an Auto Scaling group. During a scale-in event, the instance fails to stop itself. What is the MOST likely cause?

A.The policy allows ec2:StopInstances but not ec2:StartInstances.
B.The policy does not allow ec2:DescribeInstanceStatus.
C.The policy does not allow ec2:TerminateInstances.
D.The policy does not allow ec2:ModifyInstanceAttribute.
AnswerC

Auto Scaling uses TerminateInstances, not StopInstances.

Why this answer

The policy allows StopInstances but does not allow ec2:TerminateInstances, which is required for Auto Scaling to terminate instances during scale-in.

26
MCQhard

A company is building a global application that requires low-latency access to static content across multiple AWS Regions. The content changes infrequently. Which solution is MOST resilient and cost-effective?

A.Use Amazon S3 Transfer Acceleration
B.Set up a VPN to a single Region
C.Deploy EC2 instances in each Region with a global load balancer
D.Use Amazon CloudFront with an S3 bucket as origin
AnswerD

CloudFront caches content at edge locations, improving latency and resilience.

Why this answer

CloudFront with an S3 origin provides global edge caching, low latency, and high resilience at low cost.

27
MCQhard

Refer to the exhibit. An IAM policy is attached to a user. A developer tries to upload an object to s3://my-bucket/confidential/report.pdf without specifying server-side encryption. What will happen?

A.The upload succeeds because the Allow statement grants PutObject.
B.The upload succeeds because the Deny condition uses a wrong condition key.
C.The upload fails because the Deny statement requires SSE-KMS.
D.The upload fails only if the object name matches the prefix.
AnswerC

The Deny condition requires SSE-KMS; without it, the request is denied.

Why this answer

Option B is correct because the Deny statement explicitly denies PutObject if SSE-KMS is not used. Since the developer did not specify SSE, the Deny applies and the request fails. Option A is wrong because the Allow statement does not override the Deny.

Option C is wrong because the Deny uses s3:x-amz-server-side-encryption condition, not kms:EncryptionContext. Option D is wrong because the Deny is not limited to AES256.

28
MCQhard

A company uses DynamoDB global tables in two AWS Regions with strong consistency reads. They observe occasional write conflicts that are not being resolved automatically. The application uses DynamoDBMapper with optimistic locking. What should the DevOps engineer do to ensure conflict resolution?

A.Implement a custom conflict resolution using DynamoDB Streams and AWS Lambda.
B.Switch to eventual consistency reads to reduce conflicts.
C.Add a third global table region to increase redundancy.
D.Use conditional writes with a version number attribute to ensure updates are applied only to the latest version.
AnswerD

Conditional writes with versioning enable optimistic locking, allowing only the latest version to be updated, which aligns with LWW.

Why this answer

Option D is correct because DynamoDB global tables use last-writer-wins (LWW) for conflict resolution by default, but when using DynamoDBMapper with optimistic locking, the application must implement conditional writes with a version number attribute to ensure updates are applied only to the latest version. This prevents stale updates from overwriting newer data, as the conditional write will fail if the version number in the request does not match the current version in the table, allowing the application to retry with the updated version.

Exam trap

The trap here is that candidates may think DynamoDB global tables automatically resolve all conflicts, but they overlook that DynamoDBMapper's optimistic locking requires explicit conditional writes with a version attribute to prevent lost updates in multi-region scenarios.

How to eliminate wrong answers

Option A is wrong because DynamoDB Streams and Lambda can be used for custom conflict resolution, but the question states that the application uses DynamoDBMapper with optimistic locking, which already provides a built-in mechanism (conditional writes with versioning) that should be used instead of introducing unnecessary complexity. Option B is wrong because switching to eventual consistency reads does not resolve write conflicts; it only reduces the likelihood of reading stale data, but conflicts still occur and must be handled at the write level. Option C is wrong because adding a third region does not resolve write conflicts; it increases the number of regions where concurrent writes can occur, potentially increasing conflict frequency, and global tables still rely on LWW or conditional writes for conflict resolution.

29
MCQhard

An AWS account owner (Account A) owns an S3 bucket named my-bucket. The bucket policy shown in the exhibit is attached to the bucket. A user from Account B attempts to upload an object to the bucket without specifying the x-amz-acl header. What will happen?

A.The upload fails because the bucket policy requires the object ACL to be set, but the default ACL allows the upload anyway.
B.The upload succeeds because the bucket policy does not explicitly deny the request.
C.The upload succeeds because the bucket policy allows s3:PutObject for any principal.
D.The upload fails because the bucket policy requires the x-amz-acl header to be set to bucket-owner-full-control.
AnswerD

Without the header, the condition fails.

Why this answer

Option D is correct. The condition requires the x-amz-acl header to be set to bucket-owner-full-control. If the header is not specified, the condition fails, and the request is denied.

Option A is wrong because the condition is not met. Option B is wrong because the policy does not grant permission without the header. Option C is wrong because the bucket policy evaluates before the object ACL.

30
MCQhard

A company runs a stateful application on EC2 instances with instance store volumes. The application requires low-latency access to data. The operations team needs to ensure that instance failure does not result in data loss. Which solution is MOST resilient?

A.Use instance store volumes with RAID 1 across multiple instances.
B.Replicate data in real time to an EBS volume and take periodic snapshots.
C.Use larger instance types with more instance store capacity.
D.Create an AMI of the instance periodically to capture the data.
AnswerB

EBS provides persistent storage and snapshots enable recovery.

Why this answer

Option D is correct because replicating data to EBS and taking snapshots provides persistent storage and disaster recovery. Option A is wrong because instance store volumes are ephemeral. Option B is wrong because increasing instance size does not prevent data loss.

Option C is wrong because an AMI does not capture data from instance store volumes.

31
MCQmedium

A company runs a critical application on Amazon ECS with Fargate launch type. The application is deployed across multiple Availability Zones. The DevOps team needs to ensure that if an entire Availability Zone fails, the application continues to serve traffic without manual intervention. What should the team do?

A.Use an Amazon ECS service auto-scaling policy to automatically replace tasks in the failed AZ.
B.Configure the ALB to enable cross-zone load balancing and enable the ECS service's AZ rebalancing feature.
C.Configure the ECS service to run tasks in at least two Availability Zones and enable the ECS service auto-recovery feature.
D.Set the ECS service's minimum healthy percent to 100 and maximum percent to 200.
AnswerC

Multi-AZ deployment plus auto-recovery ensures resilience.

Why this answer

Option C is correct because the ECS service's AZ rebalancing feature automatically redistributes tasks across Availability Zones when an imbalance is detected, such as after an AZ failure. By configuring the service to run tasks in at least two AZs and enabling this feature, the ECS service will automatically launch replacement tasks in the remaining healthy AZs to maintain the desired count, ensuring continued traffic serving without manual intervention.

Exam trap

The trap here is that candidates often confuse auto-scaling (which adjusts capacity based on demand) with AZ rebalancing (which redistributes tasks after an AZ failure), leading them to choose Option A or B, or they mistakenly think deployment configuration settings like minimum/maximum percent (Option D) can handle AZ failures.

How to eliminate wrong answers

Option A is wrong because ECS service auto-scaling policies adjust the desired task count based on metrics like CPU or memory, but they do not automatically replace tasks lost due to an AZ failure; they only scale based on demand, not availability. Option B is wrong because ALB cross-zone load balancing distributes traffic across all AZs but does not replace failed tasks; the ECS service's AZ rebalancing feature is the correct mechanism for task redistribution after an AZ failure. Option D is wrong because setting minimum healthy percent to 100 and maximum percent to 200 controls deployment behavior (e.g., rolling updates) but does not address AZ failure recovery; it prevents task replacement during deployments but does not trigger automatic task redistribution after an AZ outage.

32
MCQeasy

A company is using Amazon RDS for MySQL with Multi-AZ deployment. During a recent failover, the application experienced a brief downtime because the DNS cache on the application servers still pointed to the old primary. How can a DevOps engineer minimize this downtime?

A.Use an RDS Proxy to manage connections and reduce DNS dependency.
B.Configure the application to use the Multi-AZ endpoint instead of the primary endpoint.
C.Configure application servers to use a hardcoded IP address instead of the RDS endpoint.
D.Increase the TTL on the RDS DNS record.
AnswerA

RDS Proxy provides a stable endpoint and handles failover transparently.

Why this answer

Option C is correct because using an RDS proxy provides a stable endpoint and connection pooling, reducing failover impact. Option A is wrong because increasing TTL increases the caching duration, worsening the issue. Option B is wrong because the CNAME is automatically updated by RDS, but DNS propagation takes time.

Option D is wrong because there is no dedicated endpoint for Multi-AZ; the same CNAME is used.

33
MCQmedium

A company uses Amazon RDS Multi-AZ for disaster recovery. The primary DB instance in us-east-1a fails. What happens next?

A.The standby DB instance in us-east-1b is promoted automatically and the CNAME record is updated
B.The administrator must manually promote the standby instance
C.The primary instance is automatically rebuilt in the same AZ
D.A read replica in us-east-1b is automatically promoted to primary
AnswerA

RDS Multi-AZ performs automatic failover.

Why this answer

RDS Multi-AZ automatically fails over to the standby in a different Availability Zone within minutes. Option A is wrong because no manual intervention is needed. Option B is wrong because the CNAME automatically updates.

Option D is wrong because Multi-AZ provides automatic failover.

34
MCQeasy

A company runs a stateless web application on EC2 instances behind an Application Load Balancer. To improve resilience, which configuration should be used for the EC2 instances?

A.Use one EC2 instance with a larger instance type
B.Use a single, large EC2 instance in one Availability Zone
C.Use multiple EC2 instances in one Availability Zone with health checks disabled
D.Use multiple EC2 instances across two or more Availability Zones
AnswerD

Provides fault tolerance across AZs.

Why this answer

D is correct because deploying multiple EC2 instances across two or more Availability Zones (AZs) ensures high availability and fault tolerance. If one AZ fails, the Application Load Balancer (ALB) automatically routes traffic to healthy instances in other AZs, maintaining service continuity. This aligns with the AWS Well-Architected Framework's resilience best practices for stateless applications.

Exam trap

The trap here is that candidates may think scaling vertically (larger instance) or using multiple instances in a single AZ is sufficient, but the DOP-C02 exam specifically tests the requirement for multi-AZ deployment to achieve resilience against AZ failures.

How to eliminate wrong answers

Option A is wrong because using a single, larger EC2 instance creates a single point of failure; if that instance fails, the entire application goes down. Option B is wrong because placing a single large instance in one AZ does not protect against AZ-level failures, such as power outages or network disruptions. Option C is wrong because using multiple instances in one AZ with health checks disabled means the ALB cannot detect and route away from failed instances, and a single AZ failure still takes down all instances.

35
Multi-Selecthard

A company runs a critical application on AWS Lambda functions that process real-time streaming data from Amazon Kinesis Data Streams. Each Lambda function processes a batch of records and writes results to an Amazon DynamoDB table. The application is sensitive to data loss and requires exactly-once processing semantics. Recently, the operations team observed that the Lambda function is failing intermittently with 'ProvisionedThroughputExceededException' errors from DynamoDB. The Lambda function's batch size is 100, and the function is configured with a reserved concurrency of 500. The DynamoDB table has 100 read capacity units (RCUs) and 100 write capacity units (WCUs) with auto scaling enabled up to 1000 WCUs. The function's execution role has the necessary DynamoDB permissions. The Kinesis stream has 10 shards. The DevOps engineer needs to resolve the throttling errors without losing data. Which combination of actions should the engineer take? (Choose THREE.)

Select 3 answers
A.Set the Lambda function's batch size to a lower value (e.g., 10) and enable parallelization factor per shard.
B.Increase the DynamoDB table's read capacity units to 1000.
C.Configure the Lambda function event source mapping to retry with a maximum retry count and set the function to not discard failed records.
D.Increase the Lambda function's reserved concurrency to 1000.
E.Increase the DynamoDB table's write capacity units maximum auto scaling limit to 5000.
AnswersA, C, E

Reduces the number of concurrent writes per shard, decreasing throttling.

Why this answer

Option A is correct because Kinesis supports retries and can reprocess records after failures, ensuring no data loss. Option B is correct because mapping iterator age and using a lower batch size reduces the number of concurrent writes. Option E is correct because enabling DynamoDB auto scaling with a higher maximum WCU ensures the table can handle bursts.

Option C (increasing RCU) is not needed for writes. Option D (increasing Lambda concurrency) may increase throttle.

36
MCQeasy

A company wants to ensure that its Amazon S3 bucket can withstand the loss of an entire AWS Availability Zone. Which configuration meets this requirement?

A.Use the S3 Standard storage class.
B.Configure cross-Region replication to another bucket.
C.Enable S3 Versioning on the bucket.
D.Use the S3 One Zone-IA storage class.
AnswerA

S3 Standard automatically stores data in at least three AZs.

Why this answer

Option C is correct because S3 Standard storage class automatically replicates data across at least three AZs. Option A is wrong because versioning does not provide AZ resilience. Option B is wrong because cross-Region replication is for geographic resilience, not AZ.

Option D is wrong because S3 One Zone-IA stores data in a single AZ.

37
MCQmedium

A company's application runs on Amazon ECS with Fargate launch type. The application must be resilient to an Availability Zone failure. Which configuration should be used?

A.Create an ECS service with tasks distributed across multiple Availability Zones using a spread placement strategy
B.Use an ECS cluster with a cluster placement strategy that prefers the same Availability Zone
C.Define multiple task definitions, one for each Availability Zone
D.Use an ECS service with a single task in one Availability Zone and rely on auto-scaling
AnswerA

Spread strategy across AZs ensures resilience.

Why this answer

Option A is correct because ECS services using the Fargate launch type can distribute tasks across multiple Availability Zones (AZs) by defining a spread placement strategy with the 'availabilityZone' dimension. This ensures that if one AZ fails, the tasks in the other AZs continue to serve traffic, providing resilience to an AZ failure. The spread strategy explicitly instructs ECS to place tasks evenly across AZs, which is essential for high availability.

Exam trap

The trap here is that candidates often confuse 'spread placement strategy' with 'binpack' or 'random' strategies, or they assume that simply using multiple subnets automatically distributes tasks without explicitly setting the spread strategy.

How to eliminate wrong answers

Option B is wrong because a cluster placement strategy that prefers the same Availability Zone would concentrate tasks in a single AZ, creating a single point of failure and violating the requirement for AZ resilience. Option C is wrong because defining multiple task definitions, one for each AZ, is unnecessary and does not inherently distribute tasks across AZs; task definitions are templates for containers, not placement mechanisms, and ECS services handle AZ distribution via placement strategies. Option D is wrong because a single task in one AZ cannot provide resilience to an AZ failure—if that AZ fails, the application becomes unavailable, and auto-scaling cannot react quickly enough to prevent downtime during an AZ outage.

38
MCQmedium

A company runs a microservices application on Amazon ECS with Fargate. The application uses an Application Load Balancer (ALB) to route traffic to services. Each service has a required number of tasks for capacity. The company recently experienced a prolonged outage when a bug caused all tasks of the critical 'payment' service to crash simultaneously. The DevOps team needs to implement a deployment strategy that reduces the risk of a full service outage during updates. The strategy must also allow for quick rollback if a deployment fails. Which deployment strategy should the team implement?

A.Implement a rolling update with a fixed number of tasks to replace at a time.
B.Use a canary deployment by creating a new service with a small number of tasks, test, then shift all traffic.
C.Deploy changes during maintenance windows with manual approval steps.
D.Implement blue/green deployment using ECS with target tracking alarms to automate traffic shifting.
AnswerD

Blue/green with automated traffic shifting and rollback capability.

Why this answer

Blue/green deployment with target tracking allows you to gradually shift traffic to the new version while monitoring. If issues arise, you can instantly rollback by switching traffic back to the old version.

39
Multi-Selecthard

A company runs a critical application on AWS using Amazon EC2 instances in an Auto Scaling group, an Application Load Balancer (ALB), and an Amazon RDS for PostgreSQL Multi-AZ DB cluster. The application must maintain an RTO of 5 minutes and an RPO of 1 second for database transactions. The current setup meets these requirements, but the DevOps team wants to improve the resilience of the application tier to withstand a regional failure. Which THREE actions should be taken? (Choose three.)

Select 3 answers
A.Replace the RDS Multi-AZ cluster with Amazon Aurora Global Database to replicate data across regions.
B.Use an active-passive architecture with a second Auto Scaling group and ALB in another region.
C.Use Amazon EFS Replication to replicate application data across regions with a recovery point objective (RPO) of 1 second.
D.Extend the existing Auto Scaling group to launch instances in two regions by specifying a second region in the launch template.
E.Set up Amazon Route 53 with health checks and failover routing policy to direct traffic to the secondary region if the primary fails.
AnswersA, B, E

Aurora Global Database provides cross-region replication with low RPO.

Why this answer

Amazon Aurora Global Database is the correct choice because it provides cross-region replication with a typical RPO of 1 second and RTO of 1 minute, meeting the stated requirements. Unlike standard RDS Multi-AZ, which is limited to a single region, Aurora Global Database replicates data asynchronously across multiple regions with minimal lag, ensuring the database tier can survive a regional failure while maintaining the required RPO of 1 second.

Exam trap

The trap here is that candidates often confuse Multi-AZ with cross-region disaster recovery, assuming Multi-AZ alone provides regional failover, when in fact it only protects against Availability Zone failures within a single region.

40
Multi-Selectmedium

A company is designing a disaster recovery (DR) strategy for a critical application that runs on EC2 instances with an RDS database. The DR site must be in a different AWS Region. The Recovery Point Objective (RPO) is 15 minutes, and Recovery Time Objective (RTO) is 1 hour. Which TWO actions should the company take to meet these objectives? (Choose TWO.)

Select 2 answers
A.Use AWS Backup to copy EC2 AMIs and RDS snapshots to the DR region every 15 minutes.
B.Use AWS CloudFormation to pre-provision resources in the DR region manually.
C.Configure Amazon Route 53 with health checks and failover routing to the DR region.
D.Create an RDS cross-Region read replica in the DR region.
E.Configure S3 cross-Region replication for application data stored in S3.
AnswersC, D

Route 53 can automatically redirect traffic, meeting RTO.

Why this answer

Options C and D are correct. C: RDS cross-Region read replicas can be promoted to become the primary in the DR region, meeting the RPO with minimal data loss. D: Amazon Route 53 health checks and failover routing can redirect traffic to the DR region within the RTO.

A is wrong because AMI copying and instance launch can exceed 1 hour. B is wrong because S3 cross-Region replication does not help with EC2 and RDS. E is wrong because CloudFormation alone does not automatically failover.

41
Multi-Selectmedium

A company is designing a disaster recovery plan for an application running on AWS. The plan must meet an RTO of 1 hour and an RPO of 15 minutes. Which TWO strategies can achieve these objectives? (Select TWO.)

Select 2 answers
A.Backup and restore using daily snapshots to a different Region
B.Warm standby in a different AWS Region with database replication
C.Cold standby in a different Region with infrastructure deployed on demand
D.Pilot light in a different Region with database replication
E.Multi-AZ deployment in the same Region
AnswersB, D

Can meet RTO 1 hr and RPO 15 min.

Why this answer

Option B (Warm standby) is correct because it maintains a scaled-down but fully functional copy of the production environment in a different AWS Region, with database replication (e.g., Amazon RDS cross-Region read replicas or Aurora Global Database) ensuring an RPO of 15 minutes or less. The standby infrastructure can be scaled up within the 1-hour RTO, as it is already running and configured.

Exam trap

The trap here is that candidates often confuse Multi-AZ deployments (which are high availability within a Region) with cross-Region disaster recovery, failing to recognize that Multi-AZ does not protect against a full Regional outage.

42
Multi-Selecthard

A company is designing a disaster recovery plan for a critical application with an RPO of 15 minutes and RTO of 1 hour. The application runs on EC2 instances with an RDS MySQL database. The primary Region is us-east-1. Which THREE actions should they take to meet the RPO and RTO? (Choose three.)

Select 3 answers
A.Schedule automated AMI backups of EC2 instances every 15 minutes
B.Launch EC2 instances in a single Availability Zone in the secondary Region to reduce costs
C.Configure Route 53 health checks and DNS failover to the secondary Region
D.Create a cross-Region read replica of the RDS MySQL database in us-west-2
E.Use AWS CloudFormation StackSets to deploy identical infrastructure in the secondary Region
AnswersA, C, D

Quick recovery of EC2 instances.

Why this answer

Option A is correct because automated AMI backups of EC2 instances every 15 minutes align with the 15-minute RPO by capturing incremental snapshots of the instance volumes. These AMIs can be used to launch replacement EC2 instances in the secondary Region within the 1-hour RTO, provided the infrastructure is pre-staged. The frequency of 15 minutes ensures that data loss is limited to at most 15 minutes of changes.

Exam trap

The trap here is that candidates often confuse infrastructure-as-code deployment (CloudFormation StackSets) with actual data replication, mistakenly believing that deploying identical infrastructure alone satisfies the RPO, when in fact continuous database replication is required to meet the 15-minute RPO.

43
Multi-Selecteasy

A company is designing a disaster recovery strategy for its application. The application runs on EC2 instances and uses an RDS MySQL database. The RTO is 1 hour, and the RPO is 15 minutes. Which TWO approaches meet these requirements?

Select 2 answers
A.Use a warm standby strategy: run a scaled-down version of the application in the DR region with RDS Multi-AZ across regions.
B.Use a pilot light strategy: replicate data using RDS cross-region automated backups and have a small environment running in the DR region.
C.Use a read replica in the DR region and promote it on failover.
D.Use a Multi-Zone deployment with RDS in the same region.
E.Use a backup and restore strategy: take snapshots every hour and restore in the DR region on failover.
AnswersA, B

Warm standby with cross-region replication meets RPO and RTO.

Why this answer

Options A and C are correct. A pilot light with RDS cross-region replication can achieve low RPO and RTO. A warm standby with Multi-AZ in another region also meets the targets.

Option B is wrong because backup and restore from snapshots may exceed RTO. Option D is wrong because Multi-Zone is not an AWS term; Multi-AZ in same region does not protect against region failure. Option E is wrong because read replicas do not provide automatic failover.

44
MCQhard

A company runs a microservices architecture on Amazon ECS with Fargate. Each service is deployed in its own ECS service. The company wants to ensure that if one Availability Zone (AZ) fails, the services can continue to operate with minimal impact. What is the MOST resilient task placement strategy?

A.Use a task placement constraint to run tasks on distinct instances.
B.Use a task placement strategy that uses the random algorithm.
C.Use a task placement strategy that uses the binpack algorithm to maximize resource utilization.
D.Use a task placement strategy that spreads tasks across Availability Zones.
AnswerD

Spreading across AZs ensures high availability even if one AZ fails.

Why this answer

Option A is correct because setting spread across AZ ensures tasks are distributed evenly across AZs, and binpack with a spread across AZ is not possible; spread is the correct strategy. Option B is wrong because binpack does not guarantee AZ distribution. Option C is wrong because random may not distribute evenly.

Option D is wrong because distinct instance is for EC2, not Fargate.

45
MCQmedium

A company uses AlB with target groups for a microservices architecture. They need to ensure that if a target group has no healthy targets, the ALB returns a custom error page instead of a 503. How can this be achieved?

A.Configure a Lambda@Edge function to replace the 503 response.
B.Enable the ALB's custom error response and specify the error page.
C.Use Amazon CloudFront to cache a custom error page.
D.Use the ALB's fixed response action to return a custom error page.
AnswerB

ALB supports custom error responses for 503 errors.

Why this answer

Option C is correct because ALB's custom error response feature allows configuring a custom error page for specific error codes. Option A is wrong because CloudFront can return custom errors but doesn't integrate directly with ALB for this purpose without being in front. Option B is wrong because Lambda@Edge is for CloudFront, not ALB.

Option D is wrong because a fixed response action returns a static response but cannot serve a custom error page from an S3 bucket.

46
MCQeasy

A company runs a critical application on Amazon EC2 instances in an Auto Scaling group. To ensure high availability, the instances are deployed across three Availability Zones. Which additional step should the company take to protect against a regional failure?

A.Place all instances in a single Availability Zone to simplify management.
B.Use EC2 Dedicated Hosts to ensure capacity.
C.Increase the minimum size of the Auto Scaling group to 10 instances.
D.Deploy the application in a second AWS Region and use Route 53 with failover routing.
AnswerD

Multi-Region deployment with DNS failover protects against region failure.

Why this answer

Option B is correct because deploying resources in multiple AWS Regions protects against a regional outage. Option A is wrong because placing instances in the same AZ reduces resilience. Option C is wrong because increasing minimum instances does not protect against region failure.

Option D is wrong because more instance types do not affect regional resilience.

47
Multi-Selectmedium

A company is designing a highly available architecture for a web application using AWS services. The application must be resilient to the failure of an entire AWS Region. Which TWO strategies should the company implement? (Choose TWO.)

Select 2 answers
A.Deploy the application in multiple AWS Regions and use Route 53 with failover routing policy.
B.Use Amazon CloudFront with multiple origins in the same region.
C.Enable S3 cross-Region replication for static assets.
D.Configure Amazon RDS for Multi-AZ and enable cross-Region read replicas.
E.Use Auto Scaling groups in a single region with multiple Availability Zones.
AnswersA, D

Multi-Region deployment with DNS failover is a key strategy for regional resilience.

Why this answer

Option A is correct because deploying to multiple regions with Route 53 failover provides cross-region disaster recovery. Option D is correct because using Amazon RDS Multi-AZ with cross-Region read replicas or Aurora Global Database ensures database resilience across regions. Option B is wrong because CloudFront alone does not provide compute failover.

Option C is wrong because S3 cross-Region replication is for data, not compute. Option E is wrong because single-region Auto Scaling does not protect against region failure.

48
Multi-Selecteasy

A company is designing a resilient storage solution for a critical application. The data must be highly available and durable. Which TWO services meet these requirements? (Choose TWO.)

Select 2 answers
A.Amazon EFS
B.Amazon EBS with RAID 1
C.Amazon S3 Glacier Deep Archive
D.Amazon S3
E.Amazon EC2 instance store
AnswersB, D

RAID 1 mirrors data across multiple EBS volumes for redundancy.

Why this answer

Options B and D are correct. B: Amazon S3 provides 99.999999999% durability and is highly available. D: Amazon EBS volumes in a RAID 1 configuration provide redundancy within an AZ.

A is wrong because instance store is ephemeral. C is wrong because EFS is a file system, not necessarily the highest durability. E is wrong because S3 Glacier is for archival, not high availability.

49
Multi-Selecteasy

A company is deploying a web application on Amazon ECS with Fargate. The application consists of a frontend service and a backend service. The DevOps team needs to ensure that the frontend service can communicate with the backend service securely without exposing the backend to the internet. Which THREE steps should the team take? (Choose THREE.)

Select 3 answers
A.Deploy the backend service in a private subnet with no internet access.
B.Use AWS Cloud Map service discovery for the backend service.
C.Configure a security group for the backend service that allows inbound traffic only from the frontend service's security group.
D.Deploy the backend service in a public subnet with an internet-facing Application Load Balancer.
E.Use an internet-facing Network Load Balancer for the backend service.
AnswersA, B, C

Keeps the backend isolated.

Why this answer

Option A is correct because deploying the backend service in a private subnet with no internet access ensures that the backend is not reachable from the internet, which is a fundamental security requirement. In Amazon ECS with Fargate, tasks in a private subnet use an elastic network interface (ENI) with no public IP address, and outbound traffic can be routed through a NAT gateway if needed, but inbound traffic from the internet is blocked. This isolates the backend from direct external exposure while still allowing communication from the frontend service within the same VPC.

Exam trap

The trap here is that candidates might think a load balancer is required for service-to-service communication in ECS, but AWS Cloud Map service discovery combined with security group rules can achieve secure, direct communication without exposing the backend to the internet.

50
MCQhard

A company runs a critical application on EC2 instances in an Auto Scaling group behind an ALB. They want to ensure that if an instance fails, the application remains available with minimal disruption. Which combination of services provides the best resilience?

A.Auto Scaling group with minimum 2 in a single AZ.
B.EC2 instance recovery with CloudWatch alarms.
C.Auto Scaling group with desired capacity of 2 and a lifecycle hook.
D.Auto Scaling group with ELB health checks and multiple AZs.
AnswerD

This ensures automatic replacement and distribution across AZs.

Why this answer

Option D is correct because the combination of Auto Scaling to replace instances, health checks to detect failure, and Multi-AZ deployment ensures availability. Option A is incorrect because it lacks health checks. Option B is incorrect because it uses Auto Scaling in a single AZ.

Option C is incorrect because it uses EC2 recovery, which does not replace the instance automatically if it's terminated.

51
MCQeasy

A company uses AWS Lambda for processing events from Amazon S3. Recently, the Lambda function started timing out after the 15-minute limit for some large files. The function downloads the entire file to /tmp before processing. What should a DevOps engineer do to resolve this issue with minimal code changes?

A.Use S3 Select to filter and retrieve only necessary data, reducing file size
B.Switch the Lambda runtime from Python to Node.js for faster execution
C.Increase the Lambda function memory to 10,240 MB to improve CPU performance
D.Modify the function to read the file in streaming chunks from S3
AnswerA

S3 Select allows retrieving only required columns, reducing data transfer and processing time.

Why this answer

Splitting the file into smaller parts using S3 Select allows processing within the Lambda timeout without changing the overall architecture. Option A is wrong because increasing memory does not increase timeout beyond 15 minutes. Option C is wrong because reading the file in chunks from S3 still requires processing within the Lambda timeout.

Option D is wrong because moving to a different runtime does not remove the timeout limit.

52
MCQmedium

A company runs a critical web application on EC2 instances behind an Application Load Balancer (ALB) with Auto Scaling. During a recent traffic spike, the application became unavailable for 10 minutes. Analysis shows that the ALB's healthy host count dropped to zero because the instances failed health checks due to high CPU load. What is the MOST effective design change to improve resilience during future traffic spikes?

A.Use predictive scaling with a scheduled scaling policy for known peak times.
B.Increase the instance size to handle higher load.
C.Configure step scaling policies based on CPU utilization.
D.Set a higher CPU threshold for health checks.
AnswerA

Predictive scaling anticipates demand and scales out in advance, preventing overload.

Why this answer

Predictive scaling uses historical traffic data to forecast future demand and proactively adjust capacity before a spike occurs. This prevents the CPU from reaching critical levels that cause health check failures, ensuring the ALB always has healthy hosts. Scheduled scaling alone would not adapt to unexpected spikes, but predictive scaling combined with dynamic scaling provides both proactive and reactive resilience.

Exam trap

The trap here is that candidates confuse reactive scaling (step/target tracking) with proactive scaling (predictive/scheduled), assuming any CPU-based policy will suffice, but the question explicitly states the spike caused a drop to zero healthy hosts—meaning reactive scaling was too slow to prevent the outage.

How to eliminate wrong answers

Option B is wrong because simply increasing instance size (vertical scaling) is a single-point-of-failure approach and does not address the root cause of insufficient capacity during spikes; it also increases cost without improving elasticity. Option C is wrong because step scaling policies based on CPU utilization are reactive—they only add instances after CPU is already high, which can lead to a lag that causes health check failures during rapid spikes. Option D is wrong because raising the CPU threshold for health checks masks the underlying performance issue and risks allowing unhealthy instances to serve traffic, degrading user experience and potentially causing cascading failures.

53
MCQmedium

A DevOps engineer runs the above command and sees that instance i-0abcd1234efgh5678 is unhealthy with reason 'Target.Timeout'. The instance is running and the application on port 80 responds to curl from the instance itself. What is the MOST likely cause?

A.The ALB health check interval is set too high.
B.The web server process is not running on the instance.
C.The health check path returns a 404 status code.
D.The security group for the instance does not allow inbound traffic from the ALB on port 80.
AnswerD

A timeout typically indicates a network connectivity issue between ALB and instance.

Why this answer

Option C is correct because a timeout suggests the ALB cannot reach the instance, likely due to a security group blocking traffic from the ALB. Option A is wrong if the application responds locally, the process is running. Option B is wrong because a missing index.html would cause HTTP 404, not timeout.

Option D is wrong because the health check path is not specified; the default is '/' which typically returns something.

54
MCQmedium

An IAM policy is attached to an S3 bucket to allow access from a specific VPC CIDR range. However, users from the VPC are receiving 'Access Denied' errors when trying to access objects in the bucket. What is the MOST likely reason?

A.The users are assuming an IAM role that does not have permission to access S3
B.The condition key 'aws:SourceIp' evaluates the public IP address, but the VPC uses private IP addresses
C.The policy should use 'aws:sourceVpce' instead of 'aws:SourceIp' to restrict access to a VPC endpoint
D.The bucket policy requires HTTPS and the requests are using HTTP
AnswerB

'aws:SourceIp' checks the public IP of the client, not the private IP.

Why this answer

The condition uses 'aws:SourceIp' which checks the public IP of the request, not the private IP. Since the VPC uses private IPs, the condition fails. Option B is wrong because the policy does not specify sourceVpce.

Option C is wrong because the policy does not enforce HTTPS. Option D is wrong because the IAM role does not affect the source IP check.

55
MCQhard

A financial services company runs a multi-region application on AWS. They need to ensure that if one AWS Region becomes unavailable, traffic is automatically rerouted to another region with no manual intervention. The application uses an Application Load Balancer in each region. What is the MOST resilient approach to meet this requirement?

A.Use Amazon CloudFront with origins in each region and configure origin failover.
B.Use a Network Load Balancer in each region and configure Route 53 with failover routing.
C.Use Route 53 with latency-based routing and health checks on the ALB endpoints.
D.Use AWS Global Accelerator with endpoint groups in each region and health checks.
AnswerD

Global Accelerator provides automatic failover across regions using health checks and traffic dials.

Why this answer

Option C is correct because Amazon Route 53 with a latency-based routing policy and health checks can automatically route traffic to healthy regions. Option A is wrong because Route 53 can do this. Option B is wrong because Global Accelerator provides static IPs and traffic management but does not automatically fail over based on health checks alone unless configured with endpoint weights.

Option D is wrong because CloudFront primarily accelerates content delivery, not dynamic failover.

56
MCQhard

A company is designing a disaster recovery (DR) strategy for a stateless web application deployed on Amazon ECS with Fargate. The application is fronted by an Application Load Balancer (ALB) and uses Amazon ElastiCache for Redis for session state. The primary region is us-east-1. The DR plan requires a Recovery Point Objective (RPO) of 15 minutes and a Recovery Time Objective (RTO) of 30 minutes. Which solution meets these requirements with the LEAST operational overhead?

A.Deploy an ALB with a warm standby ECS service in us-west-2. Use Route 53 health checks to route traffic to the secondary region if primary fails. Use ElastiCache Global Datastore for Redis to replicate data across regions.
B.Deploy an Active-Active configuration across two AWS regions using Route 53 latency routing. Use ElastiCache for Redis Global Datastore with multi-region writes.
C.Deploy a Pilot Light environment in us-west-2 with a scaled-down ECS service and Redis cluster. Use Route 53 DNS failover. On disaster, scale up the ECS service and promote the Redis cluster.
D.Use Amazon ECS with Fargate in us-east-1 only, and schedule daily snapshots of ElastiCache for Redis. In case of disaster, restore the snapshot in a new region and update DNS.
AnswerA

The warm standby approach with automatic failover and cross-region replication meets RPO and RTO with low operational overhead.

Why this answer

Option A is correct because it uses ElastiCache Global Datastore for Redis, which provides cross-region replication with an RPO of seconds (well within 15 minutes) and automatic failover, minimizing operational overhead. The warm standby ECS service in us-west-2 with Route 53 health checks allows traffic to be redirected within the 30-minute RTO without manual intervention, as the ALB and ECS service are pre-provisioned.

Exam trap

The trap here is that candidates may confuse Pilot Light (Option C) as lower overhead, but it requires manual scaling and promotion steps, whereas a warm standby with Global Datastore automates failover, making it the least operational overhead for the given RPO/RTO.

How to eliminate wrong answers

Option B is wrong because an Active-Active configuration with multi-region writes for ElastiCache Global Datastore is not supported; Global Datastore only supports active-passive (one primary, one replica) to avoid write conflicts. Option C is wrong because a Pilot Light approach requires manual scaling of the ECS service and promoting the Redis cluster on disaster, which adds operational overhead and risks exceeding the 30-minute RTO due to provisioning delays. Option D is wrong because daily snapshots of ElastiCache cannot achieve a 15-minute RPO (snapshots are at most daily), and restoring a snapshot in a new region plus updating DNS would likely exceed the 30-minute RTO due to manual steps and data transfer time.

57
MCQeasy

A company uses Amazon CloudFront to distribute content from an S3 bucket origin. Some users report intermittent access errors. The DevOps team suspects the origin is overwhelmed. What is the MOST effective way to improve resilience?

A.Set up an origin failover with two S3 buckets behind an Application Load Balancer (ALB).
B.Reduce the CloudFront cache TTL to serve fresher content.
C.Increase the CloudFront cache TTL to reduce requests to the origin.
D.Configure CloudFront to perform health checks on the origin.
AnswerA

Origin failover provides high availability.

Why this answer

Option D is correct because using an ALB or multiple origins with failover provides redundancy and load distribution. Option A is wrong because increasing cache TTL only reduces origin load but does not address origin failures. Option B is wrong because reducing TTL increases origin load.

Option C is wrong because CloudFront does not have an origin health check feature; it uses error rate.

58
MCQhard

A company's application runs on Amazon EC2 instances in an Auto Scaling group. The application writes logs to local instance storage. The operations team needs to ensure logs are not lost during instance termination or scaling events. What should be done?

A.Increase the size of the instance store volumes.
B.Use an Amazon EFS file system and mount it to each instance for log storage.
C.Configure the Auto Scaling group to terminate instances after logs are copied to S3.
D.Install the CloudWatch Logs agent on each instance and stream logs to CloudWatch Logs.
AnswerD

Real-time streaming to CloudWatch prevents loss.

Why this answer

Configuring the CloudWatch Logs agent to stream logs to CloudWatch Logs in real time ensures logs are centralized and not lost.

59
MCQmedium

A company runs a stateless web application on a fleet of EC2 instances in an Auto Scaling group. The application stores session state in a shared ElastiCache Redis cluster. During traffic spikes, the application becomes slow. Monitoring shows that the Redis cluster has high CPU utilization. Which solution is MOST cost-effective and scalable?

A.Upgrade the Redis instance to a larger node type to handle more operations
B.Enable cluster mode on the ElastiCache Redis cluster and add more shards
C.Add read replicas to offload read traffic from the primary node
D.Migrate session state to DynamoDB with DAX for caching
AnswerB

Cluster mode distributes data across shards, improving performance and scalability.

Why this answer

Enabling cluster mode on ElastiCache Redis and adding more shards horizontally scales the cluster, distributing write and read operations across multiple nodes. This directly reduces CPU utilization on any single node and is more cost-effective than vertical scaling (upgrading to a larger node type) because it allows granular, pay-as-you-grow capacity. Cluster mode also supports automatic shard rebalancing and is ideal for stateless web applications with session state that can be partitioned by session key.

Exam trap

The trap here is that candidates often confuse read replicas with horizontal scaling for write-heavy workloads, not realizing that replicas only help with read scaling and cannot reduce CPU from write operations, while cluster mode directly addresses both read and write scaling by splitting the data set.

How to eliminate wrong answers

Option A is wrong because upgrading to a larger node type (vertical scaling) is less cost-effective and has an upper limit; it does not provide the linear scalability of horizontal sharding and can lead to over-provisioning during low traffic. Option C is wrong because adding read replicas offloads only read traffic, but session state in Redis involves both reads and writes, and the high CPU is likely from write-heavy operations (e.g., SET/GET) that replicas cannot offload; replicas also introduce eventual consistency issues for session data. Option D is wrong because migrating to DynamoDB with DAX introduces unnecessary complexity and cost for session state that is already well-served by Redis; DAX is a separate caching layer that adds latency and cost, and DynamoDB's throughput pricing can be less predictable than ElastiCache for bursty traffic.

60
MCQmedium

A company hosts a static website on Amazon S3 with a CloudFront distribution. The website is critical for business operations and must be available even if the primary AWS Region fails. Currently, the S3 bucket is in us-east-1, and CloudFront uses that bucket as the origin. The company has a secondary bucket in us-west-2 with a replica of the data. The company wants to use CloudFront to automatically fail over to the secondary bucket if the primary becomes unavailable. The DevOps engineer needs to implement a solution that requires minimal operational overhead. What should the engineer do?

A.Use an Application Load Balancer in front of both S3 buckets and point CloudFront to the ALB.
B.Create a second CloudFront distribution pointing to the secondary bucket and use Route 53 failover routing between the two distributions.
C.Modify the application to switch the CloudFront origin URL using Lambda@Edge when health checks fail.
D.Configure CloudFront Origin Failover by adding both buckets as origins, with the primary in us-east-1 and secondary in us-west-2.
AnswerD

CloudFront natively supports origin failover with minimal configuration.

Why this answer

CloudFront Origin Failover allows you to set up a primary and secondary origin. If the primary returns an error (e.g., 503), CloudFront automatically routes requests to the secondary origin.

61
MCQmedium

A company runs a critical web application on EC2 instances behind an Application Load Balancer. The application stores session state in an in-memory cache on each instance. During deployment of a new version, users experience session timeouts and errors. Which design change will MOST effectively improve resilience and avoid session loss during deployments?

A.Enable sticky sessions (session affinity) on the ALB.
B.Migrate session state to ElastiCache for Redis.
C.Increase the ALB idle timeout to 600 seconds.
D.Increase the EC2 instance size to handle higher memory.
AnswerB

Offloading session state to ElastiCache makes sessions durable across instance replacements.

Why this answer

Option B is correct because migrating session state from in-memory EC2 instance storage to ElastiCache for Redis decouples session data from individual instances. This ensures that when a new deployment replaces instances, sessions persist independently, preventing timeouts and errors. ElastiCache provides a centralized, highly available session store that survives instance termination and scaling events.

Exam trap

The trap here is that candidates often confuse sticky sessions (which only route traffic consistently) with session persistence (which requires external storage), leading them to choose option A despite it not preserving session data across instance replacements.

How to eliminate wrong answers

Option A is wrong because enabling sticky sessions (session affinity) on the ALB would lock users to a specific instance, but during deployment that instance is terminated and replaced, causing session loss regardless of stickiness. Option C is wrong because increasing the ALB idle timeout to 600 seconds only extends how long the ALB keeps a connection open without data transfer; it does not preserve session state stored in the instance's memory when the instance is replaced. Option D is wrong because increasing the EC2 instance size to handle higher memory does not solve the fundamental problem of session state being ephemeral and lost during instance replacement in a deployment.

62
MCQmedium

A company is designing a serverless application using AWS Lambda, Amazon API Gateway, and Amazon DynamoDB. The application must tolerate a Regional failure. Which design provides the most resilience?

A.Use Lambda@Edge to run functions at AWS edge locations
B.Use DynamoDB auto-scaling and run Lambda in a single Region
C.Use DynamoDB global tables with Lambda functions deployed in multiple Regions and Route 53 multi-Region routing
D.Use DynamoDB Accelerator (DAX) to cache data across Regions
AnswerC

Global tables and multi-Region deployment provide resilience.

Why this answer

Option C is correct because DynamoDB global tables provide multi-Region, fully replicated tables with automatic conflict resolution, ensuring data availability during a Regional outage. Deploying Lambda functions in multiple Regions with Route 53 multi-Region routing (using health checks and latency-based or weighted routing) allows traffic to fail over to a healthy Region, making the entire serverless stack resilient to a Regional failure.

Exam trap

The trap here is that candidates often confuse caching (DAX) or edge computing (Lambda@Edge) with true multi-Region replication and failover, assuming they provide Regional resilience when they do not.

How to eliminate wrong answers

Option A is wrong because Lambda@Edge runs at CloudFront edge locations, not in multiple AWS Regions, and is designed for lightweight request/response modification, not for hosting a full serverless application backend; it does not provide Regional failover for DynamoDB or API Gateway. Option B is wrong because DynamoDB auto-scaling only adjusts throughput within a single Region and does not replicate data across Regions, so a Regional failure would still cause complete data unavailability; running Lambda in a single Region creates a single point of failure for compute. Option D is wrong because DAX is a caching layer that operates within a single Region and does not replicate data across Regions; it cannot provide data durability or availability during a Regional outage, and it is not designed for cross-Region failover.

63
MCQmedium

A company's production database on Amazon RDS Multi-AZ DB instance experienced a failover. The application experienced a brief outage. How can the company reduce the failover time?

A.Switch to a Single-AZ deployment
B.Increase the DB instance size
C.Use Amazon RDS Proxy
D.Enable Enhanced Monitoring
AnswerC

RDS Proxy reduces failover time by pooling connections and rerouting them quickly.

Why this answer

Using RDS Proxy reduces failover time by maintaining connections and routing them to the new primary instance quickly.

64
MCQhard

A company uses AWS Lambda functions to process events from Amazon SQS. The Lambda function sometimes fails due to timeouts. The team wants to preserve the event for reprocessing. How should they configure the integration?

A.Set up a DLQ on the SQS queue that receives the events
B.Use Lambda reserved concurrency
C.Enable Lambda function DLQ with SNS topic
D.Increase Lambda timeout to maximum
AnswerA

SQS DLQ stores messages that failed processing, allowing reprocessing.

Why this answer

By configuring a dead-letter queue (DLQ) on the SQS queue, failed messages are preserved for later reprocessing.

65
MCQhard

A company runs a stateful web application on EC2 instances in an Auto Scaling group. The application uses an Application Load Balancer (ALB) and an Amazon ElastiCache Redis cluster. Users report that after a scaling event, they are logged out and lose session data. What is the most likely cause?

A.The ALB health check interval is too short, causing healthy instances to be marked unhealthy
B.The ElastiCache cluster is configured with in-transit encryption, causing session tokens to be invalidated
C.The Auto Scaling group is using a termination policy that terminates the oldest instance first, which holds active sessions
D.The ElastiCache cluster is not configured for Multi-AZ and a node failure caused all sessions to be lost
AnswerD

Without Multi-AZ, a node failure can cause data loss.

Why this answer

Option D is correct because the scenario describes a stateful web application that relies on ElastiCache Redis for session storage. If the ElastiCache cluster is not configured for Multi-AZ, a node failure can cause all cached session data to be lost, logging users out. This is the most likely cause of session loss after a scaling event, as scaling events do not directly affect ElastiCache data persistence.

Exam trap

The trap here is that candidates may focus on the Auto Scaling group's termination policy or ALB health checks, overlooking that the session data is stored externally in ElastiCache and that its lack of high availability is the root cause of session loss.

How to eliminate wrong answers

Option A is wrong because a short health check interval would cause instances to be marked unhealthy and replaced, but it would not directly cause session data loss; sessions are stored in ElastiCache, not on the instances. Option B is wrong because in-transit encryption on ElastiCache protects data during transmission and does not invalidate session tokens; it is unrelated to session persistence. Option C is wrong because terminating the oldest instance first is a common termination policy that does not cause session loss if sessions are stored externally in ElastiCache; the issue is with the session store itself, not the instance termination order.

66
MCQhard

A company runs a containerized microservices architecture on Amazon ECS with Fargate. The services communicate via an internal Application Load Balancer. Recently, a new deployment of Service A caused its health checks to fail. The DevOps engineer notices that the old tasks remain running and the service is unavailable. What configuration change would prevent this issue in future deployments?

A.Set the deployment minimum healthy percent to 50 and maximum percent to 100 with a health check grace period
B.Set the deployment circuit breaker to rollback on deployment failure and disable rollback
C.Change the deployment controller from ECS to CodeDeploy for blue/green deployments
D.Set the deployment minimum healthy percent to 0 and maximum percent to 200
AnswerA

This configuration ensures old tasks remain until new tasks pass health checks.

Why this answer

Setting the deployment minimum healthy percent to 50 and maximum percent to 100 ensures that during a deployment, at least 50% of tasks are healthy, but if health checks fail, the deployment can roll back because the old tasks are not replaced until new tasks are healthy. Option B is wrong because it allows replacing all tasks before health checks pass. Option C is wrong because it is a deployment controller, not a configuration to prevent failure.

Option D is wrong because it removes the ability to roll back.

67
MCQhard

A company runs a stateless web application on Amazon ECS with Fargate launch type. The application experiences intermittent traffic spikes. The company wants to ensure that the application can scale automatically and remain resilient to underlying infrastructure failures. Which combination of actions should the DevOps engineer take?

A.Configure a scheduled scaling policy for the Amazon ECS service to add tasks during known peak hours.
B.Launch tasks in a single Availability Zone and use an Application Auto Scaling target tracking policy based on CPU utilization.
C.Configure a step scaling policy for the Amazon ECS service and increase the task memory size.
D.Configure an Application Auto Scaling target tracking policy based on memory utilization and enable Amazon ECS service auto-recovery.
AnswerD

Target tracking scales based on demand; service auto-recovery replaces failed tasks, ensuring resilience.

Why this answer

Option D is correct because it combines Application Auto Scaling target tracking based on memory utilization, which is a relevant metric for a stateless web application to handle traffic spikes, with Amazon ECS service auto-recovery, which automatically replaces unhealthy tasks to ensure resilience against underlying infrastructure failures. This approach provides both automatic scaling and fault tolerance without manual intervention.

Exam trap

The trap here is that candidates often assume CPU utilization is the only valid scaling metric for web applications, but memory utilization can be more appropriate for stateless workloads, and they may overlook the critical need for service auto-recovery to handle infrastructure failures in Fargate.

How to eliminate wrong answers

Option A is wrong because scheduled scaling is reactive to known peak hours but cannot handle intermittent, unpredictable traffic spikes, and it does not address resilience to infrastructure failures. Option B is wrong because launching tasks in a single Availability Zone creates a single point of failure, violating resilience best practices, and while target tracking based on CPU utilization can scale, it does not provide auto-recovery for failed tasks. Option C is wrong because step scaling policies can be effective, but increasing task memory size does not directly improve scaling or resilience; it may reduce the need for scaling but does not automate recovery from failures.

68
Multi-Selecthard

A company runs a containerized application on Amazon ECS with Fargate. The application needs to be resilient to Availability Zone failures. Which THREE actions should the company take? (Choose THREE.)

Select 3 answers
A.Configure the ECS service to spread tasks across multiple Availability Zones.
B.Disable managed service scaling to avoid resource contention.
C.Use a multi-AZ Amazon RDS or DynamoDB for persistent data.
D.Deploy an Application Load Balancer (ALB) with targets in multiple Availability Zones.
E.Use a single service discovery namespace for all tasks.
AnswersA, C, D

Spreading tasks across AZs prevents total loss from a single AZ failure.

Why this answer

Option A is correct because spreading tasks across multiple AZs ensures that an AZ failure does not affect all tasks. Option C is correct because using a multi-AZ ALB distributes traffic across AZs. Option E is correct because storing state in a multi-AZ database like DynamoDB or RDS ensures data durability.

Option B is wrong because a single service discovery namespace does not provide AZ resilience. Option D is wrong because disabling managed scaling may reduce availability.

69
MCQhard

A company is designing a multi-region active-active architecture for a stateless web application using Route 53 latency-based routing. The application uses an RDS MySQL database. What should be done to ensure database resilience across regions?

A.Configure automated snapshots and copy them to the secondary region
B.Use RDS Cross-Region Synchronous Replication
C.Create cross-region read replicas and promote to master during failover
D.Enable Multi-AZ deployment in each region
AnswerC

Read replicas can be promoted for cross-region DR.

Why this answer

Cross-region read replicas can be promoted to master in disaster. Option A is wrong because Aurora Global Database is better but not listed as an option; but read replica promotion is supported. Option B is wrong because Multi-AZ is single-region.

Option C is wrong because RDS doesn't support synchronous cross-region replication. Option D is wrong because snapshot restore is manual and slow.

70
MCQhard

A company is designing a multi-Region disaster recovery strategy for a stateless web application. The application runs on EC2 instances in an Auto Scaling group behind an ALB in us-east-1. The recovery point objective (RPO) is 15 minutes and recovery time objective (RTO) is 30 minutes. The application data is stored in Amazon RDS for PostgreSQL. Which combination of actions should the company take to meet the RPO and RTO?

A.Use RDS cross-Region replication to a standby DB instance in another Region. Maintain a warm standby environment (Auto Scaling group, ALB) in the disaster Region. Configure Route 53 health checks to fail over automatically.
B.Use RDS cross-Region snapshot copy every 15 minutes. In the disaster Region, manually launch a new environment and restore the latest snapshot.
C.Use RDS Multi-AZ in us-east-1. In the disaster Region, keep a standby Auto Scaling group and ALB. On failure, promote the Multi-AZ standby to primary and update DNS.
D.Use RDS read replicas in another Region. On failure, promote the read replica to a standalone instance and update the application.
AnswerA

Cross-Region replication provides low RPO; warm standby with automatic failover meets RTO.

Why this answer

Option A is correct because it meets both the 15-minute RPO and 30-minute RTO. RDS cross-Region replication provides continuous asynchronous replication with minimal lag, typically well under 15 minutes, ensuring data is nearly up-to-date. The warm standby environment (pre-provisioned Auto Scaling group and ALB) in the disaster Region allows automatic failover via Route 53 health checks, enabling recovery within the 30-minute RTO without manual intervention.

Exam trap

The trap here is confusing Multi-AZ (single-Region HA) with cross-Region DR, leading candidates to choose Option C, which fails to protect against a Regional outage.

How to eliminate wrong answers

Option B is wrong because manual snapshot copies every 15 minutes cannot guarantee a 15-minute RPO due to snapshot creation and transfer delays, and manually launching a new environment and restoring the latest snapshot far exceeds the 30-minute RTO. Option C is wrong because RDS Multi-AZ in us-east-1 provides high availability within a single Region only; it does not replicate data to another Region, so a Regional failure would result in complete data loss and no DR capability. Option D is wrong because promoting a cross-Region read replica to a standalone instance can take several minutes and requires manual DNS updates, which together exceed the 30-minute RTO; also, read replicas may have replication lag that could violate the 15-minute RPO.

71
Multi-Selecthard

A company uses Amazon ECS with Fargate for containerized applications. They need to ensure that if a task fails, it is automatically restarted and the application remains available. Which THREE actions should they take? (Choose THREE.)

Select 3 answers
A.Configure the ECS service to automatically restart failed tasks.
B.Place tasks across multiple Availability Zones.
C.Use an Application Load Balancer with health checks.
D.Set up a CloudWatch alarm to trigger AWS Lambda to restart tasks.
E.Configure an EC2 Auto Scaling group for the ECS cluster.
AnswersA, B, C

Service auto-restart replaces failed tasks.

Why this answer

Options B, C, and E are correct. Service auto-restart ensures tasks are replaced. Deploying tasks across multiple AZs provides AZ resilience.

Using an ALB with health checks ensures traffic is routed to healthy tasks. Option A is incorrect because ECS does not use EC2 Auto Scaling for Fargate. Option D is incorrect because CloudWatch alarms are not needed for automatic restart.

72
Multi-Selectmedium

A company runs a stateful web application on Amazon EC2 instances behind an Application Load Balancer (ALB). The application stores session data in local instance memory. To improve resiliency, the company wants to make the application stateless and distribute the load across multiple Availability Zones. Which THREE actions should the company take? (Choose three.)

Select 3 answers
A.Enable sticky sessions (session affinity) on the ALB.
B.Replace the ALB with a Network Load Balancer (NLB) to improve performance.
C.Configure the ALB to distribute traffic across EC2 instances in multiple Availability Zones.
D.Implement an Amazon ElastiCache cluster to store session data externally.
E.Use Amazon DynamoDB to store session state.
AnswersC, D, E

Provides high availability and fault tolerance.

Why this answer

Option A is correct because ElastiCache provides a centralized, highly available session store. Option B is correct because using an ALB with multiple AZs distributes traffic and provides high availability. Option D is correct because DynamoDB is a fully managed, durable session store.

Option C is wrong because sticky sessions (session affinity) are used to route requests to the same instance, which is not ideal for stateless architecture. Option E is wrong because an NLB operates at layer 4 and does not support path-based routing or session management for HTTP applications as effectively as an ALB.

73
MCQhard

A company has a stateless web application on EC2 instances behind an ALB. They want to ensure that if an entire Availability Zone fails, the application remains available with minimal impact. Which architecture best meets this requirement?

A.Use a global secondary index in DynamoDB to replicate data across regions
B.Deploy EC2 instances in two AZs but use a single-AZ ALB
C.Deploy the application in two AZs with an Auto Scaling group and an ALB that is enabled for multiple AZs
D.Deploy the application in a single AZ with an Auto Scaling group that launches instances in the same AZ
AnswerC

Multi-AZ ALB and Auto Scaling provide resilience.

Why this answer

Using Auto Scaling with multiple AZs and a multi-AZ ALB ensures that if one AZ fails, the ALB routes traffic to healthy instances in other AZs. Option A is wrong because a single AZ is a single point of failure. Option B is wrong because ALB does not automatically shift traffic to another region without Route53.

Option D is wrong because spreading instances across AZs but using a single-AZ ALB still risks ALB failure.

74
Multi-Selectmedium

A company is designing a multi-region disaster recovery strategy for a stateless web application. They want to minimize RTO and RPO. Which TWO of the following should they implement? (Choose TWO.)

Select 2 answers
A.Use cross-region replication for data stores.
B.Use a passive standby in a single Availability Zone.
C.Perform periodic backups and restore in the DR region.
D.Configure cross-region read replicas for the database.
E.Deploy an active-active workload using Route 53 weighted routing.
AnswersA, E

Cross-region replication keeps data synchronized, reducing RPO.

Why this answer

Options A and D are correct. Active-active with Route 53 weighted records provides immediate failover with minimal RTO. Cross-region replication of data keeps the data in sync, reducing RPO.

Option B is incorrect because standby in a single AZ does not help for regional failures. Option C is incorrect because backup and restore has high RTO/RPO. Option E is incorrect because read replicas are for read scaling, not DR.

75
MCQeasy

A company wants to design a resilient architecture for a web application using AWS services. Which of the following is a best practice for improving resilience?

A.Deploy EC2 instances in multiple Availability Zones.
B.Use an Auto Scaling group in a single AZ.
C.Use a single AZ with RDS Multi-AZ.
D.Use one large EC2 instance to handle all traffic.
AnswerA

Multi-AZ deployment prevents single AZ failures from impacting availability.

Why this answer

Option A is correct because deploying across multiple Availability Zones ensures that if one AZ fails, the application remains available. Option B is incorrect because a single EC2 instance in one AZ is not resilient. Option C is incorrect because a single AZ with RDS Multi-AZ still has a single point of failure at the AZ level.

Option D is incorrect because Auto Scaling in one AZ is still vulnerable to AZ failure.

Page 1 of 4 · 259 questions totalNext →

Ready to test yourself?

Try a timed practice session using only Resilient Cloud questions.