How many Resilient Cloud Solutions questions are on the DOP-C02 exam?

The Resilient Cloud Solutions domain is one of the weighted domains on the DOP-C02 exam. The Courseiva question bank has 259 practice questions for this domain.

How can I practice Resilient Cloud Solutions questions for DOP-C02?

Click any of the 259 questions listed on this page to see the full question and explanation, or use the session launcher to start a focused practice session of 10, 20, 30 or 50 questions drawn only from the Resilient Cloud Solutions domain.

Free DOP-C02 Resilient Cloud Solutions Practice Questions (2026)

Practice Resilient Cloud Solutions questions

10Q 20Q 30Q 50Q

All DOP-C02 Resilient Cloud Solutions questions (259)

Start session

Click any question to see the full explanation and answer options, or start a focused practice session above.

A company runs a critical web application on EC2 instances behind an Application Load Balancer (ALB) with Auto Scaling. During a recent traffic spike, the application became unavailable for 10 minutes. Analysis shows that the ALB's healthy host count dropped to zero because the instances failed health checks due to high CPU load. What is the MOST effective design change to improve resilience during future traffic spikes?

A company uses DynamoDB global tables in two AWS Regions with strong consistency reads. They observe occasional write conflicts that are not being resolved automatically. The application uses DynamoDBMapper with optimistic locking. What should the DevOps engineer do to ensure conflict resolution?

A company's application runs on EC2 instances in a single Availability Zone. The operations team wants to improve resilience without redesigning the application. Which action is the MOST effective?

A company uses a third-party backup solution to back up its EC2 instances daily. The backups are stored in an S3 bucket with default settings. The company wants to ensure that backups are protected from accidental deletion and are available for at least one year. Which combination of S3 features should the DevOps engineer implement?

A company runs a stateful web application on EC2 instances behind a Network Load Balancer (NLB) in a single Availability Zone. The application stores session state locally on the instance. The company wants to achieve high availability across multiple AZs with minimal application changes. What should the DevOps engineer do?

A company's DevOps team is designing a disaster recovery plan for a critical application. The application runs on EC2 instances with an RDS MySQL database. The Recovery Time Objective (RTO) is 15 minutes, and the Recovery Point Objective (RPO) is 1 hour. Which approach BEST meets these requirements?

A company's application uses Amazon SQS to decouple microservices. During peak hours, the SQS queue backlog grows significantly, causing processing delays. The DevOps team wants to reduce latency without increasing costs unnecessarily. What should the team do?

A company runs a microservices application on Amazon ECS with Fargate. The application includes a service that processes orders and stores them in an RDS PostgreSQL database. The company wants to ensure that the order service is resilient to AZ failures and can handle a sudden increase in order volume. Which TWO actions should the DevOps engineer take? (Choose TWO.)

A company's application uses Amazon DynamoDB as its primary data store. The application experiences occasional throttling errors during traffic spikes. The DevOps team needs to implement a solution that ensures consistent performance without manual intervention. Which TWO actions should the team take? (Choose TWO.)

A company wants to design a highly available web application using AWS services. The application must be resilient to the failure of an entire AWS Region. Which THREE components should the architecture include? (Choose THREE.)

A company runs a critical e-commerce platform on AWS. The architecture includes an Application Load Balancer (ALB) that distributes traffic to a fleet of EC2 instances in an Auto Scaling group across three Availability Zones. The instances run a Java application that connects to an Amazon RDS Multi-AZ MySQL database. The application also uses Amazon ElastiCache for Redis for session caching. The company recently experienced a severe outage where the ALB's 5xx error rate spiked to 100% for 45 minutes. The root cause was a combination of a slow-running query on the RDS primary instance and a subsequent failover that caused the application to lose connections to the database. The failover happened because the slow query caused the primary to become unresponsive, triggering a Multi-AZ failover. During the failover, the application's connection pool exhausted, and new connections failed. The application logs show a high rate of 'java.sql.SQLTimeoutException' and 'com.mysql.cj.exceptions.CJCommunicationsException'. The DevOps team needs to implement a long-term solution that minimizes the impact of similar incidents. The solution must be cost-effective and require minimal application changes. Which combination of actions should the DevOps team take?

A company runs a critical web application on Amazon EC2 instances behind an Application Load Balancer (ALB). The application uses an Amazon RDS for MySQL Multi-AZ DB instance for data storage. During an AWS infrastructure event, the primary Availability Zone (AZ) becomes unavailable, and the application experiences downtime. The RDS Multi-AZ failover completes automatically, but the application takes several minutes to reconnect. Which combination of actions would MOST reduce the recovery time for the application during such an event?

A company is designing a disaster recovery (DR) strategy for a stateless web application deployed on Amazon ECS with Fargate. The application is fronted by an Application Load Balancer (ALB) and uses Amazon ElastiCache for Redis for session state. The primary region is us-east-1. The DR plan requires a Recovery Point Objective (RPO) of 15 minutes and a Recovery Time Objective (RTO) of 30 minutes. Which solution meets these requirements with the LEAST operational overhead?

A development team wants to ensure that their application can continue serving traffic even if an entire AWS Availability Zone (AZ) becomes unavailable. The application runs on Amazon EC2 instances in an Auto Scaling group and uses an Application Load Balancer (ALB). Which configuration should the team implement to meet this requirement?

A company runs a containerized microservices application on Amazon EKS. The application includes a critical service that processes real-time financial transactions. This service must be highly available and resilient to node failures. The current setup uses a Deployment with 3 replicas and a ClusterIP service. During a recent node failure, the application experienced a brief period of unavailability. Which action should the DevOps engineer take to improve resilience without changing the underlying infrastructure?

A company is building a multi-tier web application on AWS. The application must be resilient to the failure of an entire Availability Zone. The architecture includes an Application Load Balancer (ALB), EC2 instances in an Auto Scaling group, and an Amazon RDS for MySQL database. Which TWO actions should be taken to achieve this resilience? (Choose two.)

A company runs a critical application on AWS using Amazon EC2 instances in an Auto Scaling group, an Application Load Balancer (ALB), and an Amazon RDS for PostgreSQL Multi-AZ DB cluster. The application must maintain an RTO of 5 minutes and an RPO of 1 second for database transactions. The current setup meets these requirements, but the DevOps team wants to improve the resilience of the application tier to withstand a regional failure. Which THREE actions should be taken? (Choose three.)

A company runs a production e-commerce platform on AWS. The architecture includes an Application Load Balancer (ALB) that distributes traffic to a fleet of Amazon EC2 instances running in an Auto Scaling group across three Availability Zones (AZs). The application stores session state in Amazon ElastiCache for Redis (cluster mode disabled) with a single node. The database is an Amazon Aurora MySQL DB cluster with one writer and two reader instances in different AZs. The platform experiences intermittent slowdowns and occasional timeouts during peak traffic hours. The CloudWatch metrics show that the ALB's TargetResponseTime is elevated, and the Redis CPU utilization is consistently above 80% during these periods. The Auto Scaling group is scaling out, but new instances take several minutes to become healthy. The DevOps team has been asked to improve the resilience and performance of the application with minimal changes to the application code. Which solution should the team implement?

A company runs a critical web application on Amazon EC2 instances behind an Application Load Balancer (ALB) across multiple Availability Zones. The application stores session data in a shared Amazon ElastiCache for Redis cluster. The operations team reports that during a recent AZ failure, users experienced session loss and application errors. Which combination of actions should the company take to improve resilience and maintain session state during an AZ failure? (Choose TWO.)

An AWS Lambda function that processes sensitive data writes objects to an S3 bucket. The security team requires that all objects be encrypted at rest using SSE-S3. The Lambda execution role uses the above IAM policy. Despite the policy, some objects are uploaded without server-side encryption. What is the most likely cause?

A company runs a stateless web application on EC2 instances in an Auto Scaling group across three Availability Zones. The application uses an Application Load Balancer. The operations team needs to ensure that the application remains available if one AZ fails. Which solution is MOST resilient?

Drag and drop the steps to troubleshoot a failed deployment in AWS CodeDeploy into the correct order.

Drag and drop the steps to perform a disaster recovery failover from a primary region to a secondary region using AWS Route 53 and RDS.

Match each AWS CloudFormation concept to its description.

Match each AWS CLI command to its function.

A company runs a critical web application on EC2 instances behind an Application Load Balancer (ALB). The application stores session state in an Amazon DynamoDB table. During a recent traffic spike, users experienced session timeouts and the application became unavailable. Which design change would BEST improve resilience?

A company runs a stateless web application on AWS Lambda behind an Application Load Balancer (ALB). During a deployment, the team updates the Lambda function to a new version. Some users report seeing the old version of the application for several minutes after the deployment. What is the MOST likely cause?

A company is designing a disaster recovery strategy for its primary RDS for PostgreSQL database in us-east-1. The RTO is 15 minutes and RPO is 1 minute. Which solution meets these requirements?

A company runs a microservices application on Amazon ECS with Fargate launch type. The application experiences intermittent failures when calling an external API. The errors are transient and usually resolve within a few seconds. How should the company improve resilience?

A company uses AWS CloudFormation to deploy a multi-tier application. During an update, the stack fails and rolls back. The rollback also fails, leaving the stack in UPDATE_ROLLBACK_FAILED state. The operations team needs to resolve this with minimal disruption. What is the MOST efficient approach?

A company wants to ensure that its Amazon S3 bucket can withstand the loss of an entire AWS Availability Zone. Which configuration meets this requirement?

A company runs a stateful application on EC2 instances in an Auto Scaling group. The application stores state on local instance storage. During a scaling event, users lose session data. How can the company make the application resilient without modifying the application code?

A company uses Amazon Route 53 with a failover routing policy to direct traffic to an active and a standby endpoint. The health checks are configured to check the active endpoint every 10 seconds. During a recent outage, the failover took over 3 minutes to detect and switch. How can the company improve the failover time to under 1 minute?

A company runs a critical application on Amazon EC2 instances in an Auto Scaling group. To ensure high availability, the instances are deployed across three Availability Zones. Which additional step should the company take to protect against a regional failure?

A company is designing a highly available architecture for a web application using AWS services. The application must be resilient to the failure of an entire AWS Region. Which TWO strategies should the company implement? (Choose TWO.)

A company runs a containerized application on Amazon ECS with Fargate. The application needs to be resilient to Availability Zone failures. Which THREE actions should the company take? (Choose THREE.)

A company is implementing a disaster recovery plan for its on-premises database using AWS. The plan must have a Recovery Time Objective (RTO) of 2 hours and a Recovery Point Objective (RPO) of 15 minutes. Which TWO AWS services should the company use? (Choose TWO.)

Refer to the exhibit. An IAM policy is attached to an IAM role used by an EC2 instance to manage other EC2 instances. The operations team reports that the instance can start and stop other instances but cannot terminate them. However, they also notice that the instance cannot describe instances in any region other than us-east-1. What is the reason for this behavior?

Refer to the exhibit. An Auto Scaling group is configured with an Application Load Balancer. The group has a desired capacity of 2 instances spread across two Availability Zones. Recently, the application has been experiencing high error rates during deployments. The team suspects that new instances are being marked as healthy before they are fully ready. What should the team do to resolve this issue?

Refer to the exhibit. A Lambda function uses the IAM role with the above policy. The function is configured to access a DynamoDB table MyTable and an RDS instance in a VPC. When invoked, the function fails with an error indicating it cannot describe VPC subnets. What is the MOST likely cause?

A company runs a web application on EC2 instances behind an Application Load Balancer. The application experiences intermittent failures due to a single Availability Zone failing. Which solution is MOST resilient and cost-effective?

A DevOps engineer is designing a multi-Region active-active architecture for a stateless web application using Route 53 latency-based routing and DynamoDB global tables. The application must continue to serve traffic even if an entire AWS Region becomes unavailable. Which additional step is MOST critical for resilience?

A company uses AWS CloudFormation to deploy a multi-tier application. The stack includes an RDS DB instance with Multi-AZ enabled. The database experiences a failover during maintenance. The application reports connection errors for several minutes. What is the MOST likely cause and solution?

A company runs a critical batch processing job on Amazon ECS using Fargate. The job must complete within 2 hours. If the job fails, it must be retried automatically up to 3 times. Which solution meets these requirements?

A DevOps team is designing a disaster recovery plan for a production RDS for PostgreSQL database. The RPO must be less than 5 minutes and the RTO less than 1 hour. The database size is 2 TB. Which solution is MOST cost-effective?

A company uses an NLB to distribute traffic to a fleet of EC2 instances in a single Availability Zone. During a recent AWS outage in that zone, the application became completely unavailable. The company wants to achieve high availability without rearchitecting the application. Which change is MOST appropriate?

A company runs a containerized application on Amazon ECS with Fargate. The application needs to store session state. Which service provides the MOST resilient and scalable solution?

A company uses AWS CodePipeline to deploy a web application. The pipeline includes a deploy action that uses AWS CloudFormation to update a stack. The deployment occasionally fails because of a transient resource limit error. Which automatic retry strategy should a DevOps engineer implement?

A company has a critical application running on EC2 instances in an Auto Scaling group across two Availability Zones. The application uses an EBS volume for local caching. The company wants to ensure that if an instance fails, the cache data is not lost and the replacement instance can use it. Which solution meets this requirement?

A company wants to ensure that its application running on AWS can withstand the failure of an entire AWS Region. Which TWO strategies should the company implement?

A company runs a stateful web application on EC2 instances behind an ALB. The application stores session data in memory. The company wants to make the application stateless to improve resilience. Which TWO changes should the company make?

A company is designing a disaster recovery plan for a critical application that uses Amazon RDS for MySQL with Multi-AZ. The RPO must be less than 1 minute and RTO less than 15 minutes. The primary Region is us-east-1. Which THREE steps should the company take to meet these requirements?

A company runs a production web application on EC2 instances behind an Application Load Balancer. The application experiences intermittent high latency. The operations team needs to identify the root cause without affecting live traffic. Which approach is the MOST efficient?

A company uses AWS Lambda for processing events from Amazon S3. Recently, the Lambda function started timing out after the 15-minute limit for some large files. The function downloads the entire file to /tmp before processing. What should a DevOps engineer do to resolve this issue with minimal code changes?

A company runs a critical database on Amazon RDS for PostgreSQL with Multi-AZ deployment. The application experiences a brief outage during automatic failover. To improve availability, the company wants to reduce the failover time. What should they do?

A company uses AWS CodeDeploy for blue/green deployments to an Auto Scaling group. The deployment fails because the new instances do not pass health checks. The DevOps engineer discovers that the health check URL returns a 503 error. What is the MOST likely cause?

A company runs a stateless web application on a fleet of EC2 instances in an Auto Scaling group. The application stores session state in a shared ElastiCache Redis cluster. During traffic spikes, the application becomes slow. Monitoring shows that the Redis cluster has high CPU utilization. Which solution is MOST cost-effective and scalable?

A company runs a containerized microservices architecture on Amazon ECS with Fargate. The services communicate via an internal Application Load Balancer. Recently, a new deployment of Service A caused its health checks to fail. The DevOps engineer notices that the old tasks remain running and the service is unavailable. What configuration change would prevent this issue in future deployments?

A company runs a critical application on EC2 instances in an Auto Scaling group. The application uses an EBS volume attached to each instance for temporary data. The company needs to ensure that if an instance fails, the data is not lost, and the new instance can resume quickly. What should they do?

A company uses AWS CloudFormation to deploy infrastructure. The stack creation fails with the error: 'Resource handler returned message: 'The security group does not exist in VPC'.' The template references a security group by name. What is the MOST likely cause?

A company runs a web application on EC2 instances behind an Application Load Balancer. The application uses an Aurora MySQL database. Recently, the database experienced a failover, and the application started throwing connection errors. The DevOps engineer needs to make the application resilient to database failovers with minimal code changes. What should they do?

A company wants to design a highly available and fault-tolerant architecture for a stateless web application on AWS. Which TWO actions should they take? (Choose two.)

A company runs a microservices application on Amazon ECS with Fargate. The services need to be resilient to AZ failures. Which TWO actions should the company take? (Choose two.)

A company is designing a disaster recovery plan for a critical application with an RPO of 15 minutes and RTO of 1 hour. The application runs on EC2 instances with an RDS MySQL database. The primary Region is us-east-1. Which THREE actions should they take to meet the RPO and RTO? (Choose three.)

An IAM policy is attached to an S3 bucket to allow access from a specific VPC CIDR range. However, users from the VPC are receiving 'Access Denied' errors when trying to access objects in the bucket. What is the MOST likely reason?

A DevOps engineer runs the above command and sees that one target is unhealthy with a 503 error. The application is a web server running on port 80. The health check is configured to hit the root path '/'. Which action should the engineer take to resolve the issue?

A company deploys the above CloudFormation stack. They want to enforce HTTPS for all requests to the S3 bucket. After deployment, users are still able to make HTTP requests. What is the problem?

A company runs a critical web application on EC2 instances behind an Application Load Balancer. To improve resilience, they want to automatically replace failed instances. Which AWS service should they use?

A company is designing a multi-region active-active architecture for a stateless web application using Route 53 latency-based routing. The application uses an RDS MySQL database. What should be done to ensure database resilience across regions?

A DevOps engineer needs to ensure that an application running on EC2 can automatically recover from an underlying hardware failure without manual intervention. Which AWS feature should be enabled?

A company is deploying a critical microservice on Amazon ECS with Fargate. They need to ensure that the service can tolerate an Availability Zone failure. What is the BEST approach?

A company runs an application on EC2 with a shared Elastic IP. The instance fails and an engineer manually attaches the Elastic IP to a standby instance. To automate this failover, which service should be used?

A company wants to protect its S3 bucket data from accidental deletion or overwrite. Which feature should be enabled?

A company has a multi-region application with an RDS for MySQL database in us-east-1. They want to minimize downtime if the primary region fails. They set up a cross-region read replica in us-west-2. What additional step is needed for automated failover?

A company is designing a disaster recovery strategy for a critical application. They need a Recovery Time Objective (RTO) of 15 minutes and a Recovery Point Objective (RPO) of 1 minute. Which AWS database service configuration meets these requirements?

A company uses an Application Load Balancer (ALB) to distribute traffic to EC2 instances. The ALB is in us-east-1a and us-east-1b. They want to ensure that if one AZ fails, traffic is routed only to healthy instances in the other AZ. What configuration is necessary?

A company is designing a highly available architecture for a stateless web application using AWS services. Which TWO steps should they take to achieve high availability?

A company uses DynamoDB global tables for a multi-region application. They notice that write conflicts are occurring. Which TWO strategies can reduce write conflicts?

A company is designing a disaster recovery plan for an RDS PostgreSQL database. They have a cross-region read replica. Which THREE steps should they take to ensure a successful failover?

A company runs a critical web application on EC2 instances behind an Application Load Balancer (ALB) across multiple Availability Zones. During a recent failure of one AZ, the application experienced downtime because the Auto Scaling group did not launch new instances quickly enough. What should a DevOps engineer do to improve resilience?

An application running on Amazon ECS with Fargate experiences intermittent failures. The task definition includes a single container with a health check command. Despite the health check passing, the application occasionally returns HTTP 500 errors. The application logs are sent to CloudWatch Logs. What is the MOST likely root cause?

A company wants to ensure its data in Amazon S3 is protected against accidental deletion. The bucket stores critical documents. Which approach provides the HIGHEST level of resilience?

A DevOps team uses AWS CodePipeline to deploy a web application. The pipeline has a deploy stage that uses CodeDeploy to deploy to an Auto Scaling group. During deployment, the new instances fail health checks and the deployment rolls back. However, the rollback also fails because the old instances have been terminated. What should the team do to avoid this issue?

A company runs a stateful application on EC2 instances with instance store volumes. The application requires low-latency access to data. The operations team needs to ensure that instance failure does not result in data loss. Which solution is MOST resilient?

A company is using Amazon RDS for MySQL with Multi-AZ deployment. During a recent failover, the application experienced a brief downtime because the DNS cache on the application servers still pointed to the old primary. How can a DevOps engineer minimize this downtime?

A company uses AWS Lambda to process messages from an Amazon SQS queue. The Lambda function occasionally times out after 15 seconds. To improve resilience, the team wants to ensure messages are not lost and are retried. Which configuration is MOST appropriate?

A company runs a containerized application on Amazon EKS. The application uses an ALB Ingress Controller. During a cluster upgrade, the ingress controller stops responding, causing downtime. The team wants to ensure resilience during upgrades. Which approach is BEST?

A company uses Amazon CloudFront to distribute content from an S3 bucket origin. Some users report intermittent access errors. The DevOps team suspects the origin is overwhelmed. What is the MOST effective way to improve resilience?

A company is designing a disaster recovery (DR) strategy for a critical application that runs on EC2 instances with an RDS database. The DR site must be in a different AWS Region. The Recovery Point Objective (RPO) is 15 minutes, and Recovery Time Objective (RTO) is 1 hour. Which TWO actions should the company take to meet these objectives? (Choose TWO.)

A company runs a microservices architecture on Amazon ECS with Fargate. Services communicate via an internal Application Load Balancer (ALB). The operations team notices that occasional traffic spikes cause increased latency and timeouts. The team wants to improve resilience without over-provisioning. Which THREE steps should be taken? (Choose THREE.)

A company is designing a resilient storage solution for a critical application. The data must be highly available and durable. Which TWO services meet these requirements? (Choose TWO.)

A company runs a critical web application on AWS using an Application Load Balancer (ALB) in front of an Auto Scaling group of EC2 instances. The application experiences periodic traffic spikes. To handle these spikes, the company wants to use a combination of proactive scaling based on a predictable schedule and reactive scaling based on CPU utilization. What is the MOST resilient scaling strategy?

A company runs a stateful web application on EC2 instances behind an ALB. The application uses sticky sessions (session affinity) to maintain user sessions. During a deployment, the company wants to update the application with zero downtime and ensure that in-flight sessions are not lost. Which deployment strategy should they use?

A company is designing a multi-region active-active architecture for a stateless web application. The application uses a DynamoDB table as its data store. The company wants to minimize write latency and ensure that writes are accepted in any region with eventual consistency. Which DynamoDB feature should they use?

A company runs a microservices architecture on Amazon ECS with Fargate. Each service is deployed in its own ECS service. The company wants to ensure that if one Availability Zone (AZ) fails, the services can continue to operate with minimal impact. What is the MOST resilient task placement strategy?

A company has a production RDS for PostgreSQL database. They need to perform a major version upgrade with minimal downtime. Which strategy provides the LEAST downtime while maintaining data integrity?

A company is designing a disaster recovery strategy for its on-premises database to AWS using AWS Elastic Disaster Recovery (AWS DRS). The recovery time objective (RTO) is 15 minutes, and the recovery point objective (RPO) is 1 minute. Which configuration should they use?

A company runs a containerized application on Amazon EKS. They want to ensure that if a node fails, the pods are rescheduled on healthy nodes. Which configuration is necessary?

A company has a serverless application using AWS Lambda functions that process messages from an Amazon SQS queue. The Lambda function sometimes fails due to transient errors. The company wants to ensure that failed messages are retried and eventually processed or sent to a dead-letter queue after 3 retries. What is the correct configuration?

100

A company is building a multi-tier web application on AWS. The web tier runs on EC2 instances behind an ALB. The application tier runs on EC2 instances that are not publicly accessible. The database tier runs on RDS MySQL. Which design provides the HIGHEST level of resilience for the database tier?

101

A company runs a stateful web application on EC2 instances that store session data locally. They want to migrate to a stateless architecture for better resilience. Which TWO actions should they take?

102

A company runs a critical application on AWS that uses an Auto Scaling group of EC2 instances. The application must remain available even if an entire Availability Zone fails. Which THREE actions should the company take?

103

A company is designing a disaster recovery strategy for its application. The application runs on EC2 instances and uses an RDS MySQL database. The RTO is 1 hour, and the RPO is 15 minutes. Which TWO approaches meet these requirements?

104

A company runs a web application on EC2 instances behind an ALB. To improve resilience, they want to automatically re-register failed instances. Which solution meets this requirement?

105

A company uses DynamoDB global tables with two regions. They notice that writes in one region are not replicating to the other region after a brief network partition. Which configuration will ensure replication resumes automatically?

106

A company wants to ensure its RDS Multi-AZ deployment automatically fails over to a standby instance in a different Availability Zone. Which additional step is required?

107

A company uses AlB with target groups for a microservices architecture. They need to ensure that if a target group has no healthy targets, the ALB returns a custom error page instead of a 503. How can this be achieved?

108

A company runs a Stateful application on EC2 that requires sticky sessions. They use an ALB with duration-based stickiness. During a deployment, they want to drain existing connections gracefully before terminating instances. Which step is necessary?

109

A company wants to automate the recovery of an Amazon RDS DB instance in a different region if the primary region becomes unavailable. Which service should they use?

110

A company uses AWS Lambda to process messages from an SQS queue. They need to ensure that if the Lambda function fails, the message is not lost and can be processed again. Which configuration is required?

111

A company runs a critical application on EC2 instances in an Auto Scaling group behind an ALB. They want to ensure that if an instance fails, the application remains available with minimal disruption. Which combination of services provides the best resilience?

112

A company wants to design a resilient architecture for a web application using AWS services. Which of the following is a best practice for improving resilience?

113

A company is designing a multi-region disaster recovery strategy for a stateless web application. They want to minimize RTO and RPO. Which TWO of the following should they implement? (Choose TWO.)

114

A company uses Amazon ECS with Fargate for containerized applications. They need to ensure that if a task fails, it is automatically restarted and the application remains available. Which THREE actions should they take? (Choose THREE.)

115

A company runs a mission-critical database on Amazon RDS for MySQL. They need to ensure that if the primary DB instance fails, the database remains available with minimal downtime. Which TWO configurations should they implement? (Choose TWO.)

116

Refer to the exhibit. An IAM policy is attached to a user. A developer tries to upload an object to s3://my-bucket/confidential/report.pdf without specifying server-side encryption. What will happen?

117

A company runs a high-traffic e-commerce application on EC2 instances in an Auto Scaling group behind an ALB. The application uses an in-memory cache on the EC2 instances. During a recent deployment, the Auto Scaling group terminated an instance that had active user sessions, causing users to lose their cart data and leading to a poor customer experience. The company wants to prevent this in future deployments. They need a solution that allows existing sessions to complete before instance termination, without manual intervention. Which solution should they use?

118

A company runs a critical microservices application on Amazon EKS with multiple services. They use an ingress controller (ALB Ingress Controller) to route traffic to services. They notice that when a pod fails, new requests are still sent to the failed pod for a few seconds, causing errors. The health check interval is set to 5 seconds. They want to minimize the time during which failed pods receive traffic. They also need to ensure that during rolling updates, traffic is not sent to pods that are terminating. Which solution should they implement?

119

A company runs a web application on EC2 instances behind an Application Load Balancer (ALB). The application stores session state in an RDS MySQL database. During a recent spike in traffic, the database CPU utilization reached 100%, causing slow responses. To improve resilience, what should a DevOps engineer do?

120

A DevOps team is designing a highly available multi-tier application on AWS. The application runs on EC2 instances in an Auto Scaling group across two Availability Zones. The team uses an Application Load Balancer (ALB) to distribute traffic. The application requires the ALB to be accessible via a single, static IP address for whitelisting by third-party partners. What is the most resilient solution?

121

A company has a microservices architecture running on Amazon ECS with Fargate launch type. Each service is deployed in multiple Availability Zones. The services communicate via REST APIs. Recently, a downstream service experienced a partial outage, causing upstream services to time out and leading to cascading failures. The team wants to improve resilience against such failures. Which combination of actions should the DevOps engineer take? (Choose TWO.)

122

A company is running a critical application on Amazon RDS for PostgreSQL with Multi-AZ deployment. The application performs frequent writes. During a recent failover test, the team observed that the application experienced a 30-second write outage. To minimize downtime during automatic failovers, which configuration change should the DevOps engineer implement? (Choose TWO.)

123

A DevOps engineer is designing a disaster recovery (DR) strategy for a stateless web application running on EC2 instances with an Application Load Balancer. The application stores data in Amazon S3 and uses a DynamoDB table for session data. The primary region is us-east-1 and the DR region is us-west-2. The RTO is 15 minutes and RPO is 1 minute. Which strategy is most cost-effective and meets the requirements?

124

A company runs a containerized application on Amazon ECS with Fargate. The application uses an Application Load Balancer (ALB) and stores data in Amazon Aurora Serverless v2. The application experiences intermittent timeouts during periods of rapid scaling. The DevOps engineer notices that the Aurora database's ACU utilization spikes to 100% during these events. What should the engineer do to improve resilience? (Choose THREE.)

125

A company uses AWS CloudFormation to deploy infrastructure. The DevOps team wants to ensure that if a stack update fails, the stack automatically rolls back to the previous known good state. The team also wants to receive notifications of the rollback. Which combination of steps should the team take? (Choose THREE.)

126

A DevOps engineer is troubleshooting an issue where an EC2 instance behind an ALB target group is marked as unhealthy. The instance i-0abcd1234efgh5678 is serving traffic but the health check is timing out. The security group for the instance allows inbound HTTP from the ALB's security group. What is the most likely cause?

127

A DevOps engineer created this IAM policy for a CI/CD pipeline role. The pipeline needs to stop and start production EC2 instances and manage Auto Scaling groups. However, the pipeline fails when trying to stop an instance. What is the most likely reason?

128

A DevOps engineer is reviewing a CloudFormation template for an S3 bucket that stores application logs. The bucket has versioning enabled and a lifecycle rule to expire noncurrent versions after 30 days. The bucket policy allows public read access to all objects. The company's security policy requires that all S3 buckets block public access. Which change should the engineer make to comply?

129

A company has a critical application running on Amazon EC2 instances in an Auto Scaling group. The application writes logs to an Amazon EFS file system. The DevOps team needs to ensure that log data is durable and available even if an Availability Zone fails. The EFS file system is currently in one AZ. What should the team do? (Choose TWO.)

130

A company is migrating a monolithic application to a microservices architecture on Amazon EKS. The application uses a relational database. The team wants to ensure that database connections are managed efficiently and that the database can withstand a sudden spike in connections from multiple microservices. Which solution should the DevOps engineer implement? (Choose THREE.)

131

A company runs a serverless application using AWS Lambda functions behind an Amazon API Gateway. The application processes user uploads stored in an S3 bucket. The Lambda function writes results to a DynamoDB table. Recently, the function started timing out when processing large files. What should the DevOps engineer do to improve resilience for large file processing?

132

A company runs a web application on EC2 instances in an Auto Scaling group across three Availability Zones. The application uses an Application Load Balancer (ALB) and stores session data in an ElastiCache for Redis cluster with cluster mode enabled. During a recent deployment, a new version of the application caused a memory leak in the Redis cluster, leading to out-of-memory errors and evictions. The DevOps team wants to prevent future deployments from affecting the Redis cluster's health. What should the team do? (Choose TWO.)

133

A company runs a critical application on Amazon RDS for MySQL with Multi-AZ deployment. The database is 2 TB in size. The DevOps team needs to perform a major version upgrade (e.g., MySQL 5.7 to 8.0) with minimal downtime. The RTO is 5 minutes and RPO is 1 minute. Which approach should the team take?

134

Your company runs a multi-tier web application on AWS. The web tier consists of EC2 instances behind an Application Load Balancer (ALB) in an Auto Scaling group across three Availability Zones. The application tier runs on a separate Auto Scaling group of EC2 instances that process requests from the web tier. The database tier uses an Amazon RDS for PostgreSQL Multi-AZ deployment. All application servers write logs to Amazon CloudWatch Logs. Recently, the operations team reported that during peak hours, the web tier experiences intermittent 503 errors. The ALB access logs show that the errors occur when the target group's healthy host count drops to zero momentarily. The Auto Scaling group's minimum and desired capacity is 6, with a maximum of 12. The scaling policy is based on average CPU utilization, with a target of 60%. The health check grace period is 300 seconds. The application health check endpoint returns a 200 status when healthy. The DevOps engineer suspects that the scaling policy is too slow to react to traffic spikes. The engineer wants to implement a more proactive scaling approach. Which solution should the engineer implement?

135

A financial services company runs a critical application on Amazon ECS with Fargate launch type. The application consists of three microservices: Service A (frontend), Service B (processing), and Service C (database access). Services communicate via REST APIs. The application stores data in Amazon Aurora PostgreSQL Serverless v2. The company has a disaster recovery (DR) requirement: RTO of 30 minutes and RPO of 15 minutes. The primary region is us-east-1 and the DR region is us-west-2. The DevOps team has configured cross-region replication for the Aurora database using an Aurora Global Database. The ECS services are deployed with a service-linked role for Fargate. The team wants to automate the failover process to meet the RTO. Which solution should the team implement?

136

A startup runs a stateless web application on AWS Elastic Beanstalk with a single environment. The application uses an Amazon RDS for MySQL database instance. The startup is preparing for a marketing campaign that is expected to increase traffic by 10x. The CTO is concerned about the application's ability to handle the load and wants to ensure high availability and resilience. The current architecture has a single RDS instance (db.t3.medium) and a single Elastic Beanstalk environment with one EC2 instance (t3.medium). The startup has a limited budget but wants to improve resilience without over-provisioning. Which combination of actions should the DevOps engineer recommend? (Choose THREE.)

137

A company runs a critical application on AWS Lambda functions that process real-time streaming data from Amazon Kinesis Data Streams. Each Lambda function processes a batch of records and writes results to an Amazon DynamoDB table. The application is sensitive to data loss and requires exactly-once processing semantics. Recently, the operations team observed that the Lambda function is failing intermittently with 'ProvisionedThroughputExceededException' errors from DynamoDB. The Lambda function's batch size is 100, and the function is configured with a reserved concurrency of 500. The DynamoDB table has 100 read capacity units (RCUs) and 100 write capacity units (WCUs) with auto scaling enabled up to 1000 WCUs. The function's execution role has the necessary DynamoDB permissions. The Kinesis stream has 10 shards. The DevOps engineer needs to resolve the throttling errors without losing data. Which combination of actions should the engineer take? (Choose THREE.)

138

A company runs a containerized application on Amazon EKS. The application uses an Application Load Balancer (ALB) as the ingress controller. The DevOps team wants to ensure that the application can automatically recover from node failures. The cluster consists of managed node groups across three Availability Zones. The team noticed that when a node fails, the pods on that node are not rescheduled for several minutes. The team wants to reduce the time to reschedule pods. Which configuration change should the team make?

139

A company is deploying a critical web application on AWS and needs to ensure high availability and disaster recovery across multiple AWS Regions. The application uses an Application Load Balancer (ALB) in the primary Region and an Amazon RDS Multi-AZ DB instance. Which TWO actions should the company take to meet these requirements? (Choose two.)

140

A company runs a stateful web application on Amazon EC2 instances behind an Application Load Balancer (ALB). The application stores session data in local instance memory. To improve resiliency, the company wants to make the application stateless and distribute the load across multiple Availability Zones. Which THREE actions should the company take? (Choose three.)

141

A company runs a production application on Amazon ECS with Fargate, fronted by an Application Load Balancer (ALB). The application experiences periodic latency spikes and occasional 502 errors. The ECS service is configured with a desired count of 2 tasks, and the ALB health check is set to /health with a 30-second interval and 2 consecutive failures threshold. The team uses CloudWatch Container Insights and has noticed that CPU and memory utilization of tasks remain below 50%. However, the ALB TargetGroup's HealthyHostCount metric occasionally drops to 0 for a few minutes before recovering. The deployment strategy is rolling update with a minimum healthy percent of 50% and maximum percent of 200%. The team recently updated the task definition to increase memory and CPU, but the issue persists. What is the MOST likely cause of the problem?

142

A company runs a critical web application on EC2 instances behind an Application Load Balancer. To improve resilience, they want to automatically replace unhealthy instances. Which AWS feature should they use?

143

A company uses Amazon RDS Multi-AZ for disaster recovery. The primary DB instance in us-east-1a fails. What happens next?

144

A company has a stateless web application on EC2 instances behind an ALB. They want to ensure that if an entire Availability Zone fails, the application remains available with minimal impact. Which architecture best meets this requirement?

145

A company runs a stateful application on EC2 instances. They want to distribute traffic evenly and maintain session stickiness. Which AWS service should they use?

146

A company uses AWS CodeDeploy to deploy a new version of an application to EC2 instances. They want to minimize downtime and roll back quickly if the deployment fails. Which deployment type should they use?

147

A company runs a critical microservices architecture on Amazon ECS with Fargate. They want to ensure that if a task fails, it is automatically restarted, and the service remains available across multiple Availability Zones. How should they configure the ECS service?

148

A company runs a global web application on EC2 instances behind an ALB in us-east-1. They want to improve resilience by routing users to the nearest healthy region. Which service should they use?

149

A company uses Amazon DynamoDB with global tables for a multi-region active-active application. They notice that occasionally, concurrent updates to the same item in different regions cause data inconsistency. How can they resolve this?

150

A company experiences intermittent high latency for a web application running on EC2 behind an ALB. They want to monitor and automatically replace instances that have high CPU. Which solution meets this requirement?

151

Which TWO AWS services can be used to distribute incoming traffic across multiple AWS resources in different Availability Zones within a single region?

152

Which THREE strategies can improve the resilience of an Amazon RDS for PostgreSQL database?

153

Which TWO actions can help ensure that an application running on EC2 instances can survive the loss of an entire Availability Zone?

154

A company runs a critical web application on EC2 instances behind an Application Load Balancer. The application stores session state in an in-memory cache on each instance. During deployment of a new version, users experience session timeouts and errors. Which design change will MOST effectively improve resilience and avoid session loss during deployments?

155

A company is designing a multi-Region disaster recovery strategy for a stateless web application. The application runs on EC2 instances in an Auto Scaling group behind an ALB in us-east-1. The recovery point objective (RPO) is 15 minutes and recovery time objective (RTO) is 30 minutes. The application data is stored in Amazon RDS for PostgreSQL. Which combination of actions should the company take to meet the RPO and RTO?

156

A DevOps engineer is designing a resilient architecture for a serverless application using AWS Lambda, Amazon API Gateway, and Amazon DynamoDB. The application experiences occasional spikes in traffic that cause Lambda function throttling and increased error rates. What is the MOST effective way to improve resilience and reduce throttling?

157

A company runs a microservices architecture on Amazon ECS with Fargate. Services communicate via an internal Application Load Balancer. Recently, one service became unavailable due to a memory leak, causing cascading failures in downstream services. What design change would MOST effectively improve resilience and limit the blast radius?

158

A company runs a critical batch processing workload on Amazon EMR that must complete within a 2-hour window each night. The workload is fault-tolerant but must be resilient to instance failures. Currently, the EMR cluster uses instance fleets with Spot Instances. Recently, Spot Instance interruptions caused the cluster to take over 3 hours to complete. Which change will MOST effectively ensure the workload completes within the 2-hour window despite Spot interruptions?

159

A company uses Amazon Route 53 for DNS and wants to ensure high availability for a web application hosted on two EC2 instances in different Availability Zones. The application uses an Application Load Balancer. What is the simplest way to achieve resilience if one Availability Zone becomes unavailable?

160

A company is deploying a stateful application on Amazon EKS. The application requires persistent storage that can be reattached to a new pod if the original pod fails. The cluster spans multiple Availability Zones. Which storage solution provides the BEST resilience and meets these requirements?

161

A company runs a critical application on AWS Lambda that processes messages from an Amazon SQS queue. The application must be resilient to downstream service failures. The team notices that when the downstream service is unhealthy, messages are repeatedly retried and eventually sent to the dead-letter queue (DLQ) before the service recovers. What design change would improve resilience by allowing automatic retries after the downstream service recovers?

162

A company is designing a resilient architecture for a web application using AWS Global Accelerator and two Application Load Balancers in different AWS Regions. The application is stateless and uses a global DynamoDB table for data. What is the primary benefit of using Global Accelerator in this architecture?

163

A company runs a containerized application on Amazon ECS with Fargate. The application experiences intermittent failures due to resource exhaustion. The company wants to improve resilience by automatically replacing unhealthy tasks and scaling based on demand. Which TWO actions should the company take? (Choose TWO.)

164

A company is designing a disaster recovery plan for a MySQL database running on Amazon RDS. The database is critical and must have an RPO of 5 minutes and an RTO of 1 hour. The primary Region is us-east-1, and the DR Region is us-west-2. Which TWO steps should the company take to meet these requirements? (Choose TWO.)

165

A company is deploying a serverless application using AWS Lambda, Amazon API Gateway, and Amazon DynamoDB. The application must be resilient to regional outages. Which THREE steps should the company take to achieve multi-Region resilience? (Choose THREE.)

166

A DevOps engineer ran the above command and saw this output. What is the MOST likely cause of the stack creation failure?

167

A DevOps engineer applies this S3 bucket policy to an S3 bucket. What is the effect of this policy?

168

A DevOps engineer runs the above command and sees that instance i-0abcd1234efgh5678 is unhealthy with reason 'Target.Timeout'. The instance is running and the application on port 80 responds to curl from the instance itself. What is the MOST likely cause?

169

A company is running a stateful web application on EC2 instances behind an Application Load Balancer. During a deployment, users report session timeouts. What should the DevOps engineer implement to ensure zero-downtime deployments without losing in-flight sessions?

170

A financial services company runs a multi-region application on AWS. They need to ensure that if one AWS Region becomes unavailable, traffic is automatically rerouted to another region with no manual intervention. The application uses an Application Load Balancer in each region. What is the MOST resilient approach to meet this requirement?

171

A DevOps engineer is designing a disaster recovery plan for a critical database. The RTO is 15 minutes and RPO is 1 minute. Which solution meets these requirements?

172

An e-commerce platform uses Amazon DynamoDB as its primary database. The platform experiences occasional read throttling during flash sales. The operations team needs to ensure that read traffic is handled without errors, while keeping costs low. What should a DevOps engineer recommend?

173

A company runs a containerized application on Amazon ECS with Fargate launch type. The application experiences intermittent failures when the ECS service scheduler attempts to place tasks during a deployment. The DevOps engineer notices that tasks fail to start due to insufficient IP addresses in the VPC subnets. What is the MOST resilient solution to prevent this issue?

174

A DevOps engineer is designing a highly available web application using Amazon Route 53. The application is deployed in two AWS Regions. The engineer wants to route traffic to the nearest healthy endpoint. Which routing policy should be used?

175

A company uses AWS CloudFormation to deploy infrastructure. They want to ensure that if a stack update fails, the stack is automatically rolled back to the last known good state. However, they also want to preserve any resources that were created successfully before the failure. Which CloudFormation stack policy should be used?

176

An organization runs a critical application on Amazon EC2 instances in an Auto Scaling group behind an Application Load Balancer. The application requires that all traffic be encrypted in transit. The security team mandates the use of TLS 1.2 or higher and specific ciphers. What is the MOST efficient way to enforce this requirement?

177

A company wants to automatically recover an Amazon RDS DB instance if the underlying hardware fails. Which feature should the DevOps engineer enable?

178

Which TWO strategies can be used to improve the resilience of an application running on Amazon ECS with Fargate? (Select TWO.)

179

Which THREE components are required to implement a global application that can withstand the failure of an entire AWS Region? (Select THREE.)

180

Which TWO actions can help protect against accidental deletion of an Amazon S3 bucket? (Select TWO.)

181

A company runs a critical web application on Amazon EC2 instances behind an Application Load Balancer (ALB). The application frequently experiences high latency during peak hours. The DevOps team needs to implement a solution that automatically adds capacity based on demand and reduces cost during off-peak hours. Which combination of AWS services should the team use?

182

A company's DevOps team is designing a multi-region disaster recovery solution for a stateless web application. The application runs on Amazon EC2 instances behind an Application Load Balancer (ALB) in the us-east-1 region. The team needs to fail over to a secondary region (us-west-2) with minimal downtime in case of a regional outage. Which AWS service should the team use to route traffic to the healthy region?

183

A company is running a production database on Amazon RDS for PostgreSQL with Multi-AZ deployment. The database experiences a failover due to an AZ outage. What happens to the existing database connections during the failover?

184

A company runs a critical application on Amazon ECS with Fargate launch type. The application is deployed across multiple Availability Zones. The DevOps team needs to ensure that if an entire Availability Zone fails, the application continues to serve traffic without manual intervention. What should the team do?

185

A company uses AWS Lambda functions to process events from an Amazon SQS queue. The Lambda function occasionally fails due to a transient downstream service error. The DevOps team wants to ensure that failed messages are not lost and can be retried later. The team also wants to reduce the number of invocations on the downstream service. Which configuration should the team use?

186

A company is designing a highly available architecture for a web application. The application runs on Amazon EC2 instances in an Auto Scaling group across three Availability Zones. The instances are behind an Application Load Balancer (ALB). Which additional step should the team take to ensure that traffic is evenly distributed across all healthy instances in all Availability Zones?

187

A company is running a stateful web application on Amazon EC2 instances. The application stores session data locally on the instance. The company wants to make the application stateless and improve resilience. The DevOps team decides to use Amazon ElastiCache for Redis to store session data. What additional step should the team take to ensure that the session data is highly available?

188

A company runs a microservices application on Amazon EKS. The application's frontend service needs to communicate with the backend service. The DevOps team wants to implement service-to-service authentication using AWS IAM. Which method should the team use?

189

A company has an Amazon S3 bucket that stores critical data. The company wants to protect the data from accidental deletion and ensure that even the root user cannot delete the bucket. Which S3 feature should the company enable?

190

A company runs a critical application on Amazon EC2 instances in an Auto Scaling group. The application generates logs that are sent to Amazon CloudWatch Logs. The DevOps team needs to configure a metric filter to monitor for error patterns and trigger an alarm when the error rate exceeds 5% of total requests over a 5-minute period. Which TWO steps should the team take? (Choose TWO.)

191

A company is building a serverless application using AWS Lambda, Amazon API Gateway, and Amazon DynamoDB. The application is expected to have unpredictable traffic patterns. The DevOps team needs to ensure that the application can handle sudden spikes in traffic without throttling. Which TWO actions should the team take? (Choose TWO.)

192

A company is deploying a web application on Amazon ECS with Fargate. The application consists of a frontend service and a backend service. The DevOps team needs to ensure that the frontend service can communicate with the backend service securely without exposing the backend to the internet. Which THREE steps should the team take? (Choose THREE.)

193

An AWS account owner (Account A) owns an S3 bucket named my-bucket. The bucket policy shown in the exhibit is attached to the bucket. A user from Account B attempts to upload an object to the bucket without specifying the x-amz-acl header. What will happen?

194

A DevOps engineer runs the above command and sees that one target is unhealthy with reason 'Target.Timeout'. The target is an EC2 instance running a web server on port 80. The security group for the instance allows inbound traffic on port 80 from the ALB's security group. What is the most likely cause of the health check failure?

195

A DevOps team uses the above CloudFormation template to create an S3 bucket. What does the bucket policy accomplish?

196

A company runs a web application on EC2 instances behind an ALB. To improve resilience, they want to automatically replace failed instances and maintain a minimum number of instances. Which AWS service should be used?

197

A company's production database on Amazon RDS Multi-AZ DB instance experienced a failover. The application experienced a brief outage. How can the company reduce the failover time?

198

A company uses AWS Lambda functions to process events from Amazon SQS. The Lambda function sometimes fails due to timeouts. The team wants to preserve the event for reprocessing. How should they configure the integration?

199

A company wants to design a disaster recovery solution for its primary AWS Region. The solution should have a Recovery Point Objective (RPO) of a few seconds and a Recovery Time Objective (RTO) of a few minutes. Which strategy meets these requirements?

200

An application on EC2 instances in an Auto Scaling group uses an ALB. The ALB health checks are failing for some instances, but the instances are healthy from the OS perspective. What is the most likely cause?

201

A company is building a global application that requires low-latency access to static content across multiple AWS Regions. The content changes infrequently. Which solution is MOST resilient and cost-effective?

202

A company wants to ensure its Amazon RDS DB instance is highly available with automatic failover in case of an AZ failure. Which configuration should they use?

203

A company runs a stateful application on EC2 instances. The application stores session data locally. The instances are behind an ALB with sticky sessions enabled. A scaling event terminates an instance, causing loss of session data. How can the company prevent this while maintaining performance?

204

A company's application on Amazon ECS experiences intermittent failures when the task attempts to access an S3 bucket. The task role has the correct S3 permissions. What is the most likely cause?

205

A company is designing a resilient architecture for a critical application. Which TWO strategies improve resilience?

206

A company runs a microservices architecture on Amazon ECS. They want to ensure that if a service fails, it does not cascade to other services. Which TWO design patterns should they implement?

207

A company wants to protect its application from DDoS attacks. Which THREE AWS services should they use?

208

A company is deploying a critical application on Amazon EC2 instances behind an Application Load Balancer (ALB) across multiple Availability Zones. The application must be resilient to the failure of an entire Availability Zone. Which design should the company implement?

209

A DevOps team is designing a disaster recovery solution for an Amazon RDS for MySQL database. The primary database is in us-east-1, and the recovery point objective (RPO) is 5 minutes, recovery time objective (RTO) is 1 hour. Which solution meets these requirements?

210

A company runs a stateless web application on Amazon ECS with Fargate launch type. The application experiences intermittent traffic spikes. The company wants to ensure that the application can scale automatically and remain resilient to underlying infrastructure failures. Which combination of actions should the DevOps engineer take?

211

A company is using Amazon S3 to store critical data. The company requires that all versions of objects be retained, including deleted objects, to meet compliance requirements. Which S3 feature should be enabled?

212

A company has a production environment that uses Amazon Route 53 for DNS and an Application Load Balancer (ALB) to distribute traffic to EC2 instances. The company wants to implement a disaster recovery plan that automatically fails over to a secondary region in case the primary region becomes unavailable. Which configuration should be used?

213

A company runs a critical microservice on Amazon ECS with AWS Fargate. The service must be highly available across multiple Availability Zones. The DevOps engineer configured the service with a desired count of 4 tasks spread across 2 Availability Zones. During a deployment, a new task fails to start due to a missing environment variable. The deployment fails, but the old tasks continue to run. What is the most likely cause of the deployment failure and how can the engineer ensure future deployments are resilient?

214

A company uses Amazon DynamoDB as the database for a mobile application. The application requires single-digit millisecond read and write latency and must be resilient to the failure of an entire AWS Region. Which DynamoDB feature should the company use?

215

A company runs a web application on AWS that uses Amazon SQS to decouple the frontend from the backend processing. The application experiences sudden spikes in traffic, causing the SQS queue to accumulate a large number of messages. The backend workers are unable to process messages fast enough, leading to increased latency. What solution can the company implement to improve the resilience and scalability of the backend?

216

A company is implementing a disaster recovery strategy for its Amazon Aurora MySQL database. The primary database is in us-west-2. The company requires an RPO of less than 1 minute and an RTO of less than 5 minutes. Which solution meets these requirements?

217

A company is designing a highly available architecture for a web application that uses Amazon EC2 instances. The application must be resilient to the failure of a single instance and a single Availability Zone. Which TWO actions should the company take? (Choose TWO.)

218

A company is using AWS CloudFormation to deploy a critical application stack. The company wants to ensure that the stack can be recovered quickly in case of a failure. Which THREE strategies should the company implement? (Choose THREE.)

219

A company is designing a disaster recovery plan for an Amazon S3 data lake. The data lake stores sensitive data that must be replicated to a secondary Region with an RPO of 15 minutes. Which THREE actions should the company take? (Choose THREE.)

220

Refer to the exhibit. A DevOps engineer applies the IAM policy shown to an S3 bucket to enforce server-side encryption. However, users report that some uploads succeed without encryption. What is the most likely reason?

221

222

Refer to the exhibit. A DevOps engineer runs the describe-target-health command and receives the output shown. The ALB target group has two instances. One instance is healthy, and the other is unhealthy with a 502 error. What is the most likely cause of the 502 error?

223

A company runs a stateless web application on EC2 instances behind an Application Load Balancer. To improve resilience, which configuration should be used for the EC2 instances?

224

A company is designing a disaster recovery strategy for a critical application that requires a Recovery Time Objective (RTO) of 15 minutes and a Recovery Point Objective (RPO) of 1 hour. The application runs on EC2 with data stored in Amazon RDS Multi-AZ. Which approach meets these requirements?

225

A company runs a critical application on EC2 instances in an Auto Scaling group across three Availability Zones. The application uses an Amazon RDS Multi-AZ DB instance. During a recent incident, one Availability Zone experienced a complete failure. The application remained available, but performance degraded significantly. What is the most likely cause of the degradation?

226

A company wants to ensure that its application can recover from an Amazon S3 service disruption. The application reads and writes data to S3. Which strategy should the application implement to achieve resilience?

227

A company is designing a serverless application using AWS Lambda, Amazon API Gateway, and Amazon DynamoDB. The application must tolerate a Regional failure. Which design provides the most resilience?

228

A company runs a stateful web application on EC2 instances in an Auto Scaling group. The application uses an Application Load Balancer (ALB) and an Amazon ElastiCache Redis cluster. Users report that after a scaling event, they are logged out and lose session data. What is the most likely cause?

229

A company uses Amazon Route 53 to route traffic to an Application Load Balancer. They want to improve availability by routing traffic to multiple ALBs in different AWS Regions. Which routing policy should they use?

230

A company's application runs on Amazon ECS with Fargate launch type. The application must be resilient to an Availability Zone failure. Which configuration should be used?

231

A company runs a critical application on EC2 instances behind an Application Load Balancer. The application uses an Amazon RDS for PostgreSQL Multi-AZ DB instance. During a recent failover test, the application experienced a 5-minute downtime. The RDS failover completed within 30 seconds. What is the most likely cause of the prolonged downtime?

232

A company is designing a highly available architecture for a web application using AWS. Which TWO of the following design principles should be applied? (Select TWO.)

233

A company is designing a disaster recovery plan for an application running on AWS. The plan must meet an RTO of 1 hour and an RPO of 15 minutes. Which TWO strategies can achieve these objectives? (Select TWO.)

234

A company is migrating a monolithic application to a microservices architecture on AWS. To improve resilience, which THREE design patterns should be implemented? (Select THREE.)

235

A company runs a critical web application on EC2 instances in an Auto Scaling group. The application uses an Application Load Balancer (ALB) with health checks pointing to /health. Recently, the application experienced intermittent failures where the ALB would mark instances as unhealthy and route traffic away, causing a reduction in capacity. The development team noticed that the /health endpoint occasionally returns HTTP 503 when the application is under heavy load, but the application can recover quickly. The team wants to avoid unnecessary instance replacements while ensuring availability. Which solution should the DevOps engineer implement?

236

A company has deployed a multi-tier application on AWS. The web tier uses an Auto Scaling group of EC2 instances behind an Application Load Balancer. The application tier uses another Auto Scaling group of EC2 instances that process messages from an Amazon SQS queue. The database tier uses Amazon RDS Multi-AZ. Recently, the application experienced a complete outage when the SQS queue became overwhelmed with messages due to a sudden spike in traffic. The application tier could not process messages fast enough, causing the queue to grow indefinitely and eventually exceed the visibility timeout, leading to message loss and degraded performance. The DevOps engineer needs to improve the resilience of the architecture to handle traffic spikes without losing messages. Which solution should be implemented?

237

A company runs a critical e-commerce application on Amazon EC2 instances behind an Application Load Balancer (ALB) with Auto Scaling. The application must be resilient to an Availability Zone (AZ) failure. What is the MOST resilient configuration?

238

A company uses AWS Lambda with Amazon DynamoDB to process orders. During peak hours, the Lambda function sometimes fails with throttling errors from DynamoDB. The system must be resilient and cost-effective. What should a DevOps engineer do?

239

A DevOps team is designing a disaster recovery plan for an RDS MySQL database. The database must be recoverable with minimal data loss in case of a regional failure. Which solution provides the LOWEST Recovery Point Objective (RPO)?

240

A company runs a stateless web application on Amazon ECS with Fargate. The application must be highly available across multiple Availability Zones. What is the BEST way to achieve this?

241

A company's application runs on Amazon EC2 instances in an Auto Scaling group. The application writes logs to local instance storage. The operations team needs to ensure logs are not lost during instance termination or scaling events. What should be done?

242

A company uses Amazon Route 53 for DNS. They want to ensure that if their primary website endpoint fails, traffic is automatically routed to a secondary endpoint in a different Region. Which routing policy should be used?

243

A company is building a serverless application using AWS Lambda, Amazon API Gateway, and Amazon DynamoDB. The application must be resilient to sudden spikes in traffic without manual intervention. Which combination of services should be used?

244

A company is designing a resilient architecture for a web application that uses Amazon RDS for MySQL. The application must be able to withstand the loss of an entire AWS Region. Which TWO actions should the company take?

245

A company runs a containerized application on Amazon EKS. The application must be highly available across multiple Availability Zones and must automatically recover from node failures. Which THREE steps should be taken?

246

A company wants to ensure that its Amazon S3 bucket is resilient to accidental deletion of objects. Which TWO actions should be taken?

247

Refer to the exhibit. An IAM policy is attached to an IAM role used by an EC2 instance. The instance is part of an Auto Scaling group. During a scale-in event, the instance fails to stop itself. What is the MOST likely cause?

248

A company runs a high-traffic web application on a fleet of EC2 instances behind an Application Load Balancer (ALB) with Auto Scaling. The application uses an Amazon RDS for PostgreSQL database. Recently, during a traffic spike, the application became unresponsive. Investigation revealed that the database CPU utilization reached 100%, causing queries to timeout. The Auto Scaling group added more EC2 instances, which only increased the load on the database. The DevOps team needs to implement a solution that prevents the database from being overwhelmed during traffic spikes while maintaining application availability. The solution must be cost-effective and require minimal changes to the application code. Which solution should the DevOps team implement?

249

A company hosts a static website on Amazon S3 with a CloudFront distribution. The website is critical for business operations and must be available even if the primary AWS Region fails. Currently, the S3 bucket is in us-east-1, and CloudFront uses that bucket as the origin. The company has a secondary bucket in us-west-2 with a replica of the data. The company wants to use CloudFront to automatically fail over to the secondary bucket if the primary becomes unavailable. The DevOps engineer needs to implement a solution that requires minimal operational overhead. What should the engineer do?

250

A company uses AWS CloudFormation to deploy infrastructure. During a recent deployment, the stack failed to create an Amazon RDS DB instance because of a parameter validation error. The DevOps engineer fixed the parameter and wants to resume the stack creation without recreating the resources that were already successfully created. The stack template is parameterized and uses nested stacks. What is the MOST efficient way to resume the stack creation?

251

A company runs a microservices application on Amazon ECS with Fargate. The application uses an Application Load Balancer (ALB) to route traffic to services. Each service has a required number of tasks for capacity. The company recently experienced a prolonged outage when a bug caused all tasks of the critical 'payment' service to crash simultaneously. The DevOps team needs to implement a deployment strategy that reduces the risk of a full service outage during updates. The strategy must also allow for quick rollback if a deployment fails. Which deployment strategy should the team implement?

252

A media company runs a video processing pipeline on AWS. Raw videos are uploaded to an S3 bucket, which triggers a Lambda function to start an AWS Batch job for transcoding. The Batch job reads the source video from S3, processes it, and writes the output to another S3 bucket. Recently, the company has seen an increase in processing failures. Investigation shows that the Batch jobs are being terminated with a 'TIMEOUT' status after running for exactly 30 minutes. The video files are large, and some jobs legitimately take up to 45 minutes. The Batch job definition has a 'timeout' setting configured. Which action should be taken to resolve this issue?

253

A financial services company runs a critical application on Amazon ECS with Fargate launch type. The application has strict availability requirements and must survive an Availability Zone failure. The ECS service is configured with a desired count of 4 tasks, spread across two Availability Zones using a spread strategy. The service is fronted by an Application Load Balancer. During a recent AZ outage, one AZ became completely unavailable, but the application continued to serve traffic. However, after the AZ recovered, the ECS service did not automatically place new tasks in the recovered AZ to restore the desired count. The service remains with only 2 tasks in the remaining AZ. What is the most likely cause and solution?

254

A startup runs a web application on EC2 instances behind an Application Load Balancer. They want to improve resilience by distributing instances across multiple Availability Zones. Currently, all instances are in us-east-1a. They create a launch template and an Auto Scaling group with a desired capacity of 2. They configure the Auto Scaling group to use two subnets: one in us-east-1a and one in us-east-1b. However, after updating, all instances remain in us-east-1a. What is the most likely reason?

255

A company runs a multi-tier web application on AWS. The application consists of an Application Load Balancer, EC2 instances in an Auto Scaling group, and an Amazon RDS Multi-AZ DB instance. The application experiences intermittent failures when the RDS primary instance fails over to the standby. The engineer needs to ensure that the application handles failover gracefully without manual intervention.

256

A gaming company runs a real-time multiplayer game on AWS using Amazon EC2 instances in an Auto Scaling group behind a Network Load Balancer. The game state is stored in Amazon ElastiCache for Redis. The team needs to ensure that the architecture can survive a regional failure with minimal data loss and recovery time. The RTO is 15 minutes and RPO is 5 minutes. The game currently uses a single Redis cluster in us-east-1.

257

A company runs a static website on Amazon S3 with public read access. The website content is stored in an S3 bucket and served through an Amazon CloudFront distribution for better performance and security. Recently, the company noticed that some users are accessing the S3 bucket directly via the S3 endpoint, bypassing CloudFront. This increases costs and exposes the bucket to potential attacks. The company wants to ensure that all access to the website goes through CloudFront only. Which solution should the company implement?

258

A company runs a critical application on Amazon ECS with the Fargate launch type. The application is deployed across three Availability Zones. Each service has its own Application Load Balancer. The company wants to implement a blue/green deployment strategy to reduce risk. They currently use AWS CodeDeploy for ECS deployments. During a recent deployment, the company noticed that the new version (green) was not receiving any traffic even after passing all health checks. The CodeDeploy configuration uses a 'Linear10PercentEvery3Minutes' traffic shifting configuration. What is the most likely reason that the green tasks are not receiving traffic?

259

A company runs a serverless application using AWS Lambda functions that process messages from an Amazon SQS queue. The function scales up to handle high traffic but sometimes experiences throttling errors (HTTP 429) from Lambda. The company wants to improve the resilience of the application by reducing throttling. The SQS queue is configured as a Lambda event source with a batch size of 10. The Lambda function has a reserved concurrency of 100. Which combination of actions will best reduce throttling? (Choose the single best answer.)

Practice all 259 Resilient Cloud Solutions questions

Other DOP-C02 exam domains

Configuration Management and IaC Monitoring and Logging Incident and Event Response Security and Compliance SDLC Automation

Frequently asked questions

What does the Resilient Cloud Solutions domain cover on the DOP-C02 exam?

The Resilient Cloud Solutions domain covers the key concepts tested in this area of the DOP-C02 exam blueprint published by Amazon Web Services. Courseiva provides free domain-focused practice, mock exams, missed-question review, and readiness tracking across all DOP-C02 domains — no account required.

How many Resilient Cloud Solutions questions are in the DOP-C02 question bank?

The Courseiva DOP-C02 question bank contains 259 questions in the Resilient Cloud Solutions domain. Click any question to see the full explanation and answer breakdown.

What is the best way to practice Resilient Cloud Solutions for DOP-C02?

Start with a 10-question focused session to identify your baseline accuracy in this domain. Read every explanation — even for questions you answer correctly — to understand the reasoning. Once you score consistently above 80%, move to a 20–30 question session to confirm depth before moving to the next domain.

Can I practice only Resilient Cloud Solutions questions for DOP-C02?

Yes — the session launcher on this page draws questions exclusively from the Resilient Cloud Solutions domain. Choose 10, 20, 30, or 50 questions for a focused session, or click individual questions to review them one by one.

Free forever · No credit card required

Track your DOP-C02 domain progress

Save your results, see per-domain analytics, and get readiness scores — free, for every certification.

Free forever · Every certification included