Courseiva
Knowledge + Practice
CertificationsVendorsCareer RoadmapsLabs & ToolsStudy GuidesGlossaryPractice Questions
C
Courseiva

Free IT certification practice questions with explained answers for CCNA, CompTIA, AWS, Azure, Google Cloud, and more.

Certification Practice Questions

CCNA practice questionsSecurity+ SY0-701 practice questionsAWS SAA-C03 practice questionsAZ-104 practice questionsAZ-900 practice questionsCLF-C02 practice questionsA+ Core 1 practice questionsGoogle Cloud ACE practice questionsCySA+ CS0-003 practice questionsNetwork+ N10-009 practice questions
View all certifications →

Product

CertificationsCertification PathsExam TopicsPractice TestsExam Dumps vs Practice TestsStudy HubComparisons

Company

AboutContactEditorial PolicyQuestion Writing PolicyTrust Center

Legal

Privacy PolicyTerms of Service

Courseiva is a free IT certification practice platform offering original exam-style practice questions, detailed explanations, topic-based practice, mock exams, readiness tracking, and study analytics for Cisco, CompTIA, Microsoft, AWS, and other technology certifications.

© 2026 Courseiva. Courseiva is operated by JTNetSolutions Ltd. All rights reserved.

Courseiva is an independent certification practice platform and is not affiliated with, endorsed by, or sponsored by Cisco, Microsoft, AWS, CompTIA, Google, ISC2, ISACA, or any other certification vendor. Vendor names and certification marks are used only to identify the exams learners are preparing for.

← Resilient Cloud Solutions practice sets

DOP-C02 Resilient Cloud Solutions • Complete Question Bank

DOP-C02 Resilient Cloud Solutions — All Questions With Answers

Complete DOP-C02 Resilient Cloud Solutions question bank — all 0 questions with answers and detailed explanations.

259
Questions
Free
No signup
Certifications/DOP-C02/Practice Test/Resilient Cloud Solutions/All Questions
Question 1mediummultiple choice
Read the full Resilient Cloud Solutions explanation →

A company runs a critical web application on EC2 instances behind an Application Load Balancer (ALB) with Auto Scaling. During a recent traffic spike, the application became unavailable for 10 minutes. Analysis shows that the ALB's healthy host count dropped to zero because the instances failed health checks due to high CPU load. What is the MOST effective design change to improve resilience during future traffic spikes?

Question 2hardmultiple choice
Read the full Resilient Cloud Solutions explanation →

A company uses DynamoDB global tables in two AWS Regions with strong consistency reads. They observe occasional write conflicts that are not being resolved automatically. The application uses DynamoDBMapper with optimistic locking. What should the DevOps engineer do to ensure conflict resolution?

Question 3easymultiple choice
Read the full Resilient Cloud Solutions explanation →

A company's application runs on EC2 instances in a single Availability Zone. The operations team wants to improve resilience without redesigning the application. Which action is the MOST effective?

Question 4mediummultiple choice
Read the full NAT/PAT explanation →

A company uses a third-party backup solution to back up its EC2 instances daily. The backups are stored in an S3 bucket with default settings. The company wants to ensure that backups are protected from accidental deletion and are available for at least one year. Which combination of S3 features should the DevOps engineer implement?

Question 5hardmultiple choice
Read the full Resilient Cloud Solutions explanation →

A company runs a stateful web application on EC2 instances behind a Network Load Balancer (NLB) in a single Availability Zone. The application stores session state locally on the instance. The company wants to achieve high availability across multiple AZs with minimal application changes. What should the DevOps engineer do?

Question 6easymultiple choice
Read the full Resilient Cloud Solutions explanation →

A company's DevOps team is designing a disaster recovery plan for a critical application. The application runs on EC2 instances with an RDS MySQL database. The Recovery Time Objective (RTO) is 15 minutes, and the Recovery Point Objective (RPO) is 1 hour. Which approach BEST meets these requirements?

Question 7mediummultiple choice
Read the full Resilient Cloud Solutions explanation →

A company's application uses Amazon SQS to decouple microservices. During peak hours, the SQS queue backlog grows significantly, causing processing delays. The DevOps team wants to reduce latency without increasing costs unnecessarily. What should the team do?

Question 8mediummulti select
Read the full Resilient Cloud Solutions explanation →

A company runs a microservices application on Amazon ECS with Fargate. The application includes a service that processes orders and stores them in an RDS PostgreSQL database. The company wants to ensure that the order service is resilient to AZ failures and can handle a sudden increase in order volume. Which TWO actions should the DevOps engineer take? (Choose TWO.)

Question 9hardmulti select
Read the full Resilient Cloud Solutions explanation →

A company's application uses Amazon DynamoDB as its primary data store. The application experiences occasional throttling errors during traffic spikes. The DevOps team needs to implement a solution that ensures consistent performance without manual intervention. Which TWO actions should the team take? (Choose TWO.)

Question 10easymulti select
Read the full Resilient Cloud Solutions explanation →

A company wants to design a highly available web application using AWS services. The application must be resilient to the failure of an entire AWS Region. Which THREE components should the architecture include? (Choose THREE.)

Question 11hardmultiple choice
Read the full NAT/PAT explanation →

A company runs a critical e-commerce platform on AWS. The architecture includes an Application Load Balancer (ALB) that distributes traffic to a fleet of EC2 instances in an Auto Scaling group across three Availability Zones. The instances run a Java application that connects to an Amazon RDS Multi-AZ MySQL database. The application also uses Amazon ElastiCache for Redis for session caching. The company recently experienced a severe outage where the ALB's 5xx error rate spiked to 100% for 45 minutes. The root cause was a combination of a slow-running query on the RDS primary instance and a subsequent failover that caused the application to lose connections to the database. The failover happened because the slow query caused the primary to become unresponsive, triggering a Multi-AZ failover. During the failover, the application's connection pool exhausted, and new connections failed. The application logs show a high rate of 'java.sql.SQLTimeoutException' and 'com.mysql.cj.exceptions.CJCommunicationsException'. The DevOps team needs to implement a long-term solution that minimizes the impact of similar incidents. The solution must be cost-effective and require minimal application changes. Which combination of actions should the DevOps team take?

Question 12mediummultiple choice
Read the full NAT/PAT explanation →

A company runs a critical web application on Amazon EC2 instances behind an Application Load Balancer (ALB). The application uses an Amazon RDS for MySQL Multi-AZ DB instance for data storage. During an AWS infrastructure event, the primary Availability Zone (AZ) becomes unavailable, and the application experiences downtime. The RDS Multi-AZ failover completes automatically, but the application takes several minutes to reconnect. Which combination of actions would MOST reduce the recovery time for the application during such an event?

Question 13hardmultiple choice
Read the full Resilient Cloud Solutions explanation →

A company is designing a disaster recovery (DR) strategy for a stateless web application deployed on Amazon ECS with Fargate. The application is fronted by an Application Load Balancer (ALB) and uses Amazon ElastiCache for Redis for session state. The primary region is us-east-1. The DR plan requires a Recovery Point Objective (RPO) of 15 minutes and a Recovery Time Objective (RTO) of 30 minutes. Which solution meets these requirements with the LEAST operational overhead?

Question 14easymultiple choice
Read the full Resilient Cloud Solutions explanation →

A development team wants to ensure that their application can continue serving traffic even if an entire AWS Availability Zone (AZ) becomes unavailable. The application runs on Amazon EC2 instances in an Auto Scaling group and uses an Application Load Balancer (ALB). Which configuration should the team implement to meet this requirement?

Question 15hardmultiple choice
Read the full Resilient Cloud Solutions explanation →

A company runs a containerized microservices application on Amazon EKS. The application includes a critical service that processes real-time financial transactions. This service must be highly available and resilient to node failures. The current setup uses a Deployment with 3 replicas and a ClusterIP service. During a recent node failure, the application experienced a brief period of unavailability. Which action should the DevOps engineer take to improve resilience without changing the underlying infrastructure?

Question 16mediummulti select
Read the full Resilient Cloud Solutions explanation →

A company is building a multi-tier web application on AWS. The application must be resilient to the failure of an entire Availability Zone. The architecture includes an Application Load Balancer (ALB), EC2 instances in an Auto Scaling group, and an Amazon RDS for MySQL database. Which TWO actions should be taken to achieve this resilience? (Choose two.)

Question 17hardmulti select
Read the full Resilient Cloud Solutions explanation →

A company runs a critical application on AWS using Amazon EC2 instances in an Auto Scaling group, an Application Load Balancer (ALB), and an Amazon RDS for PostgreSQL Multi-AZ DB cluster. The application must maintain an RTO of 5 minutes and an RPO of 1 second for database transactions. The current setup meets these requirements, but the DevOps team wants to improve the resilience of the application tier to withstand a regional failure. Which THREE actions should be taken? (Choose three.)

Question 18hardmultiple choice
Read the full Resilient Cloud Solutions explanation →

A company runs a production e-commerce platform on AWS. The architecture includes an Application Load Balancer (ALB) that distributes traffic to a fleet of Amazon EC2 instances running in an Auto Scaling group across three Availability Zones (AZs). The application stores session state in Amazon ElastiCache for Redis (cluster mode disabled) with a single node. The database is an Amazon Aurora MySQL DB cluster with one writer and two reader instances in different AZs. The platform experiences intermittent slowdowns and occasional timeouts during peak traffic hours. The CloudWatch metrics show that the ALB's TargetResponseTime is elevated, and the Redis CPU utilization is consistently above 80% during these periods. The Auto Scaling group is scaling out, but new instances take several minutes to become healthy. The DevOps team has been asked to improve the resilience and performance of the application with minimal changes to the application code. Which solution should the team implement?

Question 19mediummulti select
Read the full NAT/PAT explanation →

A company runs a critical web application on Amazon EC2 instances behind an Application Load Balancer (ALB) across multiple Availability Zones. The application stores session data in a shared Amazon ElastiCache for Redis cluster. The operations team reports that during a recent AZ failure, users experienced session loss and application errors. Which combination of actions should the company take to improve resilience and maintain session state during an AZ failure? (Choose TWO.)

Question 20hardmultiple choice
Read the full Resilient Cloud Solutions explanation →

An AWS Lambda function that processes sensitive data writes objects to an S3 bucket. The security team requires that all objects be encrypted at rest using SSE-S3. The Lambda execution role uses the above IAM policy. Despite the policy, some objects are uploaded without server-side encryption. What is the most likely cause?

Exhibit

Refer to the exhibit.
```
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:PutObject",
        "s3:DeleteObject"
      ],
      "Resource": "arn:aws:s3:::my-bucket/*",
      "Condition": {
        "StringEquals": {
          "s3:x-amz-server-side-encryption": "AES256"
        }
      }
    },
    {
      "Effect": "Deny",
      "Action": "s3:PutObject",
      "Resource": "arn:aws:s3:::my-bucket/*",
      "Condition": {
        "StringNotEquals": {
          "s3:x-amz-server-side-encryption": "AES256"
        }
      }
    }
  ]
}
```
Question 21easymultiple choice
Read the full Resilient Cloud Solutions explanation →

A company runs a stateless web application on EC2 instances in an Auto Scaling group across three Availability Zones. The application uses an Application Load Balancer. The operations team needs to ensure that the application remains available if one AZ fails. Which solution is MOST resilient?

Question 22mediumdrag order
Read the full Resilient Cloud Solutions explanation →

Drag and drop the steps to troubleshoot a failed deployment in AWS CodeDeploy into the correct order.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps
Order
1Step 1
2Step 2
3Step 3
4Step 4
5Step 5
Question 23mediumdrag order
Review the full routing breakdown →

Drag and drop the steps to perform a disaster recovery failover from a primary region to a secondary region using AWS Route 53 and RDS.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps
Order
1Step 1
2Step 2
3Step 3
4Step 4
5Step 5
Question 24mediummatching
Read the full Resilient Cloud Solutions explanation →

Match each AWS CloudFormation concept to its description.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts
Matches

Collection of AWS resources managed as a single unit

JSON or YAML document describing AWS resources

Preview of changes before applying to a stack

Enables stack creation across multiple accounts and regions

Identifies differences between stack and actual resource configurations

Question 25mediummatching
Read the full Resilient Cloud Solutions explanation →

Match each AWS CLI command to its function.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts
Matches

Deploys a CloudFormation stack from a template

Syncs directories and S3 buckets

Retrieves information about EC2 instances

Updates the code of a Lambda function

Starts a new build project run

Question 26mediummultiple choice
Read the full Resilient Cloud Solutions explanation →

A company runs a critical web application on EC2 instances behind an Application Load Balancer (ALB). The application stores session state in an Amazon DynamoDB table. During a recent traffic spike, users experienced session timeouts and the application became unavailable. Which design change would BEST improve resilience?

Question 27hardmultiple choice
Read the full Resilient Cloud Solutions explanation →

A company runs a stateless web application on AWS Lambda behind an Application Load Balancer (ALB). During a deployment, the team updates the Lambda function to a new version. Some users report seeing the old version of the application for several minutes after the deployment. What is the MOST likely cause?

Question 28easymultiple choice
Read the full Resilient Cloud Solutions explanation →

A company is designing a disaster recovery strategy for its primary RDS for PostgreSQL database in us-east-1. The RTO is 15 minutes and RPO is 1 minute. Which solution meets these requirements?

Question 29mediummultiple choice
Read the full Resilient Cloud Solutions explanation →

A company runs a microservices application on Amazon ECS with Fargate launch type. The application experiences intermittent failures when calling an external API. The errors are transient and usually resolve within a few seconds. How should the company improve resilience?

Question 30hardmultiple choice
Read the full Resilient Cloud Solutions explanation →

A company uses AWS CloudFormation to deploy a multi-tier application. During an update, the stack fails and rolls back. The rollback also fails, leaving the stack in UPDATE_ROLLBACK_FAILED state. The operations team needs to resolve this with minimal disruption. What is the MOST efficient approach?

Question 31easymultiple choice
Read the full Resilient Cloud Solutions explanation →

A company wants to ensure that its Amazon S3 bucket can withstand the loss of an entire AWS Availability Zone. Which configuration meets this requirement?

Question 32mediummultiple choice
Read the full Resilient Cloud Solutions explanation →

A company runs a stateful application on EC2 instances in an Auto Scaling group. The application stores state on local instance storage. During a scaling event, users lose session data. How can the company make the application resilient without modifying the application code?

Question 33hardmultiple choice
Review the full routing breakdown →

A company uses Amazon Route 53 with a failover routing policy to direct traffic to an active and a standby endpoint. The health checks are configured to check the active endpoint every 10 seconds. During a recent outage, the failover took over 3 minutes to detect and switch. How can the company improve the failover time to under 1 minute?

Question 34easymultiple choice
Read the full Resilient Cloud Solutions explanation →

A company runs a critical application on Amazon EC2 instances in an Auto Scaling group. To ensure high availability, the instances are deployed across three Availability Zones. Which additional step should the company take to protect against a regional failure?

Question 35mediummulti select
Read the full Resilient Cloud Solutions explanation →

A company is designing a highly available architecture for a web application using AWS services. The application must be resilient to the failure of an entire AWS Region. Which TWO strategies should the company implement? (Choose TWO.)

Question 36hardmulti select
Read the full Resilient Cloud Solutions explanation →

A company runs a containerized application on Amazon ECS with Fargate. The application needs to be resilient to Availability Zone failures. Which THREE actions should the company take? (Choose THREE.)

Question 37easymulti select
Read the full Resilient Cloud Solutions explanation →

A company is implementing a disaster recovery plan for its on-premises database using AWS. The plan must have a Recovery Time Objective (RTO) of 2 hours and a Recovery Point Objective (RPO) of 15 minutes. Which TWO AWS services should the company use? (Choose TWO.)

Question 38hardmultiple choice
Read the full NAT/PAT explanation →

Refer to the exhibit. An IAM policy is attached to an IAM role used by an EC2 instance to manage other EC2 instances. The operations team reports that the instance can start and stop other instances but cannot terminate them. However, they also notice that the instance cannot describe instances in any region other than us-east-1. What is the reason for this behavior?

Exhibit

Refer to the exhibit.

```
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "ec2:DescribeInstances",
        "ec2:StartInstances",
        "ec2:StopInstances"
      ],
      "Resource": "*"
    },
    {
      "Effect": "Deny",
      "Action": "ec2:TerminateInstances",
      "Resource": "arn:aws:ec2:us-east-1:123456789012:instance/*"
    }
  ]
}
```
Question 39mediummultiple choice
Read the full Resilient Cloud Solutions explanation →

Refer to the exhibit. An Auto Scaling group is configured with an Application Load Balancer. The group has a desired capacity of 2 instances spread across two Availability Zones. Recently, the application has been experiencing high error rates during deployments. The team suspects that new instances are being marked as healthy before they are fully ready. What should the team do to resolve this issue?

Network Topology
$ aws autoscaling describe-auto-scaling-groupsauto-scaling-group-name my-asgRefer to the exhibit.```"AutoScalingGroups": ["AutoScalingGroupName": "my-asg","MinSize": 1,"MaxSize": 5,"DesiredCapacity": 2,"AvailabilityZones": ["us-east-1a", "us-east-1b"],"LoadBalancerNames": ["my-alb"],"HealthCheckType": "EC2","HealthCheckGracePeriod": 300,"CreatedTime": "2023-01-01T00:00:00Z"
Question 40hardmultiple choice
Review the full subnetting walkthrough →

Refer to the exhibit. A Lambda function uses the IAM role with the above policy. The function is configured to access a DynamoDB table MyTable and an RDS instance in a VPC. When invoked, the function fails with an error indicating it cannot describe VPC subnets. What is the MOST likely cause?

Exhibit

Refer to the exhibit.

```
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "AllowVPCAccess",
      "Effect": "Allow",
      "Action": [
        "ec2:CreateNetworkInterface",
        "ec2:DescribeNetworkInterfaces",
        "ec2:DeleteNetworkInterface"
      ],
      "Resource": "*"
    },
    {
      "Sid": "AllowWriteToTable",
      "Effect": "Allow",
      "Action": [
        "dynamodb:PutItem",
        "dynamodb:UpdateItem"
      ],
      "Resource": "arn:aws:dynamodb:us-east-1:123456789012:table/MyTable"
    }
  ]
}
```
Question 41easymultiple choice
Read the full Resilient Cloud Solutions explanation →

A company runs a web application on EC2 instances behind an Application Load Balancer. The application experiences intermittent failures due to a single Availability Zone failing. Which solution is MOST resilient and cost-effective?

Question 42mediummultiple choice
Review the full routing breakdown →

A DevOps engineer is designing a multi-Region active-active architecture for a stateless web application using Route 53 latency-based routing and DynamoDB global tables. The application must continue to serve traffic even if an entire AWS Region becomes unavailable. Which additional step is MOST critical for resilience?

Question 43hardmultiple choice
Read the full Resilient Cloud Solutions explanation →

A company uses AWS CloudFormation to deploy a multi-tier application. The stack includes an RDS DB instance with Multi-AZ enabled. The database experiences a failover during maintenance. The application reports connection errors for several minutes. What is the MOST likely cause and solution?

Question 44easymultiple choice
Read the full Resilient Cloud Solutions explanation →

A company runs a critical batch processing job on Amazon ECS using Fargate. The job must complete within 2 hours. If the job fails, it must be retried automatically up to 3 times. Which solution meets these requirements?

Question 45mediummultiple choice
Read the full Resilient Cloud Solutions explanation →

A DevOps team is designing a disaster recovery plan for a production RDS for PostgreSQL database. The RPO must be less than 5 minutes and the RTO less than 1 hour. The database size is 2 TB. Which solution is MOST cost-effective?

Question 46hardmultiple choice
Read the full Resilient Cloud Solutions explanation →

A company uses an NLB to distribute traffic to a fleet of EC2 instances in a single Availability Zone. During a recent AWS outage in that zone, the application became completely unavailable. The company wants to achieve high availability without rearchitecting the application. Which change is MOST appropriate?

Question 47easymultiple choice
Read the full Resilient Cloud Solutions explanation →

A company runs a containerized application on Amazon ECS with Fargate. The application needs to store session state. Which service provides the MOST resilient and scalable solution?

Question 48mediummultiple choice
Read the full Resilient Cloud Solutions explanation →

A company uses AWS CodePipeline to deploy a web application. The pipeline includes a deploy action that uses AWS CloudFormation to update a stack. The deployment occasionally fails because of a transient resource limit error. Which automatic retry strategy should a DevOps engineer implement?

Question 49hardmultiple choice
Read the full Resilient Cloud Solutions explanation →

A company has a critical application running on EC2 instances in an Auto Scaling group across two Availability Zones. The application uses an EBS volume for local caching. The company wants to ensure that if an instance fails, the cache data is not lost and the replacement instance can use it. Which solution meets this requirement?

Question 50easymulti select
Read the full Resilient Cloud Solutions explanation →

A company wants to ensure that its application running on AWS can withstand the failure of an entire AWS Region. Which TWO strategies should the company implement?

Question 51mediummulti select
Read the full Resilient Cloud Solutions explanation →

A company runs a stateful web application on EC2 instances behind an ALB. The application stores session data in memory. The company wants to make the application stateless to improve resilience. Which TWO changes should the company make?

Question 52hardmulti select
Read the full Resilient Cloud Solutions explanation →

A company is designing a disaster recovery plan for a critical application that uses Amazon RDS for MySQL with Multi-AZ. The RPO must be less than 1 minute and RTO less than 15 minutes. The primary Region is us-east-1. Which THREE steps should the company take to meet these requirements?

Question 53easymultiple choice
Read the full Resilient Cloud Solutions explanation →

A company runs a production web application on EC2 instances behind an Application Load Balancer. The application experiences intermittent high latency. The operations team needs to identify the root cause without affecting live traffic. Which approach is the MOST efficient?

Question 54easymultiple choice
Read the full Resilient Cloud Solutions explanation →

A company uses AWS Lambda for processing events from Amazon S3. Recently, the Lambda function started timing out after the 15-minute limit for some large files. The function downloads the entire file to /tmp before processing. What should a DevOps engineer do to resolve this issue with minimal code changes?

Question 55mediummultiple choice
Read the full Resilient Cloud Solutions explanation →

A company runs a critical database on Amazon RDS for PostgreSQL with Multi-AZ deployment. The application experiences a brief outage during automatic failover. To improve availability, the company wants to reduce the failover time. What should they do?

Question 56mediummultiple choice
Read the full Resilient Cloud Solutions explanation →

A company uses AWS CodeDeploy for blue/green deployments to an Auto Scaling group. The deployment fails because the new instances do not pass health checks. The DevOps engineer discovers that the health check URL returns a 503 error. What is the MOST likely cause?

Question 57mediummultiple choice
Read the full Resilient Cloud Solutions explanation →

A company runs a stateless web application on a fleet of EC2 instances in an Auto Scaling group. The application stores session state in a shared ElastiCache Redis cluster. During traffic spikes, the application becomes slow. Monitoring shows that the Redis cluster has high CPU utilization. Which solution is MOST cost-effective and scalable?

Question 58hardmultiple choice
Read the full Resilient Cloud Solutions explanation →

A company runs a containerized microservices architecture on Amazon ECS with Fargate. The services communicate via an internal Application Load Balancer. Recently, a new deployment of Service A caused its health checks to fail. The DevOps engineer notices that the old tasks remain running and the service is unavailable. What configuration change would prevent this issue in future deployments?

Question 59hardmultiple choice
Read the full Resilient Cloud Solutions explanation →

A company runs a critical application on EC2 instances in an Auto Scaling group. The application uses an EBS volume attached to each instance for temporary data. The company needs to ensure that if an instance fails, the data is not lost, and the new instance can resume quickly. What should they do?

Question 60hardmultiple choice
Read the full Resilient Cloud Solutions explanation →

A company uses AWS CloudFormation to deploy infrastructure. The stack creation fails with the error: 'Resource handler returned message: 'The security group does not exist in VPC'.' The template references a security group by name. What is the MOST likely cause?

Question 61mediummultiple choice
Read the full Resilient Cloud Solutions explanation →

A company runs a web application on EC2 instances behind an Application Load Balancer. The application uses an Aurora MySQL database. Recently, the database experienced a failover, and the application started throwing connection errors. The DevOps engineer needs to make the application resilient to database failovers with minimal code changes. What should they do?

Question 62easymulti select
Read the full Resilient Cloud Solutions explanation →

A company wants to design a highly available and fault-tolerant architecture for a stateless web application on AWS. Which TWO actions should they take? (Choose two.)

Question 63mediummulti select
Read the full Resilient Cloud Solutions explanation →

A company runs a microservices application on Amazon ECS with Fargate. The services need to be resilient to AZ failures. Which TWO actions should the company take? (Choose two.)

Question 64hardmulti select
Read the full Resilient Cloud Solutions explanation →

A company is designing a disaster recovery plan for a critical application with an RPO of 15 minutes and RTO of 1 hour. The application runs on EC2 instances with an RDS MySQL database. The primary Region is us-east-1. Which THREE actions should they take to meet the RPO and RTO? (Choose three.)

Question 65mediummultiple choice
Review the full subnetting walkthrough →

An IAM policy is attached to an S3 bucket to allow access from a specific VPC CIDR range. However, users from the VPC are receiving 'Access Denied' errors when trying to access objects in the bucket. What is the MOST likely reason?

Exhibit

Refer to the exhibit.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": "s3:GetObject",
      "Resource": "arn:aws:s3:::my-bucket/*",
      "Condition": {
        "IpAddress": {
          "aws:SourceIp": "10.0.0.0/16"
        }
      }
    }
  ]
}
Question 66hardmultiple choice
Read the full NAT/PAT explanation →

A DevOps engineer runs the above command and sees that one target is unhealthy with a 503 error. The application is a web server running on port 80. The health check is configured to hit the root path '/'. Which action should the engineer take to resolve the issue?

Network Topology
$ aws elbv2 describe-target-healthtarget-group-arn arn:aws:elasticloadbalancing:us-east-1:123456789012:targetgroup/my-tg/1234567890123456Refer to the exhibit."TargetHealthDescriptions": ["Target": {"Id": "i-0abcd1234efgh5678","Port": 80},"HealthCheckPort": "80","TargetHealth": {"State": "unhealthy","Id": "i-0abcd1234efgh5679","State": "healthy"
Question 67hardmultiple choice
Read the full Resilient Cloud Solutions explanation →

A company deploys the above CloudFormation stack. They want to enforce HTTPS for all requests to the S3 bucket. After deployment, users are still able to make HTTP requests. What is the problem?

Exhibit

Refer to the exhibit.

Resources:
  MyBucket:
    Type: AWS::S3::Bucket
    Properties:
      VersioningConfiguration:
        Status: Enabled
  MyBucketPolicy:
    Type: AWS::S3::BucketPolicy
    Properties:
      Bucket: !Ref MyBucket
      PolicyDocument:
        Version: 2012-10-17
        Statement:
          - Effect: Deny
            Action: s3:*
            Principal: '*'
            Resource: !Sub '${MyBucket.Arn}/*'
            Condition:
              Bool:
                aws:SecureTransport: 'false'
Question 68mediummultiple choice
Read the full Resilient Cloud Solutions explanation →

A company runs a critical web application on EC2 instances behind an Application Load Balancer. To improve resilience, they want to automatically replace failed instances. Which AWS service should they use?

Question 69hardmultiple choice
Review the full routing breakdown →

A company is designing a multi-region active-active architecture for a stateless web application using Route 53 latency-based routing. The application uses an RDS MySQL database. What should be done to ensure database resilience across regions?

Question 70easymultiple choice
Read the full Resilient Cloud Solutions explanation →

A DevOps engineer needs to ensure that an application running on EC2 can automatically recover from an underlying hardware failure without manual intervention. Which AWS feature should be enabled?

Question 71mediummultiple choice
Read the full Resilient Cloud Solutions explanation →

A company is deploying a critical microservice on Amazon ECS with Fargate. They need to ensure that the service can tolerate an Availability Zone failure. What is the BEST approach?

Question 72hardmultiple choice
Read the full Resilient Cloud Solutions explanation →

A company runs an application on EC2 with a shared Elastic IP. The instance fails and an engineer manually attaches the Elastic IP to a standby instance. To automate this failover, which service should be used?

Question 73easymultiple choice
Read the full Resilient Cloud Solutions explanation →

A company wants to protect its S3 bucket data from accidental deletion or overwrite. Which feature should be enabled?

Question 74hardmultiple choice
Read the full Resilient Cloud Solutions explanation →

A company has a multi-region application with an RDS for MySQL database in us-east-1. They want to minimize downtime if the primary region fails. They set up a cross-region read replica in us-west-2. What additional step is needed for automated failover?

Question 75mediummultiple choice
Read the full Resilient Cloud Solutions explanation →

A company is designing a disaster recovery strategy for a critical application. They need a Recovery Time Objective (RTO) of 15 minutes and a Recovery Point Objective (RPO) of 1 minute. Which AWS database service configuration meets these requirements?

Question 76mediummultiple choice
Review the full routing breakdown →

A company uses an Application Load Balancer (ALB) to distribute traffic to EC2 instances. The ALB is in us-east-1a and us-east-1b. They want to ensure that if one AZ fails, traffic is routed only to healthy instances in the other AZ. What configuration is necessary?

Question 77mediummulti select
Read the full Resilient Cloud Solutions explanation →

A company is designing a highly available architecture for a stateless web application using AWS services. Which TWO steps should they take to achieve high availability?

Question 78hardmulti select
Read the full Resilient Cloud Solutions explanation →

A company uses DynamoDB global tables for a multi-region application. They notice that write conflicts are occurring. Which TWO strategies can reduce write conflicts?

Question 79mediummulti select
Read the full Resilient Cloud Solutions explanation →

A company is designing a disaster recovery plan for an RDS PostgreSQL database. They have a cross-region read replica. Which THREE steps should they take to ensure a successful failover?

Question 80mediummultiple choice
Read the full Resilient Cloud Solutions explanation →

A company runs a critical web application on EC2 instances behind an Application Load Balancer (ALB) across multiple Availability Zones. During a recent failure of one AZ, the application experienced downtime because the Auto Scaling group did not launch new instances quickly enough. What should a DevOps engineer do to improve resilience?

Question 81hardmultiple choice
Read the full Resilient Cloud Solutions explanation →

An application running on Amazon ECS with Fargate experiences intermittent failures. The task definition includes a single container with a health check command. Despite the health check passing, the application occasionally returns HTTP 500 errors. The application logs are sent to CloudWatch Logs. What is the MOST likely root cause?

Question 82easymultiple choice
Read the full Resilient Cloud Solutions explanation →

A company wants to ensure its data in Amazon S3 is protected against accidental deletion. The bucket stores critical documents. Which approach provides the HIGHEST level of resilience?

Question 83mediummultiple choice
Read the full NAT/PAT explanation →

A DevOps team uses AWS CodePipeline to deploy a web application. The pipeline has a deploy stage that uses CodeDeploy to deploy to an Auto Scaling group. During deployment, the new instances fail health checks and the deployment rolls back. However, the rollback also fails because the old instances have been terminated. What should the team do to avoid this issue?

Question 84hardmultiple choice
Read the full Resilient Cloud Solutions explanation →

A company runs a stateful application on EC2 instances with instance store volumes. The application requires low-latency access to data. The operations team needs to ensure that instance failure does not result in data loss. Which solution is MOST resilient?

Question 85easymultiple choice
Read the full DNS explanation →

A company is using Amazon RDS for MySQL with Multi-AZ deployment. During a recent failover, the application experienced a brief downtime because the DNS cache on the application servers still pointed to the old primary. How can a DevOps engineer minimize this downtime?

Question 86mediummultiple choice
Read the full Resilient Cloud Solutions explanation →

A company uses AWS Lambda to process messages from an Amazon SQS queue. The Lambda function occasionally times out after 15 seconds. To improve resilience, the team wants to ensure messages are not lost and are retried. Which configuration is MOST appropriate?

Question 87hardmultiple choice
Read the full Resilient Cloud Solutions explanation →

A company runs a containerized application on Amazon EKS. The application uses an ALB Ingress Controller. During a cluster upgrade, the ingress controller stops responding, causing downtime. The team wants to ensure resilience during upgrades. Which approach is BEST?

Question 88easymultiple choice
Read the full Resilient Cloud Solutions explanation →

A company uses Amazon CloudFront to distribute content from an S3 bucket origin. Some users report intermittent access errors. The DevOps team suspects the origin is overwhelmed. What is the MOST effective way to improve resilience?

Question 89mediummulti select
Read the full Resilient Cloud Solutions explanation →

A company is designing a disaster recovery (DR) strategy for a critical application that runs on EC2 instances with an RDS database. The DR site must be in a different AWS Region. The Recovery Point Objective (RPO) is 15 minutes, and Recovery Time Objective (RTO) is 1 hour. Which TWO actions should the company take to meet these objectives? (Choose TWO.)

Question 90hardmulti select
Read the full Resilient Cloud Solutions explanation →

A company runs a microservices architecture on Amazon ECS with Fargate. Services communicate via an internal Application Load Balancer (ALB). The operations team notices that occasional traffic spikes cause increased latency and timeouts. The team wants to improve resilience without over-provisioning. Which THREE steps should be taken? (Choose THREE.)

Question 91easymulti select
Read the full Resilient Cloud Solutions explanation →

A company is designing a resilient storage solution for a critical application. The data must be highly available and durable. Which TWO services meet these requirements? (Choose TWO.)

Question 92mediummultiple choice
Read the full NAT/PAT explanation →

A company runs a critical web application on AWS using an Application Load Balancer (ALB) in front of an Auto Scaling group of EC2 instances. The application experiences periodic traffic spikes. To handle these spikes, the company wants to use a combination of proactive scaling based on a predictable schedule and reactive scaling based on CPU utilization. What is the MOST resilient scaling strategy?

Question 93hardmultiple choice
Read the full Resilient Cloud Solutions explanation →

A company runs a stateful web application on EC2 instances behind an ALB. The application uses sticky sessions (session affinity) to maintain user sessions. During a deployment, the company wants to update the application with zero downtime and ensure that in-flight sessions are not lost. Which deployment strategy should they use?

Question 94easymultiple choice
Read the full Resilient Cloud Solutions explanation →

A company is designing a multi-region active-active architecture for a stateless web application. The application uses a DynamoDB table as its data store. The company wants to minimize write latency and ensure that writes are accepted in any region with eventual consistency. Which DynamoDB feature should they use?

Question 95hardmultiple choice
Read the full Resilient Cloud Solutions explanation →

A company runs a microservices architecture on Amazon ECS with Fargate. Each service is deployed in its own ECS service. The company wants to ensure that if one Availability Zone (AZ) fails, the services can continue to operate with minimal impact. What is the MOST resilient task placement strategy?

Question 96mediummultiple choice
Read the full Resilient Cloud Solutions explanation →

A company has a production RDS for PostgreSQL database. They need to perform a major version upgrade with minimal downtime. Which strategy provides the LEAST downtime while maintaining data integrity?

Question 97easymultiple choice
Read the full Resilient Cloud Solutions explanation →

A company is designing a disaster recovery strategy for its on-premises database to AWS using AWS Elastic Disaster Recovery (AWS DRS). The recovery time objective (RTO) is 15 minutes, and the recovery point objective (RPO) is 1 minute. Which configuration should they use?

Question 98mediummultiple choice
Read the full Resilient Cloud Solutions explanation →

A company runs a containerized application on Amazon EKS. They want to ensure that if a node fails, the pods are rescheduled on healthy nodes. Which configuration is necessary?

Question 99hardmultiple choice
Read the full Resilient Cloud Solutions explanation →

A company has a serverless application using AWS Lambda functions that process messages from an Amazon SQS queue. The Lambda function sometimes fails due to transient errors. The company wants to ensure that failed messages are retried and eventually processed or sent to a dead-letter queue after 3 retries. What is the correct configuration?

Question 100easymultiple choice
Read the full Resilient Cloud Solutions explanation →

A company is building a multi-tier web application on AWS. The web tier runs on EC2 instances behind an ALB. The application tier runs on EC2 instances that are not publicly accessible. The database tier runs on RDS MySQL. Which design provides the HIGHEST level of resilience for the database tier?

Question 101mediummulti select
Read the full Resilient Cloud Solutions explanation →

A company runs a stateful web application on EC2 instances that store session data locally. They want to migrate to a stateless architecture for better resilience. Which TWO actions should they take?

Question 102hardmulti select
Read the full Resilient Cloud Solutions explanation →

A company runs a critical application on AWS that uses an Auto Scaling group of EC2 instances. The application must remain available even if an entire Availability Zone fails. Which THREE actions should the company take?

Question 103easymulti select
Read the full Resilient Cloud Solutions explanation →

A company is designing a disaster recovery strategy for its application. The application runs on EC2 instances and uses an RDS MySQL database. The RTO is 1 hour, and the RPO is 15 minutes. Which TWO approaches meet these requirements?

Question 104mediummultiple choice
Read the full Resilient Cloud Solutions explanation →

A company runs a web application on EC2 instances behind an ALB. To improve resilience, they want to automatically re-register failed instances. Which solution meets this requirement?

Question 105hardmultiple choice
Read the full Resilient Cloud Solutions explanation →

A company uses DynamoDB global tables with two regions. They notice that writes in one region are not replicating to the other region after a brief network partition. Which configuration will ensure replication resumes automatically?

Question 106easymultiple choice
Read the full Resilient Cloud Solutions explanation →

A company wants to ensure its RDS Multi-AZ deployment automatically fails over to a standby instance in a different Availability Zone. Which additional step is required?

Question 107mediummultiple choice
Read the full Resilient Cloud Solutions explanation →

A company uses AlB with target groups for a microservices architecture. They need to ensure that if a target group has no healthy targets, the ALB returns a custom error page instead of a 503. How can this be achieved?

Question 108hardmultiple choice
Read the full NAT/PAT explanation →

A company runs a Stateful application on EC2 that requires sticky sessions. They use an ALB with duration-based stickiness. During a deployment, they want to drain existing connections gracefully before terminating instances. Which step is necessary?

Question 109easymultiple choice
Read the full Resilient Cloud Solutions explanation →

A company wants to automate the recovery of an Amazon RDS DB instance in a different region if the primary region becomes unavailable. Which service should they use?

Question 110mediummultiple choice
Read the full Resilient Cloud Solutions explanation →

A company uses AWS Lambda to process messages from an SQS queue. They need to ensure that if the Lambda function fails, the message is not lost and can be processed again. Which configuration is required?

Question 111hardmultiple choice
Read the full NAT/PAT explanation →

A company runs a critical application on EC2 instances in an Auto Scaling group behind an ALB. They want to ensure that if an instance fails, the application remains available with minimal disruption. Which combination of services provides the best resilience?

Question 112easymultiple choice
Read the full Resilient Cloud Solutions explanation →

A company wants to design a resilient architecture for a web application using AWS services. Which of the following is a best practice for improving resilience?

Question 113mediummulti select
Read the full Resilient Cloud Solutions explanation →

A company is designing a multi-region disaster recovery strategy for a stateless web application. They want to minimize RTO and RPO. Which TWO of the following should they implement? (Choose TWO.)

Question 114hardmulti select
Read the full Resilient Cloud Solutions explanation →

A company uses Amazon ECS with Fargate for containerized applications. They need to ensure that if a task fails, it is automatically restarted and the application remains available. Which THREE actions should they take? (Choose THREE.)

Question 115mediummulti select
Read the full Resilient Cloud Solutions explanation →

A company runs a mission-critical database on Amazon RDS for MySQL. They need to ensure that if the primary DB instance fails, the database remains available with minimal downtime. Which TWO configurations should they implement? (Choose TWO.)

Question 116hardmultiple choice
Read the full Resilient Cloud Solutions explanation →

Refer to the exhibit. An IAM policy is attached to a user. A developer tries to upload an object to s3://my-bucket/confidential/report.pdf without specifying server-side encryption. What will happen?

Exhibit

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:PutObject",
        "s3:DeleteObject"
      ],
      "Resource": "arn:aws:s3:::my-bucket/*"
    },
    {
      "Effect": "Deny",
      "Action": "s3:PutObject",
      "Resource": "arn:aws:s3:::my-bucket/confidential/*",
      "Condition": {
        "StringNotEquals": {
          "s3:x-amz-server-side-encryption": "aws:kms"
        }
      }
    }
  ]
}
Question 117mediummultiple choice
Read the full NAT/PAT explanation →

A company runs a high-traffic e-commerce application on EC2 instances in an Auto Scaling group behind an ALB. The application uses an in-memory cache on the EC2 instances. During a recent deployment, the Auto Scaling group terminated an instance that had active user sessions, causing users to lose their cart data and leading to a poor customer experience. The company wants to prevent this in future deployments. They need a solution that allows existing sessions to complete before instance termination, without manual intervention. Which solution should they use?

Question 118hardmultiple choice
Read the full NAT/PAT explanation →

A company runs a critical microservices application on Amazon EKS with multiple services. They use an ingress controller (ALB Ingress Controller) to route traffic to services. They notice that when a pod fails, new requests are still sent to the failed pod for a few seconds, causing errors. The health check interval is set to 5 seconds. They want to minimize the time during which failed pods receive traffic. They also need to ensure that during rolling updates, traffic is not sent to pods that are terminating. Which solution should they implement?

Question 119easymultiple choice
Read the full Resilient Cloud Solutions explanation →

A company runs a web application on EC2 instances behind an Application Load Balancer (ALB). The application stores session state in an RDS MySQL database. During a recent spike in traffic, the database CPU utilization reached 100%, causing slow responses. To improve resilience, what should a DevOps engineer do?

Question 120mediummultiple choice
Read the full Resilient Cloud Solutions explanation →

A DevOps team is designing a highly available multi-tier application on AWS. The application runs on EC2 instances in an Auto Scaling group across two Availability Zones. The team uses an Application Load Balancer (ALB) to distribute traffic. The application requires the ALB to be accessible via a single, static IP address for whitelisting by third-party partners. What is the most resilient solution?

Question 121hardmulti select
Read the full NAT/PAT explanation →

A company has a microservices architecture running on Amazon ECS with Fargate launch type. Each service is deployed in multiple Availability Zones. The services communicate via REST APIs. Recently, a downstream service experienced a partial outage, causing upstream services to time out and leading to cascading failures. The team wants to improve resilience against such failures. Which combination of actions should the DevOps engineer take? (Choose TWO.)

Question 122hardmulti select
Read the full Resilient Cloud Solutions explanation →

A company is running a critical application on Amazon RDS for PostgreSQL with Multi-AZ deployment. The application performs frequent writes. During a recent failover test, the team observed that the application experienced a 30-second write outage. To minimize downtime during automatic failovers, which configuration change should the DevOps engineer implement? (Choose TWO.)

Question 123easymultiple choice
Read the full Resilient Cloud Solutions explanation →

A DevOps engineer is designing a disaster recovery (DR) strategy for a stateless web application running on EC2 instances with an Application Load Balancer. The application stores data in Amazon S3 and uses a DynamoDB table for session data. The primary region is us-east-1 and the DR region is us-west-2. The RTO is 15 minutes and RPO is 1 minute. Which strategy is most cost-effective and meets the requirements?

Question 124mediummulti select
Read the full Resilient Cloud Solutions explanation →

A company runs a containerized application on Amazon ECS with Fargate. The application uses an Application Load Balancer (ALB) and stores data in Amazon Aurora Serverless v2. The application experiences intermittent timeouts during periods of rapid scaling. The DevOps engineer notices that the Aurora database's ACU utilization spikes to 100% during these events. What should the engineer do to improve resilience? (Choose THREE.)

Question 125mediummulti select
Read the full NAT/PAT explanation →

A company uses AWS CloudFormation to deploy infrastructure. The DevOps team wants to ensure that if a stack update fails, the stack automatically rolls back to the previous known good state. The team also wants to receive notifications of the rollback. Which combination of steps should the team take? (Choose THREE.)

Question 126hardmultiple choice
Read the full Resilient Cloud Solutions explanation →

A DevOps engineer is troubleshooting an issue where an EC2 instance behind an ALB target group is marked as unhealthy. The instance i-0abcd1234efgh5678 is serving traffic but the health check is timing out. The security group for the instance allows inbound HTTP from the ALB's security group. What is the most likely cause?

Network Topology
$ aws elbv2 describe-target-healthtarget-group-arn arn:aws:elasticloadbalancing:us-east-1:123456789012:targetgroup/my-tg/1234567890123456Refer to the exhibit.```"TargetHealthDescriptions": ["Target": {"Id": "i-0abcd1234efgh5678","Port": 80},"HealthCheckPort": "80","TargetHealth": {"State": "unhealthy","Reason": "Target.Timeout","Description": "Request timed out""Id": "i-0abcd1234efgh5679","State": "healthy","Description": "Target health check passed"
Question 127mediummultiple choice
Read the full Resilient Cloud Solutions explanation →

A DevOps engineer created this IAM policy for a CI/CD pipeline role. The pipeline needs to stop and start production EC2 instances and manage Auto Scaling groups. However, the pipeline fails when trying to stop an instance. What is the most likely reason?

Exhibit

Refer to the exhibit.

```
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "ec2:DescribeInstances",
        "ec2:StartInstances",
        "ec2:StopInstances"
      ],
      "Resource": "*",
      "Condition": {
        "StringEquals": {
          "ec2:ResourceTag/Environment": "production"
        }
      }
    },
    {
      "Effect": "Allow",
      "Action": [
        "autoscaling:UpdateAutoScalingGroup",
        "autoscaling:CreateAutoScalingGroup",
        "autoscaling:DeleteAutoScalingGroup"
      ],
      "Resource": "*"
    }
  ]
}
```
Question 128easymultiple choice
Read the full Resilient Cloud Solutions explanation →

A DevOps engineer is reviewing a CloudFormation template for an S3 bucket that stores application logs. The bucket has versioning enabled and a lifecycle rule to expire noncurrent versions after 30 days. The bucket policy allows public read access to all objects. The company's security policy requires that all S3 buckets block public access. Which change should the engineer make to comply?

Exhibit

Refer to the exhibit.

```
Resources:
  MyBucket:
    Type: AWS::S3::Bucket
    Properties:
      BucketName: my-app-data-123
      VersioningConfiguration:
        Status: Enabled
      LifecycleConfiguration:
        Rules:
          - Id: ExpireOldVersions
            Status: Enabled
            NoncurrentVersionExpirationInDays: 30
  MyBucketPolicy:
    Type: AWS::S3::BucketPolicy
    Properties:
      Bucket: !Ref MyBucket
      PolicyDocument:
        Statement:
          - Action: s3:GetObject
            Effect: Allow
            Principal: *
            Resource: !Sub arn:aws:s3:::my-app-data-123/*
```
Question 129mediummulti select
Read the full Resilient Cloud Solutions explanation →

A company has a critical application running on Amazon EC2 instances in an Auto Scaling group. The application writes logs to an Amazon EFS file system. The DevOps team needs to ensure that log data is durable and available even if an Availability Zone fails. The EFS file system is currently in one AZ. What should the team do? (Choose TWO.)

Question 130hardmulti select
Read the full Resilient Cloud Solutions explanation →

A company is migrating a monolithic application to a microservices architecture on Amazon EKS. The application uses a relational database. The team wants to ensure that database connections are managed efficiently and that the database can withstand a sudden spike in connections from multiple microservices. Which solution should the DevOps engineer implement? (Choose THREE.)

Question 131easymultiple choice
Read the full Resilient Cloud Solutions explanation →

A company runs a serverless application using AWS Lambda functions behind an Amazon API Gateway. The application processes user uploads stored in an S3 bucket. The Lambda function writes results to a DynamoDB table. Recently, the function started timing out when processing large files. What should the DevOps engineer do to improve resilience for large file processing?

Question 132hardmulti select
Read the full Resilient Cloud Solutions explanation →

A company runs a web application on EC2 instances in an Auto Scaling group across three Availability Zones. The application uses an Application Load Balancer (ALB) and stores session data in an ElastiCache for Redis cluster with cluster mode enabled. During a recent deployment, a new version of the application caused a memory leak in the Redis cluster, leading to out-of-memory errors and evictions. The DevOps team wants to prevent future deployments from affecting the Redis cluster's health. What should the team do? (Choose TWO.)

Question 133mediummultiple choice
Read the full Resilient Cloud Solutions explanation →

A company runs a critical application on Amazon RDS for MySQL with Multi-AZ deployment. The database is 2 TB in size. The DevOps team needs to perform a major version upgrade (e.g., MySQL 5.7 to 8.0) with minimal downtime. The RTO is 5 minutes and RPO is 1 minute. Which approach should the team take?

Question 134mediummultiple choice
Read the full Resilient Cloud Solutions explanation →

Your company runs a multi-tier web application on AWS. The web tier consists of EC2 instances behind an Application Load Balancer (ALB) in an Auto Scaling group across three Availability Zones. The application tier runs on a separate Auto Scaling group of EC2 instances that process requests from the web tier. The database tier uses an Amazon RDS for PostgreSQL Multi-AZ deployment. All application servers write logs to Amazon CloudWatch Logs. Recently, the operations team reported that during peak hours, the web tier experiences intermittent 503 errors. The ALB access logs show that the errors occur when the target group's healthy host count drops to zero momentarily. The Auto Scaling group's minimum and desired capacity is 6, with a maximum of 12. The scaling policy is based on average CPU utilization, with a target of 60%. The health check grace period is 300 seconds. The application health check endpoint returns a 200 status when healthy. The DevOps engineer suspects that the scaling policy is too slow to react to traffic spikes. The engineer wants to implement a more proactive scaling approach. Which solution should the engineer implement?

Question 135hardmultiple choice
Read the full Resilient Cloud Solutions explanation →

A financial services company runs a critical application on Amazon ECS with Fargate launch type. The application consists of three microservices: Service A (frontend), Service B (processing), and Service C (database access). Services communicate via REST APIs. The application stores data in Amazon Aurora PostgreSQL Serverless v2. The company has a disaster recovery (DR) requirement: RTO of 30 minutes and RPO of 15 minutes. The primary region is us-east-1 and the DR region is us-west-2. The DevOps team has configured cross-region replication for the Aurora database using an Aurora Global Database. The ECS services are deployed with a service-linked role for Fargate. The team wants to automate the failover process to meet the RTO. Which solution should the team implement?

Question 136easymulti select
Read the full NAT/PAT explanation →

A startup runs a stateless web application on AWS Elastic Beanstalk with a single environment. The application uses an Amazon RDS for MySQL database instance. The startup is preparing for a marketing campaign that is expected to increase traffic by 10x. The CTO is concerned about the application's ability to handle the load and wants to ensure high availability and resilience. The current architecture has a single RDS instance (db.t3.medium) and a single Elastic Beanstalk environment with one EC2 instance (t3.medium). The startup has a limited budget but wants to improve resilience without over-provisioning. Which combination of actions should the DevOps engineer recommend? (Choose THREE.)

Question 137hardmulti select
Read the full NAT/PAT explanation →

A company runs a critical application on AWS Lambda functions that process real-time streaming data from Amazon Kinesis Data Streams. Each Lambda function processes a batch of records and writes results to an Amazon DynamoDB table. The application is sensitive to data loss and requires exactly-once processing semantics. Recently, the operations team observed that the Lambda function is failing intermittently with 'ProvisionedThroughputExceededException' errors from DynamoDB. The Lambda function's batch size is 100, and the function is configured with a reserved concurrency of 500. The DynamoDB table has 100 read capacity units (RCUs) and 100 write capacity units (WCUs) with auto scaling enabled up to 1000 WCUs. The function's execution role has the necessary DynamoDB permissions. The Kinesis stream has 10 shards. The DevOps engineer needs to resolve the throttling errors without losing data. Which combination of actions should the engineer take? (Choose THREE.)

Question 138mediummultiple choice
Read the full Resilient Cloud Solutions explanation →

A company runs a containerized application on Amazon EKS. The application uses an Application Load Balancer (ALB) as the ingress controller. The DevOps team wants to ensure that the application can automatically recover from node failures. The cluster consists of managed node groups across three Availability Zones. The team noticed that when a node fails, the pods on that node are not rescheduled for several minutes. The team wants to reduce the time to reschedule pods. Which configuration change should the team make?

Question 139easymulti select
Read the full Resilient Cloud Solutions explanation →

A company is deploying a critical web application on AWS and needs to ensure high availability and disaster recovery across multiple AWS Regions. The application uses an Application Load Balancer (ALB) in the primary Region and an Amazon RDS Multi-AZ DB instance. Which TWO actions should the company take to meet these requirements? (Choose two.)

Question 140mediummulti select
Read the full Resilient Cloud Solutions explanation →

A company runs a stateful web application on Amazon EC2 instances behind an Application Load Balancer (ALB). The application stores session data in local instance memory. To improve resiliency, the company wants to make the application stateless and distribute the load across multiple Availability Zones. Which THREE actions should the company take? (Choose three.)

Question 141hardmultiple choice
Read the full Resilient Cloud Solutions explanation →

A company runs a production application on Amazon ECS with Fargate, fronted by an Application Load Balancer (ALB). The application experiences periodic latency spikes and occasional 502 errors. The ECS service is configured with a desired count of 2 tasks, and the ALB health check is set to /health with a 30-second interval and 2 consecutive failures threshold. The team uses CloudWatch Container Insights and has noticed that CPU and memory utilization of tasks remain below 50%. However, the ALB TargetGroup's HealthyHostCount metric occasionally drops to 0 for a few minutes before recovering. The deployment strategy is rolling update with a minimum healthy percent of 50% and maximum percent of 200%. The team recently updated the task definition to increase memory and CPU, but the issue persists. What is the MOST likely cause of the problem?

Question 142easymultiple choice
Read the full Resilient Cloud Solutions explanation →

A company runs a critical web application on EC2 instances behind an Application Load Balancer. To improve resilience, they want to automatically replace unhealthy instances. Which AWS feature should they use?

Question 143mediummultiple choice
Read the full Resilient Cloud Solutions explanation →

A company uses Amazon RDS Multi-AZ for disaster recovery. The primary DB instance in us-east-1a fails. What happens next?

Question 144hardmultiple choice
Read the full Resilient Cloud Solutions explanation →

A company has a stateless web application on EC2 instances behind an ALB. They want to ensure that if an entire Availability Zone fails, the application remains available with minimal impact. Which architecture best meets this requirement?

Question 145mediummultiple choice
Read the full Resilient Cloud Solutions explanation →

A company runs a stateful application on EC2 instances. They want to distribute traffic evenly and maintain session stickiness. Which AWS service should they use?

Question 146easymultiple choice
Read the full Resilient Cloud Solutions explanation →

A company uses AWS CodeDeploy to deploy a new version of an application to EC2 instances. They want to minimize downtime and roll back quickly if the deployment fails. Which deployment type should they use?

Question 147hardmultiple choice
Read the full Resilient Cloud Solutions explanation →

A company runs a critical microservices architecture on Amazon ECS with Fargate. They want to ensure that if a task fails, it is automatically restarted, and the service remains available across multiple Availability Zones. How should they configure the ECS service?

Question 148mediummultiple choice
Review the full routing breakdown →

A company runs a global web application on EC2 instances behind an ALB in us-east-1. They want to improve resilience by routing users to the nearest healthy region. Which service should they use?

Question 149hardmultiple choice
Read the full Resilient Cloud Solutions explanation →

A company uses Amazon DynamoDB with global tables for a multi-region active-active application. They notice that occasionally, concurrent updates to the same item in different regions cause data inconsistency. How can they resolve this?

Question 150mediummultiple choice
Read the full Resilient Cloud Solutions explanation →

A company experiences intermittent high latency for a web application running on EC2 behind an ALB. They want to monitor and automatically replace instances that have high CPU. Which solution meets this requirement?

Question 151mediummulti select
Read the full Resilient Cloud Solutions explanation →

Which TWO AWS services can be used to distribute incoming traffic across multiple AWS resources in different Availability Zones within a single region?

Question 152hardmulti select
Read the full Resilient Cloud Solutions explanation →

Which THREE strategies can improve the resilience of an Amazon RDS for PostgreSQL database?

Question 153easymulti select
Read the full Resilient Cloud Solutions explanation →

Which TWO actions can help ensure that an application running on EC2 instances can survive the loss of an entire Availability Zone?

Question 154mediummultiple choice
Read the full Resilient Cloud Solutions explanation →

A company runs a critical web application on EC2 instances behind an Application Load Balancer. The application stores session state in an in-memory cache on each instance. During deployment of a new version, users experience session timeouts and errors. Which design change will MOST effectively improve resilience and avoid session loss during deployments?

Question 155hardmultiple choice
Read the full NAT/PAT explanation →

A company is designing a multi-Region disaster recovery strategy for a stateless web application. The application runs on EC2 instances in an Auto Scaling group behind an ALB in us-east-1. The recovery point objective (RPO) is 15 minutes and recovery time objective (RTO) is 30 minutes. The application data is stored in Amazon RDS for PostgreSQL. Which combination of actions should the company take to meet the RPO and RTO?

Question 156easymultiple choice
Read the full Resilient Cloud Solutions explanation →

A DevOps engineer is designing a resilient architecture for a serverless application using AWS Lambda, Amazon API Gateway, and Amazon DynamoDB. The application experiences occasional spikes in traffic that cause Lambda function throttling and increased error rates. What is the MOST effective way to improve resilience and reduce throttling?

Question 157mediummultiple choice
Study the full AAA explanation →

A company runs a microservices architecture on Amazon ECS with Fargate. Services communicate via an internal Application Load Balancer. Recently, one service became unavailable due to a memory leak, causing cascading failures in downstream services. What design change would MOST effectively improve resilience and limit the blast radius?

Question 158hardmultiple choice
Read the full Resilient Cloud Solutions explanation →

A company runs a critical batch processing workload on Amazon EMR that must complete within a 2-hour window each night. The workload is fault-tolerant but must be resilient to instance failures. Currently, the EMR cluster uses instance fleets with Spot Instances. Recently, Spot Instance interruptions caused the cluster to take over 3 hours to complete. Which change will MOST effectively ensure the workload completes within the 2-hour window despite Spot interruptions?

Question 159easymultiple choice
Read the full DNS explanation →

A company uses Amazon Route 53 for DNS and wants to ensure high availability for a web application hosted on two EC2 instances in different Availability Zones. The application uses an Application Load Balancer. What is the simplest way to achieve resilience if one Availability Zone becomes unavailable?

Question 160mediummultiple choice
Read the full Resilient Cloud Solutions explanation →

A company is deploying a stateful application on Amazon EKS. The application requires persistent storage that can be reattached to a new pod if the original pod fails. The cluster spans multiple Availability Zones. Which storage solution provides the BEST resilience and meets these requirements?

Question 161hardmultiple choice
Read the full Resilient Cloud Solutions explanation →

A company runs a critical application on AWS Lambda that processes messages from an Amazon SQS queue. The application must be resilient to downstream service failures. The team notices that when the downstream service is unhealthy, messages are repeatedly retried and eventually sent to the dead-letter queue (DLQ) before the service recovers. What design change would improve resilience by allowing automatic retries after the downstream service recovers?

Question 162easymultiple choice
Read the full Resilient Cloud Solutions explanation →

A company is designing a resilient architecture for a web application using AWS Global Accelerator and two Application Load Balancers in different AWS Regions. The application is stateless and uses a global DynamoDB table for data. What is the primary benefit of using Global Accelerator in this architecture?

Question 163mediummulti select
Read the full Resilient Cloud Solutions explanation →

A company runs a containerized application on Amazon ECS with Fargate. The application experiences intermittent failures due to resource exhaustion. The company wants to improve resilience by automatically replacing unhealthy tasks and scaling based on demand. Which TWO actions should the company take? (Choose TWO.)

Question 164hardmulti select
Read the full Resilient Cloud Solutions explanation →

A company is designing a disaster recovery plan for a MySQL database running on Amazon RDS. The database is critical and must have an RPO of 5 minutes and an RTO of 1 hour. The primary Region is us-east-1, and the DR Region is us-west-2. Which TWO steps should the company take to meet these requirements? (Choose TWO.)

Question 165mediummulti select
Read the full Resilient Cloud Solutions explanation →

A company is deploying a serverless application using AWS Lambda, Amazon API Gateway, and Amazon DynamoDB. The application must be resilient to regional outages. Which THREE steps should the company take to achieve multi-Region resilience? (Choose THREE.)

Question 166mediummultiple choice
Read the full Resilient Cloud Solutions explanation →

A DevOps engineer ran the above command and saw this output. What is the MOST likely cause of the stack creation failure?

Network Topology
aws cloudformation describe-stack-eventsstack-name MyStackRefer to the exhibit.```"StackEvents": ["EventId": "1","ResourceStatus": "CREATE_FAILED","ResourceType": "AWS::AutoScaling::AutoScalingGroup","Timestamp": "2023-01-15T10:00:00Z"
Question 167hardmultiple choice
Read the full Resilient Cloud Solutions explanation →

A DevOps engineer applies this S3 bucket policy to an S3 bucket. What is the effect of this policy?

Exhibit

Refer to the exhibit.

```
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": "s3:PutObject",
      "Resource": "arn:aws:s3:::my-bucket/*",
      "Condition": {
        "StringEquals": {
          "s3:x-amz-server-side-encryption": "AES256"
        }
      }
    },
    {
      "Effect": "Deny",
      "Action": "s3:PutObject",
      "Resource": "arn:aws:s3:::my-bucket/*",
      "Condition": {
        "StringNotEquals": {
          "s3:x-amz-server-side-encryption": "AES256"
        }
      }
    }
  ]
}
```
Question 168mediummultiple choice
Read the full Resilient Cloud Solutions explanation →

A DevOps engineer runs the above command and sees that instance i-0abcd1234efgh5678 is unhealthy with reason 'Target.Timeout'. The instance is running and the application on port 80 responds to curl from the instance itself. What is the MOST likely cause?

Network Topology
$ aws elbv2 describe-target-healthtarget-group-arn arn:aws:elasticloadbalancing:us-east-1:123456789012:targetgroup/my-tg/1234567890123456Refer to the exhibit.```"TargetHealthDescriptions": ["Target": {"Id": "i-0abcd1234efgh5678","Port": 80},"HealthCheckPort": "80","TargetHealth": {"State": "unhealthy","Reason": "Target.Timeout","Description": "Request timed out""Id": "i-0abcd1234efgh5679","State": "healthy"
Question 169mediummultiple choice
Read the full Resilient Cloud Solutions explanation →

A company is running a stateful web application on EC2 instances behind an Application Load Balancer. During a deployment, users report session timeouts. What should the DevOps engineer implement to ensure zero-downtime deployments without losing in-flight sessions?

Question 170hardmultiple choice
Review the full routing breakdown →

A financial services company runs a multi-region application on AWS. They need to ensure that if one AWS Region becomes unavailable, traffic is automatically rerouted to another region with no manual intervention. The application uses an Application Load Balancer in each region. What is the MOST resilient approach to meet this requirement?

Question 171easymultiple choice
Read the full Resilient Cloud Solutions explanation →

A DevOps engineer is designing a disaster recovery plan for a critical database. The RTO is 15 minutes and RPO is 1 minute. Which solution meets these requirements?

Question 172mediummultiple choice
Read the full Resilient Cloud Solutions explanation →

An e-commerce platform uses Amazon DynamoDB as its primary database. The platform experiences occasional read throttling during flash sales. The operations team needs to ensure that read traffic is handled without errors, while keeping costs low. What should a DevOps engineer recommend?

Question 173hardmultiple choice
Review the full subnetting walkthrough →

A company runs a containerized application on Amazon ECS with Fargate launch type. The application experiences intermittent failures when the ECS service scheduler attempts to place tasks during a deployment. The DevOps engineer notices that tasks fail to start due to insufficient IP addresses in the VPC subnets. What is the MOST resilient solution to prevent this issue?

Question 174easymultiple choice
Review the full routing breakdown →

A DevOps engineer is designing a highly available web application using Amazon Route 53. The application is deployed in two AWS Regions. The engineer wants to route traffic to the nearest healthy endpoint. Which routing policy should be used?

Question 175mediummultiple choice
Read the full Resilient Cloud Solutions explanation →

A company uses AWS CloudFormation to deploy infrastructure. They want to ensure that if a stack update fails, the stack is automatically rolled back to the last known good state. However, they also want to preserve any resources that were created successfully before the failure. Which CloudFormation stack policy should be used?

Question 176hardmultiple choice
Read the full Resilient Cloud Solutions explanation →

An organization runs a critical application on Amazon EC2 instances in an Auto Scaling group behind an Application Load Balancer. The application requires that all traffic be encrypted in transit. The security team mandates the use of TLS 1.2 or higher and specific ciphers. What is the MOST efficient way to enforce this requirement?

Question 177easymultiple choice
Read the full Resilient Cloud Solutions explanation →

A company wants to automatically recover an Amazon RDS DB instance if the underlying hardware fails. Which feature should the DevOps engineer enable?

Question 178mediummulti select
Read the full Resilient Cloud Solutions explanation →

Which TWO strategies can be used to improve the resilience of an application running on Amazon ECS with Fargate? (Select TWO.)

Question 179hardmulti select
Read the full Resilient Cloud Solutions explanation →

Which THREE components are required to implement a global application that can withstand the failure of an entire AWS Region? (Select THREE.)

Question 180mediummulti select
Read the full Resilient Cloud Solutions explanation →

Which TWO actions can help protect against accidental deletion of an Amazon S3 bucket? (Select TWO.)

Question 181hardmultiple choice
Read the full NAT/PAT explanation →

A company runs a critical web application on Amazon EC2 instances behind an Application Load Balancer (ALB). The application frequently experiences high latency during peak hours. The DevOps team needs to implement a solution that automatically adds capacity based on demand and reduces cost during off-peak hours. Which combination of AWS services should the team use?

Question 182mediummultiple choice
Review the full routing breakdown →

A company's DevOps team is designing a multi-region disaster recovery solution for a stateless web application. The application runs on Amazon EC2 instances behind an Application Load Balancer (ALB) in the us-east-1 region. The team needs to fail over to a secondary region (us-west-2) with minimal downtime in case of a regional outage. Which AWS service should the team use to route traffic to the healthy region?

Question 183easymultiple choice
Read the full Resilient Cloud Solutions explanation →

A company is running a production database on Amazon RDS for PostgreSQL with Multi-AZ deployment. The database experiences a failover due to an AZ outage. What happens to the existing database connections during the failover?

Question 184mediummultiple choice
Read the full Resilient Cloud Solutions explanation →

A company runs a critical application on Amazon ECS with Fargate launch type. The application is deployed across multiple Availability Zones. The DevOps team needs to ensure that if an entire Availability Zone fails, the application continues to serve traffic without manual intervention. What should the team do?

Question 185hardmultiple choice
Read the full Resilient Cloud Solutions explanation →

A company uses AWS Lambda functions to process events from an Amazon SQS queue. The Lambda function occasionally fails due to a transient downstream service error. The DevOps team wants to ensure that failed messages are not lost and can be retried later. The team also wants to reduce the number of invocations on the downstream service. Which configuration should the team use?

Question 186easymultiple choice
Read the full Resilient Cloud Solutions explanation →

A company is designing a highly available architecture for a web application. The application runs on Amazon EC2 instances in an Auto Scaling group across three Availability Zones. The instances are behind an Application Load Balancer (ALB). Which additional step should the team take to ensure that traffic is evenly distributed across all healthy instances in all Availability Zones?

Question 187mediummultiple choice
Read the full Resilient Cloud Solutions explanation →

A company is running a stateful web application on Amazon EC2 instances. The application stores session data locally on the instance. The company wants to make the application stateless and improve resilience. The DevOps team decides to use Amazon ElastiCache for Redis to store session data. What additional step should the team take to ensure that the session data is highly available?

Question 188hardmultiple choice
Read the full Resilient Cloud Solutions explanation →

A company runs a microservices application on Amazon EKS. The application's frontend service needs to communicate with the backend service. The DevOps team wants to implement service-to-service authentication using AWS IAM. Which method should the team use?

Question 189easymultiple choice
Read the full Resilient Cloud Solutions explanation →

A company has an Amazon S3 bucket that stores critical data. The company wants to protect the data from accidental deletion and ensure that even the root user cannot delete the bucket. Which S3 feature should the company enable?

Question 190hardmulti select
Read the full NAT/PAT explanation →

A company runs a critical application on Amazon EC2 instances in an Auto Scaling group. The application generates logs that are sent to Amazon CloudWatch Logs. The DevOps team needs to configure a metric filter to monitor for error patterns and trigger an alarm when the error rate exceeds 5% of total requests over a 5-minute period. Which TWO steps should the team take? (Choose TWO.)

Question 191mediummulti select
Read the full NAT/PAT explanation →

A company is building a serverless application using AWS Lambda, Amazon API Gateway, and Amazon DynamoDB. The application is expected to have unpredictable traffic patterns. The DevOps team needs to ensure that the application can handle sudden spikes in traffic without throttling. Which TWO actions should the team take? (Choose TWO.)

Question 192easymulti select
Read the full Resilient Cloud Solutions explanation →

A company is deploying a web application on Amazon ECS with Fargate. The application consists of a frontend service and a backend service. The DevOps team needs to ensure that the frontend service can communicate with the backend service securely without exposing the backend to the internet. Which THREE steps should the team take? (Choose THREE.)

Question 193hardmultiple choice
Read the full Resilient Cloud Solutions explanation →

An AWS account owner (Account A) owns an S3 bucket named my-bucket. The bucket policy shown in the exhibit is attached to the bucket. A user from Account B attempts to upload an object to the bucket without specifying the x-amz-acl header. What will happen?

Exhibit

Refer to the exhibit.

```
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": "s3:PutObject",
      "Resource": "arn:aws:s3:::my-bucket/*",
      "Condition": {
        "StringEquals": {
          "s3:x-amz-acl": "bucket-owner-full-control"
        }
      }
    }
  ]
}
```
Question 194mediummultiple choice
Read the full Resilient Cloud Solutions explanation →

A DevOps engineer runs the above command and sees that one target is unhealthy with reason 'Target.Timeout'. The target is an EC2 instance running a web server on port 80. The security group for the instance allows inbound traffic on port 80 from the ALB's security group. What is the most likely cause of the health check failure?

Network Topology
$ aws elbv2 describe-target-healthtarget-group-arn arn:aws:elasticloadbalancing:us-east-1:123456789012:targetgroup/my-tg/1234567890123456Refer to the exhibit.```"TargetHealthDescriptions": ["Target": {"Id": "i-0abcd1234efgh5678","Port": 80},"HealthCheckPort": "80","TargetHealth": {"State": "unhealthy","Reason": "Target.Timeout","Description": "Request timed out""Id": "i-0abcd1234efgh5679","State": "healthy","Description": "Target registration is healthy"
Question 195easymultiple choice
Read the full Resilient Cloud Solutions explanation →

A DevOps team uses the above CloudFormation template to create an S3 bucket. What does the bucket policy accomplish?

Exhibit

Refer to the exhibit.

```
Resources:
  MyBucket:
    Type: AWS::S3::Bucket
    Properties:
      VersioningConfiguration:
        Status: Enabled
      BucketName: my-critical-bucket
  MyBucketPolicy:
    Type: AWS::S3::BucketPolicy
    Properties:
      Bucket: !Ref MyBucket
      PolicyDocument:
        Version: 2012-10-17
        Statement:
          - Effect: Deny
            Principal: '*'
            Action: 's3:*'
            Resource:
              - !Sub '${MyBucket.Arn}/*'
            Condition:
              Bool:
                'aws:SecureTransport': false
```
Question 196easymultiple choice
Read the full Resilient Cloud Solutions explanation →

A company runs a web application on EC2 instances behind an ALB. To improve resilience, they want to automatically replace failed instances and maintain a minimum number of instances. Which AWS service should be used?

Question 197mediummultiple choice
Read the full Resilient Cloud Solutions explanation →

A company's production database on Amazon RDS Multi-AZ DB instance experienced a failover. The application experienced a brief outage. How can the company reduce the failover time?

Question 198hardmultiple choice
Read the full Resilient Cloud Solutions explanation →

A company uses AWS Lambda functions to process events from Amazon SQS. The Lambda function sometimes fails due to timeouts. The team wants to preserve the event for reprocessing. How should they configure the integration?

Question 199easymultiple choice
Read the full Resilient Cloud Solutions explanation →

A company wants to design a disaster recovery solution for its primary AWS Region. The solution should have a Recovery Point Objective (RPO) of a few seconds and a Recovery Time Objective (RTO) of a few minutes. Which strategy meets these requirements?

Question 200mediummultiple choice
Read the full Resilient Cloud Solutions explanation →

An application on EC2 instances in an Auto Scaling group uses an ALB. The ALB health checks are failing for some instances, but the instances are healthy from the OS perspective. What is the most likely cause?

Question 201hardmultiple choice
Read the full Resilient Cloud Solutions explanation →

A company is building a global application that requires low-latency access to static content across multiple AWS Regions. The content changes infrequently. Which solution is MOST resilient and cost-effective?

Question 202easymultiple choice
Read the full Resilient Cloud Solutions explanation →

A company wants to ensure its Amazon RDS DB instance is highly available with automatic failover in case of an AZ failure. Which configuration should they use?

Question 203mediummultiple choice
Read the full NAT/PAT explanation →

A company runs a stateful application on EC2 instances. The application stores session data locally. The instances are behind an ALB with sticky sessions enabled. A scaling event terminates an instance, causing loss of session data. How can the company prevent this while maintaining performance?

Question 204hardmultiple choice
Read the full Resilient Cloud Solutions explanation →

A company's application on Amazon ECS experiences intermittent failures when the task attempts to access an S3 bucket. The task role has the correct S3 permissions. What is the most likely cause?

Question 205mediummulti select
Read the full Resilient Cloud Solutions explanation →

A company is designing a resilient architecture for a critical application. Which TWO strategies improve resilience?

Question 206hardmulti select
Read the full NAT/PAT explanation →

A company runs a microservices architecture on Amazon ECS. They want to ensure that if a service fails, it does not cascade to other services. Which TWO design patterns should they implement?

Question 207easymulti select
Read the full Resilient Cloud Solutions explanation →

A company wants to protect its application from DDoS attacks. Which THREE AWS services should they use?

Question 208easymultiple choice
Read the full Resilient Cloud Solutions explanation →

A company is deploying a critical application on Amazon EC2 instances behind an Application Load Balancer (ALB) across multiple Availability Zones. The application must be resilient to the failure of an entire Availability Zone. Which design should the company implement?

Question 209mediummultiple choice
Read the full Resilient Cloud Solutions explanation →

A DevOps team is designing a disaster recovery solution for an Amazon RDS for MySQL database. The primary database is in us-east-1, and the recovery point objective (RPO) is 5 minutes, recovery time objective (RTO) is 1 hour. Which solution meets these requirements?

Question 210hardmultiple choice
Read the full NAT/PAT explanation →

A company runs a stateless web application on Amazon ECS with Fargate launch type. The application experiences intermittent traffic spikes. The company wants to ensure that the application can scale automatically and remain resilient to underlying infrastructure failures. Which combination of actions should the DevOps engineer take?

Question 211easymultiple choice
Read the full Resilient Cloud Solutions explanation →

A company is using Amazon S3 to store critical data. The company requires that all versions of objects be retained, including deleted objects, to meet compliance requirements. Which S3 feature should be enabled?

Question 212mediummultiple choice
Read the full DNS explanation →

A company has a production environment that uses Amazon Route 53 for DNS and an Application Load Balancer (ALB) to distribute traffic to EC2 instances. The company wants to implement a disaster recovery plan that automatically fails over to a secondary region in case the primary region becomes unavailable. Which configuration should be used?

Question 213hardmultiple choice
Read the full Resilient Cloud Solutions explanation →

A company runs a critical microservice on Amazon ECS with AWS Fargate. The service must be highly available across multiple Availability Zones. The DevOps engineer configured the service with a desired count of 4 tasks spread across 2 Availability Zones. During a deployment, a new task fails to start due to a missing environment variable. The deployment fails, but the old tasks continue to run. What is the most likely cause of the deployment failure and how can the engineer ensure future deployments are resilient?

Question 214easymultiple choice
Read the full Resilient Cloud Solutions explanation →

A company uses Amazon DynamoDB as the database for a mobile application. The application requires single-digit millisecond read and write latency and must be resilient to the failure of an entire AWS Region. Which DynamoDB feature should the company use?

Question 215mediummultiple choice
Read the full Resilient Cloud Solutions explanation →

A company runs a web application on AWS that uses Amazon SQS to decouple the frontend from the backend processing. The application experiences sudden spikes in traffic, causing the SQS queue to accumulate a large number of messages. The backend workers are unable to process messages fast enough, leading to increased latency. What solution can the company implement to improve the resilience and scalability of the backend?

Question 216hardmultiple choice
Read the full Resilient Cloud Solutions explanation →

A company is implementing a disaster recovery strategy for its Amazon Aurora MySQL database. The primary database is in us-west-2. The company requires an RPO of less than 1 minute and an RTO of less than 5 minutes. Which solution meets these requirements?

Question 217easymulti select
Read the full Resilient Cloud Solutions explanation →

A company is designing a highly available architecture for a web application that uses Amazon EC2 instances. The application must be resilient to the failure of a single instance and a single Availability Zone. Which TWO actions should the company take? (Choose TWO.)

Question 218mediummulti select
Read the full Resilient Cloud Solutions explanation →

A company is using AWS CloudFormation to deploy a critical application stack. The company wants to ensure that the stack can be recovered quickly in case of a failure. Which THREE strategies should the company implement? (Choose THREE.)

Question 219hardmulti select
Read the full Resilient Cloud Solutions explanation →

A company is designing a disaster recovery plan for an Amazon S3 data lake. The data lake stores sensitive data that must be replicated to a secondary Region with an RPO of 15 minutes. Which THREE actions should the company take? (Choose THREE.)

Question 220easymultiple choice
Read the full Resilient Cloud Solutions explanation →

Refer to the exhibit. A DevOps engineer applies the IAM policy shown to an S3 bucket to enforce server-side encryption. However, users report that some uploads succeed without encryption. What is the most likely reason?

Exhibit

Refer to the exhibit.

```
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": "s3:PutObject",
      "Resource": "arn:aws:s3:::example-bucket/*",
      "Condition": {
        "StringEquals": {
          "s3:x-amz-server-side-encryption": "AES256"
        }
      }
    }
  ]
}
```
Question 221mediummultiple choice
Read the full Resilient Cloud Solutions explanation →

Refer to the exhibit. A DevOps engineer applies the IAM policy shown to an S3 bucket to enforce server-side encryption. However, users report that some uploads succeed without encryption. What is the most likely reason?

Exhibit

Refer to the exhibit.

```
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": "s3:PutObject",
      "Resource": "arn:aws:s3:::example-bucket/*",
      "Condition": {
        "StringEquals": {
          "s3:x-amz-server-side-encryption": "AES256"
        }
      }
    }
  ]
}
```
Question 222hardmultiple choice
Read the full Resilient Cloud Solutions explanation →

Refer to the exhibit. A DevOps engineer runs the describe-target-health command and receives the output shown. The ALB target group has two instances. One instance is healthy, and the other is unhealthy with a 502 error. What is the most likely cause of the 502 error?

Network Topology
$ aws elbv2 describe-target-healthtarget-group-arn arn:aws:elasticloadbalancing:us-east-1:123456789012:targetgroup/my-tg/1234567890123456Refer to the exhibit.```"TargetHealthDescriptions": ["Target": {"Id": "i-0abcdef1234567890","Port": 80},"HealthCheckPort": "80","TargetHealth": {"State": "healthy","Description": "Target is healthy""Id": "i-1234567890abcdef0","State": "unhealthy",
Question 223easymultiple choice
Read the full Resilient Cloud Solutions explanation →

A company runs a stateless web application on EC2 instances behind an Application Load Balancer. To improve resilience, which configuration should be used for the EC2 instances?

Question 224mediummultiple choice
Read the full Resilient Cloud Solutions explanation →

A company is designing a disaster recovery strategy for a critical application that requires a Recovery Time Objective (RTO) of 15 minutes and a Recovery Point Objective (RPO) of 1 hour. The application runs on EC2 with data stored in Amazon RDS Multi-AZ. Which approach meets these requirements?

Question 225hardmultiple choice
Read the full Resilient Cloud Solutions explanation →

A company runs a critical application on EC2 instances in an Auto Scaling group across three Availability Zones. The application uses an Amazon RDS Multi-AZ DB instance. During a recent incident, one Availability Zone experienced a complete failure. The application remained available, but performance degraded significantly. What is the most likely cause of the degradation?

Question 226easymultiple choice
Read the full Resilient Cloud Solutions explanation →

A company wants to ensure that its application can recover from an Amazon S3 service disruption. The application reads and writes data to S3. Which strategy should the application implement to achieve resilience?

Question 227mediummultiple choice
Read the full Resilient Cloud Solutions explanation →

A company is designing a serverless application using AWS Lambda, Amazon API Gateway, and Amazon DynamoDB. The application must tolerate a Regional failure. Which design provides the most resilience?

Question 228hardmultiple choice
Read the full Resilient Cloud Solutions explanation →

A company runs a stateful web application on EC2 instances in an Auto Scaling group. The application uses an Application Load Balancer (ALB) and an Amazon ElastiCache Redis cluster. Users report that after a scaling event, they are logged out and lose session data. What is the most likely cause?

Question 229easymultiple choice
Review the full routing breakdown →

A company uses Amazon Route 53 to route traffic to an Application Load Balancer. They want to improve availability by routing traffic to multiple ALBs in different AWS Regions. Which routing policy should they use?

Question 230mediummultiple choice
Read the full Resilient Cloud Solutions explanation →

A company's application runs on Amazon ECS with Fargate launch type. The application must be resilient to an Availability Zone failure. Which configuration should be used?

Question 231hardmultiple choice
Read the full Resilient Cloud Solutions explanation →

A company runs a critical application on EC2 instances behind an Application Load Balancer. The application uses an Amazon RDS for PostgreSQL Multi-AZ DB instance. During a recent failover test, the application experienced a 5-minute downtime. The RDS failover completed within 30 seconds. What is the most likely cause of the prolonged downtime?

Question 232easymulti select
Read the full Resilient Cloud Solutions explanation →

A company is designing a highly available architecture for a web application using AWS. Which TWO of the following design principles should be applied? (Select TWO.)

Question 233mediummulti select
Read the full Resilient Cloud Solutions explanation →

A company is designing a disaster recovery plan for an application running on AWS. The plan must meet an RTO of 1 hour and an RPO of 15 minutes. Which TWO strategies can achieve these objectives? (Select TWO.)

Question 234hardmulti select
Read the full NAT/PAT explanation →

A company is migrating a monolithic application to a microservices architecture on AWS. To improve resilience, which THREE design patterns should be implemented? (Select THREE.)

Question 235hardmultiple choice
Review the full routing breakdown →

A company runs a critical web application on EC2 instances in an Auto Scaling group. The application uses an Application Load Balancer (ALB) with health checks pointing to /health. Recently, the application experienced intermittent failures where the ALB would mark instances as unhealthy and route traffic away, causing a reduction in capacity. The development team noticed that the /health endpoint occasionally returns HTTP 503 when the application is under heavy load, but the application can recover quickly. The team wants to avoid unnecessary instance replacements while ensuring availability. Which solution should the DevOps engineer implement?

Question 236mediummultiple choice
Read the full Resilient Cloud Solutions explanation →

A company has deployed a multi-tier application on AWS. The web tier uses an Auto Scaling group of EC2 instances behind an Application Load Balancer. The application tier uses another Auto Scaling group of EC2 instances that process messages from an Amazon SQS queue. The database tier uses Amazon RDS Multi-AZ. Recently, the application experienced a complete outage when the SQS queue became overwhelmed with messages due to a sudden spike in traffic. The application tier could not process messages fast enough, causing the queue to grow indefinitely and eventually exceed the visibility timeout, leading to message loss and degraded performance. The DevOps engineer needs to improve the resilience of the architecture to handle traffic spikes without losing messages. Which solution should be implemented?

Question 237mediummultiple choice
Read the full Resilient Cloud Solutions explanation →

A company runs a critical e-commerce application on Amazon EC2 instances behind an Application Load Balancer (ALB) with Auto Scaling. The application must be resilient to an Availability Zone (AZ) failure. What is the MOST resilient configuration?

Question 238hardmultiple choice
Read the full Resilient Cloud Solutions explanation →

A company uses AWS Lambda with Amazon DynamoDB to process orders. During peak hours, the Lambda function sometimes fails with throttling errors from DynamoDB. The system must be resilient and cost-effective. What should a DevOps engineer do?

Question 239easymultiple choice
Read the full Resilient Cloud Solutions explanation →

A DevOps team is designing a disaster recovery plan for an RDS MySQL database. The database must be recoverable with minimal data loss in case of a regional failure. Which solution provides the LOWEST Recovery Point Objective (RPO)?

Question 240mediummultiple choice
Read the full Resilient Cloud Solutions explanation →

A company runs a stateless web application on Amazon ECS with Fargate. The application must be highly available across multiple Availability Zones. What is the BEST way to achieve this?

Question 241hardmultiple choice
Read the full NAT/PAT explanation →

A company's application runs on Amazon EC2 instances in an Auto Scaling group. The application writes logs to local instance storage. The operations team needs to ensure logs are not lost during instance termination or scaling events. What should be done?

Question 242easymultiple choice
Read the full DNS explanation →

A company uses Amazon Route 53 for DNS. They want to ensure that if their primary website endpoint fails, traffic is automatically routed to a secondary endpoint in a different Region. Which routing policy should be used?

Question 243mediummultiple choice
Read the full NAT/PAT explanation →

A company is building a serverless application using AWS Lambda, Amazon API Gateway, and Amazon DynamoDB. The application must be resilient to sudden spikes in traffic without manual intervention. Which combination of services should be used?

Question 244mediummulti select
Read the full Resilient Cloud Solutions explanation →

A company is designing a resilient architecture for a web application that uses Amazon RDS for MySQL. The application must be able to withstand the loss of an entire AWS Region. Which TWO actions should the company take?

Question 245hardmulti select
Read the full Resilient Cloud Solutions explanation →

A company runs a containerized application on Amazon EKS. The application must be highly available across multiple Availability Zones and must automatically recover from node failures. Which THREE steps should be taken?

Question 246easymulti select
Read the full Resilient Cloud Solutions explanation →

A company wants to ensure that its Amazon S3 bucket is resilient to accidental deletion of objects. Which TWO actions should be taken?

Question 247mediummultiple choice
Read the full Resilient Cloud Solutions explanation →

Refer to the exhibit. An IAM policy is attached to an IAM role used by an EC2 instance. The instance is part of an Auto Scaling group. During a scale-in event, the instance fails to stop itself. What is the MOST likely cause?

Exhibit

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "ec2:DescribeInstances",
        "ec2:StartInstances",
        "ec2:StopInstances"
      ],
      "Resource": "*"
    }
  ]
}
Question 248hardmultiple choice
Read the full Resilient Cloud Solutions explanation →

A company runs a high-traffic web application on a fleet of EC2 instances behind an Application Load Balancer (ALB) with Auto Scaling. The application uses an Amazon RDS for PostgreSQL database. Recently, during a traffic spike, the application became unresponsive. Investigation revealed that the database CPU utilization reached 100%, causing queries to timeout. The Auto Scaling group added more EC2 instances, which only increased the load on the database. The DevOps team needs to implement a solution that prevents the database from being overwhelmed during traffic spikes while maintaining application availability. The solution must be cost-effective and require minimal changes to the application code. Which solution should the DevOps team implement?

Question 249mediummultiple choice
Read the full Resilient Cloud Solutions explanation →

A company hosts a static website on Amazon S3 with a CloudFront distribution. The website is critical for business operations and must be available even if the primary AWS Region fails. Currently, the S3 bucket is in us-east-1, and CloudFront uses that bucket as the origin. The company has a secondary bucket in us-west-2 with a replica of the data. The company wants to use CloudFront to automatically fail over to the secondary bucket if the primary becomes unavailable. The DevOps engineer needs to implement a solution that requires minimal operational overhead. What should the engineer do?

Question 250easymultiple choice
Read the full Resilient Cloud Solutions explanation →

A company uses AWS CloudFormation to deploy infrastructure. During a recent deployment, the stack failed to create an Amazon RDS DB instance because of a parameter validation error. The DevOps engineer fixed the parameter and wants to resume the stack creation without recreating the resources that were already successfully created. The stack template is parameterized and uses nested stacks. What is the MOST efficient way to resume the stack creation?

Question 251mediummultiple choice
Review the full routing breakdown →

A company runs a microservices application on Amazon ECS with Fargate. The application uses an Application Load Balancer (ALB) to route traffic to services. Each service has a required number of tasks for capacity. The company recently experienced a prolonged outage when a bug caused all tasks of the critical 'payment' service to crash simultaneously. The DevOps team needs to implement a deployment strategy that reduces the risk of a full service outage during updates. The strategy must also allow for quick rollback if a deployment fails. Which deployment strategy should the team implement?

Question 252mediummultiple choice
Read the full NAT/PAT explanation →

A media company runs a video processing pipeline on AWS. Raw videos are uploaded to an S3 bucket, which triggers a Lambda function to start an AWS Batch job for transcoding. The Batch job reads the source video from S3, processes it, and writes the output to another S3 bucket. Recently, the company has seen an increase in processing failures. Investigation shows that the Batch jobs are being terminated with a 'TIMEOUT' status after running for exactly 30 minutes. The video files are large, and some jobs legitimately take up to 45 minutes. The Batch job definition has a 'timeout' setting configured. Which action should be taken to resolve this issue?

Question 253hardmultiple choice
Read the full Resilient Cloud Solutions explanation →

A financial services company runs a critical application on Amazon ECS with Fargate launch type. The application has strict availability requirements and must survive an Availability Zone failure. The ECS service is configured with a desired count of 4 tasks, spread across two Availability Zones using a spread strategy. The service is fronted by an Application Load Balancer. During a recent AZ outage, one AZ became completely unavailable, but the application continued to serve traffic. However, after the AZ recovered, the ECS service did not automatically place new tasks in the recovered AZ to restore the desired count. The service remains with only 2 tasks in the remaining AZ. What is the most likely cause and solution?

Question 254easymultiple choice
Review the full subnetting walkthrough →

A startup runs a web application on EC2 instances behind an Application Load Balancer. They want to improve resilience by distributing instances across multiple Availability Zones. Currently, all instances are in us-east-1a. They create a launch template and an Auto Scaling group with a desired capacity of 2. They configure the Auto Scaling group to use two subnets: one in us-east-1a and one in us-east-1b. However, after updating, all instances remain in us-east-1a. What is the most likely reason?

Question 255mediummulti select
Read the full Resilient Cloud Solutions explanation →

A company runs a multi-tier web application on AWS. The application consists of an Application Load Balancer, EC2 instances in an Auto Scaling group, and an Amazon RDS Multi-AZ DB instance. The application experiences intermittent failures when the RDS primary instance fails over to the standby. The engineer needs to ensure that the application handles failover gracefully without manual intervention.

Exhibit

Which TWO actions should the DevOps engineer take to improve the resilience of the architecture? (Choose two.)
Question 256hardmulti select
Read the full Resilient Cloud Solutions explanation →

A gaming company runs a real-time multiplayer game on AWS using Amazon EC2 instances in an Auto Scaling group behind a Network Load Balancer. The game state is stored in Amazon ElastiCache for Redis. The team needs to ensure that the architecture can survive a regional failure with minimal data loss and recovery time. The RTO is 15 minutes and RPO is 5 minutes. The game currently uses a single Redis cluster in us-east-1.

Exhibit

Which THREE steps should the team take to meet these requirements? (Choose three.)
Question 257easymultiple choice
Read the full Resilient Cloud Solutions explanation →

A company runs a static website on Amazon S3 with public read access. The website content is stored in an S3 bucket and served through an Amazon CloudFront distribution for better performance and security. Recently, the company noticed that some users are accessing the S3 bucket directly via the S3 endpoint, bypassing CloudFront. This increases costs and exposes the bucket to potential attacks. The company wants to ensure that all access to the website goes through CloudFront only. Which solution should the company implement?

Question 258hardmultiple choice
Read the full Resilient Cloud Solutions explanation →

A company runs a critical application on Amazon ECS with the Fargate launch type. The application is deployed across three Availability Zones. Each service has its own Application Load Balancer. The company wants to implement a blue/green deployment strategy to reduce risk. They currently use AWS CodeDeploy for ECS deployments. During a recent deployment, the company noticed that the new version (green) was not receiving any traffic even after passing all health checks. The CodeDeploy configuration uses a 'Linear10PercentEvery3Minutes' traffic shifting configuration. What is the most likely reason that the green tasks are not receiving traffic?

Question 259mediummultiple choice
Read the full NAT/PAT explanation →

A company runs a serverless application using AWS Lambda functions that process messages from an Amazon SQS queue. The function scales up to handle high traffic but sometimes experiences throttling errors (HTTP 429) from Lambda. The company wants to improve the resilience of the application by reducing throttling. The SQS queue is configured as a Lambda event source with a batch size of 10. The Lambda function has a reserved concurrency of 100. Which combination of actions will best reduce throttling? (Choose the single best answer.)

Practice tests

Scored 10-question sessions with instant feedback and explanations.

DOP-C02 Practice Test 1 — 10 Questions→DOP-C02 Practice Test 2 — 10 Questions→DOP-C02 Practice Test 3 — 10 Questions→DOP-C02 Practice Test 4 — 10 Questions→DOP-C02 Practice Test 5 — 10 Questions→DOP-C02 Practice Exam 1 — 20 Questions→DOP-C02 Practice Exam 2 — 20 Questions→DOP-C02 Practice Exam 3 — 20 Questions→DOP-C02 Practice Exam 4 — 20 Questions→Free DOP-C02 Practice Test 1 — 30 Questions→Free DOP-C02 Practice Test 2 — 30 Questions→Free DOP-C02 Practice Test 3 — 30 Questions→DOP-C02 Practice Questions 1 — 50 Questions→DOP-C02 Practice Questions 2 — 50 Questions→DOP-C02 Exam Simulation 1 — 100 Questions→

Practice by domain

Each domain maps to a weighted exam section. Focus on the domain where you are weakest.

Configuration Management and IaCResilient Cloud SolutionsMonitoring and LoggingIncident and Event ResponseSecurity and ComplianceSDLC Automation

Practice by scenario

Filter questions by type — troubleshooting, exhibit, drag-and-drop, PBQ, ACLs, OSPF, and more.

Browse scenarios→

Continue studying

All Resilient Cloud Solutions setsAll Resilient Cloud Solutions questionsDOP-C02 Practice Hub