How many Incident and Event Response questions are on the DOP-C02 exam?

The Incident and Event Response domain is one of the weighted domains on the DOP-C02 exam. The Courseiva question bank has 254 practice questions for this domain.

Free DOP-C02 Incident and Event Response Practice Questions (2026)

Q: How can I practice Incident and Event Response questions for DOP-C02?

Click any of the 254 questions listed on this page to see the full question and explanation, or use the session launcher to start a focused practice session of 10, 20, 30 or 50 questions drawn only from the Incident and Event Response domain.

Practice Incident and Event Response questions

10Q 20Q 30Q 50Q

All DOP-C02 Incident and Event Response questions (254)

Start session

Click any question to see the full explanation and answer options, or start a focused practice session above.

A company uses an Auto Scaling group with a dynamic scaling policy based on a custom CloudWatch metric. After a recent deployment, the metric spikes unexpectedly, causing the Auto Scaling group to launch several EC2 instances. The operations team wants to quickly determine whether the spike was caused by a real load increase or a deployment issue. What is the MOST efficient way to investigate this?

A company runs a critical application on Amazon ECS with Fargate launch type. The application uses an Application Load Balancer (ALB) in front. During a load test, the team notices a sudden increase in 5xx errors from the ALB, and some tasks become unhealthy. The task logs show occasional 'OutOfMemoryError' exceptions. The task definition currently has 512 CPU units and 1024 MiB memory. What should the team do to mitigate the issue while maintaining a cost-effective approach?

A DevOps engineer is investigating an incident where an EC2 instance became unreachable. The engineer checks the AWS Management Console and finds the instance is running, but the status check shows '2/2 checks passed' and the system log shows no errors. What should the engineer do NEXT to diagnose the connectivity issue?

A company has an AWS Lambda function that processes S3 events. The function is invoked multiple times for the same S3 object, causing duplicate processing. The engineer suspects the issue is related to retries from the S3 event notification or Lambda's built-in retry behavior. What is the MOST effective way to ensure idempotent processing?

An organization uses AWS CloudFormation to manage infrastructure. During an incident, a stack update fails with 'UPDATE_ROLLBACK_FAILED' status. The engineer needs to bring the stack to a consistent state without losing data. What is the BEST approach?

A company uses Amazon RDS for MySQL with Multi-AZ deployment. The database instance fails and AWS automatically fails over to the standby. After the failover, the application cannot connect to the database. The engineer checks the RDS console and sees that the instance status is Available. What is the MOST likely cause of the connectivity issue?

A DevOps team observes that an Amazon CloudFront distribution is returning HTTP 504 errors for a small percentage of requests. The origin is an Application Load Balancer (ALB) that distributes traffic to EC2 instances. The team has already checked the ALB's access logs and found that the ALB returns 200 OK for all requests. What should the team investigate NEXT?

A company uses AWS Organizations with multiple accounts. The security team notices that an IAM user in the production account has been making changes to security group rules that are not compliant with the company's policy. The team wants to automatically revoke any non-compliant security group rules and notify the security team. What is the MOST efficient way to achieve this?

A company is experiencing a DDoS attack on their web application hosted on Amazon EC2 behind an Application Load Balancer (ALB). The attack is causing high CPU utilization on the instances. The security team needs to mitigate the attack with minimal disruption to legitimate users. Which TWO actions should the team take? (Choose two.)

An e-commerce platform uses Amazon DynamoDB as its primary database. During a flash sale, the application experiences throttling errors. The operations team needs to implement a solution to handle sudden traffic spikes while keeping costs under control. Which TWO actions should the team take? (Choose two.)

A DevOps engineer is troubleshooting an Amazon RDS for PostgreSQL instance that is running out of storage. The engineer wants to resolve the issue without downtime. Which TWO actions can achieve this? (Choose two.)

An application log excerpt shows repeated HTTP 500 errors for the /api/orders endpoint, with occasional successful health checks. The application runs on EC2 instances behind an ALB. What is the MOST likely cause of this pattern?

An IAM policy is attached to a role used by an operations team. The team reports that they are unable to start or stop EC2 instances tagged with Environment=Production. Other instances can be described. What is the MOST likely reason for this failure?

A company runs a microservices application on Amazon ECS with Fargate. The application includes a service that processes messages from an Amazon SQS queue. Recently, the processing time has increased, and the SQS queue depth is growing. The CloudWatch metrics show that the ECS service's CPU utilization is consistently around 70%, memory utilization is 80%, and the number of running tasks is at the maximum allowed (10). The service is configured with a target tracking scaling policy based on CPU utilization with a target value of 50%. However, the auto scaling does not seem to be adding tasks. The engineer checks the ECS service events and finds no scaling activity. What is the MOST likely reason the auto scaling is not working, and what action should be taken to resolve the issue?

A company uses AWS Organizations with multiple accounts. The security team needs to automatically isolate a compromised EC2 instance by removing it from its security group and attaching a quarantine security group that only allows traffic to a forensic instance. Which combination of actions should be implemented?

A DevOps engineer notices that an EC2 instance in an Auto Scaling group is repeatedly failing health checks and being terminated. The engineer needs to capture the root cause by collecting memory dumps and system logs before termination. What should the engineer do?

A company is using AWS CloudFormation to deploy infrastructure. An engineer needs to ensure that any changes to the production stack are reviewed and approved before they are applied. The engineer also wants to prevent unauthorized changes. Which solution should the engineer implement?

A company has a legacy application running on an EC2 instance that is not part of an Auto Scaling group. The instance is experiencing a memory leak. The DevOps engineer needs to collect memory metrics to analyze the issue without modifying the application. What should the engineer do?

A company uses AWS Lambda functions to process incoming events from Amazon S3. The operations team notices that some events are not being processed, and there is no error in the Lambda function logs. What is the most likely cause?

A company is using Amazon RDS for MySQL with Multi-AZ deployment. The database experiences a failover due to an availability zone outage. After the failover, the application team reports that the database endpoint is not resolving to the new primary. What is the most likely reason?

A DevOps engineer is investigating why an Amazon ECS service is not scaling out as expected. The service has a target tracking scaling policy based on average CPU utilization. The CloudWatch alarm shows that CPU utilization has exceeded the target for several minutes, but no scaling activity has occurred. What is the most likely cause?

A company uses AWS CloudTrail to log API activity. The security team wants to be alerted when an IAM user creates a new access key. Which TWO steps should be taken to accomplish this? (Choose TWO.)

A company is experiencing a DDoS attack on its application hosted on AWS. The application uses an Application Load Balancer (ALB) with an Auto Scaling group of EC2 instances. The security team needs to mitigate the attack with minimal latency impact on legitimate users. Which THREE actions should the team take? (Choose THREE.)

A security engineer reviews the CloudTrail log entry above and notices that a security group was modified to allow SSH access from anywhere. The engineer wants to ensure that such changes are automatically detected and remediated in the future. What should the engineer do?

A DevOps engineer updates an ECS service via CloudFormation. The stack update fails with the message 'Resource update cancelled'. The engineer notices that the ECS service's desired count was temporarily reduced during the update. What is the most likely cause of the failure?

A company runs a critical web application on AWS. The application is deployed across multiple Availability Zones using an Application Load Balancer (ALB) with an Auto Scaling group of EC2 instances. The Auto Scaling group uses a launch template that specifies an Amazon Linux 2 AMI. The application stores session state in an ElastiCache Redis cluster. Recently, the operations team received alerts that the application is returning 503 errors intermittently. Investigation shows that the ALB target group health checks are failing for some instances, but those instances are still in service. The CloudWatch logs from the instances show that the application is running, but the health check endpoint is timing out after 5 seconds. The health check is configured with a 5-second timeout, 10-second interval, and 2 consecutive successes required to mark healthy. The DevOps engineer suspects that the issue is due to high CPU utilization on the instances causing the health check to respond slowly. The engineer wants to implement a solution that prevents the ALB from routing traffic to instances that are experiencing high CPU, and also automatically scales out to handle the increased load. What should the engineer do?

A company uses AWS CloudTrail to audit API activity. During an incident investigation, they find that a user with the IAM policy 'AdministratorAccess' deleted an S3 bucket. The security team wants to know the source IP address and user agent used for the delete operation. Which action should the team take to obtain this information?

A DevOps engineer is troubleshooting an Auto Scaling group (ASG) that is not launching instances as expected. The ASG is configured with a launch template that uses an Amazon Linux 2 AMI. The engineer checks the EC2 Auto Scaling console and sees that the group's desired capacity is set to 2, but only 1 instance is running. The last scaling activity shows 'Failed to launch instance. Error: Your quota allows for 0 more running instance(s).' What is the most likely cause?

A company runs a critical application on Amazon ECS with Fargate. The application is deployed across multiple Availability Zones and uses an Application Load Balancer (ALB) as the front-end. During a recent incident, users experienced intermittent connectivity failures. The DevOps team suspects that tasks are being stopped due to resource exhaustion. Which combination of metrics and actions should the team use to diagnose and prevent recurrence?

A company uses AWS Organizations with multiple accounts. The security team wants to ensure that all IAM roles in member accounts have a maximum session duration of 1 hour. They need a way to detect any roles that violate this policy. What should they do?

A company is experiencing an ongoing security incident where an unauthorized user gained access to an AWS access key and is making API calls. The security team needs to immediately stop the unauthorized access and preserve evidence for investigation. Which TWO actions should the team take? (Choose TWO.)

A DevOps engineer is troubleshooting an application running on an EC2 instance. The application needs to access an Amazon RDS database using IAM database authentication. The EC2 instance is associated with an IAM role 'EC2-AppRole', and the RDS instance has a resource-based policy that allows 'DatabaseAccessRole' to connect. The engineer sees the error in the exhibit. What is the most likely cause?

A company runs a multi-tier web application on AWS. The application consists of an Application Load Balancer (ALB), an EC2 Auto Scaling group (ASG) for web servers, and an Amazon RDS Multi-AZ DB instance. The ASG uses a launch template with Amazon Linux 2 and a user data script that installs the web application and connects to the RDS database using a static password stored in the user data. Recently, the security team discovered that the user data script is exposed in the EC2 console and could be viewed by anyone with EC2 describe-instances permissions. The team wants to remediate this immediately without causing downtime. The ASG is configured with a min size of 2, max size of 6, and desired capacity of 4. The application is currently under load. Which option describes the best course of action?

A company runs a critical application on Amazon EC2 instances behind an Application Load Balancer. During a security incident, the security team needs to isolate a compromised instance for forensic analysis without affecting the application's availability. What is the MOST effective action to take?

A DevOps team is investigating a production incident where an Amazon RDS for MySQL database experienced a sudden spike in connections and CPU utilization. The team suspects a SQL injection attack. Which TWO actions should the team take to investigate and mitigate the incident?

An incident response team is analyzing an IAM policy attached to a role used by a forensic tool. The tool needs to create snapshots of EBS volumes during an incident. However, when the tool runs from an IP address in the 203.0.113.0/24 range, the CreateSnapshot API call fails with an access denied error. What is the MOST likely cause?

Drag and drop the steps to implement a blue/green deployment using AWS CodeDeploy.

Drag and drop the steps to troubleshoot an AWS CloudTrail that is not logging API calls.

Match each AWS compute or container service with its description.

Match each AWS automation or configuration management tool to its description.

A DevOps engineer notices that an EC2 instance running a critical web application has been terminated unexpectedly. The instance was part of an Auto Scaling group. Which step should the engineer take FIRST to investigate the root cause?

A company uses AWS Lambda functions to process S3 events. After a recent deployment, some functions fail with timeout errors. The engineer needs to implement a solution that automatically captures and stores the function's input payload for all failed invocations without modifying the Lambda code. Which approach meets these requirements?

A company uses Amazon RDS for MySQL as its database. The operations team notices that the database CPU utilization is consistently above 90% during peak hours, causing slow query responses. The team needs to quickly reduce CPU load without changing the application code. Which action should the team take?

A company stores sensitive data in Amazon S3. A security audit reveals that several S3 buckets are publicly accessible. The DevOps engineer needs to implement a solution that automatically detects and alerts on any S3 bucket that becomes public. Which AWS service should the engineer use?

A company runs a critical application on Amazon ECS with Fargate launch type. The application experiences intermittent connection timeouts when calling an external API. The engineer needs to capture network traffic to diagnose the issue. Which solution is most appropriate?

A company uses AWS CloudFormation to deploy infrastructure. A recent stack update failed, and the engineer needs to roll back to the previous stable state. Which CloudFormation feature should the engineer use?

A company's DevOps team uses AWS CodePipeline to automate deployments. A recent pipeline execution failed at the 'Deploy' stage. The engineer needs to view the detailed logs for the failed action. Which AWS service or feature should the engineer use?

A company runs a web application on Amazon EC2 instances behind an Application Load Balancer (ALB). The application experiences intermittent 503 errors. The engineer suspects the ALB is returning these errors because the target instances are unhealthy. Which metric should the engineer monitor to confirm this suspicion?

A company uses AWS Key Management Service (KMS) to encrypt data at rest. The security team needs to know who attempted to decrypt data using a specific KMS key and whether the attempt succeeded. Which AWS service should the team use?

A company uses Amazon CloudWatch Synthetics canaries to monitor its web application endpoints. The canaries are failing intermittently with 'ClientError' status codes. Which TWO actions should the engineer take to diagnose the issue? (Choose two.)

A company has a multi-account AWS organization. The security team needs to detect and respond to security incidents across all accounts centrally. Which THREE services should the team use together? (Choose three.)

A DevOps engineer is troubleshooting an AWS CodeDeploy deployment that failed. Which TWO resources should the engineer examine to identify the cause of the failure? (Choose two.)

Refer to the exhibit. An IAM policy is attached to a user. The user tries to upload an object to the S3 bucket 'my-bucket' without server-side encryption. What will happen?

Refer to the exhibit. A CloudWatch alarm is configured for an EC2 instance. The CPU utilization exceeds 80% for two consecutive minutes. What action will occur?

A DevOps engineer notices that an EC2 instance running a critical application is unresponsive. The instance is part of an Auto Scaling group with a minimum size of 2. What is the quickest way to restore service with minimal data loss?

A company uses AWS CloudTrail to log API events. During an incident investigation, they need to identify who deleted an S3 bucket. Which CloudTrail feature should be used to retrieve the event details quickly?

An application running on Amazon ECS (Fargate) experiences intermittent HTTP 503 errors. The application uses an Application Load Balancer. The ECS service has a desired count of 2. CPU and memory utilization are below 50%. What is the most likely cause?

A DevOps team receives a CloudWatch alarm that an RDS DB instance's CPU utilization has exceeded 90% for 5 minutes. The application is experiencing latency. What is the best immediate step to mitigate the issue?

A company uses AWS Systems Manager Patch Manager to patch EC2 instances. During a patching window, some instances fail to apply patches. The engineer checks the SSM Agent logs and sees 'ERROR: Failed to download patch files from the source.' What is the most likely cause?

A company uses AWS Organizations with multiple accounts. The security team needs a centralized solution to detect and respond to EC2 instances that are publicly accessible with SSH open to 0.0.0.0/0. Which combination of services provides the most automated detection and remediation?

A DevOps engineer is troubleshooting a Lambda function that times out after 3 seconds. The function makes an HTTP request to an external API. The function's timeout setting is 10 seconds. What is the most likely cause of the timeout?

A company uses Amazon CloudFront to serve static content from an S3 bucket. Users in a specific region report slow load times. The DevOps team checks CloudFront metrics and sees a high error rate (5xx) for that region. The S3 bucket is healthy. What is the most likely cause?

A company runs a stateful web application on EC2 instances behind an Application Load Balancer. The application uses sticky sessions (session affinity) based on cookies. During a deployment, the Auto Scaling group launches new instances, but users experience session loss. What is the most likely cause?

Which TWO actions should be taken to ensure a highly available and resilient architecture for a critical web application on AWS? (Choose two.)

Which THREE steps should a DevOps engineer take to troubleshoot an EC2 instance that cannot be reached via SSH? (Choose three.)

Which TWO metrics should be monitored in Amazon CloudWatch to detect a potential memory leak in an EC2 instance? (Choose two.)

A DevOps engineer notices that an Amazon RDS for MySQL instance has failed over to a standby replica. The engineer needs to identify the root cause by examining metrics. Which AWS service should the engineer use to view the database load, replication lag, and failover events?

An application running on Amazon EC2 instances in an Auto Scaling group is experiencing intermittent connectivity issues. The DevOps team suspects a security group configuration problem. Which approach should the team use to analyze security group traffic and identify denied requests?

A company uses AWS Lambda functions to process events from Amazon SQS. Recently, the Lambda function has been throttled, causing messages to accumulate in the dead-letter queue (DLQ). The function’s reserved concurrency is set to 100, and the account’s regional concurrency limit is 1000. What is the MOST likely cause of the throttling?

A DevOps engineer receives an alert that an Amazon EC2 instance’s CPU utilization has been above 90% for the past hour. The instance is part of an Auto Scaling group with a step scaling policy based on average CPU. The engineer checks the CloudWatch alarm and sees that it is in the ALARM state. What should the engineer do to verify that the Auto Scaling group is scaling out properly?

A company uses AWS CloudFormation to deploy infrastructure. During an incident, a stack update fails with a stack rollback. The engineer needs to prevent the stack from rolling back on future failures and instead retain the resources for debugging. Which CloudFormation feature should the engineer use?

An application running on Amazon ECS with Fargate is experiencing increased latency. The DevOps team suspects that the task is running out of memory and swapping. Which set of CloudWatch metrics should the team examine to confirm this suspicion?

A DevOps engineer receives an alert that an Amazon S3 bucket has become publicly accessible. The engineer needs to identify who made the bucket public. Which AWS service should the engineer use to find the API call that changed the bucket policy?

A company uses AWS Organizations with multiple accounts. The security team wants to ensure that all accounts automatically forward their CloudWatch Logs to a central logging account. Which solution should the team implement?

A critical application is deployed on Amazon EKS. The DevOps team notices that pods are failing with 'CrashLoopBackOff' status. The team needs to capture the application logs before the pod restarts to debug the issue. Which approach should the team use?

A DevOps engineer is troubleshooting an issue where an EC2 instance in a private subnet cannot reach the internet. The instance has a route to a NAT gateway. Which TWO of the following should the engineer check? (Choose TWO.)

A company uses AWS Lambda with an Amazon DynamoDB trigger. Recently, the Lambda function started failing with 'ProvisionedThroughputExceededException' errors. The DevOps team needs to mitigate the issue. Which TWO actions should the team take? (Choose TWO.)

A company uses an Application Load Balancer (ALB) in front of an Auto Scaling group of EC2 instances. The application is experiencing intermittent HTTP 503 errors. The DevOps team needs to diagnose the cause. Which THREE of the following should the team investigate? (Choose THREE.)

An IAM policy attached to a user is shown in the exhibit. The user reports that they are unable to delete an object in the 'example-bucket' bucket. What is the reason for this?

A DevOps engineer observes the CloudWatch alarm output shown in the exhibit. The alarm is in ALARM state for instance i-0abcd1234efgh5678. The engineer checks the EC2 console and sees that the instance's CPU utilization is currently 10%. What is the MOST likely explanation?

The CloudFormation template in the exhibit deploys an S3 bucket with a bucket policy. After deployment, the DevOps team discovers that the bucket is publicly accessible. Which change should be made to prevent public access while allowing only authenticated users from a specific AWS account to read objects?

A DevOps engineer receives an alarm that an EC2 instance's CPU utilization has exceeded 90% for 5 minutes. The engineer needs to automatically recover the instance. Which AWS service should be used to configure automatic recovery?

A company's production RDS MySQL instance experienced a failover. The DevOps team needs to understand the root cause. Which set of logs should be reviewed first?

An application running on an EC2 instance in a private subnet needs to access an S3 bucket. The instance has an IAM role with S3 access. However, the application is failing with timeout errors. The security group allows all outbound traffic, and the NACL allows outbound ephemeral ports. What is the most likely cause?

A DevOps engineer is troubleshooting an AWS Lambda function that is intermittently timing out. The function is configured with a 3-second timeout and 128 MB memory. The function processes messages from an SQS queue. What is the most cost-effective change to reduce timeouts?

After deploying a new application version using AWS CodeDeploy, an EC2 instance fails the deployment. The deployment group is configured with an in-place deployment. The engineer sees the error 'ScriptMissing' in the CodeDeploy logs. What should the engineer check?

A company uses Amazon CloudWatch Logs to collect application logs from EC2 instances. The security team requires that log data be encrypted at rest using a customer-managed AWS KMS key. The logs are currently being delivered, but they are not encrypted. What is the most likely reason?

A DevOps engineer receives a CloudWatch alarm that the 'StatusCheckFailed' metric for an EC2 instance is in ALARM state. The instance is part of an Auto Scaling group. What should the engineer do first to restore service?

An application running on Amazon ECS Fargate is experiencing intermittent 'CannotPullContainerError' errors. The task definition references a Docker image in a private Amazon ECR repository. The task execution role has the 'AmazonECSTaskExecutionRolePolicy' policy attached. What is the most likely cause?

A company uses AWS CloudFormation to manage infrastructure. A recent stack update failed with the error 'UPDATE_ROLLBACK_FAILED'. The stack is now in a 'UPDATE_ROLLBACK_FAILED' state, and the engineer needs to fix the stack. What is the correct course of action?

A DevOps engineer is designing an incident response plan for a serverless application using AWS Lambda, API Gateway, and DynamoDB. Which TWO services should be used to monitor and alert on errors and latency?

A company runs a critical application on EC2 instances behind an Application Load Balancer (ALB) in an Auto Scaling group. The team wants to automate the response to an instance failure. Which THREE steps should be taken to ensure automatic recovery and notification?

During a security incident, a DevOps engineer discovers that an EC2 instance has been compromised. The instance has an IAM role with permissions to access S3 and DynamoDB. Which THREE immediate actions should the engineer take to contain the incident?

A company's production EC2 instance running a web application becomes unresponsive. The operations team checks CloudWatch metrics and sees a CPU Utilization spike to 100% for the last 10 minutes. What is the MOST efficient first step to restore service?

A DevOps engineer receives a CloudWatch alarm that an Auto Scaling group has been in an 'Insufficient data' state for 20 minutes. What does this indicate?

A company uses AWS CloudTrail to log API calls. An IAM user's credentials are compromised, and the attacker launches multiple EC2 instances in regions that are not typically used. The security team wants to receive near-real-time notifications of any API calls from this user. What is the MOST effective solution?

A company experiences an unexpected spike in network traffic to a web application hosted on EC2 instances behind an Application Load Balancer. The DevOps team needs to investigate the source IP addresses generating the traffic. Which AWS service should they use to capture the traffic?

An organization uses AWS Systems Manager to manage its EC2 instances. After a security incident, the security team wants to ensure that all future API calls to Systems Manager are logged and monitored. What is the MOST efficient way to achieve this?

A company runs a critical application on an Amazon RDS for MySQL DB instance. The application experiences intermittent connection timeouts. The DevOps team notices that the DB instance's CPU and memory metrics are normal. What should the team check NEXT to diagnose the issue?

100

A DevOps engineer receives an alarm that an EC2 instance's status check has failed. The instance is part of an Auto Scaling group. How should the engineer respond?

101

A company uses Amazon CloudFront to distribute content globally. Users in some regions report slow load times. The DevOps team wants to identify the geographic regions where performance is worst. Which tool should they use?

102

A company runs a containerized application on Amazon ECS with Fargate launch type. The application experiences periodic spikes in response times. The CloudWatch metrics show high CPU and memory usage for the tasks during these spikes. What is the MOST effective approach to handle these spikes?

103

A company uses AWS CloudFormation to manage infrastructure. An engineer notices that a stack update has failed, leaving the stack in a ROLLBACK_IN_PROGRESS state. Which TWO actions should the engineer take to investigate and resolve the issue?

104

A company runs a web application on EC2 instances behind an Application Load Balancer. The application is experiencing intermittent 503 errors. The DevOps team suspects that the target group's health check settings may be causing healthy instances to be marked as unhealthy. Which THREE configurations should the team review?

105

A DevOps team needs to implement a solution to automatically remediate an S3 bucket that becomes publicly accessible. Which TWO services should they use together?

106

A company uses AWS CloudTrail to monitor API activity. The security team notices that an IAM user 'dev-user' deleted an S3 bucket. They need to quickly identify the source IP address of the delete request. Which CloudTrail feature should they use to find this information?

107

A DevOps engineer receives a CloudWatch alarm that an EC2 instance's CPU utilization has exceeded 90% for 10 minutes. The instance hosts a critical web application. What is the MOST appropriate immediate response to mitigate performance impact?

108

A company uses an Application Load Balancer (ALB) in front of a fleet of EC2 instances. The security team reports that a specific client IP address is sending malicious requests and must be blocked immediately. The ALB's security group only allows HTTP/HTTPS from 0.0.0.0/0. What is the FASTEST way to block traffic from this IP address without affecting other traffic?

109

A company uses AWS CloudFormation to manage infrastructure. During a deployment, a stack update fails and the stack is in ROLLBACK_IN_PROGRESS state. The DevOps engineer needs to investigate the failure while preserving the resources that were created before the failure. What should the engineer do?

110

A company runs a critical Amazon RDS for PostgreSQL database. The database suddenly becomes unresponsive. The DevOps team checks CloudWatch metrics and notices that the 'DatabaseConnections' metric spiked to the maximum limit. What is the MOST likely cause and immediate action?

111

A company uses AWS Lambda functions to process messages from an Amazon SQS queue. The Lambda function sometimes fails due to a transient error in a downstream API. The DevOps engineer wants to ensure that failed messages are retried automatically and eventually sent to a dead-letter queue after 3 failed attempts. The SQS queue is configured with a redrive policy that moves messages to a DLQ after 3 receive attempts. However, Lambda functions that fail are not being retried. What is the MOST likely reason?

112

A company's production environment uses an Amazon ElastiCache Redis cluster for session caching. The operations team reports that the cache hit ratio has dropped significantly, causing increased load on the backend database. What is the MOST likely cause?

113

A company uses AWS CodePipeline for CI/CD. A recent deployment to an Amazon ECS service failed because the new task definition referenced an ECR image that does not exist. The pipeline uses a source stage (CodeCommit), build stage (CodeBuild), and deploy stage (ECS). The engineer wants to catch such errors earlier. What should the engineer add to the pipeline?

114

A company runs a web application on EC2 instances behind an ALB. The security team notices that the ALB is receiving a large number of requests from a single IP address, causing high CPU on the instances. They want to block this IP at the load balancer level without affecting other traffic. The ALB currently has a default action of forwarding to the target group. What is the MOST effective way to block this IP?

115

A DevOps engineer is investigating a security incident where an EC2 instance was compromised. The engineer needs to collect forensic data without losing volatile information. Which TWO actions should the engineer take? (Choose two.)

116

A company uses AWS Organizations with multiple accounts. The security team needs to ensure that all CloudTrail trails across the organization are delivering events to a centralized S3 bucket in the management account. Currently, some member accounts have their own trails. Which THREE steps should the security team take to enforce this? (Choose three.)

117

A company uses Amazon CloudWatch for monitoring. The operations team wants to receive an alert when an EC2 instance's status check fails for 2 consecutive minutes. Which THREE resources should the team configure? (Choose three.)

118

A DevOps engineer is troubleshooting a failed deployment. The engineer needs to identify the root cause. Which TWO AWS services can provide information about the deployment events and errors? (Choose two.)

119

A DevOps engineer notices that an EC2 instance running a critical application is unresponsive. CloudWatch alarms for CPU utilization and memory usage did not trigger. The engineer checks the system logs and finds an 'Out of memory: Kill process' error. What is the MOST likely cause of the missed alarms?

120

A DevOps team applies the above IAM policy to a group. A developer in this group tries to upload an object to the S3 bucket using the AWS CLI without specifying any encryption. The upload fails with an AccessDenied error. Why does the upload fail?

121

A company uses AWS CloudFormation to deploy a multi-tier web application. During an incident, the stack update fails with a 'ROLLBACK_IN_PROGRESS' status. The operations team needs to investigate the root cause quickly without losing the stack's current state. What is the MOST efficient approach?

122

A company runs a production database on Amazon RDS for MySQL. The database experiences a sudden spike in connections, causing the application to time out. The DevOps team needs to diagnose the issue quickly. Which combination of actions should be taken? (Choose two.)

123

A Lambda function 'my-function' is invoked multiple times, but no logs appear in CloudWatch. The DevOps engineer runs the above CLI command and sees that the log group exists but 'storedBytes' is 0. What is the MOST likely cause?

124

A company uses AWS Elastic Beanstalk for its web application. After a deployment, the environment health changes to 'Severe' and the application becomes unresponsive. The DevOps team needs to quickly revert to the previous working version. What is the FASTEST way to achieve this?

125

A DevOps engineer receives an alert that an Amazon ECS service is failing to start tasks. The service uses the Fargate launch type. The task definition includes a container that requires port 8080. The security group associated with the service allows inbound traffic on port 8080. What should the engineer check NEXT?

126

A company configures AWS CloudTrail to deliver logs to S3 bucket 'my-app-logs'. However, no log files appear. The DevOps engineer runs the above command and sees the bucket policy. What is the issue?

127

A DevOps team uses AWS CodePipeline to deploy a web application. The pipeline has a manual approval step. During an incident, the deployment is stuck at the approval step because the approver is on leave. The team needs to unblock the pipeline quickly. What is the BEST action to take?

128

An application running on Amazon EC2 instances behind an Application Load Balancer (ALB) is experiencing intermittent 503 errors. The target group health checks are failing. The DevOps engineer checks the instance logs and finds that the application is running but taking longer than 30 seconds to respond. What is the MOST likely cause?

129

An EC2 instance is in 'running' state according to the CLI output, but the application hosted on it is unreachable. The DevOps engineer checks the security group and finds it allows inbound HTTP traffic from 0.0.0.0/0. The instance has a public IP. What is the MOST likely issue?

130

A company uses AWS Systems Manager Patch Manager to patch its EC2 instances. After a patch window, some instances report a 'Failed' status. The DevOps engineer needs to investigate the cause. Which actions should be taken? (Choose three.)

131

A company has a multi-account strategy using AWS Organizations. The security team needs to respond to incidents across all accounts. They want to ensure that all CloudTrail trails are enabled and logging to a central S3 bucket in the management account. What is the MOST efficient way to monitor compliance?

132

A DevOps team uses AWS CodeDeploy to deploy an application to an Auto Scaling group. The deployment fails with an error 'The overall deployment failed because too many individual instances failed deployment'. The team checks the instance logs and finds that the 'BeforeInstall' lifecycle event script returned a non-zero exit code. What is the BEST approach to resolve this?

133

A company runs a critical application on a fleet of EC2 instances managed by an Auto Scaling group. The application generates logs that are sent to CloudWatch Logs using the CloudWatch agent. Recently, the operations team noticed that some instances are missing logs for certain periods. The CloudWatch agent is configured to batch log events and send them every 5 seconds. The instances have high CPU utilization (90%+) during the missing periods. The DevOps engineer suspects that the agent is being throttled or failing. Which of the following is the MOST likely cause and the BEST course of action?

134

A company uses AWS Lambda functions to process messages from an Amazon SQS queue. The Lambda function is configured with a reserved concurrency of 5. The SQS queue has a large backlog of messages, and the Lambda function is processing them slowly. The DevOps team wants to increase throughput without making changes to the Lambda code. The team decides to increase the reserved concurrency to 10. However, after the change, the Lambda function starts to experience throttling errors (RateExceeded). The team also notices that other Lambda functions in the same account are also being throttled. What is the MOST likely cause?

135

A company hosts a static website on Amazon S3 with CloudFront as the CDN. Users report that they see an old version of the website even after the DevOps team updated the S3 objects. The team verified that the new objects are in the S3 bucket and are publicly accessible. The CloudFront distribution has a default TTL of 24 hours. To immediately serve the new content to users, the team needs to invalidate the CloudFront cache. Which of the following is the CORRECT approach to achieve this with minimal impact?

136

A DevOps engineer receives a CloudWatch alarm indicating that an EC2 instance's CPU utilization has exceeded 90% for 10 minutes. The instance is part of an Auto Scaling group behind an Application Load Balancer. What is the MOST efficient initial step to troubleshoot the high CPU usage?

137

An organization uses AWS Systems Manager Incident Manager for incident response. They have created a response plan with an engagement plan that pages the on-call engineer via SMS. The engineer acknowledges the incident but then does not take any further action. What is the BEST way to automate escalation?

138

A company's production environment consists of EC2 instances in an Auto Scaling group behind an Application Load Balancer (ALB). The instances run a web application that stores session data in an ElastiCache Redis cluster. The company has enabled detailed CloudWatch metrics and set up a dashboard. The operations team notices that the average CPU utilization across the Auto Scaling group spikes to 95% every 15 minutes, coinciding with a high number of Redis connections. What is the MOST likely cause?

139

A DevOps engineer is setting up an incident response system for a critical application. The engineer needs to ensure that notifications are sent to the appropriate team when specific CloudWatch alarms trigger. Which TWO services can be used to trigger notifications based on CloudWatch alarms? (Choose TWO.)

140

A company uses AWS CloudTrail to log API calls in a multi-account environment. The security team wants to be alerted immediately when an IAM user or role performs a specific sensitive action (e.g., DeleteTrail, DeleteDBInstance). Which TWO services can be used together to achieve near real-time alerting? (Choose TWO.)

141

A company is experiencing intermittent connectivity issues between an EC2 instance and an RDS database. The EC2 instance is in a public subnet, and the RDS instance is in a private subnet. The security group for the RDS instance allows inbound traffic from the EC2 instance's security group. The network ACLs are default (all traffic allowed). Which THREE steps should the engineer take to troubleshoot the connectivity issue? (Choose THREE.)

142

An IAM policy attached to a DevOps engineer's role is shown above. The engineer is trying to restart a stopped EC2 instance in the us-east-1 region but receives an 'AccessDenied' error. The instance ID is i-0abcd1234efgh5678. What is the MOST likely reason?

143

A company runs a critical web application on AWS. The application is deployed on EC2 instances behind an Application Load Balancer (ALB). The instances are in an Auto Scaling group across multiple Availability Zones. The company uses Amazon Route 53 for DNS with a failover routing policy. Recently, the operations team noticed that during a regional outage, the failover did not trigger as expected, and users experienced downtime. The health checks in Route 53 are configured to check the ALB endpoint. The ALB's health checks are configured to check the instances. What is the MOST likely reason the failover did not work?

144

A DevOps team uses AWS Systems Manager Incident Manager to manage incidents. They have configured a response plan that sends notifications to an SNS topic, which triggers an AWS Lambda function to post messages to a Slack channel. Recently, the Slack notifications have stopped working. The CloudWatch logs for the Lambda function show no invocations when an incident is created. The SNS topic has a subscription to the Lambda function, and the Lambda function's resource policy allows invocation from SNS. What is the MOST likely cause?

145

A company uses Amazon RDS for MySQL with Multi-AZ deployment. The application experiences increased latency during peak hours. The DevOps engineer investigates and notices that the Read Replicas are not being utilized effectively. The application is configured to use the primary database endpoint. The engineer wants to offload read traffic to the Read Replicas without changing the application code. What is the BEST solution?

146

A company runs a containerized application on Amazon ECS with Fargate launch type. The application is behind an Application Load Balancer (ALB). The operations team notices that the ALB's 5xx error rate increases periodically. The ECS service is configured with a target tracking scaling policy based on CPU utilization. The CloudWatch logs from the application show no errors. The health check on the ALB is configured to hit the /health endpoint. What is the MOST likely cause of the 5xx errors?

147

A company uses AWS CloudFormation to deploy infrastructure. During a production deployment, the stack update fails, and the stack enters the ROLLBACK_COMPLETE state. The DevOps engineer needs to investigate the failure. The engineer checks the CloudFormation console and sees a stack event with a status of UPDATE_FAILED and a reason of 'Internal failure'. The engineer wants to find more details. What is the BEST way to get detailed error information?

148

A company has a serverless application using AWS Lambda functions and Amazon API Gateway. The application has been running fine, but recently users report that some requests are timing out with a 504 error. The Lambda function's timeout is set to 30 seconds, and API Gateway's integration timeout is 29 seconds. The CloudWatch logs for the Lambda function show that the function executes in under 5 seconds on average. What is the MOST likely cause of the 504 errors?

149

A DevOps engineer notices that an EC2 instance running a web application is unresponsive. CloudWatch alarms are not triggering. What is the FIRST step the engineer should take to diagnose the issue?

150

A company uses AWS CloudTrail to monitor API activity. During an incident, they need to quickly identify any unauthorized IAM role assumption attempts. Which CloudTrail feature should be used to filter and alert on this specific event?

151

An application running on Amazon ECS Fargate is experiencing intermittent HTTP 503 errors from the Application Load Balancer (ALB). The target group health checks are passing. Which configuration is MOST likely causing this issue?

152

A DevOps team is designing an incident response plan for a critical microservices architecture. They need to automatically collect and analyze logs from all services during an incident. Which solution should they use?

153

During an incident, an engineer needs to quickly revoke access to a compromised IAM user. Which action should be taken FIRST?

154

A company uses AWS Config to track resource changes. They want to automatically remediate non-compliant security group rules that allow public SSH access. What is the MOST effective approach?

155

An application running on Amazon RDS for PostgreSQL is experiencing slow query performance. The DevOps team suspects a specific query is causing high CPU usage. Which tool should they use to identify the problematic query?

156

A company's incident response process requires that all changes to production resources are automatically paused when a P1 incident is declared. Which AWS service can be used to enforce this by preventing modifications to CloudFormation stacks?

157

An organization uses a multi-account AWS environment with AWS Organizations. During an incident, the security team needs to isolate a compromised account by preventing all API calls from that account's root user and IAM users. Which action should be taken?

158

A DevOps engineer is troubleshooting a Lambda function that is timing out. Which TWO actions should the engineer take to diagnose the issue?

159

A company is designing an incident response strategy for its Amazon EKS cluster. Which THREE steps should be taken to ensure rapid response to a compromised pod?

160

During an incident, a DevOps engineer needs to block traffic from a specific IP address that is attacking an Application Load Balancer. Which TWO actions can the engineer take to mitigate this?

161

A DevOps engineer is troubleshooting an issue where an EC2 instance running a web application becomes unresponsive every few hours. CloudWatch logs show no application errors, but the instance's status checks are passing. The engineer suspects a memory leak. Which AWS service can be used to capture memory utilization metrics at a granular level to confirm the leak?

162

A company uses Amazon RDS for MySQL with Multi-AZ deployment. The primary DB instance fails, and automatic failover does not occur within the expected 1-2 minutes. The DevOps team needs to quickly restore database availability. What should the team do first?

163

A DevOps engineer receives an alert that an EC2 instance's CPU utilization has been above 90% for the last 30 minutes. The engineer needs to investigate the root cause. Which AWS service should the engineer use to get OS-level process details and identify which process is consuming the CPU?

164

A company uses Amazon CloudWatch Logs to store application logs from multiple EC2 instances. The DevOps team needs to create a real-time dashboard that displays the count of ERROR-level log entries across all instances. Which combination of services should be used?

165

A DevOps team is configuring an Auto Scaling group for a web application behind an Application Load Balancer. The team wants to automatically replace instances that fail the health check. Which scaling policy should be used?

166

An application running on Amazon ECS experiences intermittent failures. The DevOps engineer wants to capture the application's standard output and error logs and send them to CloudWatch Logs. What is the simplest way to achieve this?

167

A company uses AWS Lambda functions to process messages from an Amazon SQS queue. The DevOps team notices that messages are not being processed and are going to the dead-letter queue. The Lambda function code is correct. What is the most likely cause?

168

A DevOps engineer is troubleshooting an issue where an Amazon RDS instance's CPU utilization is consistently high. The engineer has enabled Performance Insights and sees that the top SQL query is a SELECT statement that scans many rows. What is the best course of action to reduce CPU utilization?

169

A company uses Amazon CloudFront to serve static content from an S3 bucket. Users report that they see outdated content even after the engineer has updated the files in the S3 bucket. What should the engineer do to ensure users see the latest content?

170

A DevOps team is investigating a security incident where an unauthorized user accessed an S3 bucket. The team needs to determine what actions were taken by the user. Which TWO AWS services should be used together to investigate? (Choose TWO.)

171

A company runs a critical application on Amazon ECS with Fargate launch type. The application is experiencing intermittent failures due to resource exhaustion. The DevOps team wants to implement automated responses to scale the service. Which THREE steps should the team take to achieve this? (Choose THREE.)

172

A DevOps engineer needs to receive notifications when an EC2 instance's status check fails. Which TWO services should the engineer use? (Choose TWO.)

173

An EC2 instance shows as 'running' in the AWS console, but the system status check is 'impaired'. What is the most likely cause?

174

A Lambda function has the above IAM policy attached. The function is failing to write logs to CloudWatch Logs. What is the most likely reason?

175

The CloudWatch alarm 'HighCPU' has transitioned to ALARM state. What does the alarm history indicate about the metric that triggered it?

176

A company experiences an EC2 instance failure in an Auto Scaling group. The instance is terminated and replaced automatically. The DevOps engineer needs to troubleshoot why the instance failed. Which AWS service should the engineer use to view the instance's console output and screenshots before termination?

177

A DevOps team is debugging a production incident where an Application Load Balancer (ALB) is returning 503 errors for some requests. The target group instances are healthy. What is the most likely cause?

178

A company uses AWS CloudFormation to manage infrastructure. During an incident, a stack update fails with the error 'The following resource(s) failed to create: [AWS::RDS::DBInstance]'. Which AWS service should the engineer use to view detailed error messages for the failed resource creation?

179

A company uses Amazon RDS for MySQL with Multi-AZ deployment. During an incident, the primary DB instance becomes unreachable. The failover to the standby instance succeeds, but application connections are failing with 'Access denied for user'. What is the most likely cause?

180

A DevOps engineer receives a CloudWatch alarm for high CPU utilization on an EC2 instance. The engineer needs to investigate the cause. Which AWS service can provide a detailed analysis of the running processes and their resource consumption?

181

A company uses Amazon S3 to store critical data. An incident occurs where an S3 bucket is accidentally deleted. The DevOps engineer needs to recover the bucket and its objects. What should the engineer do?

182

A company uses AWS Lambda functions behind an Amazon API Gateway REST API. During an incident, the API returns 502 Bad Gateway errors. The Lambda function logs show no errors. What is the most likely cause?

183

A DevOps engineer is investigating a security incident where an EC2 instance was used to launch an outbound DDoS attack. Which AWS service can provide details about the source IP addresses and network traffic from the instance?

184

A company uses an AWS Elastic Load Balancer (ELB) to distribute traffic to EC2 instances. During an incident, some users report slow response times. The DevOps engineer suspects that one instance is unhealthy but the health check is not detecting it. What should the engineer do to improve health check accuracy?

185

Which TWO actions should a DevOps engineer take to ensure that an Amazon RDS for PostgreSQL database is automatically recovered in the event of a failure?

186

A company experiences a security incident where an IAM user's access key is compromised. Which THREE steps should the DevOps engineer take immediately?

187

Which TWO metrics should a DevOps engineer monitor to detect an EC2 instance that is unresponsive due to resource exhaustion?

188

A DevOps engineer notices that an EC2 instance running a critical application is unresponsive. The engineer checks CloudWatch metrics and sees a CPU Utilization spike to 100% just before the instance became unresponsive. However, the instance status check passed. What should the engineer do NEXT to troubleshoot the issue?

189

A company uses AWS CloudTrail to record API calls across multiple accounts and regions. The security team needs to be alerted immediately when an IAM user creates a new access key. Which combination of services should be used to achieve this with minimal latency?

190

A company runs a multi-tier web application on EC2 instances behind an Application Load Balancer. The application experiences intermittent 503 errors during peak traffic. The Auto Scaling group is configured with a step scaling policy based on CPU utilization. CloudWatch metrics show that CPU utilization never exceeds 70%, but the ALB target group reports that some targets are unhealthy. What is the MOST likely cause?

191

A DevOps engineer is troubleshooting a Lambda function that processes S3 events. The function has been running successfully for months, but today it started timing out. The engineer checks CloudWatch Logs and sees 'Task timed out after 3.01 seconds' errors. The function is configured with a 3-second timeout. What should the engineer do to resolve the issue?

192

A company uses Amazon RDS for MySQL with Multi-AZ deployment. The database experiences a sudden spike in connections, causing some application requests to fail with 'Too many connections' errors. The DevOps team needs to automate a response to this incident. What is the MOST effective solution?

193

A company uses AWS CodePipeline for CI/CD. During a production deployment, the pipeline fails at the 'Deploy' stage with an error: 'The deployment failed because the deployment group does not have enough capacity to handle the deployment.' The engineer checks the CodeDeploy deployment group and sees that it is configured with a minimum healthy hosts of 100% and a deployment configuration of 'CodeDeployDefault.OneAtATime'. What is the MOST likely cause?

194

A company uses Amazon CloudFront to serve static content from an S3 bucket. Users report that they see outdated content even after the engineer invalidated the CloudFront cache. What is the MOST likely reason?

195

A DevOps engineer is investigating a security incident where an unauthorized user accessed an S3 bucket containing sensitive data. The engineer needs to determine what actions the user performed and from which IP address. Which AWS service should be used to retrieve this information?

196

A company runs a stateful web application on EC2 instances behind an Application Load Balancer. The application uses sticky sessions (session affinity) based on cookies. During a deployment, the DevOps engineer notices that some users are being logged out and losing session data. The deployment uses a rolling update strategy. What is the MOST likely cause?

197

A company uses Amazon ECS with the Fargate launch type for a microservices application. The application experiences intermittent HTTP 5xx errors from the ALB. The DevOps team needs to diagnose the issue. Which TWO steps should be taken to gather diagnostic information? (Choose TWO.)

198

A company uses AWS CloudFormation to manage infrastructure. A stack update fails with the error: 'UPDATE_ROLLBACK_IN_PROGRESS'. The DevOps engineer needs to investigate the cause. Which THREE steps should the engineer take? (Choose THREE.)

199

A company runs a critical application on EC2 instances in an Auto Scaling group. The application must be highly available across multiple Availability Zones. Which TWO configurations are necessary to achieve this? (Choose TWO.)

200

Refer to the exhibit. The DevOps engineer runs the commands and sees the output. What is the most likely issue with the instance?

201

Refer to the exhibit. An IAM policy is attached to a user. The user tries to upload an object to 'my-bucket' without specifying server-side encryption. What will happen?

202

Refer to the exhibit. A DevOps engineer set up a CloudWatch alarm for a Lambda function. The alarm fires when the error count metric exceeds 10 in 5 minutes. The engineer receives an alarm notification, but when checking the Lambda logs, only 3 errors are found in that 5-minute window. What is the MOST likely reason for the discrepancy?

203

A DevOps engineer notices that an EC2 instance is unresponsive and the CloudWatch alarm 'StatusCheckFailed' is in ALARM state. The instance was launched in a private subnet with no public IP. Which action should the engineer take to diagnose the issue without creating a new instance?

204

A company uses AWS CloudFormation to manage infrastructure. After a failed stack update, the stack is in ROLLBACK_COMPLETE state. The DevOps team needs to identify the specific resource that caused the rollback and review the error message. Which approach provides the most efficient way to achieve this?

205

A company uses Amazon RDS for MySQL and has enabled automated backups. The database administrator accidentally deleted a critical row from a table. The deletion occurred 15 minutes ago. What is the fastest way to recover the lost data?

206

An application running on Amazon ECS Fargate is experiencing intermittent 503 errors. The task definition sets a soft limit of 512 CPU units and 1024 memory. The errors occur when traffic spikes. Which change is most likely to resolve the issue?

207

A DevOps engineer is troubleshooting an Amazon CloudWatch alarm that is not triggering as expected. The alarm monitors an SQS queue's ApproximateNumberOfMessagesVisible metric with a threshold of 100 for 1 evaluation period. The queue has had over 100 messages for the past 30 minutes, but the alarm remains in OK state. What is the most likely cause?

208

A company uses AWS Organizations with multiple accounts. The security team needs to receive real-time notifications when any IAM user in any account creates an access key. Which solution is the most operationally efficient?

209

A company's application running on EC2 instances behind an Application Load Balancer (ALB) is returning intermittent 504 errors. The instances are in an Auto Scaling group with a health check grace period of 300 seconds. What should the DevOps engineer check first to troubleshoot the issue?

210

A company experiences a security incident where an unauthorized user accessed an S3 bucket containing sensitive data. The DevOps team needs to identify the source IP address and user agent of the request. Which AWS service provides this information?

211

An Amazon RDS for PostgreSQL instance is running low on storage. The DevOps engineer needs to increase the allocated storage without downtime. Which action should be taken?

212

A company uses AWS CloudTrail to log API calls. The security team needs to be alerted when an IAM user performs a ConsoleLogin event from an IP address outside the corporate network. Which TWO steps should be taken to achieve this? (Choose TWO.)

213

A DevOps team is investigating a performance issue where an application's response time spiked during a deployment. The deployment used AWS CodeDeploy to update an Auto Scaling group. Which THREE actions should the team take to identify the root cause? (Choose THREE.)

214

A company uses Amazon CloudFront to distribute content globally. Users in certain geographic regions report slow load times. Which TWO configurations can improve performance for these users? (Choose TWO.)

215

A company uses AWS Systems Manager to manage a fleet of EC2 instances. During an incident, a DevOps engineer needs to execute a script on a specific instance to collect diagnostic data. The engineer does not have SSH key access. Which approach should the engineer use to execute the script?

216

A DevOps team is configuring CloudWatch alarms for their production environment. They want to receive notifications when the CPUUtilization metric of an EC2 instance exceeds 90% for three consecutive 5-minute periods. Which combination of settings should they use?

217

During an incident, a DevOps engineer needs to quickly revoke access to a set of IAM users who are suspected to be compromised. The users have programmatic access keys and console passwords. The engineer wants to minimize the impact on non-compromised users. Which action should the engineer take FIRST?

218

A company uses an Application Load Balancer (ALB) to distribute traffic to a set of EC2 instances in an Auto Scaling group. During an incident, the DevOps team notices that the ALB is returning 503 errors. The instances are healthy according to the target group health checks. What is the MOST likely cause?

219

A DevOps engineer is troubleshooting an issue where an EC2 instance running Amazon Linux 2 is not receiving commands from AWS Systems Manager Run Command. The instance has the SSM Agent installed and is running. What should the engineer verify FIRST?

220

A company uses AWS CloudTrail to log all API calls. During an incident investigation, a security engineer needs to identify who deleted an S3 bucket named 'critical-data' two days ago. Which approach will provide the necessary information?

221

A company runs a critical application on an Amazon RDS for MySQL DB instance. During a recent incident, the database became unresponsive. The DevOps team suspects that a long-running query is blocking other operations. Which metric should they monitor in Amazon CloudWatch to detect this type of issue?

222

A company uses an Auto Scaling group with a dynamic scaling policy based on the average CPU utilization of the instances. During an incident, the DevOps team notices that the Auto Scaling group is not launching new instances quickly enough to handle a traffic spike. What is a possible cause for the slow scaling response?

223

An application running on AWS Lambda is experiencing increased error rates. The DevOps engineer needs to quickly identify the root cause. Which AWS service should the engineer use to analyze the logs and errors?

224

A DevOps engineer is designing an incident response plan for a multi-region application. The application runs on EC2 instances behind an Application Load Balancer (ALB) and uses Amazon RDS for MySQL with Multi-AZ. Which TWO actions should the engineer include to ensure high availability and fast failover during a regional incident?

225

A security team is investigating a potential data exfiltration from an S3 bucket. They need to identify which IAM user accessed a specific object and whether the access was from a known IP address. Which THREE AWS services or features should they use together to conduct this investigation?

226

A company runs a critical application on Amazon ECS with Fargate launch type. During an incident, the DevOps engineer notices that tasks are failing with 'CannotPullContainerError: API error (500)'. Which TWO steps should the engineer take to resolve this issue?

227

A DevOps engineer notices that an Auto Scaling group is repeatedly launching and terminating instances. CloudWatch alarms show high CPU but the group's metrics are erratic. What is the most likely cause?

228

A company uses AWS Systems Manager to patch EC2 instances. After a patch window, several instances are unreachable. The engineer checks the SSM Agent logs and finds no errors. What should the engineer do next to diagnose the issue?

229

During a deployment, a new application version on an ECS service starts failing health checks. The previous version is still running. The deployment is a rolling update with a 200% percent start. Which ECS feature should the engineer use to automatically revert to the previous version?

230

An application runs on EC2 instances behind an ALB. Users report intermittent 503 errors. The engineer checks ALB metrics and sees 'SurgeQueueLength' increasing periodically. What is the most likely cause?

231

A company uses AWS CloudFormation to deploy infrastructure. A stack update fails with the error 'UPDATE_ROLLBACK_FAILED'. What should the engineer do to resolve this?

232

A company uses RDS Multi-AZ with a read replica. During a failover test, the application experiences a 30-second write outage. The application uses a single DB endpoint. How can the outage be minimized?

233

A Lambda function processes SQS messages but sometimes times out after 15 seconds. The function performs a database call that occasionally takes longer. What is the best way to handle this without losing messages?

234

A company uses CloudWatch Logs to store application logs. The security team requires that logs be encrypted at rest using a customer-managed KMS key. What must be done to enable this?

235

An organization uses AWS Config to track resource changes. They notice that a particular S3 bucket policy was deleted, but the Config rule 's3-bucket-policy-grantee-check' did not trigger a remediation. What is the most likely reason?

236

A company uses Amazon CloudFront with an S3 origin. Users in Europe report slow load times. The engineer needs to improve performance for European users. Which TWO actions should the engineer take?

237

A DevOps team is troubleshooting an application that occasionally throws 'Connection reset by peer' errors when connecting to an RDS MySQL instance. The errors are intermittent and seem to correlate with high traffic. Which TWO steps should the team take to diagnose the issue?

238

A company uses AWS CodePipeline for CI/CD. A recent pipeline execution failed at the 'Deploy' stage with the error 'Action execution failed: Access Denied'. The pipeline uses an IAM service role. Which THREE checks should the engineer perform to resolve this?

239

A DevOps engineer applied the above S3 bucket policy to restrict access. Users report that they can download objects from the bucket only when using HTTPS from within the 10.0.0.0/8 network. However, users outside that network receive access denied errors even over HTTPS. What is wrong with the policy?

240

A company runs a critical application on EC2 instances behind an Application Load Balancer. The application is deployed across three Availability Zones. The DevOps team uses AWS CloudFormation to manage the infrastructure. During a recent deployment, a stack update failed, and the stack entered a ROLLBACK_IN_PROGRESS state. However, the rollback also failed, leaving the stack in UPDATE_ROLLBACK_FAILED state. The engineer needs to restore the application to a working state. The stack includes an Auto Scaling group, an ALB, security groups, and a DynamoDB table. The DynamoDB table is defined with deletion protection enabled. The engineer is considering the following actions: A) ContinueUpdateRollback to retry the rollback, fixing the resource that caused the failure. B) Delete the stack and recreate it from the last known good template. C) Use CloudFormation's 'SignalResource' to manually complete the rollback. D) Manually update the resources to match the previous template, then resume the rollback. Which action should the engineer take?

241

A company is using AWS Lambda to process events from an Amazon SQS queue. The Lambda function is configured with a batch size of 10 and a maximum concurrency of 5. Recently, the function started experiencing high error rates and the SQS queue's ApproximateNumberOfMessagesVisible metric is increasing. The CloudWatch logs show that the function is timing out after 30 seconds. The function makes calls to an external API that sometimes takes more than 30 seconds to respond. The DevOps engineer needs to reduce the backlog and prevent message loss. The engineer is considering the following actions: A) Increase the Lambda function timeout to 60 seconds and increase the SQS visibility timeout to 90 seconds. B) Decrease the batch size to 1 to avoid processing multiple messages at once. C) Increase the Lambda function reserved concurrency to 100 to allow more concurrent executions. D) Use a dead-letter queue to capture messages that fail processing after all retries. Which combination of actions should the engineer take?

242

A DevOps engineer is responsible for monitoring a production environment that uses Amazon EC2 Auto Scaling. The engineer notices that the Auto Scaling group has been launching and terminating instances frequently over the past hour. The group uses a dynamic scaling policy based on average CPU utilization. The CloudWatch alarm that triggers scaling is set to a threshold of 70% CPU for scale-out and 30% for scale-in. The engineer checks the CloudWatch metrics and sees that CPU utilization is oscillating between 40% and 60%, never reaching the thresholds. The engineer suspects that the scaling policy is not working correctly. The engineer is considering the following actions: A) Change the scaling policy to use a target tracking policy with a target value of 50% CPU utilization. B) Increase the cooldown period for the scaling policy to 300 seconds. C) Disable the scale-in policy to prevent frequent terminations. D) Use a simple scaling policy instead of a dynamic scaling policy. Which action should the engineer take?

243

A company runs a critical web application on EC2 instances behind an Application Load Balancer (ALB) with Auto Scaling. Users report intermittent 503 errors. CloudWatch metrics show that the ALB's 'RequestCount' is normal, but 'HTTPCode_ELB_5XX_Count' spikes. The 'TargetResponseTime' metric shows occasional high latency. Which troubleshooting step should the DevOps engineer take FIRST?

244

A DevOps team uses AWS Systems Manager Incident Manager for incident response. They have an escalation plan that sends notifications to an SNS topic. However, during a recent incident, the on-call engineer did not receive the notification. The engineer's phone number and email are correct in the SSM Incident Manager contact settings. What is the MOST likely cause of the missed notification?

245

A company uses CloudWatch Synthetics canaries to monitor a critical API endpoint. Recently, a canary started failing with a '403 Forbidden' error. The DevOps engineer verifies that the canary's IAM role has the necessary permissions to invoke the API and that the API endpoint is publicly accessible. What should the engineer check NEXT?

246

A company's DevOps team is designing an automated incident response workflow using AWS Systems Manager Incident Manager and AWS Lambda. The workflow should automatically acknowledge incidents and send notifications to the appropriate response team. Which TWO actions should the team take to achieve this?

247

A company uses Amazon RDS for MySQL with Multi-AZ deployment. The database experiences a sudden spike in connections, causing the application to timeout. The DevOps engineer notices that the 'DatabaseConnections' metric is high, but the 'CPUUtilization' is low. Which THREE actions should the engineer take to diagnose the issue?

248

A company uses AWS CloudFormation to manage infrastructure. A stack update fails with a 'ROLLBACK_IN_PROGRESS' status. The DevOps engineer needs to investigate the failure. Which TWO actions should the engineer take?

249

A company runs a containerized microservices application on Amazon ECS with Fargate launch type. The application uses an Application Load Balancer to route traffic to the ECS service. Recently, the DevOps team noticed that the ECS service is failing to deploy new tasks during a rolling update. The CloudWatch Logs for the ECS service show that new tasks are failing to start because they cannot pull the container image from Amazon ECR. The error message indicates 'AccessDenied' when attempting to pull the image. The task execution role has the necessary permissions, and the image URI is correct. The VPC has a VPC endpoint for ECR configured. The security group for the tasks allows outbound traffic to the VPC endpoint. What is the MOST likely cause of the access denied error?

250

A company uses AWS CloudTrail to log API activity in their AWS account. The DevOps engineer needs to ensure that all management events are logged and that the logs are delivered to an S3 bucket in another account for centralized auditing. The engineer has already created an S3 bucket in the central auditing account and applied a bucket policy that grants the CloudTrail service permission to write logs. However, logs are not being delivered. The engineer verifies that the CloudTrail trail is configured to point to the correct S3 bucket name and that the bucket exists. What is the MOST likely reason the logs are not being delivered?

251

A company runs a production database on Amazon RDS for PostgreSQL. The DevOps team has set up a read replica to offload read traffic. Recently, the replica started experiencing replication lag that is increasing over time. The primary instance's CPU and memory utilization are normal. The network bandwidth between the primary and replica is not saturated. The team has already increased the replica's instance class, but the lag persists. The primary database is under heavy write load due to a batch job that runs hourly. What is the MOST likely cause of the increasing replication lag?

252

A company uses AWS Lambda functions that are triggered by S3 events (object creation). The Lambda function processes the file and stores results in DynamoDB. Recently, the function started timing out after 15 seconds, causing some files to not be processed. The average file size has increased significantly. The DevOps engineer increases the Lambda function's timeout to 30 seconds and the memory to 512 MB, but the function still times out for large files. The CloudWatch Logs show that the timeout occurs during the 'dynamodb.put_item' call for a large item. The DynamoDB table's write capacity is set to on-demand, and there are no throttling errors. What should the engineer do to resolve the timeout issue?

253

A company uses EC2 instances in an Auto Scaling group behind an ALB. The DevOps team receives alerts that the CPU utilization on the instances is consistently above 90% during peak hours. The Auto Scaling group is configured with a simple scaling policy that adds one instance when CPU exceeds 80% and removes one when below 30%. However, during sudden traffic spikes, the scaling policy reacts too slowly, causing performance degradation. The team wants to improve the scaling responsiveness without over-provisioning. What should the team do?

254

A company's DevOps team uses AWS Config to monitor resource compliance. They have created a custom AWS Config rule that triggers an AWS Lambda function to evaluate whether EC2 instances have the 'Environment' tag with value 'Production' or 'Staging'. The rule is set to evaluate resources on configuration changes. However, the team notices that the rule does not trigger when an EC2 instance is launched. The Lambda function's IAM role has the necessary permissions to describe EC2 instances. The CloudWatch Logs for the Lambda function show that it is not being invoked. What is the MOST likely reason?

Practice all 254 Incident and Event Response questions

Other DOP-C02 exam domains

Configuration Management and IaC Resilient Cloud Solutions Monitoring and Logging Security and Compliance SDLC Automation

Frequently asked questions

What does the Incident and Event Response domain cover on the DOP-C02 exam?

The Incident and Event Response domain covers the key concepts tested in this area of the DOP-C02 exam blueprint published by Amazon Web Services. Courseiva provides free domain-focused practice, mock exams, missed-question review, and readiness tracking across all DOP-C02 domains — no account required.

How many Incident and Event Response questions are in the DOP-C02 question bank?

The Courseiva DOP-C02 question bank contains 254 questions in the Incident and Event Response domain. Click any question to see the full explanation and answer breakdown.

What is the best way to practice Incident and Event Response for DOP-C02?

Start with a 10-question focused session to identify your baseline accuracy in this domain. Read every explanation — even for questions you answer correctly — to understand the reasoning. Once you score consistently above 80%, move to a 20–30 question session to confirm depth before moving to the next domain.

Can I practice only Incident and Event Response questions for DOP-C02?

Yes — the session launcher on this page draws questions exclusively from the Incident and Event Response domain. Choose 10, 20, 30, or 50 questions for a focused session, or click individual questions to review them one by one.

Free forever · No credit card required

Track your DOP-C02 domain progress

Save your results, see per-domain analytics, and get readiness scores — free, for every certification.

Free forever · Every certification included