Knowledge + Practice

CCNA Incident Response Questions

29 of 254 questions · Page 4/4 · Incident Response topic · Answers revealed

Practice these questions Exam hub All questions

226

MCQhard

A DevOps engineer notices that an EC2 instance in an Auto Scaling group is repeatedly failing health checks and being terminated. The engineer needs to capture the root cause by collecting memory dumps and system logs before termination. What should the engineer do?

A.Configure the CloudWatch Agent to collect memory and system logs and publish them to CloudWatch Logs.

B.Use EC2Rescue for Windows Server or Linux, configure it to run at instance startup, and extend the Auto Scaling health check grace period.

C.Use AWS Systems Manager Run Command to execute a script on the instance that collects diagnostics before it is terminated.

D.Enable EC2 instance metadata service (IMDS) to capture diagnostic data that persists after termination.

AnswerB

EC2Rescue can run diagnostics at startup; extending the grace period gives time for the tool to collect data before termination.

Why this answer

Option B is correct because EC2Rescue is specifically designed to collect memory dumps and system logs from EC2 instances, and by configuring it to run at startup and extending the Auto Scaling health check grace period, the engineer ensures diagnostics are captured before the instance is terminated for failing health checks. This approach directly addresses the need to gather root cause data from a failing instance that is about to be replaced.

Exam trap

The trap here is that candidates often assume Systems Manager Run Command (Option C) can reliably execute scripts on failing instances, but they overlook that the instance must be in a running and reachable state, which is not guaranteed when health checks are repeatedly failing and termination is imminent.

How to eliminate wrong answers

Option A is wrong because the CloudWatch Agent collects logs and metrics during normal operation but does not capture memory dumps or system logs at the point of failure before termination; it cannot guarantee data collection from an instance that is being terminated due to health check failures. Option C is wrong because AWS Systems Manager Run Command requires the instance to be running and reachable to execute commands, but the instance is repeatedly failing health checks and may be terminated before the command can run, making it unreliable for capturing pre-termination diagnostics. Option D is wrong because EC2 instance metadata service (IMDS) provides metadata about the instance (e.g., instance ID, AMI ID) but does not capture diagnostic data like memory dumps or system logs, and it does not persist after termination.

Practice this question →

227

MCQeasy

A DevOps engineer receives a CloudWatch alarm indicating that an EC2 instance's CPU utilization has exceeded 90% for 10 minutes. The instance is part of an Auto Scaling group behind an Application Load Balancer. What is the MOST efficient initial step to troubleshoot the high CPU usage?

A.Review the EC2 instance's CloudWatch metrics for CPU credit balance and network utilization.

B.Modify the Auto Scaling group to use a larger instance type.

C.Check the ALB's HTTP 5xx error rate metric for the target group.

D.Immediately increase the desired capacity of the Auto Scaling group.

AnswerA

This helps identify resource contention.

Why this answer

Option C is correct because CloudWatch metrics provide historical data to identify the root cause. Option A is wrong because no alarm exists for HTTP 5xx. Option B is wrong because scaling has not yet occurred.

Option D is wrong because increasing instance size does not address the cause.

Practice this question →

228

Multi-Selecthard

A company experiences a security incident where an IAM user's access key is compromised. Which THREE steps should the DevOps engineer take immediately?

Select 3 answers

A.Review AWS CloudTrail logs for any unauthorized API calls

B.Rotate the access key by creating a new key and deleting the old one

C.Change the IAM user's password

D.Delete the IAM user and recreate it

E.Revoke any temporary security credentials issued to the user

AnswersA, B, E

Investigation is critical to understand the impact.

Why this answer

Options A, C, and D are correct. Rotating the access key invalidates the old key. Revoking temporary credentials ensures no active sessions.

Reviewing CloudTrail logs helps assess the impact. Option B is incorrect because deleting the user is too drastic and not necessary. Option E is incorrect because changing the password does not affect access keys.

Practice this question →

229

MCQeasy

A company has a legacy application running on an EC2 instance that is not part of an Auto Scaling group. The instance is experiencing a memory leak. The DevOps engineer needs to collect memory metrics to analyze the issue without modifying the application. What should the engineer do?

A.Install the CloudWatch agent on the instance and configure it to collect memory metrics.

B.Use the AWS Management Console to view memory metrics from the EC2 monitoring tab.

C.Use EC2Rescue to generate a memory dump and analyze it.

D.Enable CloudWatch detailed monitoring on the instance.

AnswerA

The CloudWatch agent can collect custom memory metrics from the OS.

Why this answer

The CloudWatch agent is required to collect custom metrics like memory utilization from an EC2 instance because the standard EC2 monitoring only captures hypervisor-level metrics (CPU, network, disk I/O). By installing and configuring the CloudWatch agent, the engineer can collect memory metrics without modifying the application code, directly addressing the memory leak analysis requirement.

Exam trap

The trap here is that candidates often assume the EC2 monitoring tab or detailed monitoring includes memory metrics, but AWS does not provide OS-level metrics (memory, disk space, swap usage) without the CloudWatch agent.

How to eliminate wrong answers

Option B is wrong because the AWS Management Console EC2 monitoring tab only displays default metrics (CPU, network, disk, status checks) and does not include memory metrics, which require a custom agent. Option C is wrong because EC2Rescue is a tool for troubleshooting and repairing common EC2 issues (e.g., OS boot failures, disk corruption), not for collecting ongoing memory metrics; it can generate a memory dump but that is a one-time snapshot, not a continuous metric stream for trend analysis. Option D is wrong because enabling CloudWatch detailed monitoring only increases the frequency of default metric collection (from 5 minutes to 1 minute) but does not add memory metrics, which are not available at the hypervisor level.

Practice this question →

230

MCQeasy

A company uses Amazon RDS for MySQL as its database. The operations team notices that the database CPU utilization is consistently above 90% during peak hours, causing slow query responses. The team needs to quickly reduce CPU load without changing the application code. Which action should the team take?

A.Enable Multi-AZ deployment.

B.Modify the DB parameter group to increase max_connections.

C.Add a read replica to offload read traffic.

D.Enable Performance Insights and analyze the top queries.

AnswerD

Performance Insights helps identify queries consuming the most resources.

Why this answer

Option B is correct because enabling Performance Insights and analyzing queries helps identify inefficient queries that cause high CPU. Option A is wrong because read replicas do not reduce CPU on the primary instance. Option C is wrong because modifying DB parameters does not immediately reduce CPU.

Option D is wrong because Multi-AZ is for high availability, not performance.

Practice this question →

231

MCQmedium

A company is using AWS Lambda to process events from an Amazon SQS queue. The Lambda function is configured with a batch size of 10 and a maximum concurrency of 5. Recently, the function started experiencing high error rates and the SQS queue's ApproximateNumberOfMessagesVisible metric is increasing. The CloudWatch logs show that the function is timing out after 30 seconds. The function makes calls to an external API that sometimes takes more than 30 seconds to respond. The DevOps engineer needs to reduce the backlog and prevent message loss. The engineer is considering the following actions: A) Increase the Lambda function timeout to 60 seconds and increase the SQS visibility timeout to 90 seconds. B) Decrease the batch size to 1 to avoid processing multiple messages at once. C) Increase the Lambda function reserved concurrency to 100 to allow more concurrent executions. D) Use a dead-letter queue to capture messages that fail processing after all retries. Which combination of actions should the engineer take?

A.Use a dead-letter queue to capture messages that fail processing after all retries.

B.Decrease the batch size to 1 to avoid processing multiple messages at once.

C.Increase the Lambda function timeout to 60 seconds and increase the SQS visibility timeout to 90 seconds.

D.Increase the Lambda function reserved concurrency to 100 to allow more concurrent executions.

AnswerC

This allows the function to complete and prevents premature retries.

Why this answer

Option A is correct because increasing the timeout allows the function to wait longer for the external API, and increasing visibility timeout prevents messages from becoming visible again before the function completes. This reduces retries and backlog. Option B is wrong because decreasing batch size reduces throughput, worsening the backlog.

Option C is wrong because increasing concurrency may cause more timeouts if the function still times out. Option D is good but alone does not reduce backlog; it only captures failed messages. The best approach is to fix the timeout first.

Practice this question →

232

Multi-Selecthard

A company has a multi-account AWS organization. The security team needs to detect and respond to security incidents across all accounts centrally. Which THREE services should the team use together? (Choose three.)

Select 3 answers

A.AWS Security Hub

B.Amazon Inspector

C.Amazon Macie

D.Amazon GuardDuty

E.Amazon Detective

AnswersA, D, E

Aggregates findings and provides a central view.

Why this answer

Option A is correct because GuardDuty provides threat detection across accounts. Option B is correct because Security Hub aggregates findings from multiple services and accounts. Option D is correct because Detective analyzes and investigates security findings.

Option C is wrong because Inspector is for vulnerability assessments, not incident response. Option E is wrong because Macie is for data classification, not incident response.

Practice this question →

233

MCQhard

A company runs a web application on Amazon EC2 instances behind an Application Load Balancer (ALB). The application experiences intermittent 503 errors. The engineer suspects the ALB is returning these errors because the target instances are unhealthy. Which metric should the engineer monitor to confirm this suspicion?

A.RequestCount

B.UnhealthyHostCount

C.HealthyHostCount

D.TargetResponseTime

AnswerC

Shows number of healthy targets; if zero, ALB returns 503.

Why this answer

Option A is correct because the ALB publishes a metric 'HealthyHostCount' that shows the number of healthy targets. If this metric drops to zero, the ALB returns 503 errors. Option B is wrong because 'UnhealthyHostCount' is not a standard ALB metric.

Option C is wrong because 'RequestCount' does not indicate health. Option D is wrong because 'TargetResponseTime' does not indicate health status.

Practice this question →

234

MCQmedium

An organization uses AWS Systems Manager to manage its EC2 instances. After a security incident, the security team wants to ensure that all future API calls to Systems Manager are logged and monitored. What is the MOST efficient way to achieve this?

A.Enable S3 server access logging on the Systems Manager log bucket

B.Enable AWS CloudTrail for the Systems Manager service

C.Install the CloudWatch Logs agent on each instance to capture Systems Manager logs

D.Create an AWS Config rule to monitor Systems Manager usage

AnswerB

CloudTrail logs all API calls to Systems Manager.

Why this answer

Option C is correct because enabling CloudTrail for Systems Manager logs all API calls. Option A is wrong because S3 server access logs are for S3 bucket access, not Systems Manager; B is wrong because CloudWatch Logs agent runs on instances, not for API calls; D is wrong because Config rules track configuration, not API calls.

Practice this question →

235

MCQhard

A company uses AWS Lambda functions behind an Amazon API Gateway REST API. During an incident, the API returns 502 Bad Gateway errors. The Lambda function logs show no errors. What is the most likely cause?

A.The Lambda function is throwing an unhandled exception

B.The Lambda function is returning a response that exceeds the API Gateway payload size limit

C.The API Gateway has reached its maximum concurrency limit

D.The Lambda function is timing out and API Gateway is not handling the timeout correctly

AnswerB

API Gateway has a 10 MB payload limit; exceeding it causes 502.

Why this answer

When an API Gateway REST API returns 502 Bad Gateway errors but the Lambda function logs show no errors, the most likely cause is that the Lambda function is returning a response that exceeds the API Gateway payload size limit. API Gateway has a maximum payload size of 10 MB for REST APIs, and if the Lambda function returns a response larger than this, API Gateway will reject it and return a 502 error without the Lambda function itself throwing an exception or logging an error.

Exam trap

AWS often tests the distinction between different HTTP status codes (502 vs 504 vs 429) and the specific conditions under which each is returned, leading candidates to incorrectly attribute 502 errors to Lambda timeouts or API Gateway throttling instead of payload size limits.

How to eliminate wrong answers

Option A is wrong because an unhandled exception in the Lambda function would cause the function to fail and log an error in Amazon CloudWatch Logs, but the question states that the Lambda function logs show no errors. Option C is wrong because API Gateway does not have a maximum concurrency limit; it scales automatically, and reaching a concurrency limit would result in 429 Too Many Requests errors, not 502 Bad Gateway errors. Option D is wrong because if the Lambda function were timing out, the Lambda service would log a timeout error in CloudWatch Logs, and API Gateway would typically return a 504 Gateway Timeout error, not a 502 Bad Gateway error.

Practice this question →

236

Multi-Selectmedium

A company uses AWS CloudFormation to manage infrastructure. An engineer notices that a stack update has failed, leaving the stack in a ROLLBACK_IN_PROGRESS state. Which TWO actions should the engineer take to investigate and resolve the issue?

Select 2 answers

A.Manually stop the rollback and continue with the update

B.Re-launch the stack with the same template

C.View the stack events in the CloudFormation console to see the specific error message

D.Delete the stack and re-launch it

E.Review the change set that was applied during the update

AnswersC, E

Stack events provide detailed error information.

Why this answer

Options B and D are correct. The engineer should view the stack events to see the specific error, then review the change set to understand what changes were attempted. Option A is wrong because deleting the stack would lose resources; C is wrong because re-launching may repeat the error; E is wrong because stopping rollback is not recommended without understanding the issue.

Practice this question →

237

Multi-Selectmedium

A DevOps engineer is designing an incident response plan for a multi-region application. The application runs on EC2 instances behind an Application Load Balancer (ALB) and uses Amazon RDS for MySQL with Multi-AZ. Which TWO actions should the engineer include to ensure high availability and fast failover during a regional incident?

Select 2 answers

A.Set up an Amazon RDS read replica in a second region and promote it during failover.

B.Create an Auto Scaling group that can launch instances in multiple regions.

C.Deploy an Application Load Balancer that spans both regions.

D.Configure Amazon RDS Multi-AZ in a second region.

E.Use Amazon Route 53 with health checks to fail over DNS to a secondary region.

AnswersA, E

Read replica can be promoted to primary in another region.

Why this answer

Options B and D are correct. B is correct because using Route 53 with health checks on the ALB endpoint can route traffic away from an unhealthy region. D is correct because a warm standby with a read replica in another region allows for promoting the replica to primary quickly.

Option A is wrong because a Multi-AZ RDS instance is in a single region. Option C is wrong because a single ALB cannot span regions. Option E is wrong because Auto Scaling groups are per-region, not cross-region.

Practice this question →

238

MCQeasy

A DevOps engineer notices that an EC2 instance running a web application is unresponsive. CloudWatch alarms are not triggering. What is the FIRST step the engineer should take to diagnose the issue?

A.Terminate the instance and launch a new one from the latest AMI.

B.Review the EC2 instance system log and CloudWatch Logs for error messages.

C.Restart the EC2 instance immediately to restore service.

D.Create a new CloudWatch alarm with a lower threshold to get alerted quicker next time.

AnswerB

System logs provide immediate insight into crashes, OOM, or application errors.

Why this answer

Option C is correct because checking the system logs (e.g., EC2 console or CloudWatch Logs) helps identify the root cause. Option A is wrong because restarting without diagnosis may lose transient logs. Option B is wrong because creating a new alarm doesn't help diagnose the current issue.

Option D is wrong because it assumes the instance is permanently failed without investigation.

Practice this question →

239

MCQhard

A company uses Amazon RDS for MySQL with Multi-AZ deployment. During an incident, the primary DB instance becomes unreachable. The failover to the standby instance succeeds, but application connections are failing with 'Access denied for user'. What is the most likely cause?

A.The DNS CNAME for the RDS endpoint has not propagated to the application's DNS resolver

B.The standby instance has a different storage configuration than the primary

C.The application is using the old master user credentials that were changed on the primary but not replicated to the standby

D.The security group for the RDS instance does not allow inbound traffic from the application's new IP address

AnswerC

Credentials are not replicated across Multi-AZ; they must be the same.

Why this answer

Option A is correct because after failover, the standby instance may have different credentials if not using the same secret. Option B is incorrect because DNS propagation does not cause access denied errors. Option C is incorrect because the security group is associated with the RDS instance, not the endpoint.

Option D is incorrect because the standby instance has the same storage as the primary.

Practice this question →

240

MCQhard

A company uses AWS Lambda functions to process S3 events. After a recent deployment, some functions fail with timeout errors. The engineer needs to implement a solution that automatically captures and stores the function's input payload for all failed invocations without modifying the Lambda code. Which approach meets these requirements?

A.Enable CloudWatch Logs Insights on the Lambda function's log group.

B.Use CloudWatch Events to capture Lambda invocation errors.

C.Configure a dead-letter queue (DLQ) for the Lambda function.

D.Use AWS Lambda Destinations with a failure destination pointing to an SQS queue.

AnswerD

Lambda Destinations send invocation records with payload for failed async invocations.

Why this answer

Option D is correct because Lambda Destinations allow sending invocation records (including input payload) to SQS, SNS, Lambda, or EventBridge for failed events, without code changes. Option A is wrong because DLQs only capture messages for Lambda functions invoked asynchronously by SNS or SQS, not directly by S3 events. Option B is wrong because CloudWatch Logs does not capture payload by default; it only logs if the function writes to logs.

Option C is wrong because CloudWatch Events (EventBridge) can capture Lambda invocations but not the payload without custom code.

Practice this question →

241

Multi-Selectmedium

A company uses Amazon CloudFront with an S3 origin. Users in Europe report slow load times. The engineer needs to improve performance for European users. Which TWO actions should the engineer take?

Select 2 answers

A.Add additional price classes to the CloudFront distribution (e.g., Price Class 200).

B.Enable S3 Transfer Acceleration on the origin bucket.

C.Enable CloudFront access logs to identify slow requests.

D.Enable AWS WAF on the CloudFront distribution to block malicious traffic.

E.Enable CloudFront Origin Shield in a European region.

AnswersA, E

Price Class 200 includes more edge locations in Europe.

Why this answer

Option A is correct because adding additional price class can include more edge locations in Europe. Option C is correct because using origin shield reduces load on the origin and improves performance. Option B is wrong because S3 Transfer Acceleration is for uploads, not CloudFront.

Option D is wrong because enabling logging does not improve performance. Option E is wrong because enabling WAF does not improve performance.

Practice this question →

242

MCQeasy

A company uses AWS CloudFormation to manage infrastructure. During an incident, a stack update fails with the error 'The following resource(s) failed to create: [AWS::RDS::DBInstance]'. Which AWS service should the engineer use to view detailed error messages for the failed resource creation?

A.AWS Config timeline

B.AWS CloudFormation console Events tab

C.AWS Service Catalog

D.AWS CloudTrail event history

AnswerB

CloudFormation events show detailed error messages for failed resources.

Why this answer

Option A is correct because CloudFormation events provide detailed error messages for resource creation failures. Option B (CloudTrail) records API calls but not CloudFormation-specific resource errors. Option C (Config) is for configuration compliance.

Option D (Service Catalog) is for product provisioning, not troubleshooting stack failures.

Practice this question →

243

Multi-Selecthard

Which TWO metrics should be monitored in Amazon CloudWatch to detect a potential memory leak in an EC2 instance? (Choose two.)

Select 2 answers

A.DiskReadOps

B.MemoryUtilization (custom metric published via CloudWatch Agent).

C.NetworkIn

D.SwapUsage (custom metric published via CloudWatch Agent).

E.CPUUtilization

AnswersB, D

Directly measures memory usage.

Why this answer

MemoryUtilization is a custom metric that must be published via the CloudWatch Agent because EC2 does not expose memory metrics by default. Monitoring this metric over time can reveal a steady upward trend in memory usage that does not drop after processes complete, which is a classic symptom of a memory leak.

Exam trap

The trap here is that candidates assume EC2 provides memory metrics by default (like CPUUtilization), but they must be explicitly enabled via the CloudWatch Agent, and they overlook SwapUsage as a complementary indicator of memory pressure from a leak.

Practice this question →

244

MCQhard

A company's production environment consists of EC2 instances in an Auto Scaling group behind an Application Load Balancer (ALB). The instances run a web application that stores session data in an ElastiCache Redis cluster. The company has enabled detailed CloudWatch metrics and set up a dashboard. The operations team notices that the average CPU utilization across the Auto Scaling group spikes to 95% every 15 minutes, coinciding with a high number of Redis connections. What is the MOST likely cause?

A.The application is using Memcached instead of Redis, causing increased load.

B.The Auto Scaling group's scaling policy is based on memory utilization instead of CPU.

C.The ALB has session stickiness enabled, causing traffic to be routed to the same instances.

D.The ElastiCache cluster is not large enough to handle the number of requests.

AnswerC

Stickiness can overload specific instances.

Why this answer

Option B is correct because session persistence can lead to uneven load distribution. Option A is wrong because Redis itself doesn't cause CPU spikes. Option C is wrong because memcached is not used.

Option D is wrong because it doesn't explain the periodic spikes.

Practice this question →

245

MCQhard

A company has a multi-account strategy using AWS Organizations. The security team needs to respond to incidents across all accounts. They want to ensure that all CloudTrail trails are enabled and logging to a central S3 bucket in the management account. What is the MOST efficient way to monitor compliance?

A.Create a CloudTrail organization trail and use CloudTrail Insights to detect configuration changes.

B.Use AWS Config conformance packs with a managed rule to check CloudTrail is enabled.

C.Set up CloudWatch Events rules in each account to detect trail disabling.

D.Use AWS Trusted Advisor to check CloudTrail configuration in each account.

AnswerB

Conformance packs can be applied to all accounts in the organization.

Why this answer

Option D is correct because AWS Config conformance packs can evaluate whether CloudTrail trails are enabled across accounts using managed rules. Option A is wrong because Trusted Advisor does not monitor CloudTrail configuration across all accounts. Option B is wrong because CloudTrail itself does not monitor trail configuration.

Option C is wrong because CloudWatch Events can be used but require custom rules per account; Config is more efficient.

Practice this question →

246

MCQeasy

A company experiences an unexpected spike in network traffic to a web application hosted on EC2 instances behind an Application Load Balancer. The DevOps team needs to investigate the source IP addresses generating the traffic. Which AWS service should they use to capture the traffic?

A.Amazon CloudWatch Logs

B.AWS Config

C.AWS CloudTrail

D.VPC Flow Logs

AnswerD

VPC Flow Logs capture network traffic information.

Why this answer

Option B is correct because VPC Flow Logs capture IP traffic information including source and destination IPs. Option A is wrong because CloudTrail logs API calls, not network traffic; C is wrong because CloudWatch Logs captures application logs; D is wrong because AWS Config records resource configuration changes.

Practice this question →

247

MCQeasy

A company runs a critical application on Amazon EC2 instances behind an Application Load Balancer. During a security incident, the security team needs to isolate a compromised instance for forensic analysis without affecting the application's availability. What is the MOST effective action to take?

A.Deregister the instance from the target group and stop the instance for forensic analysis.

B.Modify the security group of the instance to deny all inbound and outbound traffic.

C.Terminate the compromised instance immediately to prevent further damage.

D.Change the subnet route table to route traffic away from the compromised instance.

AnswerA

Isolates the instance while maintaining availability.

Why this answer

Deregistering the instance from the target group removes it from the Application Load Balancer's routing, ensuring no new traffic is sent to it while existing connections drain (connection draining). Stopping the instance preserves its memory and disk state for forensic analysis without impacting application availability, as the remaining healthy instances continue to serve traffic.

Exam trap

The trap here is that candidates confuse network-level isolation (security groups or route tables) with application-level isolation (target group deregistration), failing to recognize that the ALB continues to route traffic to a registered instance regardless of its security group or subnet routing.

How to eliminate wrong answers

Option B is wrong because modifying the security group to deny all traffic only blocks network-level access; the instance remains registered in the target group, and the ALB may still attempt to route traffic to it, potentially causing connection timeouts or errors. Option C is wrong because terminating the instance immediately destroys volatile data (e.g., memory contents, running processes) needed for forensic analysis and could cause a sudden loss of capacity if the instance was handling active requests. Option D is wrong because changing the subnet route table affects all instances in that subnet, not just the compromised one, and does not prevent the ALB from sending traffic to the instance via its private IP; route tables control layer-3 routing, not load balancer target group membership.

Practice this question →

248

MCQmedium

A company uses AWS Organizations with multiple accounts. The security team wants to ensure that all IAM roles in member accounts have a maximum session duration of 1 hour. They need a way to detect any roles that violate this policy. What should they do?

A.Use IAM Access Analyzer to validate the roles against a policy template.

B.Use AWS Config with the managed rule iam-role-max-session-duration to evaluate roles.

C.Run AWS Trusted Advisor and check the IAM report for roles with long session durations.

D.Enable AWS CloudTrail and create a metric filter to detect role creation with session duration greater than 1 hour.

AnswerB

C is correct because AWS Config can continuously evaluate role configurations against this rule.

Why this answer

AWS Config provides a managed rule called `iam-role-max-session-duration` that specifically evaluates IAM roles to ensure their `MaxSessionDuration` setting does not exceed a specified threshold (default 1 hour). This rule can be deployed across all member accounts in AWS Organizations using a conformance pack or AWS Config aggregator, allowing the security team to continuously detect and report any roles that violate the policy without manual intervention.

Exam trap

The trap here is that candidates often confuse AWS Config's ability to evaluate resource configurations (like IAM role session duration) with CloudTrail's event logging or IAM Access Analyzer's policy analysis, leading them to choose options that detect creation events rather than continuously assess the current state of all roles.

How to eliminate wrong answers

Option A is wrong because IAM Access Analyzer is designed to analyze resource-based policies (like S3 bucket policies or KMS key policies) for unintended public or cross-account access, not to validate IAM role session duration settings against a policy template. Option C is wrong because AWS Trusted Advisor checks for IAM use (e.g., unused IAM users, MFA on root) but does not include a specific check for IAM role maximum session duration. Option D is wrong because while CloudTrail can log `CreateRole` and `UpdateAssumeRolePolicy` events, a metric filter cannot directly evaluate the `MaxSessionDuration` parameter from the event; it would require complex custom parsing and still not provide ongoing compliance evaluation like AWS Config.

Practice this question →

249

MCQeasy

A DevOps engineer receives a CloudWatch alarm that an Auto Scaling group has been in an 'Insufficient data' state for 20 minutes. What does this indicate?

A.All instances in the Auto Scaling group are unhealthy

B.The Auto Scaling group needs to scale up

C.The alarm has not received enough metric data to evaluate

D.The CloudWatch agent is not installed on the instances

AnswerC

Insufficient data means not enough data points.

Why this answer

The 'Insufficient data' state in CloudWatch alarms indicates that the alarm has not received enough metric data points to determine whether the threshold has been breached. This can occur when the metric is not being published, the data collection period is too short, or the metric namespace is misconfigured. It does not directly indicate instance health, scaling needs, or agent installation status.

Exam trap

The trap here is that candidates confuse 'Insufficient data' with a problem state (like unhealthy instances or scaling failures), when it actually means the alarm simply lacks enough data to make a determination.

How to eliminate wrong answers

Option A is wrong because 'Insufficient data' does not imply unhealthy instances; it means the alarm lacks metric data to evaluate, whereas unhealthy instances would trigger 'ALARM' state if health check metrics are configured. Option B is wrong because the alarm state does not indicate a scaling need; scaling decisions are based on threshold breaches, not insufficient data. Option D is wrong because the CloudWatch agent is not required for all metrics; many metrics (e.g., EC2 basic monitoring) are published automatically without an agent, and 'Insufficient data' can occur even with the agent installed if data is not flowing.

Practice this question →

250

MCQeasy

A DevOps engineer receives an alert that an Amazon EC2 instance’s CPU utilization has been above 90% for the past hour. The instance is part of an Auto Scaling group with a step scaling policy based on average CPU. The engineer checks the CloudWatch alarm and sees that it is in the ALARM state. What should the engineer do to verify that the Auto Scaling group is scaling out properly?

A.Ensure the scaling policy is configured to scale in

B.Check the CloudWatch Logs for the instance

C.Verify that the CloudWatch alarm is in INSUFFICIENT_DATA state

D.Review the Auto Scaling group’s activity history in the EC2 console

AnswerD

Activity history shows scaling actions taken.

Why this answer

Option A is correct because checking the Auto Scaling activity history will show if scaling actions were triggered. Option B is wrong because CloudWatch logs show instance logs, not scaling actions. Option C is wrong because the alarm state indicates it’s triggered.

Option D is wrong because the scaling policy is for scale-out on high CPU.

Practice this question →

251

MCQhard

A company uses AWS Organizations with multiple accounts. The security team needs a centralized solution to detect and respond to EC2 instances that are publicly accessible with SSH open to 0.0.0.0/0. Which combination of services provides the most automated detection and remediation?

A.AWS CloudTrail and Amazon EventBridge

B.Amazon GuardDuty and AWS Lambda

C.AWS Config and Amazon Simple Notification Service (SNS)

D.AWS Config and AWS Systems Manager Automation

AnswerD

Config rule detects non-compliant SG; Systems Manager Automation remediates.

Why this answer

The correct answer is D. AWS Config rules can detect non-compliant security groups, and Systems Manager Automation can remediate by modifying the security group rules. GuardDuty detects threats but not config compliance.

EventBridge alone doesn't remediate. CloudTrail is for auditing API calls.

Practice this question →

252

MCQhard

A company runs a containerized microservices application on Amazon ECS with Fargate launch type. The application uses an Application Load Balancer to route traffic to the ECS service. Recently, the DevOps team noticed that the ECS service is failing to deploy new tasks during a rolling update. The CloudWatch Logs for the ECS service show that new tasks are failing to start because they cannot pull the container image from Amazon ECR. The error message indicates 'AccessDenied' when attempting to pull the image. The task execution role has the necessary permissions, and the image URI is correct. The VPC has a VPC endpoint for ECR configured. The security group for the tasks allows outbound traffic to the VPC endpoint. What is the MOST likely cause of the access denied error?

A.The task execution role does not have the 'ecr:GetAuthorizationToken' permission.

B.The security group for the ALB does not allow inbound traffic to the ECS tasks.

C.The VPC endpoint for ECR does not have 'Private DNS names enabled' selected.

D.The task role does not have the 'ecr:BatchGetImage' permission.

AnswerC

When private DNS is not enabled, the DNS resolution for ECR endpoints defaults to public IPs, which may not be reachable from the VPC, causing access denied.

Why this answer

Option D is correct because the VPC endpoint for ECR requires private DNS resolution to be enabled for the task to resolve the ECR repository endpoint to the private IP address. Without this, the task tries to connect via the public endpoint, which may be blocked by the security group or route table. Option A is wrong because the task execution role already has the required permissions (the error is not about IAM).

Option B is wrong because the issue is not related to the ALB security group. Option C is wrong because the task role is for application-specific permissions, not for pulling images.

Practice this question →

253

MCQmedium

A company uses AWS Lambda functions to process messages from an Amazon SQS queue. The DevOps team notices that messages are not being processed and are going to the dead-letter queue. The Lambda function code is correct. What is the most likely cause?

A.The Lambda function's execution role lacks the sqs:ReceiveMessage permission.

B.The SQS queue's visibility timeout is too long.

C.The Lambda function timeout is too short.

D.The Lambda function's dead-letter queue is misconfigured.

AnswerA

Without ReceiveMessage permission, Lambda cannot fetch messages, so they eventually go to DLQ.

Why this answer

Option C is correct because if the Lambda function's execution role does not have permission to receive messages from SQS, it cannot process them, causing them to go to DLQ. Option A is wrong because SQS does not have a timeout that causes messages to go to DLQ; that's a Lambda function timeout. Option B is wrong because visibility timeout is set on the queue, not Lambda.

Option D is wrong because DLQ is not used for Lambda errors; it's used for messages that cannot be processed.

Practice this question →

254

MCQhard

A DevOps engineer is troubleshooting an issue where an Amazon RDS instance's CPU utilization is consistently high. The engineer has enabled Performance Insights and sees that the top SQL query is a SELECT statement that scans many rows. What is the best course of action to reduce CPU utilization?

A.Create a read replica to offload read traffic.

B.Increase the allocated storage to improve I/O.

C.Increase the DB instance size to handle the load.

D.Add appropriate indexes to optimize the query.

AnswerD

Indexes reduce the number of rows scanned, lowering CPU usage.

Why this answer

Option D is correct because adding appropriate indexes can reduce the number of rows scanned, thus reducing CPU usage. Option A is wrong because increasing instance size is a temporary fix. Option B is wrong because read replicas are for read traffic, not for reducing CPU on the primary.

Option C is wrong because increasing storage does not reduce CPU.

Practice this question →

← PreviousPage 4 of 4 · 254 questions total

Ready to test yourself?

Try a timed practice session using only Incident Response questions.

Start 20-question session