Knowledge + Practice

AWS Certified DevOps Engineer Professional DOP-C02 (DOP-C02) — Questions 601–675

1740 questions total · 24pages · All types, answers revealed

Take a mock exam Exam hub

Page 9 of 24

601

MCQhard

A company runs a containerized application on Amazon ECS with Fargate launch type. The application consists of three microservices: frontend, backend, and database. The ECS cluster is in a VPC with public and private subnets. The frontend service is publicly accessible via an Application Load Balancer (ALB) in public subnets. The backend service communicates with the database service, which runs as a stateful service with persistent storage using Amazon EFS. The DevOps team is using CloudWatch Container Insights and has enabled Prometheus metrics for the ECS cluster. Recently, the team observed that the frontend service's response time has increased significantly, and some requests are timing out. The team checked the ALB metrics and saw an increase in 5xx errors. They also noticed that the backend service's CPU utilization is high, and the database service's disk I/O is high. The team suspects a bottleneck in the backend service. Which course of action should the team take FIRST to identify the root cause?

A.Disable the health check for the backend service in the ALB target group.

B.Migrate the database service to Amazon RDS for better performance.

C.Check the backend service's application logs in CloudWatch Logs to identify errors or slow database queries.

D.Increase the desired count of the backend service to reduce load per task.

AnswerC

Logs will help pinpoint the issue.

Why this answer

Option B is correct. The first step is to analyze the backend service's application logs to identify any errors or slow operations. The high CPU and disk I/O could be symptoms of inefficient queries or code.

Option A is incorrect because increasing capacity without understanding the root cause may not solve the issue and could increase costs. Option C is incorrect because switching to a different database does not address the immediate issue. Option D is incorrect because disabling health checks would hide the problem, not fix it.

Full explanation →

602

Multi-Selectmedium

A DevOps engineer is setting up centralized logging for a multi-account environment using AWS Organizations. The engineer needs to aggregate logs from all accounts into a single Amazon S3 bucket. Which TWO steps are necessary?

Select 2 answers

A.Create IAM roles in each account to allow the central bucket to read logs.

B.Create a bucket policy on the central S3 bucket that grants permissions to the source accounts.

C.Enable CloudTrail organization trail in the management account to deliver logs to the central bucket.

D.Set up a cross-account subscription in CloudWatch Logs to forward logs to the central account.

E.Configure each account’s services (e.g., CloudTrail, VPC Flow Logs) to deliver logs to the central S3 bucket.

AnswersB, E

The bucket policy must allow the source accounts to write logs.

Why this answer

Option B is correct because a bucket policy on the central S3 bucket can grant cross-account permissions to source accounts to write logs. This allows services like CloudTrail and VPC Flow Logs from member accounts to deliver logs directly to the central bucket without requiring IAM roles in each account for reading logs.

Exam trap

The trap here is that candidates often confuse the need for IAM roles in each account (Option A) with the correct bucket policy approach, or they assume that enabling an organization trail (Option C) is mandatory when the question allows for individual account configuration.

Full explanation →

603

MCQhard

A company uses Amazon S3 to store sensitive data. The security team wants to be notified when an S3 bucket policy is modified. Which approach is most efficient?

A.Create an Amazon EventBridge rule that matches the 'PutBucketPolicy' API call and sends a notification to an SNS topic.

B.Set up an AWS Config rule to detect changes to the bucket policy.

C.Configure S3 event notifications for 's3:PutBucketPolicy' on the bucket.

D.Enable S3 server access logs and use CloudWatch Logs Insights to run queries periodically.

AnswerA

EventBridge provides real-time filtering of CloudTrail events.

Why this answer

Option A is correct because CloudTrail logs S3 bucket policy changes, and EventBridge can filter those events to trigger an SNS notification. Option B is wrong because CloudWatch Logs Insights is a query tool, not real-time alerting. Option C is wrong because S3 event notifications are for object-level events, not bucket policy changes.

Option D is wrong because Config can detect changes but is not the most efficient for real-time alerting.

Full explanation →

604

MCQhard

Refer to the exhibit. An IAM policy is attached to a role used by a CI/CD system. The policy is intended to allow starting the pipeline 'MyPipeline' from the same account. However, the CI/CD system receives an 'AccessDenied' error when trying to start the pipeline. What is the problem?

A.The Allow statement does not specify the correct pipeline ARN.

B.The policy needs an additional Allow for 'codepipeline:GetPipeline' to start the pipeline.

C.The role does not have permission to pass the policy to the CI/CD system.

D.The Deny statement with the 'aws:SourceAccount' condition denies access if the condition key is not present in the request.

AnswerD

If the condition key is absent, the Deny applies, causing AccessDenied.

Why this answer

Option B is correct. The Deny statement with a condition denies all codepipeline actions if the source account is not 123456789012. However, the Allow statement allows StartPipelineExecution for the specific pipeline.

But the Deny statement with a condition that does not match (i.e., if the source account is exactly 123456789012, the condition is false, so the Deny does not apply). Wait, let's analyze: The Deny statement applies when the source account is NOT 123456789012. So if the source account IS 123456789012, the Deny does not apply, and the Allow should work.

But the error suggests AccessDenied. The issue could be that the policy also needs to allow other actions? Actually, looking closely: The Allow only allows StartPipelineExecution, but maybe the pipeline execution requires other actions? No, StartPipelineExecution should be sufficient. Another possibility: The Deny statement with condition might be interpreted incorrectly.

Actually, the Deny with StringNotEquals will deny if the source account is not equal to the specified account. So if the source account is 123456789012, the condition is false, so the Deny does not apply. So the Allow should work.

But the error persists. Perhaps the issue is that the policy also needs to allow 'codepipeline:GetPipeline' or something? However, the most likely cause is that the Deny statement is too broad: it denies all codepipeline actions for any resource, but only if the source account is not the one specified. If the source account is correct, the Deny doesn't apply.

So why denial? Maybe the condition key 'aws:SourceAccount' is not set for the request? If the condition key is not present, then StringNotEquals evaluates to false (since not present is not equal), so the Deny would apply. That is a common pitfall: when the condition key is absent, the condition evaluates to false, causing the Deny to apply. So Option B is correct: the condition key 'aws:SourceAccount' may not be present in the request, causing the Deny to apply.

Option A is wrong because the Allow is not blocked by Deny if condition met. Option C is wrong because there is no explicit deny for GetPipeline. Option D is wrong because the policy is attached to the role, not user.

Full explanation →

605

MCQhard

A company is running a critical application on Amazon ECS with Fargate launch type. The application writes logs to Amazon CloudWatch Logs. The DevOps team needs to set up an alert when the application generates more than 100 error logs in any 5-minute window. Which configuration should be used?

A.Create a CloudWatch Logs Insights query that runs every 5 minutes and triggers an SNS notification

B.Create an Amazon EventBridge rule that matches CloudWatch Logs events for the word 'ERROR' and triggers an alarm

C.Create a CloudWatch Logs metric filter for 'ERROR' and a CloudWatch alarm on the resulting metric with a period of 5 minutes

D.Enable AWS CloudTrail logging for the ECS task and create a metric filter on CloudTrail logs

AnswerC

Metric filter extracts error count into a custom metric, and alarm triggers when threshold exceeded.

Why this answer

Option D is correct because a CloudWatch Logs metric filter can parse logs for the word 'ERROR' and create a custom metric, and a CloudWatch alarm can be set on that metric. Option A is wrong because CloudWatch Logs Insights is for querying, not real-time alerting. Option B is wrong because EventBridge can't directly parse log contents.

Option C is wrong because CloudTrail is for API calls.

Full explanation →

606

MCQmedium

A company uses AWS CodeBuild to compile Java applications. The builds often fail due to insufficient memory. The buildspec currently specifies 'compute-type: BUILD_GENERAL1_SMALL'. What is the most cost-effective solution to resolve the memory issues without changing the build logic?

A.Change the compute-type to 'BUILD_GENERAL1_MEDIUM' or 'BUILD_GENERAL1_LARGE' in the buildspec.

B.Enable Amazon S3 caching for the build artifacts to reduce memory usage.

C.Split the build into multiple parallel CodeBuild actions in the pipeline, each compiling a subset of the code.

D.Set the environment variable 'MEMORY_OVERPROVISION=2' in the buildspec.

AnswerA

Larger compute types provide more memory.

Why this answer

Option C is correct because increasing the compute type to a larger size provides more memory. Option A is wrong because enabling local caching does not increase memory. Option B is wrong because CodeBuild does not support memory overprovisioning.

Option D is wrong because using multiple build environments in parallel does not help a single build's memory needs.

Full explanation →

607

MCQhard

A DevOps engineer receives the error shown in the exhibit when attempting to update an existing CloudFormation stack that deploys a VPC with subnets. The stack was created successfully earlier using the same template. What is the most likely cause of this error?

A.The subnet ID in the template is already used by another stack in the same account.

B.The IAM role used for the stack update lacks the 'ec2:DescribeSubnets' permission.

C.The subnet specified in the template does not exist in the selected AWS region.

D.The CloudFormation template has a syntax error in the subnet definition.

AnswerB

The error message explicitly states the user is not authorized to perform ec2:DescribeSubnets.

Why this answer

When updating a CloudFormation stack that deploys a VPC with subnets, the update operation must be able to read the current state of the subnet resources to determine if changes are needed. The IAM role used for the stack update must have the 'ec2:DescribeSubnets' permission to query the existing subnet configuration. Without this permission, CloudFormation cannot verify the subnet's current properties, leading to the error shown in the exhibit.

Exam trap

The trap here is that candidates often assume the error is due to a template syntax issue or resource conflict, but the real cause is insufficient IAM permissions for the update operation, which is a subtle but critical distinction in CloudFormation stack management.

How to eliminate wrong answers

Option A is wrong because subnet IDs are unique within an AWS account per region, and CloudFormation does not reuse subnet IDs across stacks; the error is not about ID conflicts. Option C is wrong because the stack was created successfully earlier using the same template, so the subnet does exist in the region; the error occurs during the update, not the initial creation. Option D is wrong because a syntax error in the template would have been caught during the initial stack creation, not during an update of an existing stack that was previously created successfully.

Full explanation →

608

MCQeasy

A development team wants to automate infrastructure provisioning using AWS CloudFormation. Which tool is specifically designed to manage CloudFormation templates as part of a deployment pipeline?

A.AWS CodePipeline

B.AWS CodeCommit

C.AWS CodeBuild

D.AWS CodeDeploy

AnswerA

CodePipeline orchestrates build, test, and deploy actions including CloudFormation.

Why this answer

Option C is correct because AWS CodePipeline natively integrates with CloudFormation to deploy infrastructure. Option A is wrong because AWS CodeCommit is a source control service. Option B is wrong because AWS CodeBuild is a build service.

Option D is wrong because AWS CodeDeploy is for deploying applications, not infrastructure.

Full explanation →

609

MCQhard

A DevOps team is using Amazon CloudWatch Logs to centralize logs from multiple EC2 instances running a custom application. The team notices that logs are missing from some instances intermittently. The CloudWatch agent configuration is identical across all instances. What is the MOST likely cause of the missing logs?

A.The CloudWatch Logs agent's state file has become corrupted due to disk full condition

B.The VPC Flow Logs are consuming all available network bandwidth

C.The EC2 instances are running out of CPU credits, causing the agent to skip log batches

D.The IAM role attached to the instances has been rotated incorrectly

AnswerA

If the agent's buffer disk is full, it stops sending logs until space is freed.

Why this answer

Option B is correct because the CloudWatch agent will stop sending logs if the disk space dedicated to the log buffer is full, which can happen if the destination log group or stream is throttled. Option A is wrong because VPC Flow Logs do not affect CloudWatch Logs. Option C is wrong because IAM roles are typically checked at startup, and intermittent issues are unlikely.

Option D is wrong because CPU utilization does not directly cause log loss; the agent prioritizes logs.

Full explanation →

610

MCQhard

A company runs a production e-commerce platform on AWS. The architecture includes an Application Load Balancer (ALB) that distributes traffic to a fleet of Amazon EC2 instances running in an Auto Scaling group across three Availability Zones (AZs). The application stores session state in Amazon ElastiCache for Redis (cluster mode disabled) with a single node. The database is an Amazon Aurora MySQL DB cluster with one writer and two reader instances in different AZs. The platform experiences intermittent slowdowns and occasional timeouts during peak traffic hours. The CloudWatch metrics show that the ALB's TargetResponseTime is elevated, and the Redis CPU utilization is consistently above 80% during these periods. The Auto Scaling group is scaling out, but new instances take several minutes to become healthy. The DevOps team has been asked to improve the resilience and performance of the application with minimal changes to the application code. Which solution should the team implement?

A.Replace the ALB with a Network Load Balancer (NLB) to reduce latency, and use an Auto Scaling group with a step scaling policy based on Redis CPU utilization.

B.Increase the instance size of the ElastiCache for Redis node and the size of the Aurora writer instance. Also, increase the cooldown period for the Auto Scaling group to allow new instances to warm up.

C.Implement Amazon RDS Proxy in front of the Aurora cluster to reduce database connection overhead, and increase the size of the Redis instance to handle more connections.

D.Migrate ElastiCache for Redis to a cluster mode enabled configuration with multiple shards and enable Multi-AZ with automatic failover. Also, use an ElastiCache replication group with read replicas in different AZs.

AnswerD

Cluster mode shards data, reducing per-node CPU. Multi-AZ and replicas improve resilience and reduce failover time.

Why this answer

Option D is correct because the primary bottleneck is the single-node Redis instance (CPU > 80%), which cannot scale reads or handle failover. Migrating to cluster mode enabled with multiple shards distributes the CPU load across shards, while Multi-AZ with automatic failover and read replicas in different AZs provides high availability and read scaling. This directly addresses the elevated ALB TargetResponseTime caused by Redis latency without requiring application code changes.

Exam trap

The trap here is that candidates focus on scaling the database or load balancer (options A, B, C) instead of recognizing that the single-node Redis cache is the bottleneck and requires horizontal scaling and high availability to resolve both performance and resilience issues.

How to eliminate wrong answers

Option A is wrong because replacing the ALB with an NLB does not reduce application-layer latency (NLB operates at Layer 4, not Layer 7, and cannot offload TLS or inspect HTTP sessions), and a step scaling policy based on Redis CPU utilization does not fix the single-node Redis bottleneck or the slow instance warm-up. Option B is wrong because increasing the instance size of the single Redis node and the Aurora writer instance only vertically scales the existing bottlenecks, and increasing the Auto Scaling group cooldown period would delay scaling further, worsening the timeouts. Option C is wrong because RDS Proxy reduces database connection overhead but does not address the Redis CPU bottleneck (the primary cause of elevated response times), and increasing the Redis instance size alone does not provide the read scaling or high availability needed.

Full explanation →

611

MCQeasy

A DevOps engineer needs to grant cross-account access to an S3 bucket. The source account is 111111111111 and the target account is 222222222222. Which combination of a bucket policy and an IAM policy correctly grants the target account access?

A.Bucket policy: { "Effect": "Allow", "Principal": "222222222222", "Action": "s3:GetObject", "Resource": "arn:aws:s3:::example-bucket/*" } and IAM policy: same as A

B.Bucket policy: { "Effect": "Allow", "Principal": { "AWS": "arn:aws:iam::222222222222:root" }, "Action": "s3:GetObject", "Resource": "arn:aws:s3:::example-bucket/*" } and IAM policy: { "Effect": "Allow", "Action": "s3:GetObject", "Resource": "arn:aws:s3:::example-bucket/*" }

C.Only bucket policy as in A

D.Only IAM policy as in A

AnswerB

Both policies are needed and correctly specified.

Why this answer

Cross-account S3 access requires a bucket policy in the source account that grants permissions to the target account's root or a specific role, and an IAM policy in the target account that allows the user to access the bucket. Option A is correct. Option B is missing the bucket policy.

Option C is missing the IAM policy. Option D uses an incorrect syntax.

Full explanation →

612

MCQmedium

A DevOps engineer is setting up centralized logging for multiple AWS accounts. They need to collect VPC Flow Logs, CloudTrail logs, and application logs into a single Amazon S3 bucket. What is the most efficient approach?

A.Configure a Lambda function in each account to copy logs to a central S3 bucket.

B.Create an S3 bucket in each account and use S3 replication.

C.Use Amazon Kinesis Data Firehose to stream logs from all accounts to a central S3 bucket.

D.Use an S3 bucket in a centralized logging account with a bucket policy that grants write access from all other accounts.

AnswerD

This allows direct cross-account log delivery.

Why this answer

Option B is correct because using an S3 bucket policy in a centralized account to allow cross-account log delivery is a common pattern. Option A is wrong because aggregating logs in a single account's S3 bucket requires cross-account permissions. Option C is wrong because Kinesis Data Firehose is not needed for simple log aggregation.

Option D is wrong because Lambda functions would add complexity and cost.

Full explanation →

613

Multi-Selecteasy

Which TWO actions should a DevOps engineer take to ensure that an AWS CodeBuild project can access a private Amazon S3 bucket to download build artifacts? (Choose two.)

Select 2 answers

A.Generate an access key and secret key for the CodeBuild project to use in the buildspec

B.Add a bucket policy that explicitly allows access from the CodeBuild project's IAM role ARN

C.Configure the CodeBuild project to run in a VPC with an S3 VPC endpoint

D.Attach an IAM role to the CodeBuild project with s3:GetObject permissions for the bucket

E.Make the S3 bucket publicly readable

AnswersB, D

Ensures the bucket allows access from the role.

Why this answer

Options A and D are correct. The CodeBuild project needs an IAM role with permissions to access the S3 bucket, and the S3 bucket policy must allow access from that role. Option B is wrong because VPC configuration is not required for S3 access.

Option C is wrong because the bucket does not need to be public. Option E is wrong because CodeBuild does not use access keys.

Full explanation →

614

Multi-Selecthard

A company runs a web application on Amazon EC2 instances behind an Application Load Balancer. The DevOps team has enabled detailed CloudWatch metrics for the ALB and is using CloudWatch Logs for the EC2 instances. Recently, users report intermittent 503 errors. The team notices that the ALB's 'RequestCount' metric shows a sudden drop during error periods, while the 'ActiveConnectionCount' remains steady. Which TWO steps should the team take to diagnose the issue? (Choose two.)

Select 2 answers

A.Enable and analyze the ALB access logs to see the HTTP response codes and target processing time.

B.Check the Amazon Route 53 health checks for the ALB DNS name.

C.Review the EC2 instances' CloudWatch metrics for CPU utilization and network traffic.

D.Observe the ALB's 'UnhealthyHostCount' metric and check target group health checks.

E.Inspect AWS CloudTrail logs for the ALB to see if there are any configuration changes.

AnswersA, D

Access logs provide detailed per-request information, including 503 responses and which target handled the request.

Why this answer

Option A is correct because ALB access logs contain detailed per-request data, including HTTP response codes (e.g., 503) and target processing time. Analyzing these logs will reveal whether the 503 errors are coming from the ALB itself (e.g., due to request queue overflow) or from the targets, and whether the sudden drop in RequestCount is due to clients aborting or the ALB throttling requests.

Exam trap

The trap here is that candidates often focus on EC2-level metrics (CPU, network) or CloudTrail config changes, missing that the ALB's own health check and access log data are the direct sources for diagnosing 503 errors tied to target unavailability or request queue limits.

Full explanation →

615

MCQmedium

A DevOps engineer receives an alarm that an EC2 instance's StatusCheckFailed metric has been in ALARM state for 10 minutes. Which action should the engineer take first to investigate?

A.Review the instance's system log and application logs in CloudWatch Logs

B.Use AWS Config to check the instance's configuration compliance

C.Check AWS CloudTrail for any API calls that modified the instance

D.Restart the EC2 instance to clear the alarm

AnswerA

Correct. System logs help diagnose instance status check failures.

Why this answer

Option C is correct because StatusCheckFailed indicates an instance issue; reviewing system logs in CloudWatch Logs can reveal the cause. Option A (restart) might resolve temporarily but does not diagnose. Option B (CloudTrail) logs API calls, not OS-level issues.

Option D (Config) is for configuration compliance.

Full explanation →

616

MCQmedium

A security engineer runs the above CLI command to investigate IAM user 'Bob'. The output shows Bob logged in and then created a new IAM user. Which additional information should the engineer look for to determine if this was a security incident?

A.The event name for the user creation.

B.The time the new user was last modified.

C.The source IP address from the CloudTrail event details.

D.The IAM group memberships of the new user.

AnswerC

The source IP can help identify if the login came from an unusual location.

Why this answer

To determine if the activity is malicious, the engineer should check the source IP address from the CloudTrail event details. Option B is correct. Option A is wrong because the event name is already known.

Option C is wrong because CloudTrail does not log the timestamp of the resource's last modification. Option D is wrong because the IAM user's group membership is not directly relevant to the login event.

Full explanation →

617

Multi-Selecteasy

A company runs a critical application on EC2 instances in an Auto Scaling group. The application must be highly available across multiple Availability Zones. Which TWO configurations are necessary to achieve this? (Choose TWO.)

Select 2 answers

A.Use a single Availability Zone to reduce latency.

B.Use Spot Instances to reduce costs.

C.Use a Classic Load Balancer to distribute traffic.

D.Place an Application Load Balancer in front of the Auto Scaling group.

E.Configure the Auto Scaling group to launch instances in multiple Availability Zones.

AnswersD, E

ALB distributes traffic across AZs and instances.

Why this answer

Options A and D are correct. Option A: Distributing instances across multiple AZs ensures that if one AZ fails, the application continues in other AZs. Option D: An Application Load Balancer (ALB) distributes traffic across instances in multiple AZs.

Option B is wrong because a single AZ defeats high availability. Option C is wrong because a Classic Load Balancer is not recommended; ALB is better for cross-zone. Option E is wrong because spot instances can be terminated, reducing availability.

Full explanation →

618

MCQhard

An organization uses AWS Elastic Beanstalk for application deployments. They want to implement immutable updates to minimize downtime and ensure that if the new environment fails health checks, the old environment remains intact. Which deployment policy should they choose?

A.Traffic splitting.

B.Immutable update.

C.All at once.

D.Rolling update based on health.

AnswerB

Immutable updates create a completely new environment and only swap when healthy.

Why this answer

Immutable updates in AWS Elastic Beanstalk launch a completely new environment with the new application version. If the new environment fails health checks, Elastic Beanstalk automatically terminates it, leaving the original environment untouched. This ensures zero downtime and a safe rollback, which matches the requirement to keep the old environment intact if health checks fail.

Exam trap

The trap here is that candidates confuse 'immutable update' with 'traffic splitting' because both involve a new environment, but traffic splitting does not automatically terminate the new environment on health check failure—it requires manual intervention or additional automation to roll back.

How to eliminate wrong answers

Option A is wrong because traffic splitting gradually shifts a percentage of traffic to a new environment, but if health checks fail, the old environment is not guaranteed to remain intact—the new environment may still be partially serving traffic and the rollback is not fully automated. Option C is wrong because all-at-once deploys replace all instances simultaneously, causing downtime and leaving no fallback environment if health checks fail. Option D is wrong because rolling update based on health replaces instances in batches and can terminate unhealthy instances in the old environment, potentially disrupting the original environment before the new one is fully verified.

Full explanation →

619

MCQmedium

A company uses AWS CloudTrail to log API activity across multiple accounts. The security team needs to ensure that all CloudTrail logs are delivered to a centralized S3 bucket in the audit account, and that any log file validation failures trigger an immediate notification. What should the engineer do to meet this requirement?

A.Enable CloudTrail log file validation and create a CloudWatch alarm on the DigestDeliveryFailed metric

B.Create a Lambda function that checks the integrity of logs and publishes to SNS

C.Configure CloudTrail to deliver logs to the S3 bucket and enable SNS notifications for all events

D.Send CloudTrail logs to CloudWatch Logs and create a metric filter for validation errors

AnswerA

Log file validation detects tampering; DigestDeliveryFailed metric triggers alarm.

Why this answer

Option C is correct because enabling log file validation and creating a CloudWatch alarm on the DigestDeliveryFailed metric will trigger notifications. Option A is wrong because CloudTrail does not publish to SNS directly. Option B is wrong because CloudWatch Logs is for log streams, not CloudTrail.

Option D is wrong because Lambda is not needed for this simple alert.

Full explanation →

620

MCQmedium

A company has a multi-account AWS environment using AWS Organizations. They want to centrally manage user access to all accounts using single sign-on (SSO) and enforce multi-factor authentication (MFA). Which service should they use?

A.Use AWS Secrets Manager to store and rotate IAM user credentials.

B.Create IAM users in each account and share the credentials securely.

C.Use Amazon Cognito user pools with an identity broker.

D.Use AWS IAM Identity Center (AWS SSO) to manage access and enforce MFA.

AnswerD

IAM Identity Center provides centralized SSO and MFA enforcement across multiple AWS accounts.

Why this answer

AWS IAM Identity Center (formerly AWS SSO) is the correct service because it provides a centralized place to manage user access and permissions across all AWS accounts in an AWS Organization. It natively supports enforcing multi-factor authentication (MFA) through an identity source (e.g., the built-in identity store or an external IdP) and integrates directly with AWS Organizations to grant single sign-on access without needing to create IAM users in each account.

Exam trap

The trap here is that candidates often confuse Amazon Cognito (a customer identity service) with workforce identity management, or assume that storing credentials in Secrets Manager or creating per-account IAM users is a viable centralized solution, when in fact AWS IAM Identity Center is the only service designed for multi-account SSO with MFA enforcement in an AWS Organizations context.

How to eliminate wrong answers

Option A is wrong because AWS Secrets Manager is designed to securely store and rotate secrets (like database credentials or API keys), not to manage user identities or enforce MFA for SSO access. Option B is wrong because creating IAM users in each account and sharing credentials manually violates the principle of least privilege, creates a massive administrative overhead, and does not provide centralized SSO or consistent MFA enforcement across accounts. Option C is wrong because Amazon Cognito user pools are intended for customer-facing identity and access management for web and mobile applications, not for managing workforce access to AWS accounts via SSO with MFA enforcement across an AWS Organization.

Full explanation →

621

MCQmedium

A DevOps team needs to monitor failed API calls in their AWS account. They want to receive notifications when specific IAM actions, such as DeleteBucket, fail. Which service should they use?

A.AWS CloudTrail and Amazon EventBridge.

B.AWS Config rules.

C.Amazon S3 server access logs.

D.CloudWatch Logs and metric filters.

AnswerA

CloudTrail logs API calls, EventBridge can filter and route to SNS.

Why this answer

Option D is correct because CloudTrail logs API calls, and CloudWatch Events (now Amazon EventBridge) can be used to create rules that match specific API calls and trigger notifications. Option A is wrong because CloudWatch does not directly capture API calls. Option B is wrong because Config monitors resource configuration, not API calls.

Option C is wrong because S3 access logs are for object-level operations.

Full explanation →

622

MCQeasy

A DevOps engineer is troubleshooting an Auto Scaling group (ASG) that is not launching instances as expected. The ASG is configured with a launch template that uses an Amazon Linux 2 AMI. The engineer checks the EC2 Auto Scaling console and sees that the group's desired capacity is set to 2, but only 1 instance is running. The last scaling activity shows 'Failed to launch instance. Error: Your quota allows for 0 more running instance(s).' What is the most likely cause?

A.The launch template has insufficient IAM permissions to create instances.

B.The account has reached the EC2 instance limit for the selected instance type in the region.

C.The VPC subnet does not have enough available IP addresses.

D.There is an instance that is not passing health checks, preventing new instances.

AnswerB

B is correct because the error explicitly states quota for running instances is exceeded.

Why this answer

The error message 'Your quota allows for 0 more running instance(s)' directly indicates that the AWS account has reached its EC2 instance limit for the specific instance type in the region. Auto Scaling groups cannot launch instances beyond the service quota, regardless of the desired capacity. This is a common issue when the default limit (e.g., 5 or 20 instances per instance family) has been exhausted.

Exam trap

The trap here is that candidates confuse IAM permissions with service quotas, or assume a VPC subnet IP shortage is the cause, when the explicit quota error message is the definitive clue.

How to eliminate wrong answers

Option A is wrong because IAM permissions affect the ability to call EC2 APIs (e.g., RunInstances), but the error message explicitly cites a quota issue, not an authorization failure (which would return 'UnauthorizedOperation' or 'AccessDenied'). Option C is wrong because insufficient IP addresses in the subnet would produce an error like 'InsufficientFreeAddressesInSubnet' or 'NoFreeAddressesInSubnet', not a quota limit message. Option D is wrong because an instance failing health checks does not prevent new instances from launching; the ASG would still attempt to launch replacements, and the error would relate to health check failures, not a quota limit.

Full explanation →

623

MCQmedium

A company uses AWS Elastic Beanstalk to deploy a web application. The environment is running behind an Application Load Balancer. The DevOps team notices that during deployments, the new application version fails health checks and the deployment rolls back. The team wants to reduce deployment time while maintaining safety. Which configuration change should the engineer recommend?

A.Increase the number of EC2 instances in the environment.

B.Use the all-at-once deployment policy.

C.Change the deployment policy to immutable.

D.Increase the rolling update batch size to 100%.

AnswerC

A: Immutable deployments launch a new fleet, test health, and then swap, providing safety and speed.

Why this answer

Option A is correct because immutable deployments create a new Auto Scaling group and only shift traffic after health checks pass, which is safe and faster than rolling with batch size 1. Option B is wrong because rolling with a large batch size risks capacity. Option C is wrong because all-at-once is fastest but unsafe.

Option D is wrong because adding more instances to the existing group does not improve deployment strategy.

Full explanation →

624

MCQhard

A team manages a large fleet of EC2 instances using AWS Systems Manager. They want to enforce a consistent configuration across all instances, including installed software packages, firewall rules, and user accounts. The team also needs to audit configuration changes and remediate drift automatically. Which AWS service should the team use?

A.AWS OpsWorks for Chef Automate

B.AWS Systems Manager State Manager

C.AWS Systems Manager Run Command

D.AWS Config

AnswerB

State Manager can define a desired state configuration and automatically apply it to instances.

Why this answer

AWS Systems Manager State Manager is the correct choice because it is designed to enforce a consistent configuration across EC2 instances by defining and applying desired state configurations (DSCs). It can manage software packages, firewall rules, and user accounts, and it automatically remediates drift by re-applying the desired state on a schedule. This directly meets the requirement for configuration enforcement, auditing, and automated drift remediation.

Exam trap

The trap here is confusing AWS Config (which only audits and detects drift) with State Manager (which enforces and remediates drift), leading candidates to choose Config because they focus on the auditing requirement without realizing it lacks enforcement capabilities.

How to eliminate wrong answers

Option A is wrong because AWS OpsWorks for Chef Automate is a configuration management service that uses Chef cookbooks, but it requires managing a Chef server and does not natively integrate with Systems Manager for drift remediation or auditing without additional setup. Option C is wrong because AWS Systems Manager Run Command is designed for ad-hoc, one-time command execution across instances, not for enforcing ongoing desired state configurations or automatically remediating drift. Option D is wrong because AWS Config is a service for auditing resource configurations and tracking changes, but it does not enforce configurations or remediate drift; it only detects non-compliance and can trigger remediation actions via other services like Systems Manager Automation.

Full explanation →

625

MCQmedium

A company is using Amazon CloudWatch Logs Insights to analyze application logs. The DevOps team needs to create a metric filter that counts occurrences of the word 'ERROR' in the log events. Which CloudWatch Logs Insights query should be used to test the metric filter?

A.fields @timestamp, @message | stats count() by bin(5m)

B.fields @timestamp, @message | filter @message like /ERROR/

C.fields @timestamp, @message | parse @message '[*] *' as @severity, @log

D.fields @timestamp, @message | sort @timestamp desc

AnswerB

This query filters events containing 'ERROR', useful for testing the metric filter.

Why this answer

Option B is correct because the `fields @timestamp, @message` command retrieves the relevant fields, and `filter @message like /ERROR/` filters events containing 'ERROR'. Option A is wrong because `stats count() by bin(5m)` aggregates but doesn't show individual matches. Option C is wrong because `parse @message` extracts fields but doesn't filter.

Option D is wrong because `sort @timestamp desc` only sorts.

Full explanation →

626

MCQmedium

A company uses Amazon CloudFront to distribute content globally. Users in some regions report slow load times. The DevOps team wants to identify the geographic regions where performance is worst. Which tool should they use?

A.Amazon CloudWatch Metrics for CloudFront

B.CloudFront access logs in S3

C.Amazon Route 53 latency records

D.CloudFront reports in the AWS Management Console

AnswerD

CloudFront reports provide geographic performance data.

Why this answer

Option C is correct because CloudFront reports show metrics like total requests, error rates, and latency by geographic region. Option A is wrong because CloudWatch metrics are per distribution, not per region; B is wrong because access logs do not aggregate performance; D is wrong because Route 53 is for DNS, not performance analysis.

Full explanation →

627

MCQhard

A company runs a high-traffic web application on a fleet of EC2 instances behind an Application Load Balancer (ALB) with Auto Scaling. The application uses an Amazon RDS for PostgreSQL database. Recently, during a traffic spike, the application became unresponsive. Investigation revealed that the database CPU utilization reached 100%, causing queries to timeout. The Auto Scaling group added more EC2 instances, which only increased the load on the database. The DevOps team needs to implement a solution that prevents the database from being overwhelmed during traffic spikes while maintaining application availability. The solution must be cost-effective and require minimal changes to the application code. Which solution should the DevOps team implement?

A.Implement read replicas for the RDS database and modify the application to use read replicas for read queries.

B.Increase the instance size of the RDS database to a larger instance type to handle more connections.

C.Use Amazon RDS Proxy between the application and the database to pool and reuse connections.

D.Configure Auto Scaling to launch EC2 instances based on a custom metric that tracks database CPU utilization, and throttle the number of instances.

AnswerC

RDS Proxy reduces the number of database connections, lowering CPU usage and improving scalability.

Why this answer

RDS Proxy manages database connections efficiently, reducing the number of connections and CPU overhead. It also provides connection pooling, which helps handle spikes without overwhelming the database.

Full explanation →

628

MCQmedium

A company uses Amazon CloudFront to serve static content from an S3 bucket. Users in a specific region report slow load times. The DevOps team checks CloudFront metrics and sees a high error rate (5xx) for that region. The S3 bucket is healthy. What is the most likely cause?

A.The AWS WAF web ACL is blocking requests from that region.

B.The S3 bucket policy does not grant access to the CloudFront origin access identity (OAI).

C.The CloudFront distribution's default TTL is too short.

D.The CloudFront origin shield is misconfigured.

AnswerB

Misconfigured OAI causes 403 errors that may appear as 5xx if not handled properly.

Why this answer

A high 5xx error rate from CloudFront combined with a healthy S3 bucket strongly indicates that CloudFront cannot fetch objects from the origin. If the S3 bucket policy does not grant access to the CloudFront origin access identity (OAI), CloudFront receives an access denied (403) response from S3, which CloudFront translates into a 5xx error for the user. This is the most common cause of regional 5xx errors when the origin is otherwise healthy.

Exam trap

The trap here is that candidates often assume 5xx errors always indicate an origin server problem, but in CloudFront, a 5xx can also result from an authentication failure (403) at the origin, which CloudFront converts to a 5xx for the viewer.

How to eliminate wrong answers

Option A is wrong because AWS WAF web ACLs block requests at the edge with a 403 Forbidden response, not a 5xx error, and the question states the error rate is 5xx. Option C is wrong because a short default TTL would cause more frequent origin fetches and potentially increase latency, but it would not cause 5xx errors; the origin is healthy and would still return valid content. Option D is wrong because a misconfigured Origin Shield could increase latency or cause connectivity issues, but it would not produce a high 5xx error rate if the underlying S3 bucket is healthy and accessible; Origin Shield is an additional caching layer, not an authentication mechanism.

Full explanation →

629

Multi-Selecteasy

A company is designing a CI/CD pipeline using AWS CodePipeline, CodeBuild, and CodeDeploy. They need to ensure that the pipeline can deploy to multiple environments (dev, test, prod) with manual approval gates. Which TWO actions should they take? (Choose TWO.)

Select 2 answers

A.Create separate stages in the pipeline for dev, test, and prod

B.Configure a single pipeline with multiple branches in the source stage

C.Use CodeDeploy deployment groups to represent each environment

D.Add a manual approval stage before each environment deployment

E.Use CodeBuild batch builds to manage environment promotion

AnswersA, D

Stages allow you to sequence deployments across environments.

Why this answer

To implement manual approval gates, CodePipeline supports approval actions. Each environment can be a separate stage. Options B and D are correct.

Option A is incorrect because one pipeline cannot have multiple branches for different environments; you would use separate pipelines or stages. Option C is incorrect because CodeDeploy deployment groups can be used for different environments, but the approval gate is in CodePipeline. Option E is incorrect because CodeBuild is for building, not for approvals.

Full explanation →

630

MCQmedium

A company uses AWS CodeCommit to store infrastructure as code templates. The DevOps team has set up an AWS CodePipeline that automatically deploys a CloudFormation stack when changes are pushed to the main branch. The pipeline includes a deployment action that uses the CloudFormation create/update stack action. Recently, a developer pushed a change that caused the CloudFormation stack update to fail because the change would have deleted a critical resource. The pipeline did not catch this issue, and the stack update failed midway, leaving the stack in a partially updated state. The team wants to implement a safety mechanism to prevent such issues in the future. Which solution should they implement?

A.Add a CloudFormation change set action in the pipeline with a manual approval step to review the changes before executing the stack update.

B.Create a stack policy that denies deletion of critical resources, and include the policy in the CloudFormation template.

C.Add a manual approval step before the deployment action to review the code change.

D.Add a 'Test' stage in the pipeline that deploys the stack to a test environment first.

AnswerA

Change sets show what will be changed, allowing review before update.

Why this answer

Option B is correct. Adding a change set approval step allows the team to review the changes before applying them. Option A is wrong because a manual approval on the source is too early.

Option C is wrong because stack policies protect resources but do not prevent the update from starting; they can prevent deletion but the update may still proceed. Option D is wrong because the 'Test' stage is for testing, not for reviewing changes to infrastructure.

Full explanation →

631

MCQhard

A DevOps engineer is designing a CI/CD pipeline for a microservices architecture. The pipeline must deploy to Amazon ECS using blue/green deployments. The team wants to automatically roll back if the new deployment fails health checks. Which combination of AWS services and configurations should the engineer use?

A.Use AWS CodeDeploy with an ECS compute platform, configure a CloudWatch alarm for health checks, and enable automatic rollback.

B.Use AWS CloudFormation with a custom resource to perform blue/green deployment.

C.Use AWS Elastic Beanstalk with blue/green environment swapping.

D.Use AWS CodePipeline with ECS deployment action and manual approval for rollback.

AnswerA

CodeDeploy supports blue/green deployments for ECS and can automatically roll back based on CloudWatch alarms.

Why this answer

Option D is correct because CodeDeploy with ECS supports blue/green deployments and automatic rollback on CloudWatch alarm. Option A is wrong because CodePipeline alone does not handle deployment rollback. Option B is wrong because CloudFormation does not natively support blue/green for ECS.

Option C is wrong because Elastic Beanstalk is for web apps, not ECS.

Full explanation →

632

MCQhard

A DevOps team is deploying a multi-tier application on AWS. The application must comply with PCI DSS. Which combination of services should be used to encrypt data in transit between the web tier and the application tier?

A.AWS Certificate Manager (ACM) and Application Load Balancer (ALB)

B.AWS CloudHSM and Classic Load Balancer

C.AWS KMS and VPC Peering

D.AWS WAF and Amazon CloudFront

AnswerA

ACM provides TLS certificates that can be used with ALB to encrypt traffic in transit between layers.

Why this answer

For encryption in transit, using TLS certificates managed by ACM and enforced by an Application Load Balancer is the standard approach. CloudHSM or KMS are for key storage but not directly for in-transit encryption. VPC Peering does not provide encryption.

Full explanation →

633

MCQeasy

A company wants to be alerted when the root user signs in to the AWS Management Console. Which service should be used to create a monitoring rule for this event?

A.AWS Config

B.Amazon S3

C.AWS IAM

D.Amazon CloudWatch Events (now Events) with a rule for the 'RootSignIn' event from CloudTrail

AnswerD

CloudWatch Events can match the root sign-in event and send notifications.

Why this answer

Amazon CloudWatch Events (now Events) can monitor AWS CloudTrail API calls and trigger actions based on specific events. The 'RootSignIn' event is a predefined CloudTrail event that fires when the root user signs in to the AWS Management Console. By creating a CloudWatch Events rule that matches this event, you can send notifications (e.g., via SNS) or invoke automated responses, meeting the alerting requirement.

Exam trap

The trap here is that candidates may confuse AWS Config (which can detect configuration changes but not real-time API call events) with CloudWatch Events, or mistakenly think IAM can generate alerts, when in fact only CloudWatch Events can create rules based on CloudTrail events like 'RootSignIn'.

How to eliminate wrong answers

Option A is wrong because AWS Config is used for resource inventory, configuration history, and compliance auditing, not for real-time event monitoring or alerting on specific API calls like root user sign-ins. Option B is wrong because Amazon S3 is an object storage service and cannot natively create monitoring rules or trigger alerts based on CloudTrail events; it can only store logs or serve as a target for notifications. Option C is wrong because AWS IAM manages users, roles, and permissions but does not provide event-driven monitoring or alerting capabilities; it cannot create rules to detect and respond to root user sign-ins.

Full explanation →

634

Multi-Selectmedium

A company uses AWS CloudTrail to log API calls in a multi-account environment. The security team wants to be alerted immediately when an IAM user or role performs a specific sensitive action (e.g., DeleteTrail, DeleteDBInstance). Which TWO services can be used together to achieve near real-time alerting? (Choose TWO.)

Select 2 answers

A.CloudWatch Logs metric filters and alarms

B.CloudTrail with CloudWatch Logs integration

C.CloudTrail with Amazon S3 event notifications

D.Amazon Athena and CloudWatch dashboards

E.AWS Config and AWS Lambda

AnswersA, B

Metric filters can trigger alarms.

Why this answer

Option A is correct because CloudTrail can deliver events to CloudWatch Logs. Option C is correct because CloudWatch Logs can trigger a metric filter alarm. Option B is wrong because S3 is not real-time.

Option D is wrong because Athena is for analysis. Option E is wrong because Config is for resource compliance.

Full explanation →

635

MCQhard

A DevOps engineer executed the CLI command shown in the exhibit. After creation, the security team requires that the log files be encrypted with a KMS key that is rotated every 90 days. The current key is a customer managed key with automatic rotation enabled set to 365 days. What should the engineer do to meet the requirement?

A.Use the existing key and change the rotation period in KMS

B.Disable automatic rotation and manually rotate the key every 90 days

C.Modify the KMS key to set the rotation period to 90 days

D.Create a new KMS key with automatic rotation set to 90 days and update the trail with the new key

AnswerD

Create a new key with the desired rotation and update the trail's KMS key.

Why this answer

To change the KMS key or its rotation period, you must update the trail. You can specify a new key with the --kms-key-id parameter when updating the trail. Changing the key rotation period is done through KMS, not CloudTrail.

But the question is about meeting the 90-day rotation. The engineer should update the trail to use a different KMS key that has a rotation period of 90 days. Note: You cannot change the rotation period of an existing KMS key; you must create a new one.

However, the options: B is correct because you can update the trail to use a new key with the desired rotation. A is wrong because you cannot change rotation period of existing key; you create a new key. C is wrong because the key is already used; you can't modify its rotation period.

D is wrong because disabling rotation is opposite.

Full explanation →

636

MCQhard

A company runs a stateful web application on EC2 instances behind a Network Load Balancer (NLB) in a single Availability Zone. The application stores session state locally on the instance. The company wants to achieve high availability across multiple AZs with minimal application changes. What should the DevOps engineer do?

A.Add more AZs and configure the NLB with cross-zone load balancing.

B.Replace the NLB with an ALB and use ElastiCache for session storage.

C.Use a Multi-AZ RDS instance to store session state.

D.Replace the NLB with an ALB and enable sticky sessions (session affinity) using the ALB's cookie.

AnswerD

Sticky sessions ensure that requests from the same client are routed to the same instance, preserving local session state.

Why this answer

Option D is correct because replacing the NLB with an ALB and enabling sticky sessions (session affinity) using the ALB's cookie allows the stateful web application to maintain session state across multiple AZs without modifying the application code. The ALB generates a cookie (AWSALB) that binds a client's session to a specific target instance, ensuring subsequent requests from the same client are routed to the same EC2 instance. This achieves high availability across AZs with minimal changes, as the application continues to store session state locally on the instance.

Exam trap

The trap here is that candidates often assume cross-zone load balancing or adding more AZs inherently solves high availability for stateful applications, but they overlook that session affinity is required to keep a client's requests directed to the same instance when session state is stored locally.

How to eliminate wrong answers

Option A is wrong because adding more AZs and configuring cross-zone load balancing with an NLB does not solve the session state problem; the NLB distributes traffic across instances without session affinity, so a client's requests may be routed to different instances in different AZs, breaking the locally stored session. Option B is wrong because replacing the NLB with an ALB and using ElastiCache for session storage requires application code changes to read/write session data to ElastiCache, which contradicts the requirement for minimal application changes. Option C is wrong because using a Multi-AZ RDS instance for session storage also requires significant application code changes to store and retrieve session data from the database, and it introduces unnecessary complexity and latency for session management.

Full explanation →

637

Multi-Selecthard

A company's application uses Amazon DynamoDB as its primary data store. The application experiences occasional throttling errors during traffic spikes. The DevOps team needs to implement a solution that ensures consistent performance without manual intervention. Which TWO actions should the team take? (Choose TWO.)

Select 2 answers

A.Use eventually consistent reads for all queries.

B.Move the data to Amazon RDS with read replicas.

C.Implement DynamoDB Accelerator (DAX) to cache read requests.

D.Enable DynamoDB Auto Scaling for read and write capacity.

E.Switch DynamoDB to On-Demand capacity mode.

AnswersC, D

DAX reduces read load on DynamoDB, mitigating throttling for read-heavy workloads.

Why this answer

DynamoDB Accelerator (DAX) is an in-memory cache that reduces read response times from milliseconds to microseconds, offloading read-heavy workloads from the DynamoDB table. By caching frequently accessed items, DAX absorbs traffic spikes and reduces the likelihood of throttling on read requests, ensuring consistent performance without manual intervention.

Exam trap

The trap here is that candidates may think On-Demand capacity mode (Option E) is the only way to handle spikes without manual intervention, but it ignores the cost implications and the fact that DAX plus Auto Scaling provides a more balanced and cost-effective solution for read-heavy workloads.

Full explanation →

638

MCQhard

A CloudFormation stack creation failed. The engineer runs the describe-stack-events command and sees the output above. What is the root cause of the failure?

A.The stack is rolling back because a network resource failed to create.

B.The stack lacks permissions to launch EC2 instances.

C.The EC2 instance failed because the instance type 't2.micro' is not supported.

D.The EC2 instance failed to create because the AMI ID is invalid or does not exist in the region.

AnswerD

The status reason explicitly states the AMI ID is invalid.

Why this answer

The event for the EC2 instance shows 'CREATE_FAILED' with a status reason indicating the AMI ID is invalid. The error says 'Expected: 'ami-0abcdef1234567890'' which suggests a typo or wrong AMI ID. Option B is correct.

Option A is incorrect because the stack is in ROLLBACK_IN_PROGRESS due to the EC2 failure. Option C is incorrect because the error is about the AMI, not instance type. Option D is incorrect because the error is not about permissions.

Full explanation →

639

MCQmedium

An organization uses OpsWorks to manage application stacks. They notice that custom cookbooks are not being executed during the lifecycle events. What is the most likely cause?

A.The layer's IAM role does not have permissions to execute the cookbook

B.The custom cookbook repository URL is misconfigured or inaccessible

C.The cookbook is not configured with CodeDeploy

D.The cookbook uses a Chef version that is not supported by OpsWorks

AnswerB

If OpsWorks cannot fetch the cookbook from the repository, it will not execute the recipes.

Why this answer

Custom cookbooks must be stored in a repository (S3, Git, etc.) and the layer must be configured to use that repository. Option D is correct. Option A is incorrect because Chef version compatibility is usually not the cause for non-execution.

Option B is incorrect because IAM roles are for AWS API calls, not cookbook execution. Option C is incorrect because OpsWorks does not use CodeDeploy for cookbook execution.

Full explanation →

640

MCQeasy

A company wants to securely store database credentials used by an application running on Amazon EC2. The credentials should be automatically rotated every 90 days. Which AWS service should be used?

A.AWS IAM

B.AWS KMS

C.AWS Systems Manager Parameter Store

D.AWS Secrets Manager

AnswerD

Secrets Manager provides automatic rotation of secrets, including database credentials.

Why this answer

AWS Secrets Manager is designed for securely storing secrets and provides automatic rotation. Systems Manager Parameter Store can store secrets but does not natively support rotation without custom automation. KMS is for encryption keys, not secret rotation.

Full explanation →

641

Multi-Selectmedium

A company's DevOps team is designing an automated incident response workflow using AWS Systems Manager Incident Manager and AWS Lambda. The workflow should automatically acknowledge incidents and send notifications to the appropriate response team. Which TWO actions should the team take to achieve this?

Select 2 answers

A.Have the Lambda function publish a message to an SNS topic that the response team subscribes to.

B.Configure the CloudWatch alarm to directly invoke the Lambda function for incident creation.

C.Use Incident Manager's built-in 'Acknowledge' and 'Notify' actions in a response plan.

D.Use AWS Step Functions to orchestrate the Lambda function and SNS topic.

E.Create an EventBridge rule that matches Incident Manager events and triggers a Lambda function.

AnswersA, E

SNS can send notifications via email, SMS, etc., to the team.

Why this answer

Option A is correct because Lambda can be triggered by Incident Manager via Amazon EventBridge. Option C is correct because the Lambda function can use SNS to send notifications to the team. Option B is wrong because Incident Manager does not directly respond to CloudWatch alarms; it uses engagement plans.

Option D is wrong because Response Plans are different from Incident Manager response plans. Option E is wrong because Step Functions is unnecessary for this simple workflow.

Full explanation →

642

MCQmedium

A company runs a critical application on Amazon RDS for MySQL with Multi-AZ deployment. The database is 2 TB in size. The DevOps team needs to perform a major version upgrade (e.g., MySQL 5.7 to 8.0) with minimal downtime. The RTO is 5 minutes and RPO is 1 minute. Which approach should the team take?

A.Create a read replica of the database in the same region, upgrade the replica to the new version, and then promote it to primary.

B.Take a snapshot of the database, restore it as a new instance, upgrade the restored instance, and redirect traffic.

C.Perform an in-place major version upgrade directly on the primary instance.

D.Use AWS Database Migration Service (DMS) to migrate data to a new database instance with the upgraded version.

AnswerA

This minimizes downtime as the replica syncs continuously; cutover is quick.

Why this answer

Option B is correct because creating a read replica, upgrading it, and then promoting it to primary with minimal downtime (cutover) achieves very low downtime. The replica syncs continuously, so RPO is near zero. Option A (in-place upgrade) causes downtime.

Option C (DMS) is for migration, not upgrade, and may have longer downtime. Option D (snapshot restore) has significant downtime.

Full explanation →

643

MCQhard

A company is implementing a disaster recovery strategy for its Amazon Aurora MySQL database. The primary database is in us-west-2. The company requires an RPO of less than 1 minute and an RTO of less than 5 minutes. Which solution meets these requirements?

A.Create a cross-Region read replica in the secondary Region and promote it during failover.

B.Use automated backups and restore to a new DB instance in the secondary Region.

C.Use Amazon Aurora Global Database with a secondary Region cluster.

D.Take manual snapshots of the DB instance and copy them to the secondary Region every hour.

AnswerC

Aurora Global Database provides low-latency replication and fast failover (under 1 minute for RPO, minutes for RTO).

Why this answer

Amazon Aurora Global Database is designed for low-latency cross-Region replication with a typical RPO of 1 second and RTO of 1 minute or less, meeting the <1 minute RPO and <5 minute RTO requirements. It uses a dedicated storage-level replication channel that keeps the secondary cluster fully synchronized without impacting primary performance, and failover involves promoting the secondary cluster to primary in under a minute.

Exam trap

The trap here is that candidates confuse a cross-Region read replica (Option A) with Aurora Global Database, assuming both provide similar failover speed, but the read replica's promotion process is slower and less reliable for meeting strict RTO/RPO targets.

How to eliminate wrong answers

Option A is wrong because a cross-Region read replica for Aurora MySQL uses asynchronous replication with a typical RPO of several seconds to minutes, but the promotion process can take longer than 5 minutes due to the need to apply remaining redo logs and reconfigure endpoints, failing the RTO requirement. Option B is wrong because automated backups are taken once per day (default retention of 1-35 days) and restoring to a new instance in a secondary Region requires copying the backup across Regions, which can take hours and far exceeds both the RPO and RTO limits. Option D is wrong because manual snapshots taken every hour provide an RPO of up to 60 minutes, which violates the <1 minute RPO requirement, and restoring from a snapshot in a secondary Region also takes significantly longer than 5 minutes.

Full explanation →

644

Multi-Selectmedium

A company runs a stateful web application on EC2 instances that store session data locally. They want to migrate to a stateless architecture for better resilience. Which TWO actions should they take?

Select 2 answers

A.Use Amazon CloudFront to cache session data at the edge.

B.Use Amazon DynamoDB to store session data.

C.Use Amazon S3 to store session data as objects.

D.Use ElastiCache for Redis to store session data externally.

E.Use Amazon EFS to store session data as files.

AnswersB, D

DynamoDB is a scalable, low-latency session store.

Why this answer

Options A and D are correct. ElastiCache provides a centralized session store, and DynamoDB provides a durable, scalable session store. Option B is wrong because EFS is a file system, not ideal for session data.

Option C is wrong because CloudFront is a CDN, not a session store. Option E is wrong because S3 is object storage, not suitable for high-frequency writes like sessions.

Full explanation →

645

MCQeasy

A company runs a serverless application using AWS Lambda functions behind an Amazon API Gateway. The application processes user uploads stored in an S3 bucket. The Lambda function writes results to a DynamoDB table. Recently, the function started timing out when processing large files. What should the DevOps engineer do to improve resilience for large file processing?

A.Increase the Lambda function memory to improve CPU performance.

B.Use S3 event notifications to trigger an AWS Step Functions workflow that processes the file asynchronously.

C.Increase the Lambda function timeout to the maximum 15 minutes.

D.Add Amazon ElastiCache to cache processed results and reduce Lambda execution time.

AnswerB

Step Functions can orchestrate long-running tasks, breaking the file into chunks or using parallel processing, avoiding Lambda timeouts.

Why this answer

Option B is correct because Lambda has a maximum execution timeout of 15 minutes. For large files, using S3 event notifications to trigger an asynchronous step function or using S3 batch operations is more resilient. Option A (increasing timeout) might not be enough if the file is very large.

Option C (ElastiCache) is not relevant. Option D (increasing memory) might help but still has timeout limits.

Full explanation →

646

Multi-Selecthard

A DevOps team uses AWS CodePipeline with an Amazon S3 source action. The pipeline deploys a static website to an S3 bucket. The engineer wants to ensure that only approved changes are deployed to production. The team uses Git feature branches and wants to deploy only when a pull request is merged to the main branch. Which THREE actions should the engineer take?

Select 3 answers

A.Configure a CloudWatch Events rule to start the pipeline when a CodeCommit pull request is merged

B.Use an S3 source and configure event notifications for all object uploads

C.Use AWS CodeCommit as the source and configure a trigger for pull request merge events

D.Set the pipeline to execute on every push to any branch

E.Add an approval stage in the pipeline that requires manual sign-off

AnswersA, C, E

Another way to trigger on merge.

Why this answer

Options A, C, and D are correct. Option A: Use CodeCommit as the source and configure a trigger for pull request merge events. Option C: Add a manual approval stage before deployment.

Option D: Use a CloudWatch Events rule to trigger the pipeline on pull request merge. Option B is incorrect because it would deploy from any branch. Option E is incorrect because it does not enforce approval.

Full explanation →

647

MCQhard

Refer to the exhibit. A CloudFormation template deploys a Lambda function with X-Ray tracing enabled. However, traces are not appearing in the X-Ray console. What is the most likely missing configuration?

A.The Lambda runtime (nodejs18.x) does not support X-Ray tracing.

B.The Lambda execution role does not have permissions to upload trace data to X-Ray.

C.The TracingConfig mode is set to 'Active' but should be 'PassThrough'.

D.The Lambda function code does not use the AWS X-Ray SDK.

AnswerB

The role needs xray:PutTraceSegments and xray:PutTelemetryRecords.

Why this answer

Option D is correct because the Lambda function's IAM role only has the AWSLambdaBasicExecutionRole, which grants permissions to write to CloudWatch Logs but not to send trace data to X-Ray. The function needs the AWSXRayDaemonWriteAccess policy or equivalent permissions. Option A is wrong because Node.js 18 is supported.

Option B is wrong because the code logs the event but does not affect X-Ray. Option C is wrong because the TracingConfig is set to Active.

Full explanation →

648

MCQhard

A company runs a multi-region application on Amazon EC2 instances across us-east-1 and eu-west-1. The application uses an Amazon Aurora global database for writes in us-east-1 and reads in eu-west-1. The DevOps team wants to monitor the replication lag between the primary and secondary regions. They have set up a CloudWatch alarm on the AuroraReplicaLag metric in both regions. However, they notice that the alarm in eu-west-1 sometimes triggers false positives when the lag spikes briefly but then recovers. The team wants to reduce false alarms while still being alerted to sustained high lag that could impact read replicas. The team is already using a standard CloudWatch alarm with a period of 1 minute and evaluation periods of 1. What should the team change to reduce false positives?

A.Increase the alarm threshold to a higher value, such as 10 seconds.

B.Reduce the metric period to 30 seconds to get more granular data.

C.Increase the number of evaluation periods to 3, so the alarm triggers only if the lag is high for 3 consecutive minutes.

D.Create a composite alarm that triggers when both the AuroraReplicaLag and CPUUtilization metrics are high.

AnswerC

This requires sustained high lag, filtering out brief spikes.

Why this answer

Option A is correct because increasing evaluation periods requires the lag to be high for a longer duration before triggering an alarm. Option B is wrong because reducing the period would cause more frequent data points, potentially increasing false alarms. Option C is wrong because increasing the threshold would only alert on higher lag, but false positives are due to brief spikes.

Option D is wrong because a composite alarm combining multiple metrics is not needed; the issue is with the evaluation period.

Full explanation →

649

MCQhard

A company runs a critical application on Amazon EC2 instances behind an Application Load Balancer. The application is deployed using AWS CodeDeploy with an in-place deployment configuration. During a recent deployment, the deployment failed because the new application version caused a health check failure, and CodeDeploy did not automatically roll back. What should the engineer do to ensure automatic rollback on health check failure?

A.Set up an EC2 instance lifecycle hook to trigger a rollback script when the instance enters a pending state

B.Configure an Amazon SQS queue to monitor health checks and invoke a rollback Lambda function

C.Enable automatic rollback in the CodeDeploy deployment group and set up a CloudWatch alarm for the ALB health check

D.Modify the Auto Scaling group to replace unhealthy instances automatically

AnswerC

CodeDeploy can be configured to roll back when a CloudWatch alarm (e.g., based on health check metrics) is in ALARM state.

Why this answer

Option C is correct because CodeDeploy can automatically roll back a deployment when a CloudWatch alarm, such as one monitoring ALB health check failures, enters the ALARM state. By enabling automatic rollback in the deployment group and associating the CloudWatch alarm, the deployment will revert to the previous version as soon as the health check fails, without manual intervention.

Exam trap

The trap here is that candidates often assume Auto Scaling group health checks or lifecycle hooks can handle deployment rollbacks, but they operate at the instance level and do not revert application code, whereas CodeDeploy's native automatic rollback with CloudWatch alarms is the correct, integrated solution.

How to eliminate wrong answers

Option A is wrong because EC2 instance lifecycle hooks are designed to pause an instance during launch or termination for custom actions, not to trigger rollbacks based on health check failures; they operate at the instance lifecycle level, not the deployment level. Option B is wrong because SQS queues are message brokers and cannot directly monitor health checks or invoke rollbacks; while a Lambda function could be triggered, this approach adds unnecessary complexity and is not the native, supported mechanism for automatic rollback in CodeDeploy. Option D is wrong because Auto Scaling group health checks replace unhealthy instances but do not revert the application version; they would launch a new instance with the same failing code, perpetuating the failure rather than rolling back the deployment.

Full explanation →

650

MCQhard

A team uses AWS CodePipeline to deploy a serverless application using AWS SAM. The pipeline includes a build stage that runs 'sam build' and a deploy stage that runs 'sam deploy'. The deployment fails with an error: 'The security token included in the request is invalid.' What is the MOST likely cause?

A.The build stage did not produce the correct output artifact.

B.The SAM template has a syntax error.

C.The IAM role used in the deploy stage does not have permission to assume the CloudFormation execution role.

D.The 'sam deploy' command is missing the '--capabilities' parameter.

AnswerC

The error indicates an invalid token, often due to role assumption failure.

Why this answer

Option B is correct because 'sam deploy' requires valid AWS credentials; if the IAM role used by CodePipeline has insufficient permissions, it cannot assume the deployment role. Option A is incorrect because the build stage succeeded. Option C is incorrect because the template is already built.

Option D is incorrect because the error points to credentials, not a missing parameter.

Full explanation →

651

MCQmedium

A team uses AWS CodeBuild to run unit tests. They notice that builds are taking longer than expected. The build environment includes many dependencies that are downloaded every time. Which change would MOST reduce build time?

A.Enable local caching for dependencies.

B.Store dependencies in an Amazon S3 bucket and configure CodeBuild to use S3 cache.

C.Increase the compute type to a larger instance.

D.Use Amazon EFS to share dependencies across builds.

AnswerB

S3 cache allows dependencies to be stored and reused across builds, significantly reducing download time.

Why this answer

Option D is correct because dependency caching avoids re-downloading. Option A is wrong because more CPU may not help if network is bottleneck. Option B is wrong because EFS is not supported.

Option C is wrong because local cache is limited.

Full explanation →

652

MCQhard

A company runs a critical e-commerce platform on AWS. The architecture includes an Application Load Balancer (ALB) that distributes traffic to a fleet of EC2 instances in an Auto Scaling group. The EC2 instances run a custom Java application that uses an RDS for MySQL database and an ElastiCache Redis cluster for session caching. The DevOps team has set up CloudWatch alarms for CPU utilization, memory, and database connections. Recently, customers have been reporting slow page load times and occasional timeouts. The team notices that during peak hours, the ALB's TargetResponseTime metric spikes, and the number of healthy hosts in the target group fluctuates. The CPU and memory metrics on the EC2 instances remain within normal ranges. The database CPU is also normal. The team suspects the issue is related to the application's session management. Which course of action should the DevOps team take to identify the root cause?

A.Enable RDS Performance Insights and look for slow queries or connection storms

B.Install the CloudWatch Agent on the EC2 instances to collect additional application-level metrics

C.Monitor ElastiCache metrics such as CPUUtilization, CacheHits, and Evictions to determine if the Redis cluster is overloaded

D.Increase the number of ALB targets and adjust the target group health check interval

AnswerC

ElastiCache metrics will reveal if session caching is causing latency.

Why this answer

Option C is correct because the symptoms (increased latency, fluctuating healthy hosts) and the role of ElastiCache in session caching suggest that Redis performance may be degrading. Monitoring ElastiCache metrics like CPUUtilization, cache hits/misses, and evictions can confirm if Redis is the bottleneck. Option A is wrong because scaling the ALB does not address session management issues.

Option B is wrong because database connection pooling is not indicated; database CPU is normal. Option D is wrong because CloudWatch Agent on EC2 would not capture Redis metrics.

Full explanation →

653

MCQhard

A company uses Amazon Route 53 with a failover routing policy to direct traffic to an active and a standby endpoint. The health checks are configured to check the active endpoint every 10 seconds. During a recent outage, the failover took over 3 minutes to detect and switch. How can the company improve the failover time to under 1 minute?

A.Configure a Route 53 calculated health check that aggregates multiple fast health checks with a lower failure threshold.

B.Add additional health checks for the same endpoint.

C.Reduce the health check interval to 5 seconds.

D.Change the routing policy to latency based.

AnswerA

Calculated health checks can combine quick checks to detect failure faster.

Why this answer

Option D is correct because using a calculated health check with faster endpoint checks and a lower failure threshold can reduce detection time. Option A is wrong because reducing the health check interval to 5 seconds is not supported (minimum is 10 seconds). Option B is wrong because latency routing does not provide active/passive failover.

Option C is wrong because adding more health checks does not reduce failover time.

Full explanation →

654

MCQeasy

A company uses AWS CodeDeploy to deploy a new version of an application to EC2 instances. They want to minimize downtime and roll back quickly if the deployment fails. Which deployment type should they use?

A.Canary deployment

B.Linear deployment

C.Blue/green deployment

D.In-place deployment

AnswerC

Blue/green allows instant rollback by switching back.

Why this answer

Blue/green deployment creates a new environment (green) and shifts traffic after testing, allowing instant rollback by switching back to the original (blue). Option A is wrong because in-place updates cause downtime during the deployment. Option C is wrong because canary is a subset of blue/green but not the full strategy.

Option D is wrong because linear is a traffic shifting pattern.

Full explanation →

655

MCQeasy

A company hosts a static website on Amazon S3 with CloudFront as the CDN. Users report that they see an old version of the website even after the DevOps team updated the S3 objects. The team verified that the new objects are in the S3 bucket and are publicly accessible. The CloudFront distribution has a default TTL of 24 hours. To immediately serve the new content to users, the team needs to invalidate the CloudFront cache. Which of the following is the CORRECT approach to achieve this with minimal impact?

A.Create a CloudFront invalidation request for the path '/*'.

B.Change the CloudFront origin path to point to a new S3 bucket.

C.Update the CloudFront distribution's default TTL to 0 and wait for the changes to propagate.

D.Delete the S3 objects and re-upload them with different names.

AnswerA

Invalidation removes cached objects immediately.

Why this answer

Option D is correct because creating a CloudFront invalidation for the path '/*' will remove all cached objects and force CloudFront to fetch new content from S3. Option A is wrong because changing the origin path does not invalidate cache. Option B is wrong because updating TTL affects future caching, not existing cache.

Option C is wrong because S3 does not control CloudFront cache.

Full explanation →

656

MCQeasy

A company uses AWS CodeDeploy to deploy applications to an Auto Scaling group. The deployment fails because the new version of the application crashes the instances. The DevOps engineer needs the Auto Scaling group to automatically replace the unhealthy instances with the previous working version. Which deployment configuration should the engineer use?

A.In-place deployment with a deployment group that has a failure threshold of 0.

B.Blue/Green deployment with a load balancer to switch traffic only after health checks pass.

C.Canary deployment that shifts 10% of traffic to the new version, then 100% after 10 minutes.

D.Linear deployment that shifts 10% of traffic every 10 minutes.

AnswerB

Correct. Blue/Green allows rolling back by switching traffic back to the original environment.

Why this answer

Option B (Blue/Green deployment) is correct because it keeps the old environment (blue) running while the new environment (green) is tested; if the green environment fails, traffic can be redirected back to blue. Option A is wrong because in-place deployments replace instances gradually and do not automatically revert. Option C is wrong because canary deployments are for gradual traffic shifting, not full rollback.

Option D is wrong because linear deployments incrementally shift traffic and do not automatically roll back.

Full explanation →

657

MCQhard

A company uses AWS CodeDeploy to deploy a web application to an Auto Scaling group. The deployment fails during the 'ValidateService' lifecycle event. The CloudWatch Agent reports that the target process is running but the health check endpoint returns HTTP 503. The CodeDeploy agent logs show no errors. What is the most likely cause of the failure?

A.The Auto Scaling group is not healthy

B.The CodeDeploy agent is not installed on the instances

C.The application is not fully functional due to missing configuration files

D.The target process is not listening on the expected port

AnswerC

Process is running but health check fails, suggesting configuration issue.

Why this answer

Option B is correct because the health check is failing (HTTP 503) even though the process is running, indicating the application is not serving traffic properly, likely due to missing dependencies or configuration. Option A is incorrect because CodeDeploy agent logs showed no errors. Option C is incorrect because Auto Scaling group health checks are separate from CodeDeploy's validation.

Option D is incorrect because the process is running.

Full explanation →

658

Multi-Selectmedium

A company is using Amazon CloudWatch Logs to collect logs from multiple EC2 instances. They need to filter logs in real time and send specific log events to a custom application for processing. Which TWO services can they use to achieve this?

Select 2 answers

A.Use Amazon Kinesis Data Analytics to process the log stream.

B.Configure a CloudWatch Logs subscription filter that invokes an AWS Lambda function.

C.Create a CloudWatch Events rule to capture log events and send them to Amazon SQS.

D.Configure a CloudWatch Logs subscription filter that sends data to Amazon Kinesis Data Firehose.

E.Use Amazon S3 event notifications to trigger a Lambda function on new log files.

AnswersB, D

Lambda can process filtered log events in real time.

Why this answer

Correct: A (CloudWatch Logs subscription filter with Lambda) and D (CloudWatch Logs subscription filter with Kinesis Data Firehose). Option B is wrong because Kinesis Data Analytics is for analytics, not forwarding. Option C is wrong because CloudWatch Events is for events, not log filtering.

Option E is wrong because S3 is a destination, not a real-time processing service.

Full explanation →

659

MCQmedium

A company is running a stateful web application on EC2 instances behind an Application Load Balancer. During a deployment, users report session timeouts. What should the DevOps engineer implement to ensure zero-downtime deployments without losing in-flight sessions?

A.Update the Auto Scaling group's launch template to use a new AMI and perform a rolling update.

B.Use an Auto Scaling group with a lifecycle hook that waits for instance termination.

C.Enable connection draining (deregistration delay) on the ALB target group and use lifecycle hooks to wait for the draining period.

D.Increase the health check interval and unhealthy threshold on the ALB target group.

AnswerC

Deregistration delay ensures in-flight requests complete; lifecycle hooks provide additional control over termination.

Why this answer

Option D is correct because deregistration delay (connection draining) on the ALB target group allows in-flight requests to complete before instances are terminated. Option A is wrong because Auto Scaling groups do not manage session stickiness during deployments. Option B is wrong because updating the launch template does not prevent session loss during replacement.

Option C is wrong because gradually increasing health check thresholds does not ensure existing sessions are preserved.

Full explanation →

660

Multi-Selecthard

A company has a microservices architecture running on Amazon ECS with Fargate launch type. Each service is deployed in multiple Availability Zones. The services communicate via REST APIs. Recently, a downstream service experienced a partial outage, causing upstream services to time out and leading to cascading failures. The team wants to improve resilience against such failures. Which combination of actions should the DevOps engineer take? (Choose TWO.)

Select 2 answers

A.Increase the HTTP timeout values for all service-to-service calls.

B.Implement circuit breaker patterns in the service clients.

C.Remove all retry logic from service calls.

D.Adopt an asynchronous communication pattern using Amazon SQS or Amazon EventBridge.

E.Configure Auto Scaling for all services based on request count.

AnswersB, D

Circuit breakers stop cascading failures by failing fast when downstream is unhealthy.

Why this answer

Option A is correct because implementing circuit breakers (e.g., with AWS App Mesh or client libraries) prevents cascading failures by failing fast when a downstream service is unhealthy. Option D is correct because using an asynchronous messaging pattern (SQS, SNS, EventBridge) decouples services, allowing upstream services to continue processing even if downstream is unavailable. Option B (increasing timeouts) could worsen the situation by holding resources longer.

Option C (Auto Scaling) helps with capacity but not with handling unavailability. Option E (removing retries) is too drastic and could cause data loss.

Full explanation →

661

MCQhard

A company runs a critical e-commerce application on AWS. The application is deployed on Amazon EC2 instances behind an Application Load Balancer (ALB) in an Auto Scaling group. The instances store session data in an ElastiCache for Redis cluster. Recently, users have reported intermittent session timeouts during peak traffic hours. The operations team notices that CloudWatch alarms for the Redis cluster's CPUUtilization and Evictions metrics are frequently breaching thresholds. The team wants to resolve the issue without incurring unnecessary costs. Which solution should the team implement?

A.Configure a Lambda function to offload session data to Amazon DynamoDB and use DAX for caching.

B.Enable auto scaling for the ElastiCache Redis cluster to add replicas or shards based on CPU and memory utilization.

C.Enable encryption in transit (TLS) for the Redis cluster to reduce overhead.

D.Migrate the Redis cluster to a memory-optimized instance type like r6g.large.

AnswerB

Auto scaling can dynamically add resources to handle peak traffic, reducing evictions and CPU utilization.

Why this answer

Option B is correct because enabling auto scaling with a larger instance type or adding shards (clustering) directly addresses the resource contention causing evictions and high CPU. Option A is wrong because memory-optimized instances may not solve CPU issues; also, increasing memory without scaling CPU may not help. Option C is wrong because increasing tRANSIT_ENCRYPTION_ENABLED does not affect performance.

Option D is wrong because Lambda integration adds complexity and latency without scaling the cache.

Full explanation →

662

MCQhard

An organization has a AWS CodePipeline that deploys a critical application. The pipeline uses a manual approval step before deploying to production. The team wants to ensure that only authorized users can approve the deployment, and that the approval action is logged for compliance. Which combination of actions should the team take? (Select TWO.)

A.Configure the approval action to invoke an AWS Lambda function that validates the approver's IAM role tags.

B.Enable AWS CloudTrail to log all approval API calls for auditing.

C.Use Amazon Simple Notification Service (SNS) to send approval notifications and allow any subscriber to approve.

D.Use AWS CodeCommit to manage approval permissions via repository policies.

E.Store approval logs in Amazon CloudWatch Logs for real-time monitoring.

AnswerA, B

This allows custom authorization based on tags.

Why this answer

Options A and C are correct. Option A: Configuring the Lambda function for approval to check IAM tags ensures only users with specific tags (e.g., 'role=approver') can call the approval API. Option C: Using CloudTrail to log approval actions meets compliance logging requirements.

Option B is wrong because SNS does not provide fine-grained authorization. Option D is wrong because CloudWatch Logs can log but CloudTrail is the correct service for API logging. Option E is wrong because CodeCommit is not involved in approval authorization.

Full explanation →

663

Multi-Selectmedium

A DevOps engineer needs to set up centralized logging for an application running on multiple EC2 instances across different AWS accounts. The logs must be aggregated in a single S3 bucket and also be analyzed in near real-time. Which TWO services should be used together to achieve this?

Select 2 answers

A.Amazon Simple Queue Service (SQS)

B.Amazon Kinesis Data Firehose

C.AWS CloudTrail

D.Amazon CloudWatch Logs subscription

E.AWS Lambda

AnswersB, D

Can receive logs from CloudWatch subscription and deliver to S3.

Why this answer

Option B (CloudWatch Logs subscription) and Option D (Kinesis Data Firehose) are correct. CloudWatch Logs subscription can forward logs to Kinesis Data Firehose, which can then deliver to S3 in near real-time. Option A is wrong because CloudTrail is for API logs.

Option C is wrong because Lambda alone cannot efficiently aggregate logs from multiple accounts. Option E is wrong because SQS is for decoupling, not for log aggregation.

Full explanation →

664

MCQmedium

An application log excerpt shows repeated HTTP 500 errors for the /api/orders endpoint, with occasional successful health checks. The application runs on EC2 instances behind an ALB. What is the MOST likely cause of this pattern?

A.The backend service that the /api/orders endpoint depends on is unavailable or failing.

B.The EC2 instances are running out of memory and the application is crashing.

C.The ALB is misconfigured and routing requests to the wrong target group.

D.The EC2 instances are not passing health checks and are being deregistered from the target group.

AnswerA

The endpoint consistently fails while health check succeeds, indicating a dependency issue.

Why this answer

The pattern of repeated HTTP 500 errors for /api/orders with occasional successful health checks strongly indicates that the backend service dependency (e.g., a database, cache, or another microservice) is intermittently failing or unavailable. HTTP 500 errors are server-side errors, meaning the application code is running but cannot complete the request due to a downstream failure. Successful health checks confirm the EC2 instances themselves are healthy and in-service, ruling out instance-level or ALB misconfiguration issues.

Exam trap

The trap here is that candidates confuse HTTP 500 errors with instance-level failures (like OOM or health check failures), but the key differentiator is that successful health checks prove the instances are operational, shifting the root cause to a failing backend dependency rather than the compute layer.

How to eliminate wrong answers

Option B is wrong because running out of memory typically causes the application process to crash (e.g., OOM killer), leading to connection timeouts or immediate 503 errors, not repeated HTTP 500 errors with successful health checks. Option C is wrong because a misconfigured ALB routing to the wrong target group would cause requests to reach instances that don't serve the /api/orders endpoint, resulting in 404 or 503 errors, not 500 errors from the application itself. Option D is wrong because if instances were failing health checks and being deregistered, they would be removed from the target group and stop receiving traffic entirely, which contradicts the observed pattern of occasional successful health checks and persistent 500 errors on /api/orders.

Full explanation →

665

MCQmedium

A company is migrating a legacy application to AWS. The application requires cross-account access to an S3 bucket in a different AWS account. The security team wants to follow the principle of least privilege. How should the DevOps engineer configure the access?

A.Generate an access key for the root user of the source account and use it in the application.

B.Create an IAM role in the source account with necessary permissions and attach a bucket policy in the target account granting access to that role.

C.Create an IAM user in the target account with access keys and store them in AWS Secrets Manager.

D.Create an IAM user in the source account with programmatic access and a bucket policy allowing that user.

AnswerB

Least privilege and secure cross-account access.

Why this answer

Option D is correct because using an IAM role in the source account with a bucket policy in the target account that allows the role is the recommended cross-account access pattern. Option A is wrong because using root user credentials is insecure. Option B is wrong because access keys should not be hardcoded.

Option C is wrong because IAM users in the source account should not be used directly; a role is preferred.

Full explanation →

666

MCQeasy

A company uses Amazon CloudWatch to monitor its production environment. The DevOps team wants to receive an email notification whenever the average CPU utilization of any EC2 instance exceeds 90% for 5 consecutive minutes. Which steps should be taken to set up this notification?

A.Install the CloudWatch Logs agent on each EC2 instance and configure a metric filter to trigger an SNS notification

B.Create a CloudWatch alarm on CPUUtilization with a threshold of 90% for 5 consecutive periods, and configure an SNS topic to send email

C.Use AWS CloudTrail to monitor CPU utilization and send notifications via SNS

D.Use AWS Config to create a rule that triggers an SNS notification when CPU utilization exceeds 90%

AnswerB

Correct. This is the standard way to set up metric-based notifications.

Why this answer

Option A is correct: create a CloudWatch alarm on the CPUUtilization metric with a threshold of 90% for 5 minutes, and configure an SNS topic to send email. Option B is wrong because CloudTrail is for API logging. Option C is wrong because CloudWatch Logs agent is for log collection.

Option D is wrong because Config is for configuration auditing.

Full explanation →

667

Multi-Selecthard

A DevOps team is designing a CI/CD pipeline for a microservices application. Each microservice has its own code repository and build artifacts. The team wants to use AWS CodePipeline with multiple parallel actions to build and test all microservices simultaneously. They also want to ensure that if one microservice's build fails, the pipeline does not block other microservices. Which THREE steps should the team take? (Choose THREE.)

Select 3 answers

A.Use a parallel action group with separate build actions for each microservice.

B.Create a separate pipeline for each microservice to fully isolate failures.

C.Configure the pipeline to block subsequent stages if any build action fails.

D.Configure a single build action that sequentially builds all microservices.

E.Set the 'RunOrder' field for each build action to the same number to run them in parallel.

AnswersA, B, E

Parallel actions allow simultaneous builds.

Why this answer

Option A is correct because using a parallel action group with separate build actions for each microservice allows all microservices to be built simultaneously within a single pipeline. This design ensures that if one microservice's build fails, the other parallel actions continue unaffected, as CodePipeline treats each action in a parallel group independently.

Exam trap

The trap here is that candidates may think a single pipeline with parallel actions is insufficient and instead choose to create separate pipelines per microservice, but the question explicitly asks for steps within a single pipeline design, and option B is incorrect because it suggests multiple pipelines, which is not one of the three required steps.

Full explanation →

668

MCQhard

A DevOps engineer applied the above S3 bucket policy to restrict access. Users report that they can download objects from the bucket only when using HTTPS from within the 10.0.0.0/8 network. However, users outside that network receive access denied errors even over HTTPS. What is wrong with the policy?

A.The Deny statement blocks all requests over HTTPS because of the BoolIfExists condition.

B.The Allow statement should include a condition for SecureTransport.

C.The Deny statement should use a Condition for SourceIp instead of SecureTransport.

D.The Allow statement only allows requests from the specific IP range; requests from other IPs are implicitly denied even over HTTPS.

AnswerD

Implicit deny applies to all not explicitly allowed.

Why this answer

Option C is correct because the Deny statement with SecureTransport false only denies non-HTTPS requests; it does not explicitly allow HTTPS requests from outside the allowed IP range. The Allow statement only allows from the specific IP range, so requests from outside that range are implicitly denied. Option A is wrong because the policy already uses a Deny for non-HTTPS.

Option B is wrong because the condition is fine. Option D is wrong because the Deny statement is not the issue; the lack of an Allow for HTTPS from other IPs is the issue.

Full explanation →

669

MCQhard

A company runs a critical application on an Amazon RDS for MySQL DB instance. The application experiences intermittent connection timeouts. The DevOps team notices that the DB instance's CPU and memory metrics are normal. What should the team check NEXT to diagnose the issue?

A.Enable Enhanced Monitoring to check OS-level metrics

B.Examine the slow query log to identify long-running queries

C.Verify that the DB instance's storage is not full

D.Check the 'DatabaseConnections' CloudWatch metric to see if the connection count is near the max_connections limit

AnswerD

Connection timeouts often result from hitting the max connections limit.

Why this answer

Option A is correct because connection timeouts with normal CPU/memory often indicate that the DB instance's maximum connections limit has been reached. Option B is wrong because Enhanced Monitoring is for OS-level metrics, not connections; C is wrong because query latency is not directly related to connection timeouts; D is wrong because storage is not the issue here.

Full explanation →

670

MCQmedium

A development team uses AWS CodePipeline with multiple stages including source, build, and deploy. The pipeline uses an Amazon S3 source action that triggers on changes to a specific bucket. Recently, the pipeline stopped triggering automatically. The IAM role for CodePipeline has the necessary permissions. What is the most likely cause?

A.The IAM role for CodePipeline does not have s3:GetObject permission.

B.The S3 bucket policy denies access to CodePipeline.

C.The S3 bucket does not have event notifications configured.

D.AWS CloudTrail is not configured to deliver S3 data events to CloudWatch Logs.

AnswerD

CodePipeline relies on CloudWatch Events, which require CloudTrail to log S3 data events.

Why this answer

Option C is correct because CodePipeline uses Amazon CloudWatch Events to detect S3 events. If CloudTrail is not configured to deliver events to CloudWatch Logs, the pipeline won't trigger. Option A is wrong because S3 event notifications are not used for CodePipeline triggers.

Option B is wrong because the pipeline role has permissions. Option D is wrong because the S3 bucket policy is not relevant for triggering.

Full explanation →

671

MCQeasy

A company uses Amazon RDS for MySQL with Multi-AZ deployment. The database instance fails and AWS automatically fails over to the standby. After the failover, the application cannot connect to the database. The engineer checks the RDS console and sees that the instance status is Available. What is the MOST likely cause of the connectivity issue?

A.The security group for the RDS instance has changed during failover.

B.The application is using the database's DNS endpoint for the old primary, which is no longer the writer.

C.The DNS record for the RDS endpoint has not propagated to the application's DNS resolver.

D.The database instance is still in the process of failover and is not yet accepting connections.

AnswerB

After failover, the writer endpoint points to the new primary, but if the application caches the old endpoint, it may fail.

Why this answer

After an RDS Multi-AZ failover, the DNS endpoint for the DB instance remains the same but its underlying IP address changes to point to the new primary (formerly the standby). If the application caches the IP address of the old primary or uses a direct connection to the old writer endpoint, it will attempt to connect to a node that is no longer the writer. The correct practice is to always connect using the RDS instance endpoint (CNAME), which automatically resolves to the current writer, and to avoid caching the resolved IP address.

Since the instance status is 'Available', the new primary is ready, so the issue is a stale connection target.

Exam trap

The trap here is that candidates assume a failed Multi-AZ failover or a DNS propagation delay, when in reality the instance is healthy and DNS updates quickly, but the application's cached IP address from the old primary is the root cause.

How to eliminate wrong answers

Option A is wrong because security groups are associated with the RDS instance itself, not with a specific node; during a failover, the security group configuration is preserved and does not change. Option C is wrong because the DNS record (CNAME) for the RDS endpoint is managed by AWS Route 53 with a very low TTL (typically 5 seconds) and propagates quickly; the application's DNS resolver would have the updated record long before the failover completes. Option D is wrong because the RDS console shows the instance status as 'Available', which means the failover has completed and the new primary is accepting connections; the issue is not that the instance is still transitioning.

Full explanation →

672

MCQeasy

A company uses AWS CloudFormation to deploy infrastructure. They have a template that creates an Amazon EC2 instance and an Elastic IP address. The template uses the AWS::EC2::EIP resource. The team notices that when they delete the stack, the Elastic IP address is not released, leading to charges. They want to ensure that the Elastic IP is automatically released when the stack is deleted. What should they do?

A.Set the DeletionPolicy attribute to 'Retain' to keep the EIP

B.Create a custom resource to release the EIP before stack deletion

C.Set the DeletionPolicy attribute to 'Delete' on the EIP resource

D.Add a DependsOn clause to ensure proper order of deletion

AnswerC

Default is Delete, but ensure it's explicitly set to avoid accidental retention

Why this answer

By default, CloudFormation deletes the EIP when the stack is deleted. However, if the EIP is associated with an instance, the association may prevent deletion. To ensure release, set the 'DeletionPolicy' attribute to 'Delete' (which is default) and ensure that the EIP is not associated with an ENI that is not being deleted.

Option A is correct. Option B (DependsOn) does not affect deletion. Option C (Retain policy) would keep the EIP, causing charges.

Option D (manual release) is not automated.

Full explanation →

673

Multi-Selectmedium

A company runs a microservices application on Amazon ECS with Fargate. The application includes a service that processes orders and stores them in an RDS PostgreSQL database. The company wants to ensure that the order service is resilient to AZ failures and can handle a sudden increase in order volume. Which TWO actions should the DevOps engineer take? (Choose TWO.)

Select 2 answers

A.Increase the CPU and memory limits for the ECS task definition.

B.Place an Amazon CloudFront distribution in front of the order service.

C.Deploy the RDS instance in a Multi-AZ configuration.

D.Configure the ECS service to run tasks in multiple Availability Zones.

E.Use RDS Proxy to manage database connections.

AnswersC, D

Multi-AZ RDS provides automatic failover to a standby in another AZ.

Why this answer

Option C is correct because deploying the RDS instance in a Multi-AZ configuration provides automatic failover to a standby replica in a different Availability Zone, ensuring database resilience to AZ failures. Option D is correct because configuring the ECS service to run tasks in multiple Availability Zones distributes the order processing workload across AZs, improving both fault tolerance and scalability during sudden traffic spikes.

Exam trap

The trap here is that candidates often confuse connection pooling (RDS Proxy) with high availability (Multi-AZ) or assume that vertical scaling (increasing task limits) is sufficient for both resilience and sudden load, when in fact horizontal distribution across AZs is required for fault tolerance and elasticity.

Full explanation →

674

MCQhard

A team uses AWS CodePipeline to deploy a containerized application to Amazon ECS. The pipeline uses a source stage from CodeCommit, a build stage that builds a Docker image and pushes it to Amazon ECR, and a deploy stage that updates an ECS service. The team wants to add a manual approval step before the deploy stage to allow QA to verify the image. What is the BEST way to implement this?

A.Configure an AWS Lambda function in the pipeline that checks a DynamoDB table for approval status and pauses until approved.

B.Use an Amazon SNS topic to send a notification to QA, and have them manually trigger the deploy stage by clicking a link in the email.

C.Use Amazon CloudWatch Events to trigger a custom action that waits for an approval signal.

D.Add a manual approval stage in CodePipeline between the build and deploy stages, and configure SNS to notify approvers.

AnswerD

CodePipeline supports manual approval actions that pause the pipeline and notify approvers via SNS.

Why this answer

Option A is correct because CodePipeline has a built-in approval action that can be added as a stage. Option B is incorrect because SNS does not provide an approval mechanism. Option C is incorrect because Lambda cannot pause the pipeline.

Option D is incorrect because CloudWatch Events cannot approve.

Full explanation →

675

Multi-Selectmedium

A company runs a containerized application on Amazon ECS with Fargate. The application experiences intermittent failures due to resource exhaustion. The company wants to improve resilience by automatically replacing unhealthy tasks and scaling based on demand. Which TWO actions should the company take? (Choose TWO.)

Select 2 answers

A.Enable cluster auto scaling to add more Fargate capacity.

B.Use an Auto Scaling group to manage the number of tasks.

C.Define a health check command in the task definition to restart unhealthy containers.

D.Configure ECS service auto scaling with a target tracking policy based on CPU utilization.

E.Set a minimum healthy percent of 50% and maximum percent of 200% in the service configuration.

AnswersD, E

Auto scaling adjusts task count to meet demand.

Why this answer

Option D is correct because ECS service auto scaling with a target tracking policy based on CPU utilization automatically adjusts the desired count of tasks to match demand, preventing resource exhaustion by scaling out when CPU is high and scaling in when low. This directly addresses the intermittent failures caused by resource exhaustion by ensuring sufficient task capacity during load spikes.

Exam trap

The trap here is confusing cluster auto scaling (for EC2 instances) with ECS service auto scaling (for Fargate tasks), and assuming a health check command alone restarts containers without understanding that ECS service health checks only trigger replacement when integrated with the service's deployment controller.

Full explanation →

Page 9 of 24

All pages

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

Practice DOP-C02 by domain

Target a specific domain to shore up weak areas.

Configuration Management and IaC Resilient Cloud Solutions Monitoring and Logging Incident and Event Response Security and Compliance SDLC Automation

See all domains with question counts →