DOP-C02 Exam Questions and Answers

A company uses AWS CloudFormation to deploy a multi-tier web application. The template includes a nested stack for the database layer. When updating the stack, the database stack fails with a 'CREATE_FAILED' status, but the parent stack continues updating other resources. What is the most likely cause and best practice to prevent this?

The parent stack's update policy is set to 'CONTINUE' by default. To prevent this, set 'OnFailure' to 'ROLLBACK' in the stack update options.

Setting 'OnFailure' to 'ROLLBACK' during update ensures the entire stack rolls back if any resource fails, maintaining consistency.

The parent stack was created without the '--capabilities' parameter, so it cannot roll back.

The nested stack failure automatically triggers a rollback of the parent stack, but the rollback also failed.

The parent stack is configured with 'OnFailure' set to 'DO_NOTHING'. Change it to 'DELETE'.

Why: Option A is correct because, by default, when a CloudFormation stack update encounters a failure in a nested stack, the parent stack's update policy is set to 'CONTINUE', meaning it will proceed with updating other resources despite the failure. To prevent this, you should set the 'OnFailure' parameter to 'ROLLBACK' in the stack update options, which instructs CloudFormation to roll back the entire parent stack if any resource (including nested stacks) fails to update, ensuring consistency across the deployment.

A DevOps engineer manages infrastructure using Terraform. The team needs to store secrets such as database passwords in a secure manner and reference them in Terraform configurations. They have configured AWS Secrets Manager. What is the recommended approach to reference secrets in Terraform without exposing them in state files?

Store the secret ARN in a Terraform variable and use 'var.secret_arn' in the resource.

Store the secret in AWS Systems Manager Parameter Store and reference it using 'data.aws_ssm_parameter'.

Pass the secret as an environment variable to Terraform and reference it with 'var.secret_value'.

Use the 'data.aws_secretsmanager_secret_version' data source and mark the attribute as 'sensitive = true' in the output.

The data source retrieves the secret, and marking outputs as sensitive prevents them from being shown in logs or state.

Why: Option D is correct because using the `data.aws_secretsmanager_secret_version` data source retrieves the secret value at plan time without storing it in the Terraform state file. Marking the attribute as `sensitive = true` in the output prevents the value from being displayed in the CLI output, but the key protection is that the secret value itself is never written to the state file when using this data source — Terraform only stores the data source's ID and metadata, not the actual secret payload.

A company uses AWS OpsWorks to manage a set of EC2 instances. They need to ensure that a custom recipe runs on all instances during the 'Configure' lifecycle event. What is the correct way to achieve this?

Modify the stack's CloudFormation template to include the recipe.

Upload the recipe to a custom cookbook repository and assign it to the 'Configure' lifecycle event in the stack settings.

This is the standard way to run custom recipes on OpsWorks lifecycle events.

Add the recipe commands to the instance's user data script.

Use AWS CodeDeploy to trigger the recipe during the Configure event.

Why: In AWS OpsWorks, lifecycle events (such as Configure) are tied to layers, not individual instances. To run a custom recipe on all instances during the Configure event, you must upload the recipe to a custom cookbook repository (e.g., S3 or Git) and then assign that recipe to the Configure lifecycle event in the stack's layer settings. This ensures OpsWorks Chef runs the recipe on every instance in that layer whenever the Configure event fires (e.g., after scaling or instance state changes).

A DevOps team uses AWS CodePipeline to automate deployments. The pipeline has a Deploy stage that uses AWS CloudFormation to create or update a stack. Recently, a stack update failed because the template referenced an AMI that was deprecated. The team wants to automatically roll back the stack to the last known good state if a deployment fails. What should they do?

Configure the CloudFormation deployment action in CodePipeline with 'ActionMode' set to 'CREATE_UPDATE' and check the 'Rollback on failure' option.

CodePipeline's CloudFormation action supports automatic rollback on failure.

Use the CodePipeline console to enable 'Automatic rollback' for the Deploy stage.

Set the stack's 'DisableRollback' parameter to 'true' in the template.

Add a stack policy to the CloudFormation stack that denies updates to the AMI parameter.

Why: Option A is correct because the CloudFormation deployment action in CodePipeline supports a 'Rollback on failure' option when 'ActionMode' is set to 'CREATE_UPDATE'. When enabled, if the stack update fails, CloudFormation automatically rolls back the stack to the last known good state (the previously deployed stack). This directly addresses the team's requirement to revert to a stable state after a failed deployment due to a deprecated AMI.

An organization uses AWS Elastic Beanstalk for application deployments. They want to implement immutable updates to minimize downtime and ensure that if the new environment fails health checks, the old environment remains intact. Which deployment policy should they choose?

Traffic splitting.

Immutable update.

Immutable updates create a completely new environment and only swap when healthy.

All at once.

Rolling update based on health.

Why: Immutable updates in AWS Elastic Beanstalk launch a completely new environment with the new application version. If the new environment fails health checks, Elastic Beanstalk automatically terminates it, leaving the original environment untouched. This ensures zero downtime and a safe rollback, which matches the requirement to keep the old environment intact if health checks fail.

A developer wants to use AWS CloudFormation to create an Amazon RDS DB instance. The template includes a DB instance resource. Which property is required for the DB instance to be created successfully?

DBInstanceClass and Engine

These are required properties for the DB instance resource.

AllocatedStorage

DBInstanceIdentifier

MasterUsername and MasterUserPassword

Why: In AWS CloudFormation, when creating an Amazon RDS DB instance using the AWS::RDS::DBInstance resource, the only truly required properties are DBInstanceClass (the compute and memory capacity) and Engine (the database engine, e.g., MySQL, PostgreSQL). These two properties are mandatory in the CloudFormation resource specification; without them, the template will fail validation. All other properties, such as AllocatedStorage, DBInstanceIdentifier, MasterUsername, and MasterUserPassword, have default behaviors or can be omitted under certain conditions (e.g., AllocatedStorage defaults to 20 GB for some engines, and MasterUsername/MasterUserPassword are not required if you use a snapshot or a source DB instance).

Want more Configuration Management and IaC practice?

All Resilient Cloud Solutions questions

Domain 2: Resilient Cloud Solutions

A company runs a critical web application on EC2 instances behind an Application Load Balancer (ALB) with Auto Scaling. During a recent traffic spike, the application became unavailable for 10 minutes. Analysis shows that the ALB's healthy host count dropped to zero because the instances failed health checks due to high CPU load. What is the MOST effective design change to improve resilience during future traffic spikes?

Use predictive scaling with a scheduled scaling policy for known peak times.

Predictive scaling anticipates demand and scales out in advance, preventing overload.

Increase the instance size to handle higher load.

Configure step scaling policies based on CPU utilization.

Set a higher CPU threshold for health checks.

Why: Predictive scaling uses historical traffic data to forecast future demand and proactively adjust capacity before a spike occurs. This prevents the CPU from reaching critical levels that cause health check failures, ensuring the ALB always has healthy hosts. Scheduled scaling alone would not adapt to unexpected spikes, but predictive scaling combined with dynamic scaling provides both proactive and reactive resilience.

A company uses DynamoDB global tables in two AWS Regions with strong consistency reads. They observe occasional write conflicts that are not being resolved automatically. The application uses DynamoDBMapper with optimistic locking. What should the DevOps engineer do to ensure conflict resolution?

Implement a custom conflict resolution using DynamoDB Streams and AWS Lambda.

Switch to eventual consistency reads to reduce conflicts.

Add a third global table region to increase redundancy.

Use conditional writes with a version number attribute to ensure updates are applied only to the latest version.

Conditional writes with versioning enable optimistic locking, allowing only the latest version to be updated, which aligns with LWW.

Why: Option D is correct because DynamoDB global tables use last-writer-wins (LWW) for conflict resolution by default, but when using DynamoDBMapper with optimistic locking, the application must implement conditional writes with a version number attribute to ensure updates are applied only to the latest version. This prevents stale updates from overwriting newer data, as the conditional write will fail if the version number in the request does not match the current version in the table, allowing the application to retry with the updated version.

A company's application runs on EC2 instances in a single Availability Zone. The operations team wants to improve resilience without redesigning the application. Which action is the MOST effective?

Use a larger instance type to handle more traffic.

Enable EC2 Auto Recovery to automatically restart the instance if it fails.

Deploy EC2 instances across multiple Availability Zones using an Auto Scaling group.

Multi-AZ deployment ensures application availability even if one AZ fails.

Place the instance in a placement group to ensure low latency.

Why: Deploying EC2 instances across multiple Availability Zones (AZs) using an Auto Scaling group is the most effective action because it eliminates the single point of failure at the AZ level. If one AZ experiences an outage, the Auto Scaling group automatically launches replacement instances in the remaining healthy AZs, ensuring application availability without requiring any application-level changes. This directly addresses the goal of improving resilience by leveraging AWS's fault-isolated infrastructure.

A company uses a third-party backup solution to back up its EC2 instances daily. The backups are stored in an S3 bucket with default settings. The company wants to ensure that backups are protected from accidental deletion and are available for at least one year. Which combination of S3 features should the DevOps engineer implement?

Enable MFA Delete and set a lifecycle policy to transition to S3 Glacier after 30 days.

Enable versioning and set a lifecycle policy to expire noncurrent versions after 365 days.

Enable cross-Region replication to a bucket with versioning enabled.

Enable S3 Object Lock with Governance mode and a retention period of 365 days, and set a lifecycle policy to transition to S3 Glacier Deep Archive after 30 days.

Object Lock prevents deletion during the retention period, and lifecycle transition reduces costs.

Why: Option D is correct because S3 Object Lock with Governance mode prevents objects from being deleted or overwritten by any user (including the root user) for the specified retention period of 365 days, meeting the one-year availability requirement. The lifecycle policy to transition to S3 Glacier Deep Archive after 30 days reduces storage costs while still keeping the data accessible for retrieval within 12 hours, which is acceptable for backup retention. This combination ensures immutability and cost-effective long-term storage.

A company runs a stateful web application on EC2 instances behind a Network Load Balancer (NLB) in a single Availability Zone. The application stores session state locally on the instance. The company wants to achieve high availability across multiple AZs with minimal application changes. What should the DevOps engineer do?

Add more AZs and configure the NLB with cross-zone load balancing.

Replace the NLB with an ALB and use ElastiCache for session storage.

Use a Multi-AZ RDS instance to store session state.

Replace the NLB with an ALB and enable sticky sessions (session affinity) using the ALB's cookie.

Sticky sessions ensure that requests from the same client are routed to the same instance, preserving local session state.

Why: Option D is correct because replacing the NLB with an ALB and enabling sticky sessions (session affinity) using the ALB's cookie allows the stateful web application to maintain session state across multiple AZs without modifying the application code. The ALB generates a cookie (AWSALB) that binds a client's session to a specific target instance, ensuring subsequent requests from the same client are routed to the same EC2 instance. This achieves high availability across AZs with minimal changes, as the application continues to store session state locally on the instance.

A company's DevOps team is designing a disaster recovery plan for a critical application. The application runs on EC2 instances with an RDS MySQL database. The Recovery Time Objective (RTO) is 15 minutes, and the Recovery Point Objective (RPO) is 1 hour. Which approach BEST meets these requirements?

Use backup and restore with daily snapshots stored in S3 and cross-Region replication.

Use a multi-Region application with Route 53 latency-based routing and RDS read replicas in the DR Region.

Use a warm standby strategy with a scaled-down copy of the production environment in the DR Region, and replicate data using RDS Multi-AZ with synchronous replication.

Warm standby allows quick failover; synchronous replication meets RPO of 1 hour.

Use a pilot light strategy with EC2 instances stopped and RDS snapshots copied to the DR Region.

Why: Option C is correct because a warm standby strategy with a scaled-down copy of the production environment in the DR Region, combined with RDS Multi-AZ using synchronous replication, meets the RTO of 15 minutes and RPO of 1 hour. Multi-AZ synchronous replication ensures zero data loss (RPO of seconds) and automatic failover within minutes, while the warm standby environment can be scaled up quickly to handle production traffic, satisfying the RTO.

Want more Resilient Cloud Solutions practice?

All Monitoring and Logging questions

Domain 3: Monitoring and Logging

A company is running a critical web application on Amazon EC2 instances behind an Application Load Balancer (ALB). The DevOps team wants to monitor HTTP 5xx errors and receive alerts when the error rate exceeds 5% over a 5-minute period. Which combination of services and configurations should be used to meet these requirements?

Enable CloudWatch Logs for the ALB and use CloudWatch Logs Insights to query 5xx logs, then create a metric filter and alarm.

Configure AWS Config rules to check ALB 5xx error counts and trigger alarms.

Use CloudWatch ALB metrics (HTTPCode_ELB_5XX_Count) and create a CloudWatch Alarm on the Sum statistic with a threshold based on total request count.

Correct: ALB publishes HTTP 5xx metrics to CloudWatch, and alarms can be set on these metrics.

Use AWS X-Ray to trace requests and create a CloudWatch alarm based on X-Ray error rate.

Why: Option C is correct because ALB automatically publishes the `HTTPCode_ELB_5XX_Count` metric to CloudWatch, and you can create a CloudWatch alarm using the `Sum` statistic over a 5-minute period. To detect when the error rate exceeds 5%, you need to combine this metric with the `RequestCount` metric in a math expression (e.g., `m1/m2*100 > 5`) or use a composite alarm, as the alarm threshold must be based on the ratio of 5xx errors to total requests, not just the raw count.

A DevOps team is using Amazon CloudWatch Logs to collect application logs from multiple EC2 instances. They notice that some log entries are missing and that the CloudWatch agent is consuming high CPU. The log group has a retention policy of 30 days. Which action should the team take to reduce CPU usage without losing log data?

Increase the batch size in the CloudWatch agent configuration.

Correct: Larger batch size reduces API calls and CPU usage.

Use JSON format for logs instead of plain text.

Set the agent's timezone to UTC.

Change the log group retention policy to 7 days.

Why: Increasing the batch size in the CloudWatch agent configuration reduces the number of HTTP API calls made to CloudWatch Logs, which lowers CPU overhead from frequent network I/O and serialization. The agent buffers log events and sends them in larger, less frequent batches, directly addressing high CPU consumption without discarding any log data.

A company wants to monitor the number of messages in an Amazon SQS queue and send an alert if the queue depth exceeds 1000 for more than 5 minutes. Which AWS service should be used to create the alarm?

Amazon EventBridge

Amazon CloudWatch Alarms

Correct: CloudWatch Alarms monitor metrics and trigger actions.

AWS X-Ray

Amazon CloudWatch Logs

Why: Amazon CloudWatch Alarms is the correct service because it can monitor SQS queue metrics (such as ApproximateNumberOfMessagesVisible) and trigger an alarm when the metric exceeds a threshold (e.g., 1000) for a specified evaluation period (e.g., 5 minutes). CloudWatch Alarms directly integrate with SQS via the AWS/SQS namespace and support actions like sending notifications through Amazon SNS.

A company is using Amazon CloudWatch Synthetics canaries to monitor its web application endpoints. The canaries are deployed in multiple AWS regions. The team wants to aggregate the canary results into a single dashboard in the US East (N. Virginia) region. What is the MOST efficient way to achieve this?

Replicate the canaries to US East (N. Virginia) and run them from there.

Create a cross-region CloudWatch dashboard and add metrics from each region using metric math.

Correct: Cross-region dashboards natively support displaying metrics from different regions.

Set up a Lambda function in each region to push canary results to a central S3 bucket, then create a dashboard from S3.

Create a CloudWatch Logs Insights query across all regions and visualize results.

Why: Option B is correct because CloudWatch cross-region dashboards allow you to aggregate metrics from multiple regions into a single dashboard without data movement. By using metric math, you can reference metric IDs from different regions directly in the dashboard widget, enabling real-time aggregation of Synthetics canary success/failure rates and latency metrics from all regions into a unified view in US East (N. Virginia). This approach avoids unnecessary data replication, reduces latency, and minimizes operational overhead.

A DevOps team is troubleshooting a slow application. They enabled AWS X-Ray tracing and see that one of the downstream services has a high average response time. However, the traces show that the service itself is fast; the delay is in the network call from the upstream service. Which X-Ray feature should the team use to identify the root cause?

Examine the trace map to see the connection between services.

Correct: The trace map visualizes service connections and latency.

Add annotations to the traces for better filtering.

View the raw segments of the upstream service.

Adjust the sampling rules to capture more traces.

Why: The trace map in AWS X-Ray provides a visual representation of the service graph, showing the connections and latency between services. Since the delay is in the network call from the upstream service to the downstream service, the trace map can highlight the specific edge where the high latency occurs, allowing the team to pinpoint whether the issue is due to network congestion, DNS resolution, or a slow HTTP connection. This is the most direct way to identify the root cause of the inter-service communication delay.

A company needs to monitor the CPU utilization of its Amazon RDS for PostgreSQL instance. The metric should be available in Amazon CloudWatch with a granularity of 1 minute. Which action should the team take?

Install the CloudWatch agent on the RDS instance.

Enable Enhanced Monitoring for the RDS instance.

No additional configuration is needed; RDS automatically sends metrics to CloudWatch.

Correct: RDS publishes CPU utilization to CloudWatch by default.

Enable Performance Insights for the RDS instance.

Why: Amazon RDS for PostgreSQL automatically publishes metrics, including CPU utilization, to CloudWatch with a default granularity of 1 minute for standard instances. No additional configuration is required to enable this basic monitoring. The metrics are collected by the RDS hypervisor layer and sent to CloudWatch without needing an agent or extra setup.

Want more Monitoring and Logging practice?

All Incident and Event Response questions

Domain 4: Incident and Event Response

A company uses an Auto Scaling group with a dynamic scaling policy based on a custom CloudWatch metric. After a recent deployment, the metric spikes unexpectedly, causing the Auto Scaling group to launch several EC2 instances. The operations team wants to quickly determine whether the spike was caused by a real load increase or a deployment issue. What is the MOST efficient way to investigate this?

Check the SNS topic that the scaling policy publishes to for notifications.

Use CloudWatch Logs Insights to query application logs for error patterns or deployment markers that coincide with the metric spike.

CloudWatch Logs Insights allows querying logs to find patterns related to the spike.

Use AWS CloudTrail to review API calls that modified the scaling policy.

Temporarily disable the scaling policy and manually increase the desired capacity to handle the load.

Why: Option B is correct because CloudWatch Logs Insights allows you to query application logs for error patterns or deployment markers (e.g., new version tags, exception stack traces) that coincide with the metric spike. This directly correlates the scaling event with application-level evidence, enabling rapid root-cause analysis without altering infrastructure or relying on indirect notifications.

A company runs a critical application on Amazon ECS with Fargate launch type. The application uses an Application Load Balancer (ALB) in front. During a load test, the team notices a sudden increase in 5xx errors from the ALB, and some tasks become unhealthy. The task logs show occasional 'OutOfMemoryError' exceptions. The task definition currently has 512 CPU units and 1024 MiB memory. What should the team do to mitigate the issue while maintaining a cost-effective approach?

Increase the task definition CPU to 1024 units and memory to 2048 MiB.

Increase the task definition memory to 2048 MiB while keeping CPU at 512 units.

This directly addresses the memory error without wasting resources on extra CPU.

Configure the ECS service to use a rolling update with a longer health check grace period.

Decrease the task definition memory to 512 MiB to force garbage collection more frequently.

Why: Option B is correct because the application is experiencing OutOfMemoryError, indicating the current 1024 MiB memory allocation is insufficient. Increasing memory to 2048 MiB while keeping CPU at 512 units directly resolves the memory constraint without unnecessary CPU cost. ECS Fargate allows independent scaling of CPU and memory within valid combinations, and this change maintains a cost-effective approach by only increasing the resource that is actually constrained.

A DevOps engineer is investigating an incident where an EC2 instance became unreachable. The engineer checks the AWS Management Console and finds the instance is running, but the status check shows '2/2 checks passed' and the system log shows no errors. What should the engineer do NEXT to diagnose the connectivity issue?

Review the CloudWatch metrics for CPU utilization and network throughput.

Reboot the instance to reset the network interface.

Stop and start the instance to move it to new underlying hardware.

Check the security group and network ACL rules to ensure inbound traffic is allowed.

Connectivity issues often stem from network permissions.

Why: Since the instance is running, status checks pass, and the system log shows no errors, the issue is not with the operating system or underlying hardware. The most likely cause is a network-layer restriction, such as security group or network ACL rules blocking inbound traffic. Checking these rules is the correct next step because they control traffic at the instance and subnet levels, respectively, and misconfigurations here are a common cause of unreachability despite healthy instance status.

A company has an AWS Lambda function that processes S3 events. The function is invoked multiple times for the same S3 object, causing duplicate processing. The engineer suspects the issue is related to retries from the S3 event notification or Lambda's built-in retry behavior. What is the MOST effective way to ensure idempotent processing?

Modify the S3 bucket event notification configuration to use a prefix filter that excludes duplicate objects.

Use a DynamoDB table to store a record of processed S3 object keys and check for existence before processing.

This pattern ensures idempotency by tracking processed objects.

Set the Lambda function's ReservedConcurrency to 1 to prevent concurrent executions.

Use an Amazon SQS FIFO queue as the event source and enable content-based deduplication.

Why: Option B is correct because storing processed S3 object keys in a DynamoDB table and checking for existence before processing ensures idempotency at the application level. This approach directly handles duplicate invocations caused by S3 event retries or Lambda's built-in retry behavior, as the function can conditionally skip processing if the key already exists in DynamoDB. It provides a durable, consistent, and scalable mechanism to prevent duplicate processing regardless of how many times the function is invoked for the same object.

An organization uses AWS CloudFormation to manage infrastructure. During an incident, a stack update fails with 'UPDATE_ROLLBACK_FAILED' status. The engineer needs to bring the stack to a consistent state without losing data. What is the BEST approach?

Use the 'ContinueUpdateRollback' API to skip the resource that caused the failure.

This is the designed method to resolve rollback failures.

Create a new stack from the same template and migrate resources.

Manually correct the resource configuration that caused the failure, then perform a stack update.

Delete the stack and then recreate it from the same template.

Why: The 'ContinueUpdateRollback' API is the best approach because it allows the stack to resume the rollback process, skipping the resource that caused the failure, and bringing the stack to a consistent 'UPDATE_ROLLBACK_COMPLETE' state without manual intervention or data loss. This API is specifically designed for the 'UPDATE_ROLLBACK_FAILED' status, enabling you to skip resources that cannot be rolled back (e.g., due to a non-reversible change) while preserving the rest of the stack's state.

A company uses Amazon RDS for MySQL with Multi-AZ deployment. The database instance fails and AWS automatically fails over to the standby. After the failover, the application cannot connect to the database. The engineer checks the RDS console and sees that the instance status is Available. What is the MOST likely cause of the connectivity issue?

The security group for the RDS instance has changed during failover.

The application is using the database's DNS endpoint for the old primary, which is no longer the writer.

After failover, the writer endpoint points to the new primary, but if the application caches the old endpoint, it may fail.

The DNS record for the RDS endpoint has not propagated to the application's DNS resolver.

The database instance is still in the process of failover and is not yet accepting connections.

Why: After an RDS Multi-AZ failover, the DNS endpoint for the DB instance remains the same but its underlying IP address changes to point to the new primary (formerly the standby). If the application caches the IP address of the old primary or uses a direct connection to the old writer endpoint, it will attempt to connect to a node that is no longer the writer. The correct practice is to always connect using the RDS instance endpoint (CNAME), which automatically resolves to the current writer, and to avoid caching the resolved IP address. Since the instance status is 'Available', the new primary is ready, so the issue is a stale connection target.

Want more Incident and Event Response practice?

All Security and Compliance questions

Domain 5: Security and Compliance

A company is using AWS Organizations with multiple accounts. The Security team wants to centrally manage IAM roles that can be assumed by users in member accounts. Which solution should be used to enforce that only specific roles can be assumed across accounts, while ensuring that the policy updates are automatically applied to all accounts?

Create an IAM role in each member account with a trust policy that allows the Security account, and use AWS CloudFormation StackSets to deploy the roles.

Use AWS Single Sign-On (SSO) to assign permissions to users across accounts.

Create an IAM role in the Security account with a trust policy that references a service control policy (SCP) in AWS Organizations.

SCPs can restrict IAM actions across accounts, and the trust policy can reference the SCP to enforce central control.

Create a resource-based policy on each IAM role in the member accounts that allows the Security account to assume the role.

Why: Option C is correct because it leverages AWS Organizations and Service Control Policies (SCPs) to centrally enforce which IAM roles can be assumed across member accounts. An SCP applied to an OU or account can explicitly deny the `sts:AssumeRole` action for any role that does not match a specific ARN pattern, ensuring that only the Security account's designated roles are assumable. Since SCPs are automatically inherited by all accounts in the organization, policy updates are applied without manual intervention.

A company is running a critical application on an Amazon EC2 instance that needs to access an S3 bucket. The application must use temporary credentials that automatically rotate. The DevOps engineer must ensure that the credentials are never stored on disk. Which approach meets these requirements?

Store the credentials in AWS Secrets Manager and retrieve them at application startup.

Attach an IAM role to the EC2 instance and use the instance profile to obtain temporary credentials from the instance metadata service.

Instance profiles provide temporary credentials that are automatically rotated and never stored on disk.

Use AWS Systems Manager Parameter Store to store the credentials and retrieve them using the EC2 instance's IAM role.

Generate an access key and secret key for an IAM user and store them in a configuration file on the EC2 instance.

Why: Option B is correct because attaching an IAM role to the EC2 instance and using the instance profile allows the application to obtain temporary credentials from the EC2 instance metadata service (IMDS). These credentials are automatically rotated by AWS before they expire, and they are never stored on disk—they are fetched on-demand from the metadata endpoint (http://169.254.169.254/latest/meta-data/iam/security-credentials/). This satisfies both the requirement for automatic rotation and the prohibition against disk storage.

A DevOps engineer needs to ensure that all API calls made to AWS are recorded for auditing purposes. Which AWS service should be used?

AWS CloudTrail

CloudTrail records all AWS API calls for auditing.

AWS Config

Amazon CloudWatch Logs

Amazon VPC Flow Logs

Why: AWS CloudTrail is the correct service because it records all API calls made to AWS, including the identity of the caller, the time of the call, the source IP address, and the request parameters. This provides a complete audit trail of user activity and API usage, which is essential for auditing, security analysis, and compliance requirements.

A company uses AWS Key Management Service (KMS) to encrypt data at rest in Amazon S3. The security team wants to ensure that only users with a specific attribute in their SAML assertion can decrypt the data. Which KMS key policy should be used?

Create an S3 bucket policy that denies kms:Decrypt unless the request includes a specific tag.

Modify the KMS key policy to include a condition that allows kms:Decrypt only if the SAML assertion contains the specific attribute.

KMS key policies can use conditions based on SAML attributes to control decryption.

Attach a resource-based policy to the S3 bucket that allows decryption only for users with the specific attribute.

Use an IAM policy that grants kms:Decrypt only if the user has the specific attribute.

Why: Option B is correct because KMS key policies can use the `kms:ViaService` or `kms:CallerPrincipal` conditions, but more importantly, they can reference SAML-based attributes using the `aws:PrincipalTag` or `saml:sub` conditions. By adding a condition in the KMS key policy that checks for a specific SAML assertion attribute (e.g., `saml:sub` or a custom SAML attribute mapped to an IAM role session tag), only users whose SAML assertion includes that attribute will be allowed to call `kms:Decrypt`. This directly enforces the security team's requirement at the key level, independent of S3 bucket policies or IAM policies.

A company has a requirement to rotate database credentials every 30 days for an Amazon RDS for MySQL instance. The credentials are currently stored in AWS Secrets Manager. The DevOps engineer needs to implement automatic rotation without modifying the application code. Which solution should be used?

Create a scheduled job that runs every 30 days to update the secret in Secrets Manager with a new password.

Store the credentials in AWS Systems Manager Parameter Store and configure automatic rotation using a Lambda function.

Use the AWS RDS automatic password rotation feature, which automatically updates the password every 30 days.

Configure Secrets Manager to automatically rotate the secret every 30 days using a Lambda rotation function, and have the application retrieve the secret using the Secrets Manager API.

Secrets Manager provides built-in rotation for RDS with a Lambda function, and the application can retrieve credentials on-the-fly.

Why: Option D is correct because AWS Secrets Manager natively supports automatic rotation of secrets using a Lambda function that updates both the secret in Secrets Manager and the password in the RDS MySQL instance. This solution meets the 30-day rotation requirement without modifying application code, as the application retrieves the current secret via the Secrets Manager API, which automatically handles versioning and caching.

A company uses AWS Organizations to manage multiple accounts. The Security team wants to prevent member accounts from disabling AWS CloudTrail or deleting CloudTrail log files. Which TWO actions should the Security team take in the organization's management account? (Choose TWO.)

Create an SCP to deny cloudtrail:UpdateTrail.

Create an IAM policy in each member account to deny cloudtrail:StopLogging.

Create an SCP to deny s3:DeleteObject on the CloudTrail log bucket.

This prevents deletion of log files.

Enable AWS CloudTrail from the management account with organization trail.

Create an SCP to deny cloudtrail:StopLogging and cloudtrail:DeleteTrail.

This prevents disabling or deleting the trail.

Why: Option C is correct because an SCP that denies s3:DeleteObject on the CloudTrail log bucket prevents member accounts from deleting log files stored in S3, even if they have full administrative permissions. This is a critical control to ensure log integrity and compliance with security policies.

Want more Security and Compliance practice?

All SDLC Automation questions

Domain 6: SDLC Automation

A company uses AWS CodePipeline with a multi-branch strategy. A new feature branch triggers a pipeline that runs unit tests and deploys to a test environment. The deployment step uses AWS CodeDeploy with a deployment group configured for in-place deployment to Amazon EC2 instances. The deployment fails intermittently with the error 'The overall deployment failed because too many individual instances failed deployment, too few healthy instances are available for deployment, or some instances in your deployment group are experiencing problems.' The instances are healthy and pass health checks. What is the most likely cause?

The pipeline has a failed execution that is blocking subsequent executions.

The CodeDeploy agent on the instances is not running, causing the deployment to fail.

The pipeline is configured with a high frequency of changes, causing throttling from CodePipeline.

A previous deployment is still in progress or frozen in the CodeDeploy deployment group.

CodeDeploy limits concurrent deployments per deployment group; a frozen deployment prevents new ones.

Why: Option D is correct because CodeDeploy enforces a per-deployment-group concurrency limit of one deployment at a time. If a previous deployment is still in progress or in a 'frozen' state (e.g., due to a failed or stopped deployment that hasn't been explicitly rolled back or cleaned up), new deployments will fail with the 'too many individual instances failed' error even when instances are healthy. The error message is misleading because it reflects CodeDeploy's inability to proceed with the new deployment, not actual instance health issues.

A development team uses AWS CodeBuild to compile a Java application and run unit tests. The build takes 30 minutes, but the team wants to reduce build time. The codebase has not changed significantly, and dependencies are stable. Which action would be MOST effective in reducing build time?

Configure CodeBuild to cache dependencies in an Amazon S3 bucket.

Caching avoids re-fetching dependencies every build.

Move the build process to a local developer machine to avoid CodeBuild overhead.

Reduce the number of unit tests executed in the build phase.

Increase the compute type of the build environment to a larger instance.

Why: Caching dependencies in an Amazon S3 bucket allows CodeBuild to reuse previously downloaded Maven/Gradle dependencies across builds, eliminating the need to re-download them each time. Since the codebase and dependencies are stable, this directly reduces the build time by avoiding repeated network transfers of large artifact repositories.

A company uses AWS CodePipeline with multiple stages: Source (Amazon S3), Build (AWS CodeBuild), and Deploy (AWS CodeDeploy). The build stage runs a series of tests, and if they pass, the pipeline proceeds to deploy. Recently, a developer committed a change that passed all tests but caused a production outage. The team wants to add an approval step before the deploy stage, but they also want to ensure that only changes from specific branches can be deployed. What is the MOST secure and maintainable way to enforce this?

Use a Lambda function in the pipeline to check the branch name and fail if not allowed.

Add a manual approval step in the pipeline and rely on the approver to verify the branch.

Create a separate pipeline for each allowed branch, with the approval step only in the production pipeline.

Isolating pipelines prevents direct deployment from unauthorized branches.

Tag the source artifacts with the branch name and use a condition in CodePipeline to allow only specific tags.

Why: Option C is correct because it enforces branch-based deployment at the pipeline level, ensuring that only changes from specific branches trigger the production pipeline with the approval step. This approach is secure and maintainable as it leverages AWS CodePipeline's native ability to trigger on branch events, avoiding custom logic or manual verification. By isolating production deployments to a dedicated pipeline, the team reduces the risk of unauthorized or untested code reaching production.

A company uses AWS CodeCommit for source control. Developers frequently push large binary files (e.g., compiled JARs) to the repository, causing the repository size to grow rapidly and slowing down clone operations. The team wants to enforce a policy to reject pushes that contain files larger than 50 MB. Which approach should be used?

Configure a CodeCommit trigger that invokes an AWS Lambda function to validate file sizes and reject the push.

CodeCommit triggers allow custom validation before accepting a push.

Set up an Amazon CloudWatch Events rule to monitor repository size and alert when it exceeds a threshold.

Create an IAM policy that denies the `codecommit:GitPush` action if the file size exceeds 50 MB.

Use a pre-receive hook in the repository to reject large files by generating an S3 pre-signed URL.

Why: Option A is correct because AWS CodeCommit supports custom triggers that invoke AWS Lambda functions on repository events, including pushes. By configuring a trigger for the 'push' event, a Lambda function can inspect each file in the push payload, check its size against the 50 MB threshold, and programmatically reject the push by returning an error response. This approach enforces the policy at the repository level without requiring client-side changes.

An organization uses AWS CodePipeline to orchestrate deployments to multiple environments (dev, test, prod). Each environment uses a different AWS account. The pipeline uses cross-account actions with IAM roles. Recently, the pipeline failed at the deploy stage for the prod account with the error 'Access Denied' when assuming the cross-account role. The role ARN is correct and the trust policy allows the pipeline's service role. What is the MOST likely cause?

The EC2 instances in the prod account do not have an appropriate instance profile.

The pipeline's service role lacks the `sts:AssumeRole` permission for the cross-account role.

The service role needs explicit permission to assume the cross-account role.

The cross-account role's permissions boundary denies the deploy action.

The pipeline's service role does not have permission to perform the deploy action in the prod account.

Why: The pipeline's service role must have an `sts:AssumeRole` permission on the cross-account role to perform the role assumption. Even if the trust policy on the cross-account role allows the pipeline's service role, the pipeline's service role itself needs an IAM policy granting `sts:AssumeRole` for the cross-account role ARN. Without this permission, the `AssumeRole` API call fails with 'Access Denied', which is the exact error described.

A team uses AWS CodeDeploy to deploy a web application to an Auto Scaling group. The deployment strategy is Blue/Green. During a recent deployment, the new instances passed all health checks, but traffic was not routed to them. What is the most likely reason?

The target group associated with the Auto Scaling group is not properly configured to route traffic.

The target group must be correctly set up to forward traffic to the new instances.

The deployment group is not configured to use a load balancer.

The Auto Scaling group's lifecycle hook failed to signal readiness.

The CodeDeploy agent on the new instances is not installed.

Why: In a Blue/Green deployment with CodeDeploy and an Auto Scaling group, traffic routing is handled by a load balancer target group. If the target group is not properly configured to route traffic to the new instances (e.g., missing or incorrect listener rules, deregistration delay, or health check thresholds), the instances may pass health checks but never receive traffic. This is the most likely cause because the deployment succeeded in provisioning and validating the new instances, but the load balancer did not forward requests to them.

Want more SDLC Automation practice?

Browse all DOP-C02 questions Take a timed practice test

Frequently asked questions

How many questions are on the DOP-C02 exam?

The DOP-C02 exam has 75 questions and must be completed in 180 minutes. The passing score is 750/1000.

What types of questions appear on the DOP-C02 exam?

Scenario-based questions covering exam objectives with detailed answer explanations.

How are DOP-C02 questions organised by domain?

The exam covers 6 domains: Configuration Management and IaC, Resilient Cloud Solutions, Monitoring and Logging, Incident and Event Response, Security and Compliance, SDLC Automation. Questions are weighted by domain — higher-weight domains appear more on your actual exam.

Are these the actual DOP-C02 exam questions?

No. These are original exam-style practice questions written against the official Amazon Web Services DOP-C02 exam objectives. They are not copied from the real exam. Courseiva focuses on genuine understanding, not memorisation of braindumps.

Ready to practice all 75 DOP-C02 questions?

Courseiva tracks your accuracy per domain and routes you toward weak areas automatically. Free, no account required.

Amazon Web Services · Free Practice Questions · Last reviewed May 2026

DOP-C02 Exam Questions and Answers

36real exam-style questions organised by domain, each with the correct answer highlighted and a plain-English explanation of why it's right — and why the others are wrong.

75 exam questions

180 min time limit

Pass: 750/1000 / 1000

6 exam domains

Overview Domain Blueprint Study Guide All QuestionsSample by Domain

1. Configuration Management and IaC 2. Resilient Cloud Solutions 3. Monitoring and Logging 4. Incident and Event Response 5. Security and Compliance 6. SDLC Automation

Domain 1: Configuration Management and IaC

All Configuration Management and IaC questions

The parent stack's update policy is set to 'CONTINUE' by default. To prevent this, set 'OnFailure' to 'ROLLBACK' in the stack update options.

Setting 'OnFailure' to 'ROLLBACK' during update ensures the entire stack rolls back if any resource fails, maintaining consistency.

The parent stack was created without the '--capabilities' parameter, so it cannot roll back.

The nested stack failure automatically triggers a rollback of the parent stack, but the rollback also failed.

The parent stack is configured with 'OnFailure' set to 'DO_NOTHING'. Change it to 'DELETE'.

Store the secret ARN in a Terraform variable and use 'var.secret_arn' in the resource.

Store the secret in AWS Systems Manager Parameter Store and reference it using 'data.aws_ssm_parameter'.

Pass the secret as an environment variable to Terraform and reference it with 'var.secret_value'.

Use the 'data.aws_secretsmanager_secret_version' data source and mark the attribute as 'sensitive = true' in the output.

The data source retrieves the secret, and marking outputs as sensitive prevents them from being shown in logs or state.

Modify the stack's CloudFormation template to include the recipe.

Upload the recipe to a custom cookbook repository and assign it to the 'Configure' lifecycle event in the stack settings.

This is the standard way to run custom recipes on OpsWorks lifecycle events.

Add the recipe commands to the instance's user data script.

Use AWS CodeDeploy to trigger the recipe during the Configure event.

Configure the CloudFormation deployment action in CodePipeline with 'ActionMode' set to 'CREATE_UPDATE' and check the 'Rollback on failure' option.

CodePipeline's CloudFormation action supports automatic rollback on failure.

Use the CodePipeline console to enable 'Automatic rollback' for the Deploy stage.

Set the stack's 'DisableRollback' parameter to 'true' in the template.

Add a stack policy to the CloudFormation stack that denies updates to the AMI parameter.

Traffic splitting.

Immutable update.

Immutable updates create a completely new environment and only swap when healthy.

All at once.

Rolling update based on health.

DBInstanceClass and Engine

These are required properties for the DB instance resource.

AllocatedStorage

DBInstanceIdentifier

MasterUsername and MasterUserPassword

Want more Configuration Management and IaC practice?

All Resilient Cloud Solutions questions

Domain 2: Resilient Cloud Solutions

Use predictive scaling with a scheduled scaling policy for known peak times.

Predictive scaling anticipates demand and scales out in advance, preventing overload.

Increase the instance size to handle higher load.

Configure step scaling policies based on CPU utilization.

Set a higher CPU threshold for health checks.

Implement a custom conflict resolution using DynamoDB Streams and AWS Lambda.

Switch to eventual consistency reads to reduce conflicts.

Add a third global table region to increase redundancy.

Use conditional writes with a version number attribute to ensure updates are applied only to the latest version.

Conditional writes with versioning enable optimistic locking, allowing only the latest version to be updated, which aligns with LWW.

A company's application runs on EC2 instances in a single Availability Zone. The operations team wants to improve resilience without redesigning the application. Which action is the MOST effective?

Use a larger instance type to handle more traffic.

Enable EC2 Auto Recovery to automatically restart the instance if it fails.

Deploy EC2 instances across multiple Availability Zones using an Auto Scaling group.

Multi-AZ deployment ensures application availability even if one AZ fails.

Place the instance in a placement group to ensure low latency.

Enable MFA Delete and set a lifecycle policy to transition to S3 Glacier after 30 days.

Enable versioning and set a lifecycle policy to expire noncurrent versions after 365 days.

Enable cross-Region replication to a bucket with versioning enabled.

Enable S3 Object Lock with Governance mode and a retention period of 365 days, and set a lifecycle policy to transition to S3 Glacier Deep Archive after 30 days.

Object Lock prevents deletion during the retention period, and lifecycle transition reduces costs.

Add more AZs and configure the NLB with cross-zone load balancing.

Replace the NLB with an ALB and use ElastiCache for session storage.

Use a Multi-AZ RDS instance to store session state.

Replace the NLB with an ALB and enable sticky sessions (session affinity) using the ALB's cookie.

Sticky sessions ensure that requests from the same client are routed to the same instance, preserving local session state.

Use backup and restore with daily snapshots stored in S3 and cross-Region replication.

Use a multi-Region application with Route 53 latency-based routing and RDS read replicas in the DR Region.

Use a warm standby strategy with a scaled-down copy of the production environment in the DR Region, and replicate data using RDS Multi-AZ with synchronous replication.

Warm standby allows quick failover; synchronous replication meets RPO of 1 hour.

Use a pilot light strategy with EC2 instances stopped and RDS snapshots copied to the DR Region.

Want more Resilient Cloud Solutions practice?

All Monitoring and Logging questions

Domain 3: Monitoring and Logging

Enable CloudWatch Logs for the ALB and use CloudWatch Logs Insights to query 5xx logs, then create a metric filter and alarm.

Configure AWS Config rules to check ALB 5xx error counts and trigger alarms.

Use CloudWatch ALB metrics (HTTPCode_ELB_5XX_Count) and create a CloudWatch Alarm on the Sum statistic with a threshold based on total request count.

Correct: ALB publishes HTTP 5xx metrics to CloudWatch, and alarms can be set on these metrics.

Use AWS X-Ray to trace requests and create a CloudWatch alarm based on X-Ray error rate.

Increase the batch size in the CloudWatch agent configuration.

Correct: Larger batch size reduces API calls and CPU usage.

Use JSON format for logs instead of plain text.

Set the agent's timezone to UTC.

Change the log group retention policy to 7 days.

Amazon EventBridge

Amazon CloudWatch Alarms

Correct: CloudWatch Alarms monitor metrics and trigger actions.

AWS X-Ray

Amazon CloudWatch Logs

Replicate the canaries to US East (N. Virginia) and run them from there.

Create a cross-region CloudWatch dashboard and add metrics from each region using metric math.

Correct: Cross-region dashboards natively support displaying metrics from different regions.

Set up a Lambda function in each region to push canary results to a central S3 bucket, then create a dashboard from S3.

Create a CloudWatch Logs Insights query across all regions and visualize results.

Examine the trace map to see the connection between services.

Correct: The trace map visualizes service connections and latency.

Add annotations to the traces for better filtering.

View the raw segments of the upstream service.

Adjust the sampling rules to capture more traces.

Install the CloudWatch agent on the RDS instance.

Enable Enhanced Monitoring for the RDS instance.

No additional configuration is needed; RDS automatically sends metrics to CloudWatch.

Correct: RDS publishes CPU utilization to CloudWatch by default.

Enable Performance Insights for the RDS instance.

Want more Monitoring and Logging practice?

All Incident and Event Response questions

Domain 4: Incident and Event Response

Check the SNS topic that the scaling policy publishes to for notifications.

Use CloudWatch Logs Insights to query application logs for error patterns or deployment markers that coincide with the metric spike.

CloudWatch Logs Insights allows querying logs to find patterns related to the spike.

Use AWS CloudTrail to review API calls that modified the scaling policy.

Temporarily disable the scaling policy and manually increase the desired capacity to handle the load.

Increase the task definition CPU to 1024 units and memory to 2048 MiB.

Increase the task definition memory to 2048 MiB while keeping CPU at 512 units.

This directly addresses the memory error without wasting resources on extra CPU.

Configure the ECS service to use a rolling update with a longer health check grace period.

Decrease the task definition memory to 512 MiB to force garbage collection more frequently.

Review the CloudWatch metrics for CPU utilization and network throughput.

Reboot the instance to reset the network interface.

Stop and start the instance to move it to new underlying hardware.

Check the security group and network ACL rules to ensure inbound traffic is allowed.

Connectivity issues often stem from network permissions.

Modify the S3 bucket event notification configuration to use a prefix filter that excludes duplicate objects.

Use a DynamoDB table to store a record of processed S3 object keys and check for existence before processing.

This pattern ensures idempotency by tracking processed objects.

Set the Lambda function's ReservedConcurrency to 1 to prevent concurrent executions.

Use an Amazon SQS FIFO queue as the event source and enable content-based deduplication.

Use the 'ContinueUpdateRollback' API to skip the resource that caused the failure.

This is the designed method to resolve rollback failures.

Create a new stack from the same template and migrate resources.

Manually correct the resource configuration that caused the failure, then perform a stack update.

Delete the stack and then recreate it from the same template.

The security group for the RDS instance has changed during failover.

The application is using the database's DNS endpoint for the old primary, which is no longer the writer.

After failover, the writer endpoint points to the new primary, but if the application caches the old endpoint, it may fail.

The DNS record for the RDS endpoint has not propagated to the application's DNS resolver.

The database instance is still in the process of failover and is not yet accepting connections.

Want more Incident and Event Response practice?

All Security and Compliance questions

Domain 5: Security and Compliance

Create an IAM role in each member account with a trust policy that allows the Security account, and use AWS CloudFormation StackSets to deploy the roles.

Use AWS Single Sign-On (SSO) to assign permissions to users across accounts.

Create an IAM role in the Security account with a trust policy that references a service control policy (SCP) in AWS Organizations.

SCPs can restrict IAM actions across accounts, and the trust policy can reference the SCP to enforce central control.

Create a resource-based policy on each IAM role in the member accounts that allows the Security account to assume the role.

Store the credentials in AWS Secrets Manager and retrieve them at application startup.

Attach an IAM role to the EC2 instance and use the instance profile to obtain temporary credentials from the instance metadata service.

Instance profiles provide temporary credentials that are automatically rotated and never stored on disk.

Use AWS Systems Manager Parameter Store to store the credentials and retrieve them using the EC2 instance's IAM role.

Generate an access key and secret key for an IAM user and store them in a configuration file on the EC2 instance.

A DevOps engineer needs to ensure that all API calls made to AWS are recorded for auditing purposes. Which AWS service should be used?

AWS CloudTrail

CloudTrail records all AWS API calls for auditing.

AWS Config

Amazon CloudWatch Logs

Amazon VPC Flow Logs

Create an S3 bucket policy that denies kms:Decrypt unless the request includes a specific tag.

Modify the KMS key policy to include a condition that allows kms:Decrypt only if the SAML assertion contains the specific attribute.

KMS key policies can use conditions based on SAML attributes to control decryption.

Attach a resource-based policy to the S3 bucket that allows decryption only for users with the specific attribute.

Use an IAM policy that grants kms:Decrypt only if the user has the specific attribute.

Create a scheduled job that runs every 30 days to update the secret in Secrets Manager with a new password.

Store the credentials in AWS Systems Manager Parameter Store and configure automatic rotation using a Lambda function.

Use the AWS RDS automatic password rotation feature, which automatically updates the password every 30 days.

Configure Secrets Manager to automatically rotate the secret every 30 days using a Lambda rotation function, and have the application retrieve the secret using the Secrets Manager API.

Secrets Manager provides built-in rotation for RDS with a Lambda function, and the application can retrieve credentials on-the-fly.

Create an SCP to deny cloudtrail:UpdateTrail.

Create an IAM policy in each member account to deny cloudtrail:StopLogging.

Create an SCP to deny s3:DeleteObject on the CloudTrail log bucket.

This prevents deletion of log files.

Enable AWS CloudTrail from the management account with organization trail.

Create an SCP to deny cloudtrail:StopLogging and cloudtrail:DeleteTrail.

This prevents disabling or deleting the trail.

Want more Security and Compliance practice?

All SDLC Automation questions

Domain 6: SDLC Automation

The pipeline has a failed execution that is blocking subsequent executions.

The CodeDeploy agent on the instances is not running, causing the deployment to fail.

The pipeline is configured with a high frequency of changes, causing throttling from CodePipeline.

A previous deployment is still in progress or frozen in the CodeDeploy deployment group.

CodeDeploy limits concurrent deployments per deployment group; a frozen deployment prevents new ones.

Configure CodeBuild to cache dependencies in an Amazon S3 bucket.

Caching avoids re-fetching dependencies every build.

Move the build process to a local developer machine to avoid CodeBuild overhead.

Reduce the number of unit tests executed in the build phase.

Increase the compute type of the build environment to a larger instance.

Use a Lambda function in the pipeline to check the branch name and fail if not allowed.

Add a manual approval step in the pipeline and rely on the approver to verify the branch.

Create a separate pipeline for each allowed branch, with the approval step only in the production pipeline.

Isolating pipelines prevents direct deployment from unauthorized branches.

Tag the source artifacts with the branch name and use a condition in CodePipeline to allow only specific tags.

Configure a CodeCommit trigger that invokes an AWS Lambda function to validate file sizes and reject the push.

CodeCommit triggers allow custom validation before accepting a push.

Set up an Amazon CloudWatch Events rule to monitor repository size and alert when it exceeds a threshold.

Create an IAM policy that denies the `codecommit:GitPush` action if the file size exceeds 50 MB.

Use a pre-receive hook in the repository to reject large files by generating an S3 pre-signed URL.

The EC2 instances in the prod account do not have an appropriate instance profile.

The pipeline's service role lacks the `sts:AssumeRole` permission for the cross-account role.

The service role needs explicit permission to assume the cross-account role.

The cross-account role's permissions boundary denies the deploy action.

The pipeline's service role does not have permission to perform the deploy action in the prod account.

The target group associated with the Auto Scaling group is not properly configured to route traffic.

The target group must be correctly set up to forward traffic to the new instances.

The deployment group is not configured to use a load balancer.

The Auto Scaling group's lifecycle hook failed to signal readiness.

The CodeDeploy agent on the new instances is not installed.

Want more SDLC Automation practice?