Knowledge + Practice

CCNA Troubleshooting and Optimization Questions

75 of 291 questions · Page 1/4 · Troubleshooting and Optimization · Answers revealed

Practice these questions Domain overview All questions

1

Multi-Selecthard

A developer is optimizing an Amazon S3 bucket that stores millions of small objects. The application frequently lists objects with prefix-based queries. Which THREE strategies should the developer implement to improve performance?

Select 3 answers

A.Use Amazon S3 Inventory to generate a daily list of objects and query that list.

B.Use Amazon S3 Select or Amazon Athena to query objects instead of listing.

C.Move infrequently accessed objects to Amazon S3 Glacier Deep Archive.

D.Increase the TPS limit for ListObjects requests by requesting a quota increase.

E.Use AWS Glue to create a catalog of object metadata for faster querying.

AnswersA, B, E

S3 Inventory provides a CSV/Parquet file of objects, avoiding List API calls.

Why this answer

Option A is correct because Amazon S3 Inventory provides a daily or weekly CSV/Parquet report listing all objects and their metadata. By querying this inventory file (e.g., with Athena or SQL), you avoid making thousands of individual ListObjects API calls, which is far more efficient for prefix-based queries on millions of small objects.

Exam trap

The trap here is that candidates might think increasing API rate limits (Option D) is possible, but AWS S3 does not allow requesting a higher ListObjects TPS limit; instead, you must use alternative strategies like S3 Inventory or partitioning.

Practice this question →

2

MCQmedium

A developer is troubleshooting a Lambda function that processes S3 events. The function runs successfully in the AWS Management Console test but fails when triggered by actual S3 PUT events. The error logs show 'AccessDenied' when attempting to read the object from S3. What is the most likely cause?

A.The S3 bucket has versioning enabled and the event does not include the version ID.

B.The Lambda execution role lacks 's3:GetObject' permission for the bucket.

C.The S3 bucket policy does not grant 's3:GetObject' to the Lambda service principal.

D.The Lambda function is not in the same VPC as the S3 bucket.

AnswerB

The Lambda role must have permission to read the object from S3.

Why this answer

The Lambda function's execution role must have permission to read the S3 object. The error occurs only with actual events because the console test may use a different role or the event payload differs. Option C directly addresses the missing permission.

Option A is wrong because S3 events do not require VPC access. Option B is wrong because the bucket policy may not grant access to the Lambda service principal. Option D is wrong because S3 event notifications do not have versioning restrictions.

Practice this question →

3

MCQhard

A company is using Amazon DynamoDB with on-demand capacity. A developer notices that write requests are being throttled during peak hours. What is the MOST effective way to resolve this issue?

A.Switch to provisioned capacity mode with auto-scaling.

B.Increase the write capacity units.

C.Review the partition key design and consider adding a suffix to distribute writes.

D.Increase the read capacity units.

AnswerC

Hot partitions can cause throttling even in on-demand mode.

Why this answer

Option D is correct because on-demand capacity can still throttle if you exceed the previous peak traffic by a large margin; the table can be split into multiple partitions to distribute write load. Option A is wrong because on-demand already auto-scales. Option B is wrong because increasing read capacity does not help writes.

Option C is wrong because there are no WCU limits in on-demand mode.

Practice this question →

4

Multi-Selecthard

A company is running a serverless application using AWS Lambda and Amazon API Gateway. The application experiences increased latency during peak hours. CloudWatch metrics show that Lambda function duration remains stable, but API Gateway latency spikes. Which THREE actions should the developer take to reduce API Gateway latency?

Select 3 answers

A.Increase the Lambda function timeout.

B.Enable compression for API responses.

C.Increase the API Gateway throttling limits.

D.Enable API Gateway caching for the endpoints.

E.Switch API Gateway endpoint type from Edge-optimized to Regional.

AnswersB, D, E

Compression reduces response size, speeding up transfer.

Why this answer

Options A, B, and D are correct. Enabling caching reduces backend calls, using Regional endpoint reduces network latency, and enabling compression reduces payload size. Option C is wrong because throttling limits do not reduce latency.

Option E is wrong because increasing Lambda timeout does not affect API Gateway latency.

Practice this question →

5

MCQhard

An application running on Amazon ECS Fargate is experiencing intermittent connection timeouts when calling an external API. The task has a public IP and a security group that allows outbound HTTPS. What is the most likely cause?

A.The ECS service is not configured to auto-assign public IP.

B.The task's security group does not allow inbound traffic.

C.The security group outbound rules are misconfigured.

D.The task is running in a private subnet without a NAT gateway.

AnswerD

Private subnets need a NAT gateway for outbound internet access.

Why this answer

Option D is correct because ECS Fargate tasks running in a private subnet do not have direct internet access. Without a NAT gateway, outbound traffic to the external API is routed to the subnet’s route table, which lacks an internet gateway target, causing connection timeouts. The task’s public IP assignment is irrelevant in a private subnet, as the subnet itself has no route to the internet.

Exam trap

The trap here is that candidates assume a public IP on the task guarantees internet access, overlooking that the subnet’s route table determines whether traffic can reach the internet, and a private subnet without a NAT gateway blocks all outbound internet traffic regardless of the task’s public IP assignment.

How to eliminate wrong answers

Option A is wrong because the task already has a public IP assigned (as stated in the question), so the ECS service configuration for auto-assigning public IP is not the issue. Option B is wrong because inbound traffic rules are irrelevant for outbound HTTPS connections; the security group only needs to allow outbound traffic, which it does. Option C is wrong because the security group outbound rules are correctly configured to allow HTTPS (port 443), so misconfiguration is not the cause.

Practice this question →

6

MCQeasy

A developer is troubleshooting an AWS Lambda function that times out when processing large files from Amazon S3. The function has a 15-minute timeout and 512 MB memory. What should the developer do to resolve this issue?

A.Use Amazon S3 batch operations to split the files before processing.

B.Add an S3 Event Notification to trigger the function asynchronously.

C.Reduce the Lambda timeout to 5 minutes to force faster processing.

D.Increase the Lambda function memory to 3008 MB.

AnswerD

More memory improves CPU and network, speeding up execution.

Why this answer

Option C is correct because increasing memory also increases CPU and network throughput, which can reduce execution time. Option A is wrong because Lambda already supports up to 15 minutes. Option B is wrong because increasing memory is a more direct solution than splitting files.

Option D is wrong because S3 Event Notifications are configured, not something to add to solve timeout.

Practice this question →

7

Multi-Selectmedium

A company's application runs on Amazon EC2 instances in an Auto Scaling group. The application experiences intermittent failures, and the developer suspects the application is not properly handling termination notifications. Which TWO steps should the developer take to diagnose the issue?

Select 2 answers

A.Enable detailed monitoring on the Auto Scaling group.

B.Configure a CloudWatch Events rule to capture Auto Scaling termination events.

C.Install the CloudWatch Logs agent on the instances to capture application logs.

D.Add a lifecycle hook to the Auto Scaling group to pause termination.

E.Use an Elastic Load Balancer to replace instances automatically.

AnswersB, D

CloudWatch Events can invoke a Lambda function to log or handle the event.

Why this answer

Options B and D are correct. B: Lifecycle hooks allow the instance to perform actions before termination. D: CloudWatch Events can capture lifecycle transitions.

Option A is wrong because detailed monitoring does not capture termination signals. Option C is wrong because CloudWatch Logs agent is for logs, not for termination notifications. Option E is wrong because replacing instances does not diagnose the issue.

Practice this question →

8

MCQeasy

A developer is troubleshooting a Lambda function that occasionally times out. The function makes HTTPS calls to an external API. Which configuration change is MOST likely to resolve the issue without increasing the risk of further timeouts?

A.Increase the function's memory allocation.

B.Decrease the function's reserved concurrency.

C.Increase the function's timeout setting.

D.Change the function invocation type to synchronous.

AnswerC

Increasing timeout allows the function to wait longer for the external API response.

Why this answer

Option B is correct because adjusting the timeout setting gives the function more time to complete the external API call. Option A is wrong because increasing memory/CPU won't help if the external API is slow. Option C is wrong because synchronous invocation is not relevant to timeout.

Option D is wrong because reducing timeout would worsen the issue.

Practice this question →

9

MCQhard

A developer deployed an AWS Lambda function that is invoked by an Amazon SQS queue. The function is configured with a batch size of 10 and a timeout of 30 seconds. CloudWatch metrics show that the function's Duration is consistently around 28 seconds, but occasionally spikes to 35 seconds causing timeouts. The function makes a synchronous HTTP call to an external API. Which approach will MOST effectively prevent timeouts while maximizing throughput?

A.Use async HTTP calls with a callback

B.Increase the function's timeout to 60 seconds

C.Reduce the batch size to 5

D.Increase the SQS visibility timeout to 60 seconds

AnswerA

Correct. Async calls allow the Lambda function to handle multiple HTTP requests concurrently, reducing overall execution time and preventing timeouts.

Why this answer

Option A is correct because using async HTTP calls with a callback prevents the Lambda function from blocking on the synchronous HTTP request. This allows the function to process the batch of 10 messages concurrently, reducing the overall execution time below the 30-second timeout even when individual API calls occasionally take longer. By not waiting for each response sequentially, the function maximizes throughput and avoids timeouts.

Exam trap

The trap here is that candidates often assume increasing the Lambda timeout or adjusting SQS settings will fix performance issues, when the real problem is synchronous blocking I/O that can be resolved with asynchronous programming to improve concurrency and throughput.

How to eliminate wrong answers

Option B is wrong because increasing the timeout to 60 seconds only masks the symptom without addressing the root cause—blocking on synchronous HTTP calls—and reduces throughput by keeping the function running longer per invocation. Option C is wrong because reducing the batch size to 5 decreases the number of messages processed per invocation, lowering throughput and not preventing timeouts caused by slow API calls within the batch. Option D is wrong because increasing the SQS visibility timeout does not affect the Lambda function's execution timeout; it only delays message redelivery if the function fails, which does not prevent the function from timing out.

Practice this question →

10

MCQhard

A company runs a web application on EC2 instances behind an Application Load Balancer (ALB). The application stores session data in an RDS MySQL database. Recently, users have reported that they are being logged out unexpectedly and their session data is lost. The developer investigates and finds that the RDS instance's CPU utilization spikes periodically, coinciding with the logout events. The application uses connection pooling via an RDS Proxy. The developer suspects that the session table is being dropped or truncated. After checking the application logs, the developer finds no evidence of truncation commands. The RDS instance has automated backups enabled, and the binary logs are retained for 24 hours. The developer wants to identify the root cause and prevent future occurrences. Which course of action should the developer take?

A.Enable Multi-AZ deployment for RDS to improve availability and prevent data loss during failover.

B.Increase the RDS instance size to handle the CPU spikes and prevent future issues.

C.Disable RDS Proxy and implement connection pooling in the application code to reduce database load.

D.Check the session table's storage engine; if it uses MEMORY, change it to InnoDB to persist data across restarts.

AnswerD

MEMORY engine loses data on restart; InnoDB is durable.

Why this answer

The RDS CPU spikes and session loss suggest that the database is being restarted or failing over, causing in-memory session data to be lost. However, the session data is stored in a table, which should survive restarts. The issue might be that the session table is using the MEMORY storage engine, which loses data on restart.

Option C is correct: checking the table storage engine. Option A is wrong because RDS Proxy does not cause data loss. Option B is wrong because binary logs are for replication, not session persistence.

Option D is wrong because increasing instance size may delay but not prevent the issue.

Practice this question →

11

Multi-Selecthard

A developer is optimizing an AWS Lambda function that processes streaming data from Amazon Kinesis. The function is CPU-bound. Which TWO actions should the developer take to improve performance?

Select 2 answers

A.Rewrite the function in a compiled language like Go.

B.Increase the function's reserved concurrency.

C.Increase the function's memory allocation.

D.Increase the Kinesis stream's shard count.

E.Enable GPU acceleration for the function.

AnswersA, C

Compiled languages generally have better CPU performance than interpreted.

Why this answer

Option A is correct because increasing memory also increases CPU allocation. Option C is correct because using a language with better CPU performance (e.g., Go) can improve speed. Option B is wrong because increasing reserved concurrency does not affect individual function performance.

Option D is wrong because Lambda does not support GPU. Option E is wrong because Kinesis batch size does not affect CPU-bound processing.

Practice this question →

12

MCQmedium

A developer is troubleshooting an AWS Lambda function that processes records from an Amazon Kinesis Data Stream. The function is configured with a batch size of 100 and a parallelization factor of 1. The developer notices that the function is processing records slowly, and the iterator age is increasing. CloudWatch Logs show that the function is not experiencing errors or throttling, but the execution time per invocation is close to the 5-minute timeout. The stream has 10 shards. What is the most cost-effective way to increase processing throughput?

A.Increase the batch size to 1000

B.Increase the parallelization factor to 10

C.Increase the memory of the Lambda function

D.Split the stream into more shards

AnswerB

The parallelization factor determines the number of concurrent Lambda invocations per shard. Increasing it allows multiple invocations to process records from the same shard simultaneously, dramatically increasing throughput without additional shard costs.

Why this answer

Increasing the parallelization factor to 10 allows each shard to be processed by up to 10 concurrent Lambda invocations, which directly increases throughput without additional shard costs. Since the function is not throttled or erroring, the bottleneck is the per-invocation processing time; parallelization reduces the iterator age by processing multiple batches per shard simultaneously.

Exam trap

The trap here is that candidates often assume increasing shards is the only way to scale Kinesis processing, but the parallelization factor is a cost-effective Lambda-specific tuning knob that increases concurrency without additional shard costs.

How to eliminate wrong answers

Option A is wrong because the batch size is already 100, and increasing it to 1000 would likely cause the function to exceed the 5-minute timeout even more, as it would need to process more records per invocation, worsening the iterator age. Option C is wrong because increasing memory may reduce execution time for CPU-bound tasks, but the logs show the function is close to timeout, not CPU-bound, and memory increases cost without guaranteed throughput improvement for I/O-bound Kinesis processing. Option D is wrong because splitting the stream into more shards increases AWS costs and complexity, and the existing 10 shards are not fully utilized due to the parallelization factor of 1; adding shards does not address the per-shard concurrency bottleneck.

Practice this question →

13

MCQeasy

A developer is using Amazon DynamoDB for a new application. The developer wants to reduce read latency. Which design pattern should the developer use?

A.Create a global secondary index (GSI) for the table.

B.Increase the provisioned read capacity units (RCUs) for the table.

C.Use DynamoDB Global Tables to replicate data to multiple regions.

D.Use DynamoDB Accelerator (DAX) as a cache for frequently read items.

AnswerD

DAX provides microsecond read latency.

Why this answer

DynamoDB Accelerator (DAX) is an in-memory cache designed specifically for DynamoDB, providing microsecond read latency for frequently accessed items. By caching read-heavy workloads, DAX offloads requests from the DynamoDB table, reducing read latency without requiring application-level caching logic. This directly addresses the developer's goal of reducing read latency.

Exam trap

The trap here is that candidates often confuse increasing provisioned capacity (Option B) with reducing latency, when in fact it only increases throughput, while DAX (Option D) directly addresses latency by caching reads in memory.

How to eliminate wrong answers

Option A is wrong because a Global Secondary Index (GSI) provides an alternative query pattern or sort key, but does not inherently reduce read latency; it may even add latency due to asynchronous replication. Option B is wrong because increasing provisioned read capacity units (RCUs) improves throughput (handling more requests per second) but does not reduce per-request latency, as DynamoDB's read latency is already low and consistent regardless of RCU level. Option C is wrong because DynamoDB Global Tables replicate data across regions for disaster recovery and low-latency reads in remote regions, but for a single-region application, it adds complexity and cost without reducing local read latency.

Practice this question →

14

MCQmedium

The above CloudWatch alarm is configured for an EC2 instance. The alarm state is 'INSUFFICIENT_DATA' and has been for 2 days. Which of the following is the most likely cause?

A.The EC2 instance is stopped

B.The alarm period and evaluation periods are misconfigured

C.The CloudWatch agent is not installed on the EC2 instance

D.The threshold is set too high for the metric

AnswerC

CPUUtilization for EC2 is available by default, but if the instance is not sending metrics due to missing agent or permissions, data will be absent.

Why this answer

Option A is correct because INSUFFICIENT_DATA means the metric data is not available. The most common reason is that the CloudWatch agent is not installed or configured to send the CPUUtilization metric. Option B is wrong because the instance is running, but that doesn't guarantee metric data.

Option C is wrong because if the alarm period and evaluation periods are correct, the alarm would evaluate to OK or ALARM if data existed. Option D is wrong because the threshold is not the issue; the issue is lack of data.

Practice this question →

15

MCQhard

An application uses an Amazon SQS queue to decouple microservices. The producer is sending messages, but the consumer is not processing them. The consumer is an Auto Scaling group of EC2 instances. The SQS queue's ApproximateNumberOfMessagesVisible metric is increasing. What is the MOST likely cause?

A.The SQS queue policy denies access to the consumer.

B.The consumer instances are not polling the SQS queue.

C.The visibility timeout is set too low.

D.The SQS queue has a dead-letter queue configured.

AnswerB

If consumers are not polling, messages accumulate in the queue.

Why this answer

Option A is correct because if the number of visible messages is increasing, the consumer is not polling or processing fast enough. Option B is wrong because the DLQ would receive messages after max receive count is exceeded, but the visible messages are increasing, indicating messages are not being consumed. Option C is wrong because the visibility timeout being too short would cause messages to reappear after being processed, but the visible count would not increase continuously.

Option D is wrong because the queue is receiving messages, so permissions are likely correct.

Practice this question →

16

MCQmedium

A developer deployed a new version of an AWS Lambda function using the AWS CLI update-function-code command. The function uses environment variables to store database credentials. After the update, the function returns errors indicating it cannot connect to the database. What is the MOST likely cause?

A.The function ARN changed after the update.

B.The database credentials are encrypted but the function cannot decrypt them.

C.The environment variables were not set in the new version.

D.The function's IAM role was changed during the update.

AnswerC

update-function-code resets environment variables; they need to be set again.

Why this answer

Option B is correct because update-function-code does not preserve environment variables; they are reset. Option A is wrong because the function code is updated, not the IAM role. Option C is wrong because environment variables are not encrypted by default but that wouldn't cause connection failure.

Option D is wrong because the function ARN doesn't affect environment variables.

Practice this question →

17

MCQhard

A company runs a containerized web application on Amazon ECS with Fargate launch type. The application experiences intermittent HTTP 503 errors. The ECS service auto-scales based on CPU, but the errors persist. What is the most likely cause and solution?

A.The ALB health check interval is too short; increase the health check interval.

B.The CPU threshold is too low; increase the target value to 80%.

C.The ALB idle timeout is too low; increase it to 300 seconds.

D.The cluster does not have enough EC2 instances; add more instances.

AnswerA

Short intervals cause healthy containers to be marked unhealthy, leading to 503 errors.

Why this answer

Option C is correct because 503 errors from ALB suggest targets failing health checks. Increasing health check thresholds gives targets more time to become healthy. Option A is wrong because increasing CPU threshold would reduce scaling, worsening errors.

Option B is wrong because Fargate does not use a cluster of EC2 instances. Option D is wrong because errors are 503, not 504 (gateway timeout).

Practice this question →

18

Multi-Selecthard

A developer is using AWS CodePipeline to deploy a web application. The pipeline has a source stage from GitHub and a deploy stage to Elastic Beanstalk. The deploy stage fails with the error 'The S3 bucket does not allow access to the artifact'. Which THREE actions could resolve this issue?

Select 3 answers

A.Specify a different artifact bucket in the pipeline configuration.

B.Add a bucket policy that grants the pipeline's service role access to the artifact bucket.

C.Ensure the pipeline's IAM role has s3:GetObject and s3:PutObject permissions on the artifact bucket.

D.If the artifact bucket is encrypted with AWS KMS, ensure the pipeline role has kms:Decrypt permission.

E.Enable versioning on the artifact bucket.

AnswersB, C, D

Bucket policy can grant cross-account access.

Why this answer

Option A: The artifact bucket may be in another account; setting bucket policy allows access. Option B: The pipeline role must have permissions to the bucket. Option C: If the artifact bucket is encrypted with KMS, the role needs kms:Decrypt.

Option D is wrong because the bucket is already specified. Option E is wrong because versioning is not related to access.

Practice this question →

19

MCQmedium

A developer is deploying a serverless application using AWS CloudFormation. The stack creation fails with the error 'CREATE_FAILED: The following resource(s) failed to create: [MyLambdaFunction]'. The developer checks the CloudFormation events and sees 'Resource creation cancelled'. What is the most likely cause?

A.The Lambda function code is too large and exceeds the deployment limit.

B.The Lambda function creation timed out due to a network issue.

C.Another resource in the stack failed, triggering a rollback and cancelling the Lambda creation.

D.The Lambda function's execution role is missing permissions.

AnswerC

CloudFormation cancels creation of remaining resources when a stack rolls back.

Why this answer

CloudFormation cancels resource creation if a previous resource fails and the stack is set to roll back on failure. The Lambda function creation may have been cancelled due to another resource failure. Option A is correct.

Option B is wrong because the error is not specific to IAM. Option C is wrong because the Lambda function creation was cancelled, not failed due to timeout. Option D is wrong because the error does not mention throttling.

Practice this question →

20

MCQeasy

A developer notices that an S3 bucket configured for static website hosting returns 403 Forbidden errors when accessed from a browser. The bucket policy allows s3:GetObject for principal "*" but only over HTTPS. What is the MOST likely reason for the error?

A.The requester is not authenticated with AWS IAM.

B.The bucket policy denies s3:GetObject for anonymous principals.

C.The request was sent over HTTP instead of HTTPS.

D.The S3 bucket website endpoint does not support HTTPS.

AnswerC

The bucket policy condition requires HTTPS; HTTP requests are denied.

Why this answer

Option D is correct because the bucket policy condition requires HTTPS, but the browser may have used HTTP. Option A is wrong because the bucket policy does not require authentication. Option B is wrong because the website endpoint does not support HTTPS by default.

Option C is wrong because the bucket policy explicitly allows GetObject.

Practice this question →

21

MCQeasy

A developer deployed a web application using AWS Elastic Beanstalk. The application uses an RDS MySQL database. After a recent deployment, the application's health status turned from 'Ok' to 'Severe'. The developer checks the environment events and sees that the database connection is failing. The RDS instance is in the same VPC and security group. The developer confirms that the database endpoint and credentials are correct. What is the MOST likely cause of the connection failure?

A.The security group for the Elastic Beanstalk environment no longer allows outbound traffic to the RDS database port.

B.The RDS database endpoint changed after the deployment.

C.The RDS instance is not publicly accessible.

D.The application code is failing to start due to a bug.

AnswerA

A deployment can modify security group rules, blocking the connection.

Why this answer

Option B is correct because a recent deployment may have changed the environment's security group configuration, removing or modifying the inbound rule for the database port. Option A is wrong because the database endpoint is correct. Option C is wrong because the application is failing to connect, not failing to start.

Option D is wrong because the RDS instance is in the same VPC, so it is not an internet routing issue.

Practice this question →

22

MCQhard

A company has a DynamoDB table with a global secondary index (GSI) for querying. The write capacity is provisioned at 1000 WCU, and the GSI has 500 WCU. During a traffic spike, writes to the table are throttled, but the GSI is not throttled. What is the MOST likely cause?

A.DynamoDB adaptive capacity is reducing the table's write capacity.

B.A hot partition in the table is exceeding its partition-level write limits.

C.The total write traffic to the table exceeds 1000 WCU.

D.The GSI is throttling writes because its write capacity is insufficient.

AnswerC

If the write traffic exceeds the provisioned 1000 WCU, the table will throttle writes regardless of the GSI capacity.

Why this answer

Option B is correct because throttling occurs when writes exceed provisioned WCU on the table, not the GSI. Option A is wrong because hot partition would cause throttling on specific partitions, but the scenario mentions general throttling. Option C is wrong because GSI throttling would affect writes to the GSI, not the table.

Option D is wrong because adaptive capacity would mitigate throttling, not cause it.

Practice this question →

23

MCQmedium

A web application running on EC2 instances behind an Application Load Balancer (ALB) is experiencing intermittent 503 errors. The ALB target group health checks are succeeding. Which step should the developer take FIRST to diagnose the issue?

A.Increase the number of EC2 instances in the target group.

B.Examine the ALB access logs for 503 responses.

C.Check the Route 53 record for the ALB.

D.Verify that the EC2 instances are in a running state.

AnswerB

Access logs provide details on each request and can identify patterns causing 503s.

Why this answer

Option D is correct because examining ALB access logs can reveal the response status codes and request details. Option A is wrong because health checks succeed. Option B is wrong because increasing instances may not address the root cause.

Option C is wrong because the issue is with the ALB, not DNS.

Practice this question →

24

MCQeasy

A developer is deploying a serverless application using AWS SAM. The deployment fails with the error 'Resource creation cancelled'. What is the most likely cause?

A.The SAM template is malformed.

B.A resource in the stack failed to create.

C.The Lambda function code has a timeout.

D.The IAM role does not have sufficient permissions.

AnswerB

AWS CloudFormation cancels the creation of subsequent resources after a failure.

Why this answer

Option D is correct because 'Resource creation cancelled' typically indicates a stack failure due to a previous resource creation failure. Option A is wrong because missing permissions cause access denied errors. Option B is wrong because timeout would cause a different error.

Option C is wrong because invalid YAML causes a parse error.

Practice this question →

25

MCQeasy

A company uses Amazon S3 to store sensitive data. A developer needs to ensure that all objects uploaded to a specific S3 bucket are encrypted at rest. Which approach should the developer take?

A.Add a bucket policy that denies PutObject if the object is not encrypted.

B.Use pre-signed URLs with server-side encryption parameters.

C.Attach an IAM policy to all users requiring them to include the x-amz-server-side-encryption header.

D.Enable default encryption on the bucket using SSE-S3.

AnswerA

A bucket policy with a condition key s3:x-amz-server-side-encryption can deny uploads without encryption.

Why this answer

Option D is correct because using a bucket policy that denies PutObject if the object is not encrypted with SSE-S3, SSE-KMS, or SSE-C ensures encryption is enforced. Option A (default encryption) encrypts objects that don't have encryption specified, but it does not prevent unencrypted uploads if the client overrides. Option B (IAM policy) is cumbersome and not bucket-specific.

Option C (pre-signed URLs) does not enforce encryption.

Practice this question →

26

Multi-Selecthard

A Lambda function reading from Kinesis is falling behind. Which two metrics/settings should be reviewed first?

Select 2 answers

A.IteratorAge for the event source mapping

B.S3 bucket public access settings

C.Route 53 hosted zone count

D.Batch size, parallelization factor, and shard count

AnswersA, D

Correct for the stated requirement.

Why this answer

The IteratorAge metric measures how far behind the Lambda function is in processing records from the Kinesis stream. A high IteratorAge indicates the function is falling behind, making it the primary metric to review. The batch size, parallelization factor, and shard count directly control the concurrency and throughput of the event source mapping, so adjusting these settings can help catch up.

Exam trap

The trap here is that candidates may overlook the direct performance-tuning metrics (IteratorAge, batch size, parallelization factor) and instead focus on unrelated AWS services like S3 or Route 53, which are red herrings in this troubleshooting context.

Practice this question →

27

MCQeasy

A developer is deploying a CloudFormation stack and sees the event above. What should the developer do to fix the error?

A.Update the Lambda function code to use a different programming language.

B.Increase the Lambda function timeout in the template.

C.Change the runtime to a supported version like nodejs18.x.

D.Add permissions to the Lambda function's execution role.

AnswerC

Updating the runtime to a supported version resolves the error.

Why this answer

The error indicates that the Node.js 12.x runtime is deprecated. The developer should update the runtime to a supported version, such as nodejs18.x, in the CloudFormation template and redeploy. Option A is wrong because the timeout is not the issue.

Option B is wrong because the Lambda function code may be fine; the runtime is the problem. Option C is wrong because adding permissions won't fix the unsupported runtime.

Practice this question →

28

MCQeasy

An application running on Amazon ECS with Fargate is unable to pull an image from Amazon ECR. The task definition uses the 'default' task execution role. What is the most likely cause?

A.The task role does not have permissions to access ECR.

B.The ECS cluster does not have permissions to access ECR.

C.The ECS service role does not have permissions to access ECR.

D.The task execution role does not have permissions to pull from ECR.

AnswerD

The execution role needs ecr:GetDownloadUrlForLayer and ecr:BatchGetImage permissions.

Why this answer

Option D is correct because when using Amazon ECS with Fargate, the task execution role (not the task role) is responsible for pulling container images from Amazon ECR. The 'default' task execution role is created automatically but lacks the necessary permissions (e.g., ecr:GetDownloadUrlForLayer, ecr:BatchGetImage, and ecr:BatchCheckLayerAvailability) unless explicitly attached via an IAM policy. Since the question states the task definition uses the 'default' task execution role, the most likely cause is that this role does not have the required ECR permissions.

Exam trap

The trap here is that candidates often confuse the task execution role with the task role, assuming the task role handles all permissions including image pulling, when in fact the task execution role is a separate IAM role specifically required for ECR image pulls and CloudWatch Logs.

How to eliminate wrong answers

Option A is wrong because the task role is used by the application code running inside the container to interact with AWS services (e.g., DynamoDB, S3), not for pulling images from ECR; image pulling is handled by the ECS agent using the task execution role. Option B is wrong because an ECS cluster itself does not have an IAM role or permissions; permissions are assigned to the task execution role or the ECS service role, not to the cluster resource. Option C is wrong because the ECS service role (formerly ecsServiceRole) is used for actions like registering/deregistering targets with a load balancer, not for pulling container images from ECR; image pulling is exclusively the responsibility of the task execution role.

Practice this question →

29

MCQmedium

A developer is troubleshooting an AWS Lambda function that is triggered by an Amazon SQS queue. The function processes messages but occasionally fails. The failed messages are not being sent to the dead-letter queue (DLQ). What is the most likely reason?

A.The Lambda function's execution role does not have permission to send messages to the DLQ.

B.The SQS queue's redrive policy is not configured.

C.The Lambda function's reserved concurrency is set to 0.

D.The Lambda function does not have a dead-letter queue configured.

AnswerD

Lambda's DLQ must be explicitly configured to capture failed events.

Why this answer

For Lambda with SQS, a DLQ can be configured on the Lambda function or on the SQS source. If the Lambda function's DLQ is not configured, messages that fail after the maximum retries are discarded. The SQS queue's own DLQ (redrive policy) applies only if the message is not processed after the maxReceiveCount.

But Lambda's DLQ is separate. Option A is wrong because the SQS queue's DLQ is for when messages are not deleted after processing; Lambda deletes messages on success. Option B is wrong because the Lambda function's failure handling does not automatically send to DLQ unless configured.

Option C is wrong because the Lambda function's reserved concurrency does not affect DLQ. Option D is correct: the Lambda function's DLQ must be explicitly configured.

Practice this question →

30

MCQeasy

A developer receives an alert that an EC2 instance's status check fails. The instance is running, but the developer cannot SSH into it. What is the most likely cause?

A.The security group inbound rules are incorrect.

B.The instance has been terminated.

C.There is a problem with the underlying host.

D.The key pair is missing.

AnswerC

Instance status check failure indicates host issues.

Why this answer

A status check failure on a running EC2 instance indicates that the instance is passing the system status check (i.e., the underlying host is operational) but failing the instance status check, which tests network connectivity, OS responsiveness, and file system integrity. Since the instance is running but SSH is unavailable, the most likely cause is a problem with the underlying physical host (e.g., hardware degradation, network connectivity loss at the hypervisor level) that prevents the instance from responding to network traffic, even though the instance state shows as 'running'.

Exam trap

The trap here is that candidates confuse a status check failure with a network configuration issue (security group) or authentication problem (key pair), but status checks are internal health probes that do not depend on security group rules or SSH keys.

How to eliminate wrong answers

Option A is wrong because incorrect security group inbound rules would block SSH traffic at the network level, but the instance would still pass its status checks (status checks test the instance's internal health, not external network access). Option B is wrong because if the instance were terminated, it would not appear as 'running' in the console and the status check would show 'insufficient data' or 'not applicable', not a failure. Option D is wrong because a missing key pair would prevent SSH authentication, but the instance would still be reachable and pass status checks; the key pair is only used for authentication, not for instance health or network connectivity.

Practice this question →

31

MCQmedium

A developer monitors an AWS Lambda function that processes messages from an Amazon SQS queue. CloudWatch logs show that the function's execution time has increased significantly over the past week. The function's code has not been changed recently. The function makes calls to an Amazon DynamoDB table. CloudWatch metrics show a high rate of DynamoDBProvisionedThroughputExceededException errors. The DynamoDB table has 5 read and 5 write capacity units (RCU/WCU). What is the most effective action to reduce the function's execution time?

A.Increase the Lambda function's memory allocation.

B.Increase the Lambda function's reserved concurrency.

C.Increase the DynamoDB table's read and write capacity units.

D.Increase the Lambda function's timeout.

AnswerC

Raising the provisioned capacity reduces the frequency of throttling exceptions. With fewer throttles, the function's retries decrease, leading to faster execution and lower overall latency.

Why this answer

Option C is correct because the high rate of DynamoDBProvisionedThroughputExceededException errors indicates that the Lambda function is being throttled by DynamoDB due to insufficient read and write capacity units. This throttling causes the function to retry operations, significantly increasing execution time. Increasing the RCU/WCU from 5 to a higher value directly addresses the bottleneck, allowing operations to complete without retries and reducing overall execution time.

Exam trap

The trap here is that candidates often confuse performance issues caused by Lambda resource limits (memory, concurrency, timeout) with downstream service throttling, leading them to adjust Lambda settings instead of addressing the root cause in DynamoDB capacity.

How to eliminate wrong answers

Option A is wrong because increasing memory allocation improves CPU performance and execution speed for compute-bound tasks, but the issue here is a DynamoDB throughput limitation, not a lack of compute resources. Option B is wrong because reserved concurrency controls how many concurrent Lambda invocations are allowed, which does not affect the per-invocation execution time or resolve DynamoDB throttling errors. Option D is wrong because increasing the timeout only allows the function to run longer before being terminated, but it does not reduce the actual time taken to process each message; the function will still be delayed by DynamoDB retries.

Practice this question →

32

MCQeasy

A developer reports that an AWS Lambda function is timing out after 3 seconds. The function reads from an Amazon SQS queue. What is the most likely cause?

A.The Lambda function memory is set too low, causing slow execution.

B.The Lambda function timeout is set to 3 seconds, which is too low.

C.The Lambda execution role lacks permissions to poll SQS.

D.The SQS queue is empty, causing the function to wait indefinitely.

AnswerB

The default timeout is 3 seconds, increasing it resolves the timeout.

Why this answer

The Lambda function is timing out after exactly 3 seconds because its configured timeout is set to 3 seconds, which is too low for the workload. Lambda has a maximum execution timeout of 15 minutes (900 seconds), but the default timeout is 3 seconds. Since the function reads from an SQS queue, it likely needs more time to process messages, and increasing the timeout value will resolve the issue.

Exam trap

The trap here is that candidates often confuse timeout with memory or permissions issues, but the exact 3-second timeout is a direct indicator of the default Lambda timeout being too low, not a resource or authorization problem.

How to eliminate wrong answers

Option A is wrong because low memory can cause slower execution, but it would not cause a hard timeout at exactly 3 seconds; memory affects performance, not the timeout limit. Option C is wrong because if the execution role lacked permissions to poll SQS, the function would fail with an access denied error (e.g., 403 or 500), not a timeout. Option D is wrong because an empty SQS queue does not cause a Lambda function to wait indefinitely; Lambda polls the queue and returns immediately if no messages are available, and the function would complete quickly without timing out.

Practice this question →

33

MCQeasy

A developer is using Amazon DynamoDB as the database for a web application. The application experiences occasional spikes in traffic, and some write requests fail with a ProvisionedThroughputExceededException. What is the MOST cost-effective way to handle these spikes without manual intervention?

A.Switch to on-demand mode for the table.

B.Enable DynamoDB auto scaling for the table.

C.Increase the provisioned write capacity to the peak expected value.

D.Use DynamoDB Accelerator (DAX) to cache writes.

AnswerB

Auto scaling adjusts capacity dynamically.

Why this answer

Option A is correct because DynamoDB auto scaling automatically adjusts capacity based on traffic, handling spikes without manual intervention. Option B is wrong because it increases cost continuously. Option C is wrong because it requires manual changes.

Option D is wrong because it does not help with write throughput.

Practice this question →

34

Multi-Selectmedium

A developer is troubleshooting a slow web application. The application uses an Application Load Balancer, EC2 instances, and an RDS database. The developer suspects the database is the bottleneck. Which TWO CloudWatch metrics should the developer examine to confirm this? (Select TWO.)

Select 2 answers

A.DatabaseConnections

B.RequestCount

C.NetworkIn

D.ReadIOPS and WriteIOPS

E.CPUUtilization

AnswersA, D

High connection count can indicate the database is overwhelmed.

Why this answer

Options B and D are correct. DatabaseConnections shows current connections; ReadIOPS and WriteIOPS show disk I/O, which can cause slowness. Option A (CPUUtilization) is a general metric, but for database bottleneck, I/O and connections are more indicative.

Option C (NetworkIn) is more about network load. Option E (RequestCount) is for the ALB, not the database.

Practice this question →

35

Multi-Selectmedium

A company uses AWS Lambda functions that are invoked by an Amazon S3 bucket notification. The function sometimes fails with a 'ResourceNotFoundException' for the S3 bucket. Which THREE steps should the developer take to resolve the issue?

Select 3 answers

A.Ensure the S3 bucket has versioning enabled.

B.Check the S3 bucket notification configuration for the correct Lambda function ARN.

C.Verify that the Lambda execution role has permissions to read from the S3 bucket.

D.Increase the Lambda function timeout.

E.Confirm that the S3 bucket exists and is in the same region as the Lambda function.

AnswersB, C, E

Incorrect ARN can cause the function not to be invoked.

Why this answer

Option B is correct because the 'ResourceNotFoundException' indicates that the Lambda function ARN specified in the S3 bucket notification configuration is incorrect or does not match the actual function ARN. S3 uses the notification configuration to invoke the Lambda function, and if the ARN is malformed, outdated, or points to a deleted function, S3 will throw this error. Verifying and correcting the ARN in the S3 event notification ensures S3 can successfully invoke the Lambda function.

Exam trap

The trap here is that candidates often confuse 'ResourceNotFoundException' with a permissions issue (Option C) or a timeout issue (Option D), but the error specifically indicates the target resource (the Lambda function) is not found, not that access is denied or execution is slow.

Practice this question →

36

MCQhard

A web application runs on Amazon EC2 instances behind an Application Load Balancer (ALB). During peak hours, users report receiving HTTP 503 (Service Unavailable) errors. The developer checks Amazon CloudWatch metrics and finds that the ALB's request count is high but below the limit, and the target group's healthy host count drops to zero intermittently. The Auto Scaling group for the instances is configured with a minimum of 2, maximum of 10, and a simple scaling policy to add 2 instances when CPU utilization exceeds 70% for 5 consecutive minutes. What is the most likely cause of the 503 errors?

A.The Auto Scaling group's cooldown period prevents new instances from being added quickly enough during rapid traffic spikes

B.The ALB's idle timeout is set too low, causing dropped connections

C.The Auto Scaling group's maximum capacity of 10 is insufficient

D.The health check grace period is preventing instances from being marked healthy

AnswerA

After a scaling activity, the cooldown period (300s by default) pauses further scaling, causing delays that can result in all instances becoming unhealthy and returning 503 errors.

Why this answer

The 503 errors occur because the simple scaling policy has a cooldown period (default 300 seconds) that prevents the Auto Scaling group from launching new instances during rapid traffic spikes. When CPU exceeds 70% for 5 minutes, the policy adds 2 instances, but the cooldown blocks further scaling actions until it expires, even if the newly launched instances are still initializing and the healthy host count drops to zero. This mismatch between traffic demand and scaling responsiveness causes the ALB to have no healthy targets, resulting in 503 errors.

Exam trap

The trap here is that candidates often assume 503 errors are always due to capacity limits (Option C) or misconfigured health checks (Option D), but the real issue is the cooldown period's impact on scaling responsiveness during rapid traffic spikes.

How to eliminate wrong answers

Option B is wrong because the ALB's idle timeout (default 60 seconds) controls how long the ALB keeps a connection open without data transfer; it does not cause 503 errors or affect target health status. Option C is wrong because the maximum capacity of 10 is not the issue—the healthy host count drops to zero intermittently, indicating a scaling responsiveness problem, not a capacity ceiling. Option D is wrong because the health check grace period (default 300 seconds) delays the start of health checks for newly launched instances, but it does not cause healthy hosts to drop to zero; it only postpones marking them healthy, which would not explain intermittent drops in an already-running group.

Practice this question →

37

MCQmedium

An IAM user has the above policy attached. The user tries to stop an EC2 instance. What happens?

A.The stop operation fails because ec2:StopInstances is not allowed explicitly.

B.The stop operation fails because of the Deny statement.

C.The stop operation fails because the policy is invalid (conflict).

D.The stop operation succeeds.

AnswerD

Allow for StopInstances, no Deny.

Why this answer

Option A is correct because the Allow statement explicitly allows StopInstances for all resources. The Deny only applies to TerminateInstances, not StopInstances. Option B is incorrect because there is no Deny for StopInstances.

Option C is incorrect because StopInstances is allowed. Option D is incorrect because the policy is valid.

Practice this question →

38

MCQhard

An application running on EC2 instances behind an Application Load Balancer (ALB) occasionally returns HTTP 503 errors. The instances are in an Auto Scaling group. Which action should be taken to resolve this issue?

A.Enable cross-zone load balancing on the ALB.

B.Review the ALB access logs to identify the target response codes.

C.Increase the ALB idle timeout setting.

D.Increase the size of the EC2 instances.

AnswerB

Access logs show whether the 503 is from targets or the ALB, guiding further action.

Why this answer

HTTP 503 errors from an ALB indicate that the targets (EC2 instances) are not responding successfully. Reviewing ALB access logs reveals the specific target response codes (e.g., 503 from the target itself or connection timeouts), which helps pinpoint whether the issue is due to overloaded instances, application errors, or health check failures. This diagnostic step is essential before making any configuration changes.

Exam trap

The trap here is that candidates often jump to scaling or instance size changes (Option D) without first using access logs to diagnose whether the 503s originate from the ALB or the targets, leading to ineffective fixes.

How to eliminate wrong answers

Option A is wrong because cross-zone load balancing is enabled by default on ALBs and affects traffic distribution across Availability Zones, not the root cause of 503 errors from unresponsive targets. Option C is wrong because the ALB idle timeout setting controls how long the ALB keeps a connection open without data transfer; increasing it does not resolve 503 errors caused by target failures or overload. Option D is wrong because simply increasing EC2 instance size may mask the problem but does not address the underlying cause (e.g., application bugs, scaling policies, or health check misconfigurations) and could lead to unnecessary cost.

Practice this question →

39

MCQmedium

Refer to the exhibit. A developer invoked a Lambda function and received this response. What does the FunctionError field indicate?

A.The function executed successfully.

B.The function threw an unhandled exception.

C.The function was throttled.

D.The function timed out.

AnswerB

Unhandled means the error was not caught.

Why this answer

Option B is correct because FunctionError: Unhandled indicates that the function threw an exception that was not caught by the code. Option A is wrong because StatusCode 200 means invocation succeeded. Option C is wrong because throttling would return 429.

Option D is wrong because configuration errors would return 400.

Practice this question →

40

MCQmedium

A developer notices that an AWS Lambda function configured with a VPC is timing out when trying to access an Amazon S3 bucket. The function has the necessary IAM permissions. What is the most likely cause?

A.Lambda functions cannot be configured inside a VPC.

B.The Lambda function's execution role lacks S3 permissions.

C.The Lambda function does not have a route to the internet or a VPC endpoint for S3.

D.The security group attached to the Lambda function does not allow outbound traffic to S3.

AnswerC

Without a NAT gateway/instance or VPC endpoint, the function cannot reach S3 over the internet.

Why this answer

A VPC-enabled Lambda function has no internet access unless configured with a NAT Gateway/Instance or VPC Endpoint. To access S3, the function either needs a VPC endpoint for S3 or a route to the internet via a NAT device. The IAM permissions are not the issue.

Option A is wrong because the timeout is not due to IAM. Option B is wrong because security group rules would affect different resources. Option C is correct.

Option D is wrong because Lambda can be in a VPC but needs proper routing.

Practice this question →

41

MCQmedium

A developer deploys an application on EC2 instances behind an Application Load Balancer (ALB). The application uses sticky sessions (session affinity) based on a cookie. Users report that they are intermittently logged out during their session. What is the MOST likely cause?

A.The deregistration delay value is too low, causing connections to be dropped during scaling events.

B.The ALB health check interval is too short, causing healthy instances to be marked unhealthy frequently.

C.Cross-zone load balancing is disabled, causing uneven traffic distribution.

D.The stickiness cookie expiration duration is set too low, causing the cookie to expire before the user's session ends.

AnswerD

Short cookie duration causes loss of stickiness.

Why this answer

Option C is correct because if the stickiness cookie duration is shorter than the user's session, the load balancer may route the user to a different instance, losing session state. Option A is wrong because cross-zone load balancing distributes traffic but does not affect stickiness. Option B is wrong because deregistration delay affects instance draining, not stickiness.

Option D is wrong because health checks do not remove cookies.

Practice this question →

42

MCQhard

A company runs a production web application on EC2 instances behind an Application Load Balancer. Users report intermittent 502 errors. The developers find that the ALB access logs show 'target_response_code' of 502 for some requests. What is the MOST likely cause?

A.The EC2 instances are unable to resolve DNS for the ALB.

B.The security group for the EC2 instances is blocking traffic from the ALB.

C.The EC2 instances are closing idle connections prematurely due to a short keep-alive timeout.

D.The ALB health checks are failing and the target group has unhealthy instances.

AnswerC

If the EC2 instance closes the connection before the ALB finishes sending the request, the ALB returns a 502.

Why this answer

Option B is correct because a 502 from ALB indicates the target (EC2) closed the connection before the ALB could finish writing the request or reading the response. Option A is wrong because security groups blocking traffic would result in 503 or timeout. Option C is wrong because ALB health checks failing would cause 503.

Option D is wrong because DNS resolution is not involved in ALB-to-target communication.

Practice this question →

43

MCQhard

A developer is using AWS CodePipeline to automate a multi-stage pipeline. The pipeline includes a manual approval step before deploying to production. The developer wants to receive an email notification when the pipeline reaches the approval step. Which service should the developer use?

A.Configure CodePipeline to send an email using the 'Email' action

B.Use Amazon CloudWatch Logs to monitor the pipeline logs and trigger an alarm

C.Use Amazon Simple Email Service (SES) to send an email from the pipeline

D.Use Amazon CloudWatch Events to detect the pipeline state change and trigger an SNS notification

AnswerD

CloudWatch Events can monitor pipeline stage transitions and publish to an SNS topic.

Why this answer

Option B is correct because CloudWatch Events (EventBridge) can detect the pipeline state change (e.g., stage execution state changed to 'waiting') and trigger an SNS topic to send email. Option A is wrong because CodePipeline does not directly send email notifications. Option C is wrong because CloudWatch Logs is for logging, not notifications.

Option D is wrong because SES is for sending emails directly, but CloudWatch Events with SNS is the standard approach.

Practice this question →

44

MCQhard

A developer is using Amazon S3 Transfer Acceleration to upload a large file. The upload is slower than expected. Which metric should the developer check to determine if Transfer Acceleration is providing a benefit?

A.CloudWatch metric 'Requests' for the S3 bucket.

B.CloudWatch metric 'BytesDownloaded' for the S3 bucket.

C.CloudWatch metric 'TotalRequestLatency' for the S3 bucket.

D.CloudWatch metric 'FirstByteLatency' for the S3 bucket.

AnswerC

This metric shows the time for a complete upload, indicating acceleration benefit.

Why this answer

TotalRequestLatency measures the time taken for a complete S3 request, including the time to send the request to the S3 endpoint and receive the response. For S3 Transfer Acceleration, this metric reflects the end-to-end latency improvement achieved by routing traffic through AWS edge locations and the optimized network path. A lower TotalRequestLatency compared to a non-accelerated upload indicates that Transfer Acceleration is providing a benefit.

Exam trap

The trap here is confusing 'FirstByteLatency' (which measures initial response time for reads) with 'TotalRequestLatency' (which captures the full upload duration), leading candidates to incorrectly select D when they should focus on the end-to-end time for uploads.

How to eliminate wrong answers

Option A is wrong because the 'Requests' metric simply counts the number of requests made to the bucket, which does not indicate whether Transfer Acceleration is improving upload speed. Option B is wrong because 'BytesDownloaded' tracks data downloaded from the bucket, not uploaded, and is irrelevant to upload acceleration. Option D is wrong because 'FirstByteLatency' measures the time to receive the first byte of a response, which is more relevant to read operations and does not capture the total upload duration that Transfer Acceleration optimizes.

Practice this question →

45

MCQmedium

A developer has an AWS Lambda function that processes messages from an Amazon SQS queue. The function is configured with a reserved concurrency of 5. Recently, the SQS queue has experienced a high volume of messages, and the developer notices that many invocations are being throttled, leading to increased processing time. What is the most likely cause of the throttling?

A.The function's execution role lacks permissions to invoke the function.

B.The reserved concurrency is too low, causing SQS to throttle Lambda invocations.

C.The SQS queue visibility timeout is set too high.

D.The Lambda function has a VPC configuration that causes cold starts.

AnswerB

Reserved concurrency caps the number of concurrent executions. When the queue has many messages, SQS tries to invoke Lambda concurrently, but if the reserved limit is reached, invocations are throttled, delaying processing.

Why this answer

The correct answer is B because reserved concurrency limits the maximum number of concurrent executions for a Lambda function. When the SQS queue has a high volume of messages, Lambda attempts to scale up to process them, but with a reserved concurrency of 5, it can only run 5 concurrent invocations. Any additional invocation requests are throttled with a 429 error, causing messages to remain in the queue and increasing processing time.

Exam trap

The trap here is that candidates often confuse throttling (due to concurrency limits) with cold starts (due to VPC or initialization delays), or they mistakenly think that SQS itself throttles Lambda invocations rather than understanding that Lambda's reserved concurrency is the bottleneck.

How to eliminate wrong answers

Option A is wrong because the execution role's permissions affect whether the function can access other AWS services (like SQS or CloudWatch), not whether Lambda itself can invoke the function; invocation permissions are controlled by the resource-based policy or the SQS trigger configuration. Option C is wrong because a high visibility timeout would cause messages to become invisible for longer after being polled, potentially leading to duplicate processing or delays, but it does not cause throttling of Lambda invocations. Option D is wrong because VPC configuration can cause cold starts due to ENI creation delays, but cold starts affect latency on the first invocation, not throttling due to concurrency limits.

Practice this question →

46

MCQhard

A developer optimized an Amazon S3 bucket for high request rates. The bucket receives over 5,000 PUT requests per second. Recently, some requests are failing with a 503 Slow Down error. What is the most likely cause and how should the developer fix it?

A.Use multipart upload for all objects to improve throughput.

B.The request rate exceeds the account-level PUT quota; request a quota increase.

C.The bucket policy is too permissive; restrict access to prevent abuse.

D.Add a random prefix to the object keys to distribute across partitions.

AnswerD

Random prefixes increase partition count, reducing 503 errors.

Why this answer

Option D is correct because S3 returns 503 when request rates exceed partition limits. Prefix randomization spreads requests across partitions. Option A is wrong because 503 is not due to permissions.

Option B is wrong because 503 is not a quota limit exceeded error (that would be 400). Option C is wrong because multipart upload is for large objects, not rate limits.

Practice this question →

47

MCQhard

A developer is using Amazon CloudFront to distribute content from an S3 bucket. The bucket is configured as an origin with Origin Access Control (OAC). Recently, some users have reported that they receive 403 Forbidden errors when accessing certain objects. The developer checks the CloudFront distribution and confirms that the OAC is set up correctly. The S3 bucket policy allows the CloudFront service principal to get objects. The developer also notes that the objects in question have been updated recently. What is the MOST likely cause of the 403 errors?

A.The objects are encrypted with SSE-C (server-side encryption with customer-provided keys).

B.The OAC configuration is not correctly associated with the CloudFront distribution.

C.The S3 bucket policy denies access to the CloudFront service principal.

D.The CloudFront distribution is configured to use the S3 website endpoint instead of the REST endpoint.

AnswerA

CloudFront cannot decrypt objects encrypted with SSE-C, resulting in 403 errors.

Why this answer

Option D is correct because CloudFront may be serving cached stale objects from edge locations. If the object is updated but the cache TTL has not expired, CloudFront serves the old object. However, the question says users receive 403 errors, not old content.

Actually, 403 errors could occur if the object permissions changed. But more likely, the issue is that the new objects have different permissions or the bucket policy does not cover the new objects' paths. Option A is wrong because OAC is correctly configured.

Option B is wrong because the CloudFront origin is not a website endpoint. Option C is wrong because the bucket policy allows access. The most plausible cause is that the objects were updated and the S3 bucket policy has a condition that restricts access based on a header that the new objects do not have.

But given typical scenarios, the correct answer is often that the objects are not publicly accessible? However, with OAC, objects do not need to be public. Actually, if the objects were uploaded with a bucket policy that denies access to everyone except CloudFront, but the objects were uploaded with a different owner? Wait. The correct answer is likely D: The objects were uploaded with a different AWS account? No.

Let's think: The most common cause is that the object's permissions were set to private and the bucket policy only grants access to CloudFront, but if the object is owned by a different account, the bucket policy might not apply. However, that is complex. A simpler explanation: The objects were updated and the bucket policy includes a condition that requires a specific header that the new objects' requests do not have.

But the question says the objects in question have been updated recently. The best answer is that the objects were updated with a different encryption key? Actually, the correct answer is: The objects were uploaded with SSE-C (customer-provided encryption keys) and CloudFront cannot access them because it does not have the encryption key. Option A: The OAC configuration is incorrect.

Option B: The CloudFront distribution is using the S3 website endpoint. Option C: The bucket policy denies access to the CloudFront service principal. Option D: The objects are encrypted with SSE-C.

That is a known issue: CloudFront cannot serve objects encrypted with SSE-C. So Option D is correct.

Practice this question →

48

MCQhard

A company uses AWS CodePipeline with CodeBuild to deploy a Node.js application. The build fails intermittently with 'npm ERR! network' errors. What is the most likely cause and solution?

A.A unit test is failing; fix the test code.

B.The npm cache is corrupted; clear the cache in CodeBuild.

C.The build environment lacks outbound internet access; configure a NAT gateway or use a VPC endpoint for npm.

D.The npm token has expired; regenerate the token.

AnswerC

Network errors indicate connectivity issues.

Why this answer

Option A is correct because CodeBuild may have limited outbound internet access; using a NAT gateway or VPC endpoint for npm registry resolves this. Option B is incorrect because failing test does not cause network errors. Option C is incorrect because CodeBuild does not have npm cache.

Option D is incorrect because npm token expiry returns 401, not network errors.

Practice this question →

49

Multi-Selecteasy

A developer is troubleshooting a slow Amazon RDS for MySQL database. The application experiences high latency on write operations. Which TWO actions can improve write performance?

Select 2 answers

A.Add a read replica to offload read traffic.

B.Increase the allocated storage size to get better I/O performance.

C.Enable deletion protection.

D.Increase the DB instance class to a larger size.

E.Enable Multi-AZ deployment for high availability.

AnswersB, D

More storage can provide better I/O throughput due to burst credits.

Why this answer

Option B is correct because increasing the allocated storage size for an Amazon RDS for MySQL instance can improve I/O performance by providing a higher baseline IOPS rate. Larger volumes in RDS, especially those using General Purpose SSD (gp2/gp3), have higher throughput and IOPS limits, which directly reduces write latency under heavy load.

Exam trap

The trap here is that candidates often confuse high availability (Multi-AZ) or read scaling (read replicas) with performance improvements for write operations, but neither addresses the underlying I/O or compute bottleneck causing write latency.

Practice this question →

50

MCQmedium

A developer runs the AWS CLI command shown in the exhibit. The output includes 'FunctionError': 'Unhandled'. What does this indicate?

A.The function threw an error that was caught by the code.

B.The function timed out.

C.The function threw an unhandled exception.

D.The function was not invoked successfully.

AnswerC

Unhandled means an uncaught error occurred.

Why this answer

Option C is correct because 'Unhandled' means the function returned an error that was not caught by the code. Option A is wrong because StatusCode 200 indicates successful invocation, not failure to invoke. Option B is wrong because a handled error would show 'Handled'.

Option D is wrong because a timeout would be a different error.

Practice this question →

51

MCQhard

Refer to the exhibit. The developer runs the AWS CLI command to invoke a Lambda function. The output shows 'FunctionError': 'Unhandled'. What should the developer do to get more details about the error?

A.Re-invoke the function with a valid payload because the error is due to invalid input.

B.Enable AWS X-Ray tracing on the function.

C.Decode the base64-encoded 'LogResult' field to view the log output.

D.Check the CloudWatch Logs for the function's log group.

AnswerC

The 'LogResult' contains the base64-encoded log; decoding it shows the error.

Why this answer

The 'LogResult' field contains a base64-encoded log. Decoding it provides the error details. Option A is correct.

Option B is wrong because 'FunctionError' indicates an error, not success. Option C is wrong because CloudWatch Logs is another option but not directly from the CLI output. Option D is wrong because X-Ray traces requests but not errors.

Practice this question →

52

MCQhard

A developer is using Amazon API Gateway with a Lambda authorizer to control access to APIs. The authorizer is failing with a 500 error. The Lambda function logs show 'User: arn:aws:iam::123456789012:role/MyLambdaRole is not authorized to perform: sts:AssumeRole'. What is the most likely cause?

A.The Lambda authorizer is not returning a valid policy.

B.The Lambda function's resource-based policy is missing.

C.The API Gateway does not have permission to invoke the Lambda function.

D.The Lambda function's execution role does not have sts:AssumeRole permission for the target role.

AnswerD

The error shows the Lambda function's role is trying to assume another role but lacks permission.

Why this answer

The Lambda function's execution role needs permission to call sts:AssumeRole for the authorizer's role? Actually, the error indicates that the Lambda function's role is trying to assume another role (perhaps the authorizer's role) but is not allowed. In API Gateway Lambda authorizer, the Lambda function is invoked directly by API Gateway; the function does not need to assume a role unless it calls other services. The error may be from within the Lambda code trying to assume a role.

Option A is wrong because invoke permission is for API Gateway to invoke Lambda, not for Lambda to assume role. Option B is wrong because the Lambda authorizer does not need to assume a role by default. Option C is correct: the Lambda function's code is attempting to assume an IAM role (maybe to call another service) but the function's execution role lacks sts:AssumeRole permission for that role.

Option D is wrong because the Lambda function's resource-based policy is for cross-account access, not for assuming roles.

Practice this question →

53

Multi-Selecteasy

A web application running on Amazon EC2 instances behind an Application Load Balancer (ALB) is experiencing intermittent 503 errors. Which TWO steps should be taken to diagnose the issue?

Select 2 answers

A.Check the Route 53 health checks for the domain.

B.Check the CPU utilization of the EC2 instances.

C.Check the target group health check settings and instance health status.

D.Check the security group rules for the ALB.

E.Check the EBS volume type of the EC2 instances.

AnswersB, C

High CPU can cause instances to fail health checks and return 503.

Why this answer

Option B is correct because high CPU utilization on EC2 instances can cause them to become unresponsive or fail to respond to health checks within the ALB's configured timeout, leading to 503 errors. The ALB routes traffic only to healthy targets; if instances are overwhelmed, they may fail health checks or drop requests, resulting in a 503 response to clients.

Exam trap

The trap here is that candidates may confuse Route 53 health checks (DNS-level) with ALB target group health checks (application-level), or assume that security groups or EBS volumes are the root cause of HTTP 503 errors when they are not directly related to load balancer routing failures.

Practice this question →

54

Multi-Selecteasy

A developer is troubleshooting a slow RDS MySQL instance. Which TWO metrics in Amazon CloudWatch should the developer examine first?

Select 2 answers

A.NetworkReceiveThroughput

B.SwapUsage

C.CPUUtilization

D.FreeStorageSpace

E.ReadLatency

AnswersC, E

High CPU indicates a problem.

Why this answer

Options A and E are correct because high CPU and high ReadLatency are common indicators of database performance issues. Option B is wrong because FreeStorageSpace is for storage capacity, not performance. Option C is wrong because NetworkReceiveThroughput is for network.

Option D is wrong because SwapUsage is less common for RDS.

Practice this question →

55

MCQmedium

An application running on Amazon ECS with Fargate is experiencing high latency. The application writes logs to Amazon CloudWatch Logs. Which AWS service can be used to analyze the logs to pinpoint the cause of the latency?

A.Amazon CloudWatch Logs

B.Amazon CloudWatch Logs Insights

C.AWS X-Ray

D.Amazon S3

AnswerB

Logs Insights allows querying and analyzing logs to identify latency causes.

Why this answer

Amazon CloudWatch Logs Insights is the correct choice because it is purpose-built for interactively querying and analyzing log data stored in CloudWatch Logs. It allows you to run SQL-like queries (using a query language) to filter, aggregate, and visualize log events, which is essential for pinpointing latency patterns, such as slow API calls or database queries, without needing to export logs to another service.

Exam trap

The trap here is that candidates confuse CloudWatch Logs (storage/monitoring) with CloudWatch Logs Insights (query/analysis), assuming the former can perform deep log analysis, when in fact it only supports basic metric filters and real-time monitoring.

How to eliminate wrong answers

Option A is wrong because Amazon CloudWatch Logs itself is a log storage and monitoring service, not a query engine; it can only view raw log streams or set metric filters, not perform ad-hoc analytical queries to diagnose latency. Option C is wrong because AWS X-Ray is a distributed tracing service that traces requests through microservices, but it does not analyze CloudWatch Logs; it uses its own trace data and segments, not log files. Option D is wrong because Amazon S3 is an object storage service; while logs can be exported to S3, it provides no built-in querying capability for log analysis without additional services like Athena.

Practice this question →

56

MCQhard

An application running on Amazon ECS (Fargate) uses an Application Load Balancer (ALB) with connection draining enabled. The application is experiencing intermittent 502 (Bad Gateway) errors during rolling updates of the ECS service. The developer notices that the ALB is routing requests to tasks that are in the 'Draining' state. The ECS service is configured with a deployment circuit breaker that automatically rolls back a failed deployment. What is the most likely cause of the 502 errors?

A.The ALB's idle timeout is too short, causing connections to be dropped before the application responds.

B.The ALB's connection draining timeout is set to 0 seconds, causing connections to be dropped immediately when deregistering targets.

C.The ECS deployment circuit breaker is incorrectly configured to roll back on health check failures.

D.The application is not handling the SIGTERM signal from ECS, causing it to terminate abruptly while the ALB still routes traffic to it.

AnswerD

When ECS stops a task, it sends a SIGTERM signal to allow the application to gracefully shut down. If the application does not catch this signal and stop accepting new connections or complete in-flight requests before exiting, the ALB may still send traffic to the task after it stops, resulting in 502 errors. This is a common issue during rolling updates.

Why this answer

Option D is correct because when ECS sends a SIGTERM signal to a Fargate task during a rolling update, the task is expected to gracefully shut down. If the application does not handle SIGTERM, it terminates immediately, but the ALB may still have the task registered as a target and continue routing requests to it. Since the task is already dead or unresponsive, the ALB receives no valid HTTP response and returns a 502 Bad Gateway error.

Connection draining is enabled, but it only works if the task signals the ALB that it is deregistering; without proper SIGTERM handling, the task dies before the draining process completes.

Exam trap

The trap here is that candidates often assume connection draining is a silver bullet that prevents all errors during rolling updates, but they overlook that the application must handle SIGTERM to allow the draining process to work as intended.

How to eliminate wrong answers

Option A is wrong because the ALB's idle timeout (default 60 seconds) controls how long the ALB keeps a connection open without data transfer; it does not cause 502 errors during rolling updates, as 502s stem from the target not responding, not from idle timeouts. Option B is wrong because setting connection draining timeout to 0 seconds would cause immediate deregistration, which would prevent routing to draining tasks, not cause 502 errors; the problem here is that tasks are still receiving traffic while draining, which is the opposite scenario. Option C is wrong because the deployment circuit breaker rolls back the entire deployment on health check failures, but it does not cause 502 errors during the update; it is a recovery mechanism, not a root cause of the errors.

Practice this question →

57

MCQeasy

A developer receives an AccessDeniedException when trying to invoke a Lambda function from an Amazon API Gateway REST API. The Lambda resource-based policy allows API Gateway. What is the most likely issue?

A.The Lambda function is in a VPC without a NAT gateway.

B.The Lambda function concurrency limit is exceeded.

C.The API Gateway execution role lacks lambda:InvokeFunction permission.

D.API Gateway caching is enabled, returning stale responses.

AnswerC

Execution role needs invoke permission.

Why this answer

Option C is correct because when API Gateway invokes a Lambda function, it must have the `lambda:InvokeFunction` permission. Even if the Lambda resource-based policy grants API Gateway access, the API Gateway execution role must also explicitly allow the `lambda:InvokeFunction` action. Without this permission, API Gateway cannot invoke the function, resulting in an AccessDeniedException.

Exam trap

The trap here is that candidates often assume the Lambda resource-based policy alone is sufficient, overlooking that API Gateway's execution role must also grant the `lambda:InvokeFunction` permission, which is a separate IAM authorization layer.

How to eliminate wrong answers

Option A is wrong because a Lambda function in a VPC without a NAT gateway would cause network timeout errors (e.g., inability to reach the internet), not an AccessDeniedException, which is an IAM permissions error. Option B is wrong because exceeding the Lambda concurrency limit results in a `TooManyRequestsException` (HTTP 429) or `ResourceConflictException`, not an AccessDeniedException. Option D is wrong because API Gateway caching returns stale responses only when caching is enabled and a cached response exists; it does not cause an AccessDeniedException, which is a permissions issue unrelated to caching.

Practice this question →

58

MCQhard

A company runs a critical application on EC2 instances behind an Application Load Balancer. The application experiences intermittent 503 errors. The health checks are configured correctly and the instances pass health checks consistently. What is the most likely cause?

A.The application response time exceeds the ALB idle timeout

B.The health check path is incorrect

C.The target group is deregistering instances prematurely

D.A security group is blocking traffic from the ALB to the instances

AnswerA

ALB idle timeout is 60 seconds by default; if the app takes longer, ALB returns 503.

Why this answer

Option D is correct because if the application takes longer than the idle timeout to respond, the ALB returns a 503. Option A is wrong because the target group might be deregistering instances that are terminating, but this would show as unhealthy. Option B is wrong because security groups denying traffic would cause connection timeouts, not 503.

Option C is wrong because the health check path issue would cause health check failures, not intermittent 503.

Practice this question →

59

MCQmedium

A developer monitors an AWS Lambda function that processes messages from an Amazon SQS queue. CloudWatch logs show that the function's execution time has increased significantly over the past week, and it now frequently times out at the 5-minute timeout. The function's code has not been changed recently. The function makes calls to an Amazon DynamoDB table. What is the most likely cause of the increased execution time?

A.The DynamoDB table's read capacity units are underprovisioned, causing throttling.

B.The SQS queue's visibility timeout is too short, causing duplicate processing.

C.The Lambda function's memory is too low, causing CPU throttling.

D.The DynamoDB table's indexes are missing, causing full table scans.

AnswerA

Underprovisioned capacity throttles read/write requests, causing Lambda to retry, increasing execution time and potentially causing timeouts.

Why this answer

The most likely cause is that the DynamoDB table's read capacity units are underprovisioned, leading to throttling (ProvisionedThroughputExceededException). When DynamoDB throttles requests, the Lambda function must retry them, which adds latency and can cause the function to exceed its 5-minute timeout. Since the code hasn't changed, this points to a scaling or capacity issue on the DynamoDB side.

Exam trap

The trap here is that candidates may confuse DynamoDB throttling with Lambda timeout configuration, overlooking that gradual performance degradation often points to downstream resource contention rather than function configuration.

How to eliminate wrong answers

Option B is wrong because a short SQS visibility timeout would cause duplicate processing, not increased execution time; duplicates would result in more invocations, not slower individual runs. Option C is wrong because low memory in Lambda causes CPU throttling only if the function is CPU-bound; memory allocation affects CPU proportionally, but the described symptom (increased execution time without code changes) is not typically caused by memory alone. Option D is wrong because missing indexes would cause full table scans, which would increase execution time from the start, not gradually over a week; this would be a code or schema issue, not a gradual degradation.

Practice this question →

60

MCQhard

A developer deployed a new version of an AWS Lambda function that is part of a serverless application. The function uses an Amazon DynamoDB table as a data store. After deployment, the developer notices that the function's latency has increased significantly for some requests. CloudWatch traces show that the increase is due to DynamoDB throttle events. The function is configured with a reserved concurrency of 100 and the DynamoDB table has 5 read capacity units (RCUs) and 5 write capacity units (WCUs). What is the most effective way to reduce the throttling while maintaining application performance?

A.Decrease the reserved concurrency of the Lambda function to 10

B.Increase the read and write capacity units on the DynamoDB table

C.Enable DynamoDB Accelerator (DAX) for caching reads

D.Enable auto scaling on the DynamoDB table

AnswerB

Increasing RCU and WCU directly increases the number of operations the table can handle, reducing throttling.

Why this answer

The primary cause of the throttling is insufficient DynamoDB capacity to handle the request volume from the Lambda function. Increasing the read and write capacity units (RCUs/WCUs) directly addresses the throttle events by providing more throughput to match the function's concurrency of 100. This is the most effective solution because it resolves the bottleneck at the data store level without reducing the application's ability to process requests concurrently.

Exam trap

The trap here is that candidates may choose auto scaling (Option D) thinking it dynamically handles spikes, but they overlook that auto scaling has a significant lag and cannot prevent immediate throttling, whereas increasing the base capacity is the immediate and effective solution.

How to eliminate wrong answers

Option A is wrong because decreasing reserved concurrency to 10 would reduce the number of concurrent Lambda invocations, which would lower the request rate to DynamoDB and potentially reduce throttling, but it would also severely degrade application performance by limiting throughput and increasing latency for legitimate traffic. Option C is wrong because DynamoDB Accelerator (DAX) is an in-memory cache that only accelerates read operations (GetItem, Query, Scan) and does not help with write throttling or reduce write capacity consumption; the question does not specify that the throttling is read-only, and DAX cannot mitigate write capacity throttling. Option D is wrong because enabling auto scaling on the DynamoDB table would adjust capacity over time based on traffic patterns, but it cannot react instantly to sudden spikes in demand; auto scaling has a lag of several minutes, so it would not prevent the immediate throttle events that are already occurring, and it does not address the need for a higher baseline capacity to match the Lambda's concurrency.

Practice this question →

61

MCQmedium

A company runs a web application on EC2 instances behind an Application Load Balancer (ALB). Users report intermittent 503 errors. The ALB health checks are failing for a few instances, but the instances themselves are running and have healthy application processes. What is the MOST likely cause?

A.The ALB is not scaled to handle the traffic.

B.The security group for the EC2 instances is not allowing traffic from the ALB.

C.The DNS resolution via Route53 is misconfigured.

D.Sticky sessions are not enabled on the ALB.

AnswerB

Health checks fail if security group blocks ALB traffic.

Why this answer

The ALB health checks are failing despite the instances and application processes being healthy, which indicates a network-level issue. The most likely cause is that the EC2 instances' security group is not allowing inbound traffic from the ALB's security group on the health check port (e.g., HTTP/HTTPS). Without this rule, the ALB cannot reach the health check endpoint, marking the instances as unhealthy and causing intermittent 503 errors when traffic is routed to those instances.

Exam trap

The trap here is that candidates often assume health check failures are always due to application issues (e.g., process crashes) rather than network-layer misconfigurations like security group rules, especially when the instance appears healthy from within the OS.

How to eliminate wrong answers

Option A is wrong because the ALB scales automatically based on traffic patterns and does not require manual scaling; 503 errors from insufficient capacity would be persistent, not intermittent, and would affect all instances. Option C is wrong because DNS misconfiguration via Route53 would cause resolution failures (e.g., NXDOMAIN) or routing to the wrong endpoint, not intermittent 503 errors from healthy instances behind an ALB. Option D is wrong because sticky sessions (session affinity) do not affect health checks or 503 errors; they only control how requests are distributed to the same target, and their absence would not cause health check failures.

Practice this question →

62

MCQhard

A developer receives an Access Denied error when trying to download an object from an S3 bucket. The developer's IAM policy is shown in the exhibit. The bucket policy also grants access. What is the MOST likely cause?

A.The S3 bucket has block public access enabled.

B.The S3 bucket uses SSE-KMS and the user lacks kms:Decrypt permission.

C.The IAM policy does not allow s3:GetObject.

D.The bucket policy denies access to the user.

AnswerB

KMS permissions are required to decrypt objects.

Why this answer

Option B is correct because if the bucket is encrypted with a KMS key, the user must also have kms:Decrypt permission. Option A is wrong because the policy explicitly allows s3:GetObject. Option C is wrong because the bucket policy also grants access, so it's not a bucket policy issue.

Option D is wrong because public access is not required if IAM policies allow access.

Practice this question →

63

MCQhard

A developer is troubleshooting an AWS Lambda function that experiences high latency for the first few invocations after being idle. The function is written in Python and uses a large library (e.g., Pandas). The function connects to an RDS database in a VPC. What is the most effective way to reduce the latency for the first invocation after idle?

A.Increase the function's memory allocation to 3008 MB.

B.Enable provisioned concurrency on the function.

C.Move the large library to a Lambda layer.

D.Replace the RDS database with Amazon DynamoDB.

AnswerB

Provisioned concurrency keeps the function initialized and ready, eliminating cold starts.

Why this answer

Provisioned concurrency keeps a specified number of execution environments initialized and ready to respond immediately, eliminating the cold start latency that occurs after a period of idle time. This is the most direct solution for reducing latency on the first invocation after idle, especially for functions with large libraries like Pandas that take significant time to load.

Exam trap

The trap here is that candidates often confuse cold start mitigation strategies like increasing memory or using layers with the only AWS feature that truly eliminates cold starts for idle functions: provisioned concurrency.

How to eliminate wrong answers

Option A is wrong because increasing memory allocation can improve CPU performance and reduce cold start time slightly, but it does not eliminate the cold start itself; the function still needs to load the large library and establish the VPC connection from scratch after idle. Option C is wrong because moving the library to a Lambda layer does not reduce cold start latency; layers are simply a packaging mechanism and the library still must be loaded into memory during initialization. Option D is wrong because replacing RDS with DynamoDB addresses database connection latency, not the cold start latency caused by loading the large Python library and initializing the function runtime.

Practice this question →

64

MCQmedium

A developer notices that an AWS Lambda function processing S3 events is being retried frequently due to throttling errors from Amazon DynamoDB. The function writes records to a DynamoDB table and has reserved concurrency set to 100. The DynamoDB table uses on-demand capacity mode. What should the developer do to reduce retries and improve overall throughput?

A.Increase the Lambda function's reserved concurrency to 500.

B.Implement exponential backoff and retry in the Lambda function code for DynamoDB API calls.

C.Disable the Lambda function's S3 event source mapping and use Amazon SQS to buffer events.

D.Switch the DynamoDB table to provisioned capacity with a high write capacity unit setting.

AnswerB

Exponential backoff and retry automatically handle throttling errors by retrying with increasing delays, reducing the chance of repeated failures.

Why this answer

Option B is correct because implementing exponential backoff and retry in the Lambda function code for DynamoDB API calls directly addresses the throttling errors. Even with on-demand capacity, DynamoDB can throttle requests if they exceed the table's burst capacity or if there are hot partitions. Exponential backoff reduces the retry rate, allowing DynamoDB to recover and improving overall throughput without changing the Lambda concurrency or capacity mode.

Exam trap

The trap here is that candidates assume increasing Lambda concurrency or switching to provisioned capacity will solve throttling, but the real issue is the retry strategy at the application layer, not the infrastructure scaling.

How to eliminate wrong answers

Option A is wrong because increasing reserved concurrency to 500 would only increase the number of concurrent Lambda invocations, which would exacerbate DynamoDB throttling by sending more requests simultaneously. Option C is wrong because disabling the S3 event source mapping and using SQS to buffer events would add latency and complexity but does not address the root cause of DynamoDB throttling; it only decouples the invocation, not the write errors. Option D is wrong because switching to provisioned capacity with a high write capacity unit setting does not guarantee elimination of throttling; on-demand mode already scales automatically, and the issue is likely due to request patterns or hot partitions, not capacity mode.

Practice this question →

65

MCQeasy

A developer is troubleshooting an AWS Lambda function that is failing with an 'AccessDenied' error when trying to write to an S3 bucket. The function's execution role has the following policy. What is the most likely cause of the failure? (Policy: { 'Version': '2012-10-17', 'Statement': [ { 'Effect': 'Allow', 'Action': 's3:PutObject', 'Resource': 'arn:aws:s3:::my-bucket/*' } ] })

A.The resource ARN does not include the bucket itself; it only includes objects

B.The policy is missing a 'Principal' element

C.The action 's3:PutObject' is not allowed for Lambda execution roles

D.The action 's3:PutObject' is not sufficient; need 's3:*'

AnswerA

For s3:PutObject, the resource must be 'arn:aws:s3:::my-bucket/*'. However, the bucket policy may also need access, but the error is likely due to missing permissions on the bucket level for other actions like 's3:ListBucket' or the function is trying to write to the bucket root without proper permissions.

Why this answer

Option B is correct because the resource ARN 'arn:aws:s3:::my-bucket/*' only allows objects inside the bucket, but the bucket itself is 'arn:aws:s3:::my-bucket'. The policy needs to include both the bucket and its contents for operations like 's3:PutObject'. Option A is wrong because the action 's3:PutObject' is correct.

Option C is wrong because the policy is in the correct format. Option D is wrong because the action is allowed.

Practice this question →

66

MCQhard

The exhibit shows an IAM policy attached to a Lambda function's execution role. The function writes objects to an S3 bucket that is encrypted with a KMS key (the key specified in the policy). When the function tries to write an object, it receives an access denied error. What is the MOST likely missing permission?

A.kms:GenerateDataKey is missing.

B.The KMS key policy does not allow the Lambda function role.

C.s3:GetObject is missing for the bucket.

D.kms:ReEncrypt is missing.

AnswerA

S3 uses GenerateDataKey for server-side encryption with KMS.

Why this answer

Option B is correct because to write an encrypted object, the function needs kms:GenerateDataKey or kms:Encrypt permission. Option A is wrong because s3:PutObject is allowed. Option C is wrong because KMS key policy might be restrictive, but the most direct missing permission is kms:GenerateDataKey.

Option D is wrong because kms:ReEncrypt is not needed.

Practice this question →

67

MCQmedium

A developer is running a Docker container on Amazon ECS with Fargate. The container logs are not appearing in CloudWatch Logs even though the task definition has a logConfiguration specifying the awslogs driver and a log group. What is the MOST likely missing configuration?

A.The container image does not have the awslogs log driver installed.

B.The task execution role lacks the necessary IAM permissions to write to CloudWatch Logs.

C.The CloudWatch Logs log group does not exist.

D.The EC2 instance profile does not have CloudWatch Logs permissions.

AnswerB

The execution role needs logs:CreateLogStream and logs:PutLogEvents.

Why this answer

Option A is correct because the task execution role must have permissions to write to CloudWatch Logs. Option B is wrong because the container is running in Fargate, so there is no EC2 instance profile. Option C is wrong because the log driver is specified in the task definition, not in the container image.

Option D is wrong because the log group already exists in the configuration.

Practice this question →

68

MCQhard

A developer notices that an AWS Lambda function, which processes messages from an SQS queue, is taking longer than expected. The function has a reserved concurrency of 5 and a batch size of 10. The SQS queue has a large backlog. CloudWatch metrics show that the function's throttles are high. The function is idempotent and can process up to 100 messages per invocation. What is the most effective way to increase throughput without increasing reserved concurrency?

A.Increase the batch size to 100.

B.Increase reserved concurrency to 10.

C.Change the function timeout to 15 minutes.

D.Enable SQS short polling to reduce latency.

AnswerA

Since the function can handle more messages per invocation, increasing the batch size reduces the number of invocations, which reduces throttling and increases throughput without changing reserved concurrency.

Why this answer

Increasing the batch size to 100 allows each invocation to process up to 100 messages instead of 10, directly increasing throughput per invocation without changing the reserved concurrency of 5. Since the function is idempotent and can handle up to 100 messages per invocation, this change maximizes the number of messages processed per Lambda execution, reducing the backlog more efficiently.

Exam trap

The trap here is that candidates may assume increasing batch size is always beneficial, but in reality, AWS Lambda's maximum batch size for SQS is 10, so the option to increase to 100 is unrealistic and tests whether you recognize the constraint or focus on the conceptual improvement.

How to eliminate wrong answers

Option B is wrong because increasing reserved concurrency would increase the number of concurrent executions, which directly contradicts the requirement to not increase reserved concurrency. Option C is wrong because increasing the function timeout does not increase throughput; it only allows longer processing time per invocation, but the bottleneck is throttling due to concurrency limits, not execution duration. Option D is wrong because enabling SQS short polling reduces latency for message retrieval but does not increase the number of messages processed per invocation or reduce throttling; it may even increase the number of empty responses.

Practice this question →

69

MCQhard

An application running on Amazon ECS Fargate is experiencing intermittent high latency and timeout errors. The application makes API calls to an external third-party service. The ECS service is configured with a target group using HTTP health checks. The ALB health check logs show occasional 503 responses. What is the MOST likely cause?

A.The security group for the ECS tasks is blocking inbound traffic from the ALB.

B.The ECS tasks are running out of CPU credits, causing slow response times.

C.The ECS service is configured with a task placement strategy that is causing tasks to be stopped and restarted frequently.

D.The application is not properly handling timeouts to the third-party service, causing the health check endpoint to hang.

AnswerD

If the health check endpoint is blocked by a long-running call to the external service, the ALB health check may timeout and return 503, leading to unhealthy tasks.

Why this answer

Option A is correct because if the application is waiting for a response from the third-party service, it may not respond to health checks in time, causing the ALB to mark it unhealthy and stop routing traffic, which exacerbates the issue. Option B (insufficient CPU) could cause latency but not specifically 503s. Option C (security group) would cause consistent failures.

Option D (task placement) would cause new tasks to be created, but not 503s.

Practice this question →

70

MCQmedium

A developer is troubleshooting an AWS Lambda function that processes records from an Amazon Kinesis Data Stream. The function is configured with a batch size of 100 and a parallelization factor of 1. The developer notices that the iterator age is increasing, indicating that the function is not keeping up with the stream. CloudWatch Logs show that the function is not experiencing errors or throttling, but the execution time per invocation is close to the 5-minute timeout. The stream has 10 shards. Which action will most likely increase processing throughput?

A.Increase the batch size to 500.

B.Increase the parallelization factor to 10.

C.Increase the Lambda function memory and CPU allocation.

D.Split the stream into more shards.

AnswerC

Increasing memory increases CPU allocation proportionally, which can make each invocation faster. This reduces the per-batch processing time, allowing the function to keep up with the stream and decrease the iterator age.

Why this answer

Option C is correct because the function's execution time is already near the 5-minute timeout, indicating a CPU-bound or memory-bound operation. Increasing memory proportionally increases CPU allocation in Lambda, which directly reduces execution time per invocation, allowing each batch to be processed faster and thus increasing overall throughput without changing the batch size or shard count.

Exam trap

The trap here is that candidates assume increasing parallelism (via shards or parallelization factor) always improves throughput, but when the bottleneck is per-invocation execution time (not concurrency), only reducing that time—by increasing memory/CPU—will help.

How to eliminate wrong answers

Option A is wrong because increasing the batch size to 500 would cause the function to process more records per invocation, but since the function is already near the timeout, it would likely exceed the 5-minute limit, leading to timeouts and failed processing. Option B is wrong because the parallelization factor controls the number of concurrent Lambda instances per shard; with 10 shards and a factor of 1, there are already 10 concurrent instances, and increasing the factor to 10 would create 100 concurrent instances, which could cause throttling or out-of-memory errors without addressing the per-invocation execution time bottleneck. Option D is wrong because splitting the stream into more shards increases the number of parallel Lambda invocations, but each invocation still suffers from the same high execution time, so the overall processing rate would not improve; additionally, the function is not throttled, so shard count is not the limiting factor.

Practice this question →

71

MCQhard

A developer is using Amazon API Gateway with a Lambda authorizer to protect a REST API. The authorizer is configured with a TTL of 300 seconds. After updating the IAM policy attached to the authorizer's execution role, some users still receive 403 Forbidden errors for requests that should be allowed. What is the MOST likely cause?

A.The Lambda authorizer function has a timeout that prevents it from evaluating the new policy.

B.The authorizer's cached results are still valid, so the old policy is being applied.

C.The IAM policy update has not propagated to all regions.

D.The API Gateway endpoint was not redeployed after the policy change.

AnswerB

Caching causes the authorizer to return previous decisions until the TTL expires.

Why this answer

Option C is correct because the Lambda authorizer result is cached for 300 seconds, so old permissions are enforced until the cache expires. Option A is wrong because the authorizer function itself does not have an execution timeout issue. Option B is wrong because the IAM policy update is immediate for the role, but the authorizer result is cached.

Option D is wrong because the deployment does not affect authorizer caching.

Practice this question →

72

MCQeasy

A developer notices that an S3 bucket used for static website hosting returns 403 Forbidden for anonymous requests. The bucket policy allows s3:GetObject for Principal "*". What is the most likely issue?

A.The bucket does not have server access logging enabled.

B.The bucket ACL does not allow public read.

C.The bucket policy is not attached to the correct bucket.

D.The S3 Block Public Access settings are enabled.

AnswerD

Block Public Access overrides bucket policies.

Why this answer

D is correct because S3 Block Public Access settings, when enabled at the account or bucket level, override any bucket policy or ACL that grants public access. Even though the bucket policy allows s3:GetObject for Principal "*", the Block Public Access settings explicitly deny all public requests, resulting in a 403 Forbidden error for anonymous users.

Exam trap

The trap here is that candidates often assume a bucket policy granting public access is sufficient, overlooking the S3 Block Public Access settings which silently override such policies and cause 403 errors.

How to eliminate wrong answers

Option A is wrong because server access logging is a feature for logging requests to the bucket, not a permission control; it does not affect whether requests are allowed or denied. Option B is wrong because the bucket policy already grants public read access via Principal "*", and while ACLs can also grant public read, the bucket policy takes precedence; the issue is not the ACL but an overriding deny. Option C is wrong because the question states the bucket policy is attached and allows s3:GetObject, so the policy is correctly associated; the problem lies with a separate security mechanism.

Practice this question →

73

MCQhard

A company runs a monolithic application on EC2 Behind an Application Load Balancer. They want to migrate to a microservices architecture using ECS Fargate. What is the most important optimization to ensure minimal downtime during the migration?

A.Use a blue/green deployment strategy with weighted target groups.

B.Increase the EC2 instance size to handle the microservices load.

C.Deploy all microservices in a single ECS service for simplicity.

D.Scale horizontally by adding more EC2 instances.

AnswerA

Blue/green allows controlled traffic shift.

Why this answer

Option A is correct because a blue/green deployment strategy with weighted target groups allows you to gradually shift traffic from the existing monolithic EC2 application (blue) to the new microservices on ECS Fargate (green) while monitoring for errors. This minimizes downtime by enabling instant rollback if issues arise, and it leverages Application Load Balancer (ALB) features like stickiness and health checks to ensure a seamless transition without disrupting active connections.

Exam trap

The trap here is that candidates confuse scaling strategies (horizontal/vertical) with deployment strategies, assuming that adding more capacity or consolidating services will inherently reduce downtime, when in fact only a controlled traffic-shifting method like blue/green with weighted routing ensures minimal disruption during a live migration.

How to eliminate wrong answers

Option B is wrong because increasing EC2 instance size does not address the migration to microservices or ECS Fargate; it only scales the monolithic application vertically, which contradicts the goal of moving to a serverless container architecture and does not reduce downtime during migration. Option C is wrong because deploying all microservices in a single ECS service defeats the purpose of microservices isolation, scaling, and independent deployment; it introduces tight coupling and increases the blast radius of failures, leading to higher downtime risk. Option D is wrong because scaling horizontally by adding more EC2 instances only scales the monolithic application, not the microservices on Fargate, and does not provide a controlled traffic-shifting mechanism to minimize downtime during migration.

Practice this question →

74

MCQmedium

A developer configured an S3 bucket to trigger a Lambda function on object creation. The Lambda function processes the object and then deletes it. Some objects are not being processed. What should the developer do to ensure all objects are processed?

A.Assign a new IAM role to the Lambda function with S3 permissions.

B.Enable S3 versioning on the bucket.

C.Send S3 events to an SQS queue and configure the Lambda function to poll the queue.

D.Increase the Lambda function timeout.

AnswerC

SQS provides reliable message delivery with retries.

Why this answer

Option C is correct because sending S3 events to an SQS queue decouples event delivery from Lambda invocation. If the Lambda function fails or throttles, the event remains in the queue and can be retried, ensuring no objects are missed. Without a queue, S3 events that fail to invoke Lambda (e.g., due to concurrency limits) are lost, leading to unprocessed objects.

Exam trap

The trap here is that candidates assume the issue is a permission or timeout problem, when in fact the root cause is the loss of S3 event notifications due to Lambda throttling or transient failures, which a queue-based architecture resolves.

How to eliminate wrong answers

Option A is wrong because the Lambda function already processes and deletes objects, so it must already have S3 permissions; assigning a new IAM role would not fix lost events. Option B is wrong because enabling S3 versioning preserves object versions but does not affect event delivery reliability or retry behavior. Option D is wrong because increasing the Lambda function timeout addresses execution duration, not the loss of events due to throttling or invocation failures.

Practice this question →

75

MCQmedium

A developer is troubleshooting an AWS Lambda function that returns timeout errors when calling an external HTTPS API. The function is configured with a 30-second timeout and runs in a VPC with a public subnet and NAT Gateway. The developer checks CloudWatch logs and sees that the function is timing out at exactly 30 seconds. What is the most likely cause?

A.The NAT Gateway is not configured with a route to the internet.

B.The Lambda function's security group does not allow outbound traffic.

C.The external API's response time exceeds 30 seconds.

D.The Lambda function's VPC does not have an internet gateway.

AnswerB

Correct. If the security group's outbound rules do not permit HTTPS traffic, the connection cannot be established, resulting in a timeout. This is the most common cause in such scenarios.

Why this answer

Option B is correct because Lambda functions running in a VPC do not automatically get internet access; they require a route to a NAT Gateway or NAT instance. Even with a NAT Gateway, the Lambda function's security group must allow outbound traffic (e.g., HTTPS on port 443) to reach the external API. Without this rule, outbound packets are dropped, causing the function to hang until the configured timeout (30 seconds) expires, resulting in a timeout error.

Exam trap

The trap here is that candidates assume a NAT Gateway alone provides internet access to Lambda, overlooking that security group egress rules must explicitly allow outbound traffic to the destination.

How to eliminate wrong answers

Option A is wrong because the NAT Gateway is explicitly stated to be present, and a NAT Gateway requires a route to the internet (via an Internet Gateway) to function; if it were misconfigured, the function would likely fail immediately or at a different timeout, not exactly at 30 seconds. Option C is wrong because the function times out at exactly 30 seconds, matching its configured timeout, not at a variable time based on API response; if the API exceeded 30 seconds, the timeout would still occur at 30 seconds, but the question asks for the most likely cause given the VPC setup. Option D is wrong because the VPC does not need an Internet Gateway for outbound traffic through a NAT Gateway; the NAT Gateway itself resides in a public subnet and uses an Internet Gateway, but the Lambda function's VPC configuration is separate—the issue is security group egress rules, not the presence of an Internet Gateway.

Practice this question →

Page 1 of 4 · 291 questions totalNext →

Ready to test yourself?

Try a timed practice session using only Troubleshooting and Optimization questions.

Start 20-question session