DVA-C02 domain

Troubleshooting and Optimization

Use this page to practise DVA-C02 Troubleshooting and Optimization practice questions. The goal is not to memorise dumps, but to understand the concept, review the explanation and improve your exam readiness.

54 questions

Focused practice

Start a Troubleshooting and Optimization session

All sessions draw only from this domain. Pick a length or try interactive practice with inline explanations.

Start 20-question practice session →

What the exam tests

What to know about Troubleshooting and Optimization

Troubleshooting and Optimization questions test whether you can apply the concept in context, not just recognise a definition.

How the topic appears in realistic exam-style scenarios.

Which detail in the question changes the correct answer.

How to eliminate plausible but wrong options.

How to connect the question back to the wider exam objective.

Question index

All Troubleshooting and Optimization questions (54)

Click any question to see the full explanation, or start a practice session above.

1

A developer deployed a new version of an AWS Lambda function that is part of a serverless application. The function uses an Amazon DynamoDB table as a data store. After deployment, the developer notices that the function's latency has increased significantly for some requests. CloudWatch traces show that the increase is due to DynamoDB throttle events. The function is configured with a reserved concurrency of 100 and the DynamoDB table has 5 read capacity units (RCUs) and 5 write capacity units (WCUs). What is the most effective way to reduce the throttling while maintaining application performance?

2

A developer is running an AWS Lambda function that is triggered by Amazon S3 events. The function writes processed data to an Amazon DynamoDB table. Over time, the function's execution time has increased significantly. CloudWatch Logs show many DynamoDBProvisionedThroughputExceededException errors. The table is configured with 5 read capacity units (RCUs) and 5 write capacity units (WCUs). The function performs both reads and writes. Which optimization will MOST effectively reduce throttling errors while maintaining performance?

3

A web application runs on Amazon EC2 instances behind an Application Load Balancer (ALB). During peak hours, users report receiving HTTP 503 (Service Unavailable) errors. The developer checks Amazon CloudWatch metrics and finds that the ALB's request count is high but below the limit, and the target group's healthy host count drops to zero intermittently. The Auto Scaling group for the instances is configured with a minimum of 2, maximum of 10, and a simple scaling policy to add 2 instances when CPU utilization exceeds 70% for 5 consecutive minutes. What is the most likely cause of the 503 errors?

4

A developer is troubleshooting an AWS Lambda function that processes large CSV files (up to 1 GB) uploaded to an Amazon S3 bucket. The function uses Python and the pandas library to perform data transformations. Recently, the function started timing out on large files. CloudWatch Logs show that the function's execution time is close to the 15-minute Lambda timeout, and memory utilization peaks at around 80% of the configured 3,008 MB. The function has not been modified in months. Which action will most likely resolve the timeout issue without requiring code changes?

5

A developer is troubleshooting an AWS Lambda function that processes records from an Amazon Kinesis Data Stream. The function is configured with a batch size of 100 and a parallelization factor of 1. The developer notices that the iterator age is increasing, indicating that the function is not keeping up with the stream. CloudWatch Logs show that the function is not experiencing errors or throttling, but the execution time per invocation is close to the 5-minute timeout. The stream has 10 shards. Which action will most likely increase processing throughput?

6

A developer is troubleshooting an AWS Lambda function that is invoked from an Amazon S3 bucket via event notifications. The function processes images and stores metadata in Amazon DynamoDB. The developer notices that some images are being processed multiple times, resulting in duplicate entries in DynamoDB. The S3 event notification is configured to send events to the Lambda function with the 's3:ObjectCreated:*' event type. The function uses the 'uuid' library to generate a unique ID for each image upon processing. What is the most likely cause of the duplicate processing?

7

A developer is troubleshooting an AWS Lambda function that processes records from an Amazon Kinesis Data Stream. The function is configured with a batch size of 100 and a parallelization factor of 1. The developer notices that the function is processing records slowly, and the iterator age is increasing. CloudWatch Logs show that the function is not experiencing errors or throttling, but the execution time per invocation is close to the 5-minute timeout. The stream has 10 shards. What is the most cost-effective way to increase processing throughput?

8

A web application runs on Amazon EC2 instances behind an Application Load Balancer (ALB). During rolling updates of the Auto Scaling group, users intermittently receive HTTP 502 (Bad Gateway) errors. The developer checks the ALB access logs and notices that requests are being routed to instances that are in the 'Draining' state. The ALB has connection draining enabled with a timeout of 30 seconds. The Auto Scaling group terminates instances after they are taken out of service. What is the most likely cause of the 502 errors?

9

A developer notices that an AWS Lambda function, which processes messages from an SQS queue, is taking longer than expected. The function has a reserved concurrency of 5 and a batch size of 10. The SQS queue has a large backlog. CloudWatch metrics show that the function's throttles are high. The function is idempotent and can process up to 100 messages per invocation. What is the most effective way to increase throughput without increasing reserved concurrency?

10

An application running on Amazon ECS (Fargate) uses an Application Load Balancer (ALB) with connection draining enabled. The application is experiencing intermittent 502 (Bad Gateway) errors during rolling updates of the ECS service. The developer notices that the ALB is routing requests to tasks that are in the 'Draining' state. The ECS service is configured with a deployment circuit breaker that automatically rolls back a failed deployment. What is the most likely cause of the 502 errors?

11

A developer notices that an AWS Lambda function, which uses Amazon RDS Proxy to connect to an Aurora MySQL database, is experiencing increased latency and occasional connection timeouts. The function is configured with a reserved concurrency of 100 and is deployed in a VPC. The RDS Proxy's maximum connections is set to 1000. CloudWatch metrics show that the DatabaseConnections metric for the proxy is consistently at 1000. What is the most likely cause of the increased latency and timeouts?

12

A developer monitors an AWS Lambda function that processes records from an Amazon SQS queue and writes results to an Amazon DynamoDB table. CloudWatch Logs show that execution time has increased over the past week, and the function frequently times out at the 5-minute timeout. The function's code has not been changed recently. CloudWatch metrics show a high rate of DynamoDBProvisionedThroughputExceededException errors. The DynamoDB table has 5 write capacity units (WCUs). What action will MOST effectively reduce the function's execution time?

13

A developer is troubleshooting performance issues in an application that uses Amazon DynamoDB as the primary data store. The application reads a large set of items using a Query operation on a Global Secondary Index (GSI). The developer notices high read latency and throttled requests on the GSI. The base table has sufficient read capacity. The GSI is projected with KEYS_ONLY. Which action would most likely reduce the latency and throttling?

14

A developer monitors an AWS Lambda function that processes messages from an Amazon SQS queue. CloudWatch logs show that the function's execution time has increased significantly over the past week. The function's code has not been changed recently. The function makes calls to an Amazon DynamoDB table. CloudWatch metrics show a high rate of DynamoDBProvisionedThroughputExceededException errors. The DynamoDB table has 5 read and 5 write capacity units (RCU/WCU). What is the most effective action to reduce the function's execution time?

15

A developer is using AWS Lambda with a function that processes messages from an SQS queue. The function is configured with a batch size of 10 and reserved concurrency of 5. The queue has a large backlog, and messages are being throttled, leading to retries and eventual DLQ. The function is idempotent and can handle up to 100 messages per invocation. What is the most effective way to increase throughput without increasing throttling?

16

An AWS Lambda function that processes messages from an SQS queue is experiencing throttling (TooManyRequestsException). The function has reserved concurrency set to 100. The SQS queue has a redrive policy configured with maxReceiveCount of 5. CloudWatch metrics show that the function's concurrent executions occasionally spike to 100, and throttling occurs. The function execution time averages 2 seconds. What is the most effective way to reduce throttling?

17

A developer deployed an AWS Lambda function that is invoked by an Amazon SQS queue. The function is configured with a batch size of 10 and a timeout of 30 seconds. CloudWatch metrics show that the function's Duration is consistently around 28 seconds, but occasionally spikes to 35 seconds causing timeouts. The function makes a synchronous HTTP call to an external API. Which approach will MOST effectively prevent timeouts while maximizing throughput?

18

A developer is troubleshooting slow response times in a serverless application. The application consists of an Amazon API Gateway REST API that invokes an AWS Lambda function, which then writes data to an Amazon DynamoDB table with on-demand capacity. The function also calls an external API for enrichment. The developer observes that the API Gateway integration latency is high, but the Lambda function duration is low. What is the most likely cause?

19

A developer is troubleshooting an AWS Lambda function that is triggered by an S3 event. The function occasionally fails with a timeout error. CloudWatch logs show that the timeout occurs during the processing of large files. The function has a memory setting of 128 MB and a timeout of 3 seconds. The developer wants to process large files without modifying the code. Which parameter should the developer adjust first?

20

A developer is troubleshooting performance issues in an application that uses Amazon ElastiCache for Redis. The application experiences periodic latency spikes during peak hours. The developer checks CloudWatch metrics and sees that the 'Evictions' metric is consistently high and the 'CacheHitRate' metric is low. The cluster uses a single cache.t3.small node. Which action will most likely improve the cache hit rate and reduce latency?

21

A developer is troubleshooting a DynamoDB table that is experiencing high write throttling (ProvisionedThroughputExceededException) on certain days. The table has provisioned write capacity of 1000 WCU. The table has a partition key of 'user_id' which is a UUID. The table is accessed by multiple services. CloudWatch metrics show that the WriteThrottleEvents are spiking during specific hours, and the ConsumedWriteCapacityUnits often reaches 1000. What is the most likely cause of the throttling?

22

A developer is troubleshooting an AWS Lambda function that writes items to an Amazon DynamoDB table. The function frequently fails with ProvisionedThroughputExceededException. The table has provisioned write capacity of 500 write capacity units (WCUs). The function has reserved concurrency of 10, and each invocation writes 10 items of approximately 1 KB each. There are no other writers to the table. What is the most likely cause of the throttling?

23

A developer is monitoring an AWS Lambda function that processes events from an Amazon Kinesis stream. The function's CloudWatch metrics show high IteratorAge and the function is often throttled. The function's batch size is 100, maximum record age is 60s, and reserved concurrency is 100. The Kinesis stream has 10 shards, each with 5000 records/sec. Which action is most effective to reduce the IteratorAge and throttle rate?

24

A developer is troubleshooting an Amazon API Gateway REST API that returns 504 Gateway Timeout errors for certain requests. The backend is a Lambda function that performs a resource-intensive operation that occasionally takes up to 30 seconds. API Gateway has a default integration timeout of 29 seconds. The developer cannot reduce the execution time. What should the developer do to resolve the timeout issue?

25

A developer is troubleshooting an application that uses Amazon ElastiCache for Redis to improve performance. The application periodically experiences high latency during peak hours. The developer checks the ElastiCache metrics and sees that the 'Evictions' metric is consistently high and the 'CacheHitRate' metric is low. The cluster has a single node with a cache.t3.small instance type. Which action will most likely improve the cache hit rate and reduce latency?

26

An AWS Lambda function processes messages from an Amazon SQS queue and writes results to an Amazon DynamoDB table. The function is configured with a reserved concurrency of 5 and a batch size of 10. CloudWatch metrics show high throttling and a growing queue backlog. The function's execution time averages 1 second per message. What is the MOST effective action to reduce throttling while improving throughput?

27

A developer is debugging an issue where an Amazon S3 bucket policy is not allowing cross-account access for a user from another AWS account. The bucket policy grants access to the other account's root user. The IAM user in the other account has an IAM policy that allows s3:GetObject on the bucket. When the user tries to download an object, they get an Access Denied error. What is the most likely cause?

28

A developer has deployed an AWS Lambda function that is triggered by an Amazon S3 event. The function processes image files and stores metadata in an Amazon DynamoDB table. CloudWatch metrics show that the function's error count has increased. The developer checks CloudWatch Logs and sees errors related to insufficient memory. The function is configured with 128 MB of memory. What should the developer do to resolve the errors?

29

A developer is troubleshooting an AWS Lambda function that processes streaming data from Amazon Kinesis Data Streams. The function processes records in batches. The developer notices that the function often experiences high latency even though the average invocation rate is well below the account concurrency limit. Which action would MOST effectively reduce latency?

30

A developer is monitoring an AWS Lambda function that is triggered by an Amazon SQS queue. The function's CloudWatch metrics show a high number of throttles. The function has a reserved concurrency of 10 and the SQS queue has a large backlog of messages. The function processes each message in about 2 seconds and has a timeout of 60 seconds. Which action will most effectively reduce the throttles and increase throughput?

31

A developer is monitoring an AWS Lambda function that processes messages from an SQS queue. CloudWatch metrics show that the function's throttles are high when the queue backlog grows. The function has a reserved concurrency of 50 and a batch size of 10. The SQS queue has a visibility timeout of 30 seconds. The function processes each batch in about 5 seconds. Which action will most effectively reduce throttles?

32

A developer is using AWS X-Ray to trace a microservices application. The trace shows that a downstream service is failing with HTTP 500 errors intermittently. The developer wants to set up trace annotations to capture the error details for further analysis. Which AWS service can the developer use to search and filter traces based on these annotations?

33

A developer is troubleshooting an AWS Lambda function that experiences high latency for the first few invocations after being idle. The function is written in Python and uses a large library (e.g., Pandas). The function connects to an RDS database in a VPC. What is the most effective way to reduce the latency for the first invocation after idle?

34

A developer notices that an AWS Lambda function, configured to access an Amazon RDS database in the same VPC, is timing out. The function has a 30-second timeout. CloudWatch Logs show that the function starts execution but never reaches the database. The VPC configuration includes private subnets without a NAT gateway. The RDS database is in the same VPC. What is the most likely cause of the timeout?

35

A developer notices that an AWS Lambda function processing S3 events is being retried frequently due to throttling errors from Amazon DynamoDB. The function writes records to a DynamoDB table and has reserved concurrency set to 100. The DynamoDB table uses on-demand capacity mode. What should the developer do to reduce retries and improve overall throughput?

36

A developer is managing an application running on Amazon EC2 instances behind an Application Load Balancer. Users report that the application becomes unresponsive after several hours, and restarting the instance temporarily fixes the issue. The developer suspects a memory leak but cannot add custom instrumentation. Which AWS service can collect memory utilization metrics and help identify the memory leak with minimal configuration?

37

A developer is troubleshooting an AWS Lambda function that processes records from an Amazon Kinesis Data Stream. The function is configured with a batch size of 100 and a parallelization factor of 1. The iterator age metric is increasing, and CloudWatch Logs show the function execution time is around 4 minutes (timeout is 5 minutes). The stream has 10 shards. What is the most cost-effective way to increase processing throughput?

38

A developer is using Amazon DynamoDB and notices that read requests are frequently throttled. The table has provisioned read capacity of 100 read capacity units (RCUs) and is used by a web application that experiences bursty traffic. The developer wants to minimize throttling without manual intervention. Which action should the developer take?

39

A developer is troubleshooting an application that uses Amazon ElastiCache for Redis to cache database query results. The application experiences high latency during cache misses. The developer notices that frequently accessed keys (hot keys) are often missing from the cache, suggesting they are being evicted. Which action should the developer take to reduce cache misses for hot keys?

40

A developer monitors an AWS Lambda function that processes messages from an Amazon SQS queue. CloudWatch logs show that the function's execution time has increased significantly over the past week, and it now frequently times out at the 5-minute timeout. The function's code has not been changed recently. The function makes calls to an Amazon DynamoDB table. What is the most likely cause of the increased execution time?

41

A developer is troubleshooting an AWS Lambda function that processes files uploaded to an Amazon S3 bucket. The function sometimes times out when processing large files. CloudWatch Logs show that the function's execution time correlates with file size. The function is configured with 128 MB memory and a timeout of 30 seconds. Which action should the developer take to resolve the timeout for large files without refactoring the code?

42

A developer has an AWS Lambda function that processes messages from an Amazon SQS queue. The function is configured with a reserved concurrency of 5. Recently, the SQS queue has experienced a high volume of messages, and the developer notices that many invocations are being throttled, leading to increased processing time. What is the most likely cause of the throttling?

43

A developer is troubleshooting an AWS Lambda function that returns timeout errors when calling an external HTTPS API. The function is configured with a 30-second timeout and runs in a VPC with a public subnet and NAT Gateway. The developer checks CloudWatch logs and sees that the function is timing out at exactly 30 seconds. What is the most likely cause?

44

A Lambda function behind API Gateway intermittently times out only during cold starts. Which two actions can reduce cold-start impact?

45

A DynamoDB application receives ProvisionedThroughputExceededException during predictable daily peaks. The workload is not cacheable. What should be changed?

46

Messages in an SQS queue are processed successfully but later reappear and are processed again. What is the most likely configuration issue?

47

A developer needs to trace a request across API Gateway, Lambda, and downstream AWS service calls. Which service should be enabled?

48

A Lambda function using a Kinesis event source repeatedly retries one bad record and blocks progress in the shard. Which feature helps isolate failed records after retry limits?

49

An API Gateway API returns 429 errors during load testing. Which two areas should the developer investigate first?

50

An API backed by Lambda returns high p95 latency after deployment. Which two telemetry sources are most useful first?

51

A DynamoDB table shows throttling on one partition key value. Which two signs point to a hot partition problem?

52

A Lambda function reading from Kinesis is falling behind. Which two metrics/settings should be reviewed first?

53

Users receive AccessDenied when downloading SSE-KMS encrypted S3 objects cross-account. Which two policies may need changes?

54

An SQS-triggered Lambda repeatedly processes the same poison message. Which two settings help contain the issue?

Watch out for

Common Troubleshooting and Optimization exam traps

  • Answering from memory before reading the full scenario.
  • Missing a constraint such as cost, availability, security, scope or command context.
  • Choosing a broad answer when the question asks for the most specific fix.
  • Ignoring why the wrong options are tempting.

Frequently asked questions

What does the Troubleshooting and Optimization domain cover on the DVA-C02 exam?
Troubleshooting and Optimization questions test whether you can apply the concept in context, not just recognise a definition.
How many questions are in this domain?
This page lists all 54 Troubleshooting and Optimization questions in the DVA-C02 question bank. The actual exam draws from this domain proportionally to its weighting in the official exam blueprint.
What is the best way to practise this domain?
Start with a short focused session (10 questions) to identify gaps, then use the interactive practice page to work through explanations. Repeat with a longer session once the weak areas feel solid.
Can I practise only Troubleshooting and Optimization questions?
Yes — the session launcher on this page filters questions to this domain only. Choose any session length or try the interactive practice page for inline explanations.