Knowledge + Practice

CCNA Troubleshooting and Optimization Questions

75 of 291 questions · Page 2/4 · Troubleshooting and Optimization · Answers revealed

Practice these questions Domain overview All questions

76

Multi-Selecthard

A company is using Amazon ElastiCache for Redis to improve the performance of a high-traffic web application. Recently, the application has been experiencing increased latency. The developer suspects that cache misses are causing the application to read from the database more frequently. Which THREE metrics should the developer examine in Amazon CloudWatch to troubleshoot this issue? (Choose THREE.)

Select 3 answers

A.CacheMisses

B.Evictions

C.SwapUsage

D.CurrConnections

E.CPUUtilization

AnswersA, B, D

High cache misses indicate that the cache is not being used effectively, leading to database reads.

Why this answer

Option A, option B, and option C are correct. Option A: CacheMisses shows the number of requests that did not find a key in the cache. Option B: Evictions indicates keys removed due to memory pressure, which can cause increased misses.

Option C: CurrConnections helps understand if there is a connection bottleneck. Option D (CPUUtilization) is not directly related to cache efficiency. Option E (SwapUsage) indicates memory pressure, which is related to evictions, but CPU is not the best metric for this issue.

The three best are CacheMisses, Evictions, and CurrConnections.

Practice this question →

77

MCQhard

A developer is troubleshooting performance issues in an application that uses Amazon DynamoDB as the primary data store. The application reads a large set of items using a Query operation on a Global Secondary Index (GSI). The developer notices high read latency and throttled requests on the GSI. The base table has sufficient read capacity. The GSI is projected with KEYS_ONLY. Which action would most likely reduce the latency and throttling?

A.Increase the read capacity units (RCU) of the base table.

B.Change the GSI projection to ALL.

C.Increase the read capacity units (RCU) of the GSI.

D.Create a Local Secondary Index instead.

AnswerC

Correct. Throttling on a GSI indicates that the provisioned read capacity of that index is exhausted. Increasing its RCU alleviates throttling and reduces latency.

Why this answer

The correct answer is C because a Global Secondary Index (GSI) has its own provisioned read capacity, separate from the base table. When a Query operation reads from a GSI, it consumes RCUs from the GSI's capacity, not the base table's. Since the base table has sufficient read capacity but the GSI is experiencing throttling and high latency, increasing the GSI's RCU directly addresses the bottleneck by allowing more read requests per second against the index.

Exam trap

The trap here is that candidates often assume increasing the base table's capacity will resolve all read performance issues, failing to recognize that GSIs have independent capacity allocations and that throttling on a GSI requires adjusting the index's RCU, not the base table's.

How to eliminate wrong answers

Option A is wrong because increasing the base table's RCU does not affect the GSI's throughput; the GSI has its own independent capacity settings, and throttling on the GSI is caused by insufficient RCU on the index itself. Option B is wrong because changing the GSI projection to ALL would increase the size of each item returned, consuming more RCUs per query and potentially worsening latency and throttling, not reducing it. Option D is wrong because a Local Secondary Index (LSI) shares the base table's partition key and RCU/WCU, but it does not solve the issue of insufficient read capacity on the index; additionally, LSIs cannot be created after table creation if not initially defined, and they have different partition key constraints that do not address the GSI-specific throttling.

Practice this question →

78

MCQhard

An AWS Lambda function that processes messages from an SQS queue is experiencing throttling (TooManyRequestsException). The function has reserved concurrency set to 100. The SQS queue has a redrive policy configured with maxReceiveCount of 5. CloudWatch metrics show that the function's concurrent executions occasionally spike to 100, and throttling occurs. The function execution time averages 2 seconds. What is the most effective way to reduce throttling?

A.Increase the batch size of the SQS event source mapping

B.Increase the reserved concurrency of the Lambda function

C.Decrease the batch window of the event source mapping

D.Add a dead-letter queue (DLQ) for the Lambda function

AnswerA

A larger batch size means fewer invocations for the same number of messages, reducing the peak concurrent executions and thus throttling.

Why this answer

Increasing the batch size allows the Lambda function to process more messages per invocation, reducing the number of concurrent executions needed to handle the same message volume. Since the function already spikes to its reserved concurrency of 100, processing more messages per batch lowers the invocation rate and thus reduces throttling without requiring additional concurrency.

Exam trap

The trap here is that candidates often assume throttling is always solved by increasing reserved concurrency, but the question explicitly states the function already spikes to its limit, so the correct approach is to reduce the number of invocations by increasing the batch size.

How to eliminate wrong answers

Option B is wrong because reserved concurrency is already at 100 and throttling occurs at that limit; increasing it would raise costs and may hit account-level concurrency limits, but the question asks for the most effective way to reduce throttling given the existing spike, not to increase capacity. Option C is wrong because decreasing the batch window would cause the SQS event source mapping to poll more frequently, increasing the number of invocations and worsening throttling. Option D is wrong because adding a dead-letter queue (DLQ) only captures messages that fail processing after the maxReceiveCount is exceeded; it does not reduce the invocation rate or throttling.

Practice this question →

79

MCQhard

A developer is monitoring an AWS Lambda function that processes events from an Amazon Kinesis stream. The function's CloudWatch metrics show high IteratorAge and the function is often throttled. The function's batch size is 100, maximum record age is 60s, and reserved concurrency is 100. The Kinesis stream has 10 shards, each with 5000 records/sec. Which action is most effective to reduce the IteratorAge and throttle rate?

A.Increase the batch size to 1000

B.Increase the number of shards

C.Decrease the maximum record age

D.Increase the function's memory and CPU allocation

AnswerB

Correct. More shards increase the concurrency of Lambda invocations (each shard processed independently), directly reducing the IteratorAge and throttle rate by providing more parallel processing capacity.

Why this answer

The high IteratorAge indicates that the Lambda function is falling behind in processing records from the Kinesis stream. Throttling occurs because the function's reserved concurrency of 100 is insufficient to handle the total throughput of 10 shards × 5000 records/sec = 50,000 records/sec. Increasing the number of shards (option B) directly increases the parallelism of the stream, allowing more Lambda invocations to run concurrently (up to the reserved concurrency limit) and reducing the backlog, thereby decreasing IteratorAge and throttle rate.

Exam trap

The trap here is that candidates often assume increasing batch size or memory will solve throughput issues, but the real bottleneck is concurrency limits and shard-level parallelism, not per-invocation processing capacity.

How to eliminate wrong answers

Option A is wrong because increasing the batch size to 1000 would cause each invocation to process more records, but it does not increase concurrency; the function is already throttled due to reserved concurrency limits, and larger batches may increase processing time per invocation, worsening IteratorAge. Option C is wrong because decreasing the maximum record age (from 60s to a lower value) would cause records to be discarded sooner, which does not address the root cause of throttling or backlog; it only changes the retention policy for unprocessed records. Option D is wrong because increasing memory and CPU allocation improves per-invocation performance but does not increase the number of concurrent executions; the throttling is due to concurrency limits, not individual invocation speed.

Practice this question →

80

MCQeasy

A developer is troubleshooting an EC2 instance that cannot connect to the internet. The instance has a public IP address and is in a public subnet with a route to an internet gateway. The security group allows all outbound traffic. What is the most likely cause?

A.The subnet's route table does not have a route to an internet gateway.

B.The security group's outbound rules are too restrictive.

C.The network ACL's outbound rules are blocking traffic.

D.The instance does not have a public IP address.

AnswerC

Network ACLs are stateless and must allow both inbound and outbound.

Why this answer

Option D is correct because a missing or misconfigured network ACL can block traffic even if security group allows it. Option A is wrong because the instance has a public IP. Option B is wrong because the route table has a route to an internet gateway.

Option C is wrong because outbound is allowed.

Practice this question →

81

Multi-Selectmedium

A developer is investigating why an Amazon API Gateway REST API is returning 504 errors during peak traffic. The API integrates with a Lambda function. Which TWO factors are MOST likely causing the 504 errors? (Choose TWO.)

Select 2 answers

A.The Lambda function is hitting its reserved concurrency limit and being throttled.

B.The API Gateway cache is full and cannot store responses.

C.The AWS WAF is blocking the request.

D.The Lambda function's configured timeout is too short.

E.The API Gateway account-level throttling limits are exceeded.

AnswersA, D

When Lambda throttles, it returns a 429 error, which API Gateway converts to a 504 because it cannot complete the request.

Why this answer

Option A and Option B are correct. Option A: If the Lambda function's timeout is less than the API Gateway integration timeout (29 seconds), the function may timeout before the API Gateway, causing a 504. Option B: If the Lambda function is throttled, API Gateway receives a 429 from Lambda and returns a 504.

Option C (API Gateway throttling) would cause 429, not 504. Option D (cache) would not cause 504. Option E (WAF) would cause 403.

Practice this question →

82

MCQhard

A developer is troubleshooting an AWS Lambda function that processes messages from an Amazon SQS queue. The function is configured with a batch size of 10 and a maximum concurrency of 5. The function frequently reports errors related to message processing timeouts. The function code is idempotent. Which combination of actions will reduce the number of timeouts and improve processing efficiency?

A.Increase the function timeout to 30 seconds and set the SQS visibility timeout to 6 minutes.

B.Increase the batch size to 20 and increase the function timeout to 30 seconds.

C.Reduce the batch size to 5 and increase the maximum concurrency to 10.

D.Increase the maximum concurrency to 10 and set the SQS visibility timeout to 30 seconds.

AnswerC

Smaller batches reduce processing time per invocation; more concurrency increases throughput.

Why this answer

Option D is correct because reducing batch size lowers the work per invocation, and increasing concurrency allows more parallel processing. Option A is wrong because increasing batch size would increase processing time. Option B is wrong because increasing timeout may not solve the root cause if messages are heavy.

Option C is wrong because increasing concurrency alone may help but not as much as reducing batch size.

Practice this question →

83

MCQhard

A developer is using Amazon API Gateway with a Lambda integration. The API returns a 502 Bad Gateway error. The developer checks the Lambda function logs and finds no invocations. What is the most likely cause?

A.The Lambda function is returning a non-JSON response.

B.The API Gateway does not have permission to invoke the Lambda function.

C.The Lambda function is timing out due to a long-running operation.

D.The Lambda function's memory is insufficient, causing it to crash.

AnswerB

Without a resource-based policy, API Gateway cannot invoke Lambda, leading to a 502.

Why this answer

A 502 Bad Gateway error from API Gateway with Lambda integration typically indicates that the Lambda function either returned an invalid response or was not invoked at all. Since the Lambda logs show no invocations, the most likely cause is that API Gateway lacks the necessary resource-based policy permission to invoke the Lambda function. Without this permission, API Gateway cannot trigger the function, resulting in a 502 error without any invocation record.

Exam trap

The trap here is that candidates often assume a 502 error always means the Lambda function returned malformed data (Option A), but the absence of invocation logs points to a permissions issue, not a response formatting problem.

How to eliminate wrong answers

Option A is wrong because a non-JSON response from Lambda would still result in an invocation (and thus appear in logs), and the 502 would occur after invocation due to response formatting issues. Option C is wrong because a Lambda timeout would still generate an invocation log entry and a 502 error would occur after the timeout, not before any invocation. Option D is wrong because insufficient memory would cause the function to crash during execution, which would still produce an invocation log entry before the crash.

Practice this question →

84

Multi-Selectmedium

A company is using Amazon S3 to store large objects. Users report that uploads are slow. Which THREE actions should the developer take to optimize upload performance?

Select 3 answers

A.Use multipart upload for objects over 100 MB.

B.Use S3 Select to upload only specific parts of the object.

C.Enable S3 Transfer Acceleration.

D.Transition objects to S3 Glacier after upload.

E.Use multiple S3 prefixes to increase request rate.

AnswersA, C, E

Multipart upload improves throughput by parallelizing uploads.

Why this answer

Option A is correct because multipart upload improves performance for large objects. Option B is correct because using S3 Transfer Acceleration reduces latency. Option C is correct because using S3 prefixes increases request rate performance.

Option D is wrong because S3 Select is for retrieving subsets of data, not uploads. Option E is wrong because Glacier is for archival, not for active uploads.

Practice this question →

85

MCQeasy

A developer notices that an EC2 instance running a web application is unreachable via its public IP. The instance passes status checks but security group rules appear correct. What should the developer check NEXT?

A.Verify that the instance has an Elastic IP associated.

B.Check the network ACL associated with the subnet for rules that may block traffic.

C.Review the route table for a route to an internet gateway.

D.Inspect the IAM role attached to the instance for network permissions.

AnswerB

NACLs are stateless and can block traffic even if security groups allow it.

Why this answer

The instance passes status checks and security group rules appear correct, which rules out OS-level and security group issues. Since the instance is unreachable via its public IP, the next logical step is to check the network ACL (NACL) associated with the subnet, because NACLs are stateless and can block inbound or outbound traffic even if security groups allow it. NACLs evaluate rules in order by rule number, and a deny rule (or missing allow rule) for the required ephemeral ports (e.g., 1024-65535 for return traffic) could silently drop packets.

Exam trap

The trap here is that candidates often assume security group rules are the only network filter and overlook the stateless nature of network ACLs, which can block traffic even when security groups are correctly configured.

How to eliminate wrong answers

Option A is wrong because an Elastic IP is not required for public IP reachability; an instance with a public IP assigned by AWS (from the subnet's auto-assign public IP setting) is reachable without an Elastic IP, so this check is premature and not the next step. Option C is wrong because the route table must have a route to an internet gateway for public traffic, but the question states the instance is unreachable via its public IP, and a missing route would typically cause a different symptom (e.g., no connectivity at all) rather than passing status checks; also, route tables are often checked earlier in troubleshooting, but the question specifies security groups appear correct, making NACL the more likely culprit. Option D is wrong because IAM roles control permissions for AWS API actions (e.g., S3, DynamoDB), not network-level traffic to/from the instance; network permissions are governed by security groups and NACLs, not IAM.

Practice this question →

86

MCQeasy

Refer to the exhibit. A developer created this CloudFormation template. After deployment, the stack creation fails with 'Bucket name already exists'. What should the developer do to fix the issue?

A.Change the BucketName to include a random suffix.

B.Remove the MyQueue resource.

C.Remove the VersioningConfiguration from the bucket.

D.Set SqsManagedSseEnabled to false.

AnswerA

Ensures globally unique bucket name.

Why this answer

Option B is correct because the bucket name is derived from the stack name, which might be already used. Changing the bucket name to include a unique suffix will avoid conflicts. Option A is wrong because removing versioning does not affect bucket name uniqueness.

Option C is wrong because the queue is not the issue. Option D is wrong because disabling SSE does not affect naming.

Practice this question →

87

MCQhard

An application running on EC2 instances behind an Application Load Balancer is experiencing high error rates. The ALB target group health checks are failing. The instances are in an Auto Scaling group with a minimum of 2 and maximum of 10. What should a developer do to troubleshoot?

A.Check the EC2 instance system log and screenshot.

B.Review the ALB access logs.

C.Modify the Auto Scaling group's scaling policy.

D.Increase the maximum size of the Auto Scaling group.

AnswerA

Can reveal OS boot issues or application crashes.

Why this answer

Option A is correct because checking the instance system log and screenshot helps diagnose OS-level issues. Option B is wrong because that only shows traffic. Option C is wrong because scaling policies don't affect health check failures.

Option D is wrong because increasing max size doesn't fix existing instances.

Practice this question →

88

MCQmedium

Why is the Lambda function not being invoked?

A.The Lambda execution role does not have permission to be invoked by S3.

B.The Lambda permission does not specify the correct source account.

C.The Lambda function has a runtime that is not supported.

D.The S3 bucket does not have a notification configuration for the Lambda function.

AnswerD

Missing NotificationConfiguration property in the bucket.

Why this answer

The S3 bucket has no notification configuration to trigger the Lambda function. The Lambda permission allows S3 to invoke, but the bucket must have a NotificationConfiguration event. Option D is correct.

Option A is wrong because runtime is supported. Option B is wrong because permission is correct. Option C is wrong because the role is for Lambda execution, not S3 invocation.

Practice this question →

89

MCQmedium

A developer is troubleshooting an AWS Lambda function that is triggered by an S3 event. The function occasionally fails with a timeout error. CloudWatch logs show that the timeout occurs during the processing of large files. The function has a memory setting of 128 MB and a timeout of 3 seconds. The developer wants to process large files without modifying the code. Which parameter should the developer adjust first?

A.Increase the function's memory

B.Increase the function's timeout

C.Increase the function's reserved concurrency

D.Increase the S3 event notification batch size

AnswerA

More memory provides more CPU, which can speed up processing and reduce the chance of timeout without code changes.

Why this answer

Increasing the function's memory is the correct first step because Lambda allocates CPU proportionally to memory, and more CPU reduces processing time for CPU-bound tasks like decompressing or parsing large files. This directly addresses the timeout by making the function complete faster, without requiring code changes. The current 128 MB setting is the minimum, which provides the least CPU, so even a modest increase can significantly reduce execution time.

Exam trap

The trap here is that candidates often assume a timeout error must be fixed by increasing the timeout, but the question explicitly states the timeout occurs during processing of large files, indicating a performance bottleneck that memory (and thus CPU) increase can resolve without code changes.

How to eliminate wrong answers

Option B is wrong because increasing the timeout alone does not speed up processing; it only allows the function to run longer, which may mask the underlying performance issue but does not prevent future timeouts on even larger files. Option C is wrong because reserved concurrency controls the number of concurrent executions, not the execution duration of a single invocation; it would not resolve a timeout caused by slow processing. Option D is wrong because the S3 event notification batch size controls how many events are sent per invocation, not the processing speed of a single file; increasing it would only make the function handle more files per invocation, worsening the timeout.

Practice this question →

90

MCQmedium

A developer needs to trace a request across API Gateway, Lambda, and downstream AWS service calls. Which service should be enabled?

A.AWS X-Ray

B.AWS Budgets

C.AWS Artifact

D.AWS License Manager

AnswerA

Correct for the stated requirement.

Why this answer

AWS X-Ray is the correct service because it provides end-to-end tracing for requests flowing through distributed applications, including API Gateway, Lambda functions, and downstream AWS services like DynamoDB or S3. It captures trace data as the request traverses each component, allowing developers to identify performance bottlenecks and errors across the entire request path. X-Ray integrates natively with API Gateway and Lambda via the X-Ray SDK or active tracing configuration, requiring no code changes for basic tracing.

Exam trap

The trap here is that candidates may confuse AWS X-Ray with CloudWatch Logs or CloudTrail, thinking those services provide the same distributed tracing capability, but X-Ray is the only service that correlates trace data across multiple components in a single request.

How to eliminate wrong answers

Option B (AWS Budgets) is wrong because it is a cost management service that monitors AWS spending and sends alerts when usage exceeds thresholds, not a tracing or observability tool. Option C (AWS Artifact) is wrong because it provides access to AWS compliance reports, security documentation, and agreements, such as SOC and PCI reports, not request tracing capabilities. Option D (AWS License Manager) is wrong because it manages software licenses (e.g., Microsoft, Oracle) to prevent license violations, and has no role in tracing API requests or debugging distributed applications.

Practice this question →

91

MCQeasy

A developer is using AWS X-Ray to trace requests through a microservices application. One of the services, Service B, is not appearing in the trace map. What is the MOST likely reason?

A.Service B is using HTTP/2, which is not supported by X-Ray.

B.Service B is running in a different AWS region.

C.The X-Ray sampling rate is set too low.

D.Service B is not instrumented with the X-Ray SDK.

AnswerD

Without instrumentation, X-Ray cannot receive trace data from that service.

Why this answer

For X-Ray to trace requests across services, each service must be instrumented with the X-Ray SDK. If Service B is not instrumented, it won't send trace data, and it won't appear in the trace map.

Practice this question →

92

MCQmedium

A developer is optimizing a Node.js Lambda function that processes CSV files from S3. The function reads the entire file into memory, processes it, and writes results to DynamoDB. For large files, the function runs out of memory. What is the MOST effective optimization?

A.Increase the Lambda timeout to allow more processing time.

B.Increase the Lambda function memory to 3008 MB.

C.Use the AWS SDK's S3 GetObject with a stream and process in chunks.

D.Use S3 Select to retrieve only necessary columns.

AnswerC

Streaming prevents loading entire file into memory.

Why this answer

Option D is correct because streaming the file from S3 avoids loading the entire file into memory. Option A is wrong because increasing Lambda timeout does not address memory. Option B is wrong because increasing memory may help but is less efficient than streaming.

Option C is wrong because S3 Select is for filtering, not streaming.

Practice this question →

93

MCQhard

A company deploys a serverless application using AWS Lambda, Amazon API Gateway, and Amazon DynamoDB. The application allows users to retrieve data by calling a REST API. Recently, users have reported that some requests return HTTP 500 errors. The developer investigates and finds that the Lambda function logs show occasional 'ProvisionedThroughputExceededException' errors when writing to a DynamoDB table. The table has provisioned read capacity of 5 and write capacity of 5. The Lambda function is configured with a reserved concurrency of 10. The developer wants to minimize errors without significantly increasing costs. Which action should the developer take?

A.Enable auto scaling for the DynamoDB table with a minimum write capacity of 5 and a maximum of 10.

B.Add a DynamoDB Accelerator (DAX) cluster in front of the table.

C.Increase the reserved concurrency for the Lambda function to 20.

D.Implement retry logic with exponential backoff in the Lambda function for DynamoDB write operations.

AnswerA

Auto scaling adjusts capacity based on actual traffic, reducing throttling while keeping costs low.

Why this answer

Option A is correct because enabling DynamoDB auto scaling allows the table to handle bursts beyond provisioned capacity, reducing throttling while controlling cost. Option B is wrong because increasing reserved concurrency would cause more Lambda invocations and more writes, potentially worsening the throttling. Option C is wrong because adding retries with exponential backoff in the Lambda function may help, but it does not address the root cause of insufficient write capacity; it only reduces the immediate error rate but could increase latency and costs due to retries.

Option D is wrong because using DAX is for read-heavy workloads and does not help with write capacity.

Practice this question →

94

MCQmedium

A developer is deploying a new version of an AWS Lambda function using the AWS CLI. The deployment fails with a 'ResourceConflictException' error. What is the MOST likely cause?

A.Another deployment is currently in progress for the same Lambda function.

B.The Lambda function code exceeds the maximum allowed size.

C.The Lambda function has an alias that conflicts with the version number.

D.The IAM role associated with the Lambda function does not have sufficient permissions.

AnswerA

Lambda does not allow concurrent updates to the same function.

Why this answer

Option A is correct because the error indicates that the function code or configuration is being updated while a previous update is still in progress. Option B is wrong because the error is not related to IAM permissions. Option C is wrong because publishing a new version does not conflict with an alias.

Option D is wrong because the error is not about exceeding the function code size limit.

Practice this question →

95

MCQhard

A developer is using AWS Lambda with a VPC configuration. The function needs to access an Amazon RDS instance in the same VPC. The function is timing out after 3 seconds. What is the MOST likely cause?

A.The Lambda function's execution role does not have rds:Connect permission.

B.The Lambda function's security group does not allow outbound traffic to the RDS instance.

C.The Lambda function does not have an RDS proxy configured.

D.The Lambda function timeout is set too low.

AnswerB

The Lambda function's security group must allow outbound traffic to the RDS security group on the database port.

Why this answer

Option C is correct because Lambda functions in a VPC need a NAT gateway or VPC endpoints to access the internet, but to access RDS in the same VPC, they need a route to the RDS subnet. However, the most common cause of timeout is missing a route to the RDS subnet via a VPC peering or transit gateway, but here the RDS is in the same VPC, so the issue is likely that the Lambda function's security group does not allow outbound traffic to the RDS security group. Option A is wrong because RDS proxy is not required.

Option B is wrong because the function can access the database directly via its private IP. Option D is wrong because the function timeout is set to 3 seconds, which is the default; it could be increased, but the root cause is connectivity.

Practice this question →

96

MCQmedium

A developer is troubleshooting an AWS Lambda function that writes to an S3 bucket. The function is configured with a resource-based policy that allows the S3 service to invoke the function. However, the function fails with an access denied error when trying to write to S3. What is the MOST likely cause?

A.The Lambda function is configured in a VPC without an S3 VPC endpoint.

B.The Lambda function's execution role does not have an IAM policy that allows s3:PutObject.

C.The Lambda function's trigger (S3 event notification) is misconfigured.

D.The S3 bucket policy does not grant the Lambda function write access.

AnswerB

Execution role must have S3 write permissions.

Why this answer

Option D is correct because the Lambda function needs an execution role with permissions to write to S3. The resource-based policy only allows S3 to invoke the function, not the function to write. Option A is wrong because bucket policy is not needed if the execution role has permissions.

Option B is wrong because the function can be triggered correctly. Option C is wrong because VPC does not cause access denied.

Practice this question →

97

MCQmedium

A Lambda function processing SQS messages is failing with concurrency errors. The function is configured with reserved concurrency of 5. The SQS queue has a batch size of 10. What is the most effective way to prevent throttling?

A.Reduce the batch size to 1 to spread out invocations.

B.Increase the Lambda function memory to get more concurrency.

C.Increase the reserved concurrency to a higher value.

D.Set the SQS queue's concurrency limit to match the Lambda reserved concurrency.

AnswerC

More reserved concurrency prevents throttling.

Why this answer

Option C is correct because the function is throttling due to insufficient reserved concurrency. With a batch size of 10, each SQS batch triggers one invocation, but the function's reserved concurrency of 5 limits concurrent executions to 5. Increasing reserved concurrency allows more concurrent invocations to handle the SQS messages without throttling.

Exam trap

The trap here is that candidates often confuse batch size with concurrency, thinking reducing batch size reduces load, but it actually increases invocation count and worsens throttling.

How to eliminate wrong answers

Option A is wrong because reducing the batch size to 1 would increase the number of invocations per message, worsening concurrency pressure and potentially increasing throttling. Option B is wrong because increasing Lambda memory does not affect concurrency limits; memory and concurrency are independent settings. Option D is wrong because SQS queues do not have a configurable concurrency limit; Lambda's event source mapping manages polling, and setting a non-existent queue concurrency limit is not a valid action.

Practice this question →

98

MCQmedium

The developer invokes a Lambda function using the AWS CLI and gets the output shown. What is the most likely cause of the error?

A.The Lambda function code has a syntax error.

B.The Lambda function's execution role lacks permissions.

C.The event payload does not contain the expected data.

D.The Lambda function timed out.

AnswerC

The code assumes a property exists on the event object, but it is undefined.

Why this answer

The error 'Cannot read property 'x' of undefined' indicates that the code is trying to access a property on an undefined object. This is a runtime error in the function code, likely due to the event payload not having the expected structure. Option A is wrong because the function returned a 200 status code, meaning it was invoked successfully but the code threw an error.

Option B is wrong because the error is in the code logic, not syntax (syntax errors would prevent invocation). Option C is wrong because the execution role permissions affect access to other AWS services, not code errors. Option D is correct: the payload does not contain the expected property, causing the code to fail.

Practice this question →

99

Multi-Selectmedium

An API Gateway API returns 429 errors during load testing. Which two areas should the developer investigate first?

Select 2 answers

A.Usage plan or stage throttling limits

B.S3 lifecycle expiration rules

C.CloudFormation stack drift only

D.Account-level or method-level API Gateway throttling

AnswersA, D

Correct for the stated requirement.

Why this answer

A is correct because API Gateway uses usage plans and stage-level throttling to limit request rates. When a client exceeds the configured rate limit (e.g., 10,000 requests per second) or burst limit, API Gateway returns a 429 Too Many Requests error. Investigating these limits is the first step to identify if the load test is hitting predefined caps.

Exam trap

The trap here is that candidates may overlook account-level throttling (option D) as a separate investigation area, but both usage plan/stage limits and account/method-level limits are valid first checks for 429 errors.

Practice this question →

100

MCQeasy

A developer is using Amazon DynamoDB with provisioned throughput. The application is receiving ProvisionedThroughputExceededException errors. What is the BEST way to handle this error?

A.Contact AWS Support to increase the DynamoDB service limits.

B.Reduce the read and write capacity units.

C.Implement exponential backoff and retry in the application code.

D.Switch the table to on-demand capacity mode.

AnswerC

Exponential backoff allows the application to retry after a delay, reducing the chance of further throttling.

Why this answer

Option D is correct because the best practice is to implement exponential backoff and retry logic. Option A is wrong because the error is transient, not a permanent issue. Option B is wrong because switching to on-demand would resolve the issue but is not the best way to handle the error; it's a capacity change, not an error handling strategy.

Option C is wrong because reducing read capacity would make the problem worse.

Practice this question →

101

Multi-Selecthard

A developer is optimizing an application that uses Amazon DynamoDB. The application reads items by primary key and also performs queries on a Global Secondary Index (GSI). The developer notices that some queries on the GSI are slow. Which TWO actions would improve the performance of GSI queries? (Choose TWO.)

Select 2 answers

A.Use a more distributed GSI partition key to avoid hot partitions.

B.Ensure that the GSI key attribute exists on all items that need to be queried.

C.Increase the read capacity of the GSI.

D.Use strongly consistent reads on the GSI.

E.Increase the write capacity of the DynamoDB table.

AnswersA, B

A hot partition can cause throttling; a more distributed key spreads read traffic evenly.

Why this answer

Option A and Option B are correct. Option A: If the GSI is sparse (not all items have the GSI key attribute), the index may not have enough data to distribute reads; adding a filter condition can help. Actually, ensuring that the GSI key attribute exists on all items that need to be queried can improve performance.

Option B: Overloading the GSI partition key can cause hot partitions; using a more distributed key pattern helps. Option C (adjust write capacity) does not directly help read performance on GSI. Option D (use strong consistency) would actually increase latency.

Option E (increase read capacity) could help if the issue is throttling, but the question asks for two actions. Let me reconsider: The correct options should be: B (use a more distributed GSI partition key) and E (increase read capacity on the index) if the issue is throttling. However, increasing read capacity on the index is not directly possible; you increase read capacity on the table, which is shared with GSIs.

Actually, GSIs use the same read capacity units as the table. So option E is not accurate. Option C says 'Increase the write capacity of the table' which would not help read performance.

Option D 'Use strongly consistent reads' would be slower. So the best two are: B (redistribute keys) and A (ensure GSI key attribute exists on queried items) to avoid sparse index. Let me adjust the correct answers.

Practice this question →

102

MCQeasy

A developer deploys a new version of an AWS Lambda function using the AWS CLI. After deployment, the function returns stale results. What is the most likely cause?

A.The function's environment variables are cached and not updated.

B.The Lambda function alias is still pointing to the previous version.

C.The Amazon CloudFront distribution is caching the old response.

D.The Lambda function's code is cached by the Lambda service.

AnswerB

Aliases are used to point to specific versions; if not updated, the old code runs.

Why this answer

When a developer deploys a new version of a Lambda function using the AWS CLI without updating the function alias, the alias continues to point to the previous version. Invoking the function via the alias (e.g., via an API Gateway endpoint or a CloudFront origin) will execute the old code, returning stale results. The `$LATEST` version is updated, but unless the alias is repointed, it does not automatically use the new code.

Exam trap

The trap here is that candidates may assume deploying new code automatically updates the invoked version, overlooking that aliases must be explicitly repointed to the new version to change which code is executed.

How to eliminate wrong answers

Option A is wrong because environment variables are not cached; they are read from the function's configuration at invocation time and are updated immediately when the function is deployed with new environment variables. Option C is wrong because CloudFront caching is a separate concern; while it can serve stale responses, the question states the function itself returns stale results, and CloudFront would only cache the HTTP response, not the Lambda execution output directly. Option D is wrong because the Lambda service does not cache the function's code in a way that persists across deployments; the new code is immediately available when the function version is updated, and the issue is about which version is being invoked, not code caching.

Practice this question →

103

MCQhard

A company runs a microservices application on Amazon ECS with Fargate. The application includes a service that processes messages from an SQS queue. The service's CPU utilization is consistently above 80%, and messages are accumulating in the queue. The service is configured with a desired count of 2 tasks and auto scaling based on CPU utilization. What should a developer do to improve message processing throughput?

A.Increase the desired count of tasks to 5.

B.Increase the task size to use more CPU and memory.

C.Change the auto scaling metric to use the SQS queue's ApproximateNumberOfMessagesVisible.

D.Decrease the batch size of messages polled from SQS.

AnswerC

Queue-based scaling is more direct and responsive.

Why this answer

Option A is correct because using SQS-based metrics (e.g., ApproximateNumberOfMessagesVisible) for auto scaling is more responsive to queue depth than CPU utilization. Option B is wrong because increasing task size may help but does not address the scaling trigger. Option C is wrong because increasing desired count without scaling policy will not adapt.

Option D is wrong because reducing batch size reduces throughput.

Practice this question →

104

Multi-Selectmedium

Which TWO actions should a developer take to optimize cost and performance for a Lambda function that processes real-time streaming data from Amazon Kinesis? (Choose 2.)

Select 2 answers

A.Enable provisioned concurrency to reduce cold starts.

B.Increase the Lambda function memory to improve processing speed.

C.Use a larger Kinesis shard count.

D.Enable the parallelization factor to process multiple batches concurrently per shard.

E.Increase the batch size to process more records per invocation.

AnswersD, E

Parallelization factor improves throughput without increasing shards.

Why this answer

Increasing batch size and enabling parallelization factor reduce the number of Lambda invocations and improve throughput. Provisioned concurrency is for latency, not cost/performance optimization for Kinesis. Increasing memory may be needed but not directly for cost optimization.

Practice this question →

105

MCQmedium

A developer is debugging an issue where an Amazon S3 bucket policy is not allowing cross-account access for a user from another AWS account. The bucket policy grants access to the other account's root user. The IAM user in the other account has an IAM policy that allows s3:GetObject on the bucket. When the user tries to download an object, they get an Access Denied error. What is the most likely cause?

A.The bucket is encrypted with SSE-KMS and the user does not have kms:Decrypt permission

B.The bucket policy does not specify the user's ARN

C.The object's ACL is set to private

D.The IAM policy does not include s3:ListBucket

AnswerA

SSE-KMS requires explicit kms:Decrypt permission on the customer master key. Without it, even valid S3 permissions result in Access Denied.

Why this answer

The most likely cause is that the bucket is encrypted with SSE-KMS. When an S3 bucket uses AWS KMS customer master keys (CMKs) for server-side encryption, the bucket policy granting access to the root user of the other account is not sufficient. The IAM user in the other account must also have explicit kms:Decrypt permission on the KMS key, because S3 GetObject calls require decrypting the object before returning it.

Without this KMS permission, the request fails with Access Denied even though the S3 bucket policy and IAM policy appear correct.

Exam trap

The trap here is that candidates assume a valid S3 bucket policy and IAM policy are sufficient, forgetting that KMS encryption adds an independent authorization layer that requires explicit kms:Decrypt permissions, which is a common oversight in cross-account S3 access scenarios.

How to eliminate wrong answers

Option B is wrong because the bucket policy grants access to the other account's root user, which covers all IAM users and roles in that account by default; specifying the individual user's ARN is not required. Option C is wrong because object ACLs are evaluated after bucket policies, and if the bucket policy explicitly grants access, a private object ACL would be overridden (unless the bucket policy has a condition denying access). Option D is wrong because s3:ListBucket is only needed for listing objects (e.g., GET Bucket (List Objects) requests), not for downloading a specific object using s3:GetObject.

Practice this question →

106

MCQhard

A developer is troubleshooting an AWS Lambda function that processes streaming data from Amazon Kinesis Data Streams. The function processes records in batches. The developer notices that the function often experiences high latency even though the average invocation rate is well below the account concurrency limit. Which action would MOST effectively reduce latency?

A.Increase the batch size for the Kinesis event source mapping.

B.Enable reserved concurrency for the function.

C.Increase the number of shards in the Kinesis data stream.

D.Use provisioned concurrency for the function.

AnswerD

Provisioned concurrency keeps execution environments initialized, eliminating cold starts and reducing invocation latency.

Why this answer

Provisioned concurrency pre-warms a specified number of execution environments, eliminating cold starts and ensuring that the function can handle sudden bursts of traffic without latency spikes. Since the function processes streaming data from Kinesis and experiences high latency despite low average concurrency, the issue is likely cold starts or initialization overhead, which provisioned concurrency directly mitigates.

Exam trap

The trap here is that candidates often confuse concurrency limits (reserved concurrency) with performance optimization, or assume that increasing shards or batch sizes will always reduce latency, when in fact the root cause is cold start latency that provisioned concurrency directly addresses.

How to eliminate wrong answers

Option A is wrong because increasing the batch size may reduce the number of invocations but can increase per-record processing latency and risk of timeout, especially if records are large or processing is complex. Option B is wrong because reserved concurrency only caps the maximum concurrency for the function to prevent it from competing with other functions; it does not reduce latency or address cold starts. Option C is wrong because increasing the number of shards increases parallelism and throughput but does not reduce per-invocation latency caused by cold starts or initialization; it may even exacerbate the problem by creating more concurrent invocations that each face cold starts.

Practice this question →

107

MCQmedium

A developer is troubleshooting an AWS Lambda function that writes items to an Amazon DynamoDB table. The function frequently fails with ProvisionedThroughputExceededException. The table has provisioned write capacity of 500 write capacity units (WCUs). The function has reserved concurrency of 10, and each invocation writes 10 items of approximately 1 KB each. There are no other writers to the table. What is the most likely cause of the throttling?

A.The function is writing to a single partition key, causing hot partition throttling

B.The function's reserved concurrency is too high, causing excessive write requests

C.The table's read capacity is insufficient, causing write throttling

D.The function is exceeding the DynamoDB item size limit of 400 KB

AnswerA

Correct. Hot partitions can cause throttling even when table-level capacity is sufficient. The function's writes are likely concentrated on one partition key, exceeding that partition's throughput limits.

Why this answer

The most likely cause is that the function is writing to a single partition key, creating a 'hot partition' that exceeds the 1,000 WCU per-partition limit (or 3,000 WCU for burst capacity) even though the table's total provisioned capacity of 500 WCU is not exhausted. DynamoDB distributes throughput across partitions based on the partition key; if all writes target the same key, they are throttled at the partition level regardless of table-level capacity.

Exam trap

The trap here is that candidates assume table-level provisioned capacity is the only throttle boundary, but DynamoDB enforces per-partition throughput limits, so a hot partition can cause throttling even when the table's total WCU is not fully consumed.

How to eliminate wrong answers

Option B is wrong because reserved concurrency of 10 limits the number of concurrent invocations, and each invocation writes 10 items (10 KB total), so at most 100 items/second (100 KB/s) are written, which is well within the 500 WCU table capacity (each 1 KB item consumes 1 WCU). Option C is wrong because read capacity is irrelevant to write throttling; ProvisionedThroughputExceededException is specific to write capacity, and the table has sufficient write capacity. Option D is wrong because the DynamoDB item size limit is 400 KB, and each item is approximately 1 KB, so size is not a factor.

Practice this question →

108

Matchingmedium

Match each AWS storage class to its description.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts

Matches

Frequent access, low latency

Automatic cost optimization

Long-term archival

Infrequent access, single AZ

Lowest cost retrieval

Why these pairings

S3 storage classes are important for cost management.

Practice this question →

109

MCQhard

An application uses an Auto Scaling group with a launch configuration that includes a user data script to configure instances. After a scaling event, new instances launch but fail to register with the target group. The existing instances continue to work. What should the developer do to resolve this issue?

A.Modify the existing launch configuration with the correct user data

B.Create a new launch configuration with corrected user data and update the Auto Scaling group

C.Update the Auto Scaling group to use the latest launch configuration version

D.Delete and recreate the Auto Scaling group

AnswerB

Creating a new launch configuration and associating it with the Auto Scaling group will ensure new instances use the correct user data.

Why this answer

Option D is correct because the launch configuration is immutable; changes to it are not applied to existing instances. Updating the launch configuration and launching new instances will use the corrected user data. Option A is wrong because the Auto Scaling group does not automatically update launch configurations.

Option B is wrong because modifying the launch configuration does not affect running instances. Option C is wrong because recreating the Auto Scaling group is unnecessary.

Practice this question →

110

Multi-Selecthard

An SQS-triggered Lambda repeatedly processes the same poison message. Which two settings help contain the issue?

Select 2 answers

A.Configure maxReceiveCount and a dead-letter queue

B.Disable CloudWatch Logs

C.Use partial batch response or failure reporting where applicable

D.Set message retention to zero

AnswersA, C

Correct for the stated requirement.

Why this answer

Option A is correct because setting a maxReceiveCount on the SQS queue limits how many times a message can be received before it is automatically moved to a dead-letter queue (DLQ). This prevents the Lambda function from repeatedly processing the same poison message, as the message is redirected to the DLQ after exceeding the threshold, allowing you to isolate and analyze the failure.

Exam trap

The trap here is that candidates often confuse message retention period (how long a message stays in the queue) with receive count limits, and they may overlook that disabling CloudWatch Logs only hides the problem rather than solving it.

Practice this question →

111

MCQhard

Refer to the exhibit. A developer attached this S3 bucket policy to my-bucket. Users from IP 10.0.0.5 can access objects, but users from IP 10.0.1.5 cannot. What is the most likely reason?

A.The bucket policy does not apply to users.

B.The action should be s3:GetObjectVersion.

C.The resource ARN is incorrect.

D.The IP address condition restricts access to a specific range.

AnswerD

Only IPs in 10.0.0.0/24 are allowed.

Why this answer

Option D is correct because the policy only allows access from the IP range 10.0.0.0/24. Users from 10.0.1.5 are outside this range. Option A is wrong because the resource is correct.

Option B is wrong because the action is correct. Option C is wrong because the bucket policy allows access.

Practice this question →

112

MCQeasy

A developer is troubleshooting a slow-running query in Amazon RDS for MySQL. The query is used by a reporting dashboard. Which AWS service should the developer use to identify the bottleneck?

A.AWS X-Ray

B.AWS CloudTrail

C.Amazon RDS Performance Insights

D.Amazon CloudWatch Logs

AnswerC

Performance Insights is designed for database performance troubleshooting.

Why this answer

Amazon RDS Performance Insights provides a detailed analysis of database performance, including wait events and SQL query performance, helping identify bottlenecks.

Practice this question →

113

MCQmedium

A developer is deploying a serverless application using AWS SAM. The application includes an API Gateway REST API and several Lambda functions. The developer runs 'sam deploy' and the deployment succeeds. However, when the developer tests the API endpoint using curl, the request times out. The CloudWatch logs for the Lambda function show that the function is not being invoked. The API Gateway logs are not enabled. The developer checks the API Gateway console and sees that the integration type is 'AWS Service' instead of 'Lambda Function'. The developer used the following SAM template snippet: Resources: MyApi: Type: AWS::Serverless::Api Properties: StageName: Prod DefinitionBody: swagger: 2.0 info: title: My API paths: /items: get: x-amazon-apigateway-integration: type: aws_proxy uri: !Sub arn:aws:apigateway:${AWS::Region}:lambda:path/2015-03-31/functions/${MyFunction.Arn}/invocations responses: {} MyFunction: Type: AWS::Serverless::Function Properties: CodeUri: ./src Handler: index.handler Runtime: nodejs14.x Events: ApiEvent: Type: Api Properties: RestApiId: !Ref MyApi Path: /items Method: GET What is the most likely cause of the timeout?

A.The API Gateway endpoint is not publicly accessible due to a resource policy.

B.The Lambda function's code is throwing an unhandled exception before it can log.

C.The Lambda function has reached the concurrency limit.

D.The API Gateway integration type is misconfigured; the SAM template should use 'AWS::Serverless::Function' event source instead of manual Swagger integration.

AnswerD

The manual Swagger integration may conflict with the event source, causing incorrect integration.

Why this answer

The SAM template defines the API using both the 'AWS::Serverless::Api' resource with a Swagger definition inline and the 'AWS::Serverless::Function' with an Api event. This may cause a conflict or incorrect configuration. Option D is correct: the 'x-amazon-apigateway-integration' type is set to 'aws_proxy' but the URI is incorrect.

However, the API Gateway integration type shown as 'AWS Service' indicates that the integration was not properly set up. The most likely cause is that the SAM template incorrectly defines the integration, leading to a misconfiguration. Option A is wrong because the timeout is not due to Lambda limits.

Option B is wrong because the code is not invoked. Option C is wrong because the API is correctly set up for public access.

Practice this question →

114

MCQhard

A developer deployed the above AWS SAM template. Messages are not being processed; they end up in the DeadLetterQueue after 3 receives. The Lambda function timeout is 30 seconds. What is the most likely cause?

A.The maxReceiveCount is too low; increase it to 5.

B.The batch size is too high, causing the function to timeout.

C.The Lambda function does not have permission to poll the SQS queue.

D.The SQS visibility timeout is equal to the Lambda function timeout.

AnswerD

If the function runs for the full timeout, the message becomes visible again, leading to duplicate processing and eventual DLQ.

Why this answer

Option A is correct because the VisibilityTimeout (30 seconds) equals the Lambda timeout (30 seconds). If the function takes near 30 seconds, the message becomes visible again before processing completes, causing another receive and eventually moving to DLQ. Option B is wrong because the function has the SQSPollerPolicy.

Option C is wrong because BatchSize 10 is fine. Option D is wrong because a higher maxReceiveCount would delay DLQ, not prevent it.

Practice this question →

115

MCQhard

A company has a production application running on Amazon ECS with Fargate. The application consists of a front-end service and a backend service that processes orders. The backend service consumes messages from an Amazon SQS queue and writes order records to an Amazon DynamoDB table. Recently, during a marketing campaign, traffic increased significantly, causing the backend service to fall behind processing messages. The SQS queue depth grew to over 100,000 messages, and some orders were not processed in time, leading to customer complaints. The operations team noticed that the ECS service's CPU utilization never exceeded 60%, and memory utilization was around 50%. The service is configured with a desired count of 2 tasks and a target tracking scaling policy based on average CPU utilization. The DynamoDB table has on-demand capacity mode. After analyzing the logs, the development team found that each message processing takes about 2 seconds, but the backend service has a bottleneck: it makes an HTTP call to a third-party API that sometimes takes up to 10 seconds to respond. The team wants to optimize the architecture to handle traffic spikes better without over-provisioning resources. Which solution is MOST effective?

A.Increase the batch size of messages polled from SQS to 20 to process more messages per task.

B.Refactor the backend service to send the order processing request to a separate SQS queue and have a dedicated set of tasks poll that queue to make the HTTP call. This decouples the main processing from the slow API call.

C.Increase the desired count of the ECS service to 10 tasks and set a target tracking scaling policy based on SQS queue depth.

D.Change the scaling policy to use memory utilization instead of CPU, since CPU is underutilized.

AnswerB

Offloading the slow call improves throughput.

Why this answer

Option A is correct because the bottleneck is the third-party API call. Moving the call to a separate step function or SQS queue allows the ECS service to offload that work and process more messages concurrently. Option B is wrong because increasing tasks may not help if the bottleneck is the API latency.

Option C is wrong because changing scaling metric to memory is not addressing the root cause. Option D is wrong because increasing batch size may increase per-invocation time and exacerbate the bottleneck.

Practice this question →

116

MCQhard

A developer is troubleshooting an AWS Lambda function that is invoked from an Amazon S3 bucket via event notifications. The function processes images and stores metadata in Amazon DynamoDB. The developer notices that some images are being processed multiple times, resulting in duplicate entries in DynamoDB. The S3 event notification is configured to send events to the Lambda function with the 's3:ObjectCreated:*' event type. The function uses the 'uuid' library to generate a unique ID for each image upon processing. What is the most likely cause of the duplicate processing?

A.S3 event notifications are delivered at least once, and the Lambda function is not idempotent.

B.The Lambda function's concurrency is set too high, causing race conditions.

C.The DynamoDB table does not have a primary key that prevents duplicates.

D.The S3 bucket is configured with versioning, causing multiple object creation events.

AnswerA

S3 can send the same event multiple times. Without idempotency checks (e.g., using the S3 object key as the DynamoDB primary key), each event creates a new item, causing duplicates.

Why this answer

Amazon S3 event notifications are delivered on an 'at least once' basis, meaning the same event can be sent to Lambda multiple times. If the Lambda function is not idempotent—i.e., processing the same event multiple times produces duplicate side effects—then duplicate DynamoDB entries will occur. The use of a 'uuid' library inside the function does not help because a new UUID is generated on each invocation, so the same image gets different IDs and is stored as a separate item each time.

Exam trap

The trap here is that candidates assume generating a unique ID inside the function solves duplication, but they miss that idempotency requires using a stable, external identifier (like the S3 object key) to detect and skip already-processed events.

How to eliminate wrong answers

Option B is wrong because high concurrency can cause race conditions, but the core issue here is duplicate event delivery, not concurrent writes; even with low concurrency, duplicate events would still be processed. Option C is wrong because the DynamoDB table's primary key design does not cause duplicate processing; it only affects whether duplicate writes are rejected or overwritten—the problem is that the function is invoked multiple times for the same image. Option D is wrong because S3 versioning generates separate object versions, each with a unique version ID, and the 's3:ObjectCreated:*' event fires once per version; versioning does not cause multiple events for the same object version.

Practice this question →

117

Multi-Selecthard

A developer is troubleshooting an AWS Lambda function that is invoked by an Amazon S3 bucket notification. The function processes new objects and writes results to a DynamoDB table. Recently, some objects are not being processed. The developer checks the CloudWatch Logs for the Lambda function and sees no errors. Which TWO actions should the developer take to investigate the issue?

Select 2 answers

A.Enable DynamoDB Streams on the table and process records.

B.Increase the Lambda function's memory allocation.

C.Check the S3 bucket notification configuration to ensure it is properly set for the correct events.

D.Configure a dead-letter queue on the Lambda function to capture unprocessed events.

E.Review the Lambda function's CloudWatch Logs for timeout messages.

AnswersC, D

Misconfigured notifications can cause some events to not trigger the function.

Why this answer

Options B and D are correct. Option B: S3 event notifications can fail if the event is not configured correctly; checking the bucket notification configuration ensures the events are being sent. Option D: Lambda dead-letter queues (DLQ) capture events that failed to be processed; if configured, they can reveal unprocessed events.

Option A is wrong because the function logs show no errors, indicating the function is not invoked for those objects. Option C is wrong because DynamoDB stream is not involved in this flow. Option E is wrong because the function is invoked, but the issue is before invocation.

Practice this question →

118

MCQeasy

The exhibit shows a CloudFormation template that creates an S3 bucket with versioning enabled. After deploying the stack, a developer uploads an object to the bucket. Later, the developer updates the object by uploading a new version. The developer wants to retrieve the original object. What is the correct way to do this?

A.Restore the original object using the S3 Object Lambda.

B.Use the S3 Batch Operations to revert to the original version.

C.The original object is overwritten and cannot be retrieved.

D.Use the S3 console or CLI to list object versions and retrieve the version ID of the original object.

AnswerD

Versioning stores all versions; you can retrieve by version ID.

Why this answer

Option A is correct because versioning allows preserving and retrieving previous versions. Option B is wrong because there is no undo in S3. Option C is wrong because the original is still present as a previous version.

Option D is wrong because only the latest version is returned by default.

Practice this question →

119

Multi-Selecthard

An API backed by Lambda returns high p95 latency after deployment. Which two telemetry sources are most useful first?

Select 2 answers

A.AWS Billing console only

B.CloudWatch Lambda duration/init duration/logs

C.S3 Inventory reports

D.X-Ray traces across API Gateway and Lambda

AnswersB, D

Correct for the stated requirement.

Why this answer

CloudWatch Lambda duration and init duration metrics directly measure the time your function spends executing and initializing, which are the primary drivers of p95 latency. Logs can reveal cold starts, timeouts, or inefficient code paths that cause high latency. These are the most immediate telemetry sources to identify performance bottlenecks in the Lambda function itself.

Exam trap

The trap here is that candidates often overlook the combination of CloudWatch metrics and X-Ray traces, mistakenly thinking that only one telemetry source (like CloudWatch logs) is sufficient, or they confuse billing data with performance monitoring.

Practice this question →

120

Multi-Selecthard

A Lambda function behind API Gateway intermittently times out only during cold starts. Which two actions can reduce cold-start impact?

Select 2 answers

A.Use provisioned concurrency for predictable low latency

B.Move all logs to S3 Glacier

C.Reduce deployment package size and initialize clients outside unnecessary hot paths

D.Disable all retries

AnswersA, C

Correct for the stated requirement.

Why this answer

Provisioned concurrency keeps a specified number of Lambda execution environments initialized and ready to respond immediately, eliminating the cold-start latency for those invocations. This is the most direct way to ensure predictable low latency for a function that intermittently times out during cold starts.

Exam trap

The trap here is that candidates may confuse reducing cold-start impact with optimizing log storage or retry behavior, but only provisioned concurrency and minimizing initialization code directly address the cold-start latency issue.

Practice this question →

121

MCQmedium

A developer notices that an Amazon RDS for MySQL DB instance's CPU utilization is consistently above 90% during peak hours. Which AWS service can the developer use to analyze the database queries and identify the root cause?

A.AWS X-Ray

B.Amazon CloudWatch Logs

C.Amazon RDS Performance Insights

D.AWS Trusted Advisor

AnswerC

Performance Insights shows database load and top SQL queries.

Why this answer

Option B is correct because Performance Insights provides database performance analysis with query-level metrics. Option A is wrong because CloudWatch Logs is for log data, not query analysis. Option C is wrong because X-Ray traces application requests, not database queries.

Option D is wrong because Trusted Advisor provides cost and security checks, not query analysis.

Practice this question →

122

MCQmedium

A developer is managing an application that runs on an Amazon EC2 instance. The application uses an IAM role attached to the instance to access an S3 bucket. The developer recently updated the IAM role to add a new policy that grants access to a different S3 bucket. However, when testing, the application cannot access the new bucket and still returns 'Access Denied'. The developer verifies that the instance profile is correctly associated with the EC2 instance and that the new policy is attached. The application was restarted after the policy change. What is the MOST likely cause of the issue?

A.The S3 bucket has a bucket policy that denies access to the IAM role.

B.The temporary security credentials cached by the instance are still valid and do not reflect the new policy.

C.The application needs to be restarted multiple times to pick up the new policy.

D.The instance profile is not correctly attached to the EC2 instance.

AnswerB

The credentials are cached and will not include the new policy until they expire and are refreshed.

Why this answer

Option C is correct because the AWS SDK or CLI on the EC2 instance caches the temporary credentials from the instance metadata service. These credentials may not include the new policy if they were obtained before the policy update. The credentials are valid for a certain period (default 6 hours) and are not automatically refreshed when the role changes.

Option A is wrong because the instance profile is correctly associated. Option B is wrong because restarting the application does not refresh the credentials; it uses the cached credentials. Option D is wrong because S3 bucket policies are not required; the IAM role policy is sufficient.

Practice this question →

123

Multi-Selectmedium

A DynamoDB table shows throttling on one partition key value. Which two signs point to a hot partition problem?

Select 2 answers

A.Most traffic targets the same partition key

B.The table has point-in-time recovery enabled

C.Consumed capacity is uneven despite total table capacity being available

D.CloudTrail is enabled in all regions

AnswersA, C

Correct for the stated requirement.

Why this answer

Option A is correct because a hot partition occurs when a single partition key value receives a disproportionate share of read/write traffic, causing throttling on that partition even if the table's total provisioned capacity is not fully utilized. This imbalance means the partition's capacity is exhausted while other partitions remain underutilized, leading to request throttling for that specific key.

Exam trap

The trap here is that candidates confuse overall table capacity with partition-level capacity, assuming throttling only happens when total consumed capacity exceeds provisioned capacity, rather than recognizing that uneven key distribution can cause throttling on a single partition.

Practice this question →

124

MCQhard

A company uses an S3 bucket to store sensitive documents. The bucket policy allows access only from a specific VPC endpoint. However, a developer in the same VPC is unable to access the bucket from an EC2 instance. What is the MOST likely cause?

A.The EC2 instance is routing traffic to S3 through the internet instead of the VPC endpoint.

B.The S3 bucket policy requires encryption in transit, which is not configured.

C.The VPC endpoint policy does not grant access to the developer's IAM role.

D.The EC2 instance does not have an IAM role assigned.

AnswerA

If traffic goes through the internet, the source IP won't match the VPC endpoint, causing denial.

Why this answer

S3 bucket policies that restrict access to a specific VPC endpoint require that requests originate from that endpoint. The EC2 instance must route S3 traffic through the VPC endpoint. If the instance has a public IP and routes directly to S3, the request won't go through the endpoint.

Practice this question →

125

MCQmedium

A developer is monitoring an AWS Lambda function that is triggered by an Amazon SQS queue. The function's CloudWatch metrics show a high number of throttles. The function has a reserved concurrency of 10 and the SQS queue has a large backlog of messages. The function processes each message in about 2 seconds and has a timeout of 60 seconds. Which action will most effectively reduce the throttles and increase throughput?

A.Increase the reserved concurrency of the Lambda function to 50

B.Increase the batch size in the SQS event source mapping to 100

C.Increase the function timeout to 120 seconds

D.Decrease the reserved concurrency to 5

AnswerA

Increasing reserved concurrency allows more concurrent executions, reducing the chance of throttling and enabling more messages to be processed in parallel.

Why this answer

The high throttles indicate that the Lambda function's reserved concurrency of 10 is insufficient to handle the incoming messages from the SQS queue. By increasing reserved concurrency to 50, you allow more concurrent executions, which reduces throttling and increases throughput. The function's 2-second processing time and 60-second timeout are not the bottleneck; the concurrency limit is.

Exam trap

The trap here is that candidates may think increasing batch size or timeout will help, but they overlook that the root cause is the reserved concurrency cap, which directly limits the number of concurrent executions and is the primary driver of throttles.

How to eliminate wrong answers

Option B is wrong because increasing the batch size to 100 would cause the function to receive more messages per invocation, but with a reserved concurrency of 10, the function can only process 10 batches concurrently, so throttles would persist and latency could increase due to longer processing per batch. Option C is wrong because increasing the timeout to 120 seconds does not address the concurrency limit; the function already completes in 2 seconds, so a longer timeout has no effect on throttles. Option D is wrong because decreasing reserved concurrency to 5 would reduce the number of concurrent executions, worsening throttles and decreasing throughput.

Practice this question →

126

MCQhard

A developer has attached the IAM policy shown in the exhibit to a user. The user reports that they can upload and delete objects in the bucket 'my-bucket', but cannot list the objects in the bucket. What is the MOST likely reason?

A.The IAM policy is missing the s3:GetObject permission for listing.

B.The bucket has a bucket policy that denies s3:ListBucket.

C.The user is using the ListObjectsV2 API call instead of ListObjects.

D.The bucket is in a different region than the default region configured in the AWS CLI.

AnswerB

An explicit deny in a bucket policy overrides an allow from an IAM policy.

Why this answer

Option A is correct. The ListBucket permission is granted at the bucket level (arn:aws:s3:::my-bucket), but the condition that the user must have s3:ListBucket permission is granted. However, the issue is that the user cannot list objects.

The policy looks correct. The most common reason for being unable to list objects despite having ListBucket permission is that the bucket policy denies listing, or the user is trying to list a specific prefix without proper permissions. But in this case, the policy allows ListBucket on the bucket.

Option B: The user is using an incorrect API call (e.g., ListObjectsV2) is unlikely. Option C: The bucket is in a different region would not affect permissions. Option D: The user needs s3:GetObject to list? No, listing requires ListBucket.

So the correct answer is that the bucket policy overrides the IAM policy. Let me adjust: Actually, the most likely reason is that the bucket has a bucket policy that denies s3:ListBucket. That is option A.

Yes.

Practice this question →

127

MCQhard

A developer runs the above CLI command to check a Lambda function's logs. The function is invoked but no logs appear in CloudWatch. The IAM role for the Lambda function has the AWSLambdaBasicExecutionRole managed policy attached. What is the most likely cause?

A.The log group's retention policy has deleted older logs.

B.The CloudWatch Logs service is not enabled in the region.

C.The Lambda function's IAM role does not have permission to create log groups.

D.The Lambda function is not being invoked.

AnswerA

Retention 30 days may have purged logs.

Why this answer

Option C is correct because the log group exists (storedBytes >0) but retention is 30 days, so logs older than 30 days are deleted. The developer may be looking at a timeframe beyond retention. Option A is incorrect because the role has the necessary permissions (AWSLambdaBasicExecutionRole includes CreateLogGroup, CreateLogStream, PutLogEvents).

Option B is incorrect because the function is being invoked (otherwise storedBytes would be 0). Option D is incorrect because storedBytes is not zero, indicating logs were written.

Practice this question →

128

Multi-Selecthard

Which TWO are best practices for optimizing DynamoDB performance? (Choose two.)

Select 2 answers

A.Use SQS to decouple write-heavy workloads and handle spikes.

B.Use partition keys with high cardinality to distribute traffic evenly.

C.Provision maximum write capacity units to handle any spike.

D.Use Scan operations instead of Query for retrieving data.

E.Enable strongly consistent reads for all read operations.

AnswersA, B

SQS buffers writes to DynamoDB.

Why this answer

Option A is correct because using SQS to decouple write-heavy workloads allows DynamoDB to absorb traffic spikes by buffering writes in a queue, preventing throttling and enabling batch processing. This pattern, often called 'queue-based load leveling,' ensures that DynamoDB's provisioned capacity is not overwhelmed by sudden bursts, improving overall system resilience and cost efficiency.

Exam trap

The trap here is that candidates often confuse 'handling spikes' with over-provisioning capacity (Option C) instead of using decoupling patterns like SQS, or they mistakenly believe that Scan operations are acceptable for frequent data retrieval, ignoring the cost and performance penalties.

Practice this question →

129

MCQeasy

A developer is troubleshooting an AWS Lambda function that is timing out. The function processes S3 events and writes to DynamoDB. The average execution time is 5 seconds, but the function times out after 3 seconds. What is the most likely cause?

A.The S3 bucket is not configured to send event notifications.

B.DynamoDB write capacity is insufficient.

C.The Lambda function timeout is set to 3 seconds.

D.The Lambda function concurrency limit is exceeded.

AnswerC

Default timeout is 3 seconds; increase it to match execution time.

Why this answer

The lambda timeout setting defaults to 3 seconds. To fix, increase the timeout in the Lambda configuration to a value higher than the maximum expected execution time (e.g., 10 seconds). Option A is wrong because S3 event notifications are asynchronous and do not cause timeout.

Option B is wrong because DynamoDB write capacity issues would cause throttling, not timeout. Option D is wrong because Lambda concurrency limits affect invocation throttling, not execution timeout.

Practice this question →

130

Multi-Selectmedium

Which THREE are valid methods to handle application configuration in AWS? (Choose three.)

Select 3 answers

A.AWS CloudFormation template parameters

B.AWS Secrets Manager

C.AWS IAM roles

D.Lambda environment variables

E.AWS Systems Manager Parameter Store

AnswersB, D, E

Manages secrets like database passwords.

Why this answer

AWS Secrets Manager is a valid method for handling application configuration because it securely stores and manages sensitive configuration data such as database credentials, API keys, and other secrets. It supports automatic rotation of secrets, fine-grained access control via IAM policies, and integrates with AWS services like RDS, Redshift, and Lambda. This makes it ideal for managing dynamic configuration values that require high security and lifecycle management.

Exam trap

The trap here is that candidates often confuse IAM roles with configuration storage, thinking that roles can hold configuration data, when in fact roles only define permissions and cannot store key-value pairs or secrets.

Practice this question →

131

MCQhard

A developer notices that an AWS Lambda function, which uses Amazon RDS Proxy to connect to an Aurora MySQL database, is experiencing increased latency and occasional connection timeouts. The function is configured with a reserved concurrency of 100 and is deployed in a VPC. The RDS Proxy's maximum connections is set to 1000. CloudWatch metrics show that the DatabaseConnections metric for the proxy is consistently at 1000. What is the most likely cause of the increased latency and timeouts?

A.The Lambda function is not reusing database connections properly, exhausting the proxy connection pool

B.The RDS Proxy target group is not configured with the correct DB instance

C.The Lambda function's execution role is missing the rds-db:connect permission

D.The VPC does not have a NAT Gateway for outbound traffic

AnswerA

Correct. Each invocation opens a new connection without reuse, causing the proxy to reach its connection limit.

Why this answer

The RDS Proxy's DatabaseConnections metric is consistently at 1000, which equals the proxy's maximum connections setting. This indicates the proxy connection pool is fully saturated. When all connections are in use, new connection requests from Lambda invocations must wait, causing increased latency, and if the wait exceeds the timeout, connection timeouts occur.

The most likely cause is that the Lambda function is not reusing database connections (e.g., not using connection pooling or keeping connections open across invocations), exhausting the pool.

Exam trap

The trap here is that candidates may focus on the reserved concurrency (100) versus proxy max connections (1000) and assume the numbers are fine, missing that the real issue is connection reuse per invocation, not the total count.

How to eliminate wrong answers

Option B is wrong because if the target group were misconfigured, the proxy would fail to connect to the database entirely, not just experience latency and timeouts while the connection pool is full. Option C is wrong because missing the rds-db:connect permission would cause immediate authentication failures (e.g., 'Access denied') for all connection attempts, not gradual pool exhaustion. Option D is wrong because Lambda functions in a VPC use Elastic Network Interfaces (ENIs) for outbound traffic to RDS Proxy within the same VPC; a NAT Gateway is only needed for internet-bound traffic, not for connecting to RDS Proxy in the same VPC.

Practice this question →

132

MCQmedium

A company's application uses Amazon S3 to store user-uploaded images. Users report that recently uploaded images are sometimes not immediately available for viewing. The application uses S3 Event Notifications to trigger a Lambda function that processes images and stores metadata in DynamoDB. What is the MOST likely cause of the delay?

A.Lambda function has a cold start that adds several seconds to processing time.

B.S3 is eventually consistent for new object writes, so the object may not be immediately available.

C.S3 Event Notifications may have a slight delay, and the application polls for the processed image before the notification triggers Lambda.

D.DynamoDB has insufficient read capacity causing throttling on metadata retrieval.

AnswerC

Event notifications are asynchronous and may have latency.

Why this answer

Option B is correct because S3 Event Notifications are typically delivered within seconds but can be delayed; the application should not assume immediate eventual consistency for object reads after writes. Option A is wrong because S3 is eventually consistent for overwrite PUTS of existing keys, but new uploads are strongly consistent. Option C is wrong because Lambda cold start may cause a delay but not minutes.

Option D is wrong because DynamoDB is fast.

Practice this question →

133

MCQmedium

A developer is monitoring an AWS Lambda function that processes messages from an SQS queue. CloudWatch metrics show that the function's throttles are high when the queue backlog grows. The function has a reserved concurrency of 50 and a batch size of 10. The SQS queue has a visibility timeout of 30 seconds. The function processes each batch in about 5 seconds. Which action will most effectively reduce throttles?

A.Increase the SQS queue visibility timeout

B.Increase the Lambda function's reserved concurrency

C.Increase the Lambda function's batch size

D.Decrease the SQS queue message retention period

AnswerC

Increasing the batch size reduces the number of invocations needed to process the same number of messages, thereby reducing the number of concurrent executions and decreasing throttles.

Why this answer

Option C is correct because increasing the batch size allows each Lambda invocation to process more messages per batch (e.g., from 10 to a higher value up to 10,000 for standard queues). This reduces the number of concurrent invocations needed to clear the backlog, directly lowering the throttle count without requiring additional reserved concurrency. Since the function processes each batch in ~5 seconds and the visibility timeout is 30 seconds, there is ample time to handle larger batches, making this the most effective adjustment.

Exam trap

The trap here is that candidates often assume throttles are always solved by increasing concurrency (Option B), overlooking that batch size optimization can achieve the same throughput with fewer invocations, which is more efficient and directly addresses the backlog-driven throttle pattern.

How to eliminate wrong answers

Option A is wrong because increasing the visibility timeout does not reduce throttles; it only prevents messages from becoming visible again before processing completes, which is irrelevant since the function finishes in 5 seconds (well under the current 30-second timeout). Option B is wrong because increasing reserved concurrency would raise the throttle ceiling but does not address the root cause—the backlog grows due to insufficient throughput per invocation, and simply adding more concurrency may lead to other resource limits or costs without optimizing batch processing. Option D is wrong because decreasing the message retention period only causes messages to be deleted sooner if not processed, which does not reduce throttles and could lead to data loss; it does not affect the rate at which Lambda invocations are throttled.

Practice this question →

134

MCQeasy

A developer notices that an S3 bucket's 'PutObject' API calls are failing intermittently for a specific application. The application uses the AWS SDK for Java to upload files. The error message is 'RequestTimeout: Your socket connection to the server was not read from or written to within the timeout period.' The bucket is in the same region as the application. The developer checks the S3 metrics and sees no throttling errors. The application runs on EC2 instances behind an ALB. The developer suspects a network issue. What should the developer do to resolve the issue?

A.Add a bucket policy that grants s3:PutObject to the application's IAM role.

B.Use an S3 VPC endpoint to improve network reliability.

C.Increase the socket timeout in the AWS SDK configuration and ensure the network path (ALB) has a higher idle timeout.

D.Enable S3 Transfer Acceleration on the bucket and update the application to use the accelerated endpoint.

AnswerC

The ALB may have a default idle timeout of 60 seconds; increasing it and adjusting SDK timeout can prevent the error.

Why this answer

The 'RequestTimeout' error indicates that the connection is idle for too long. This often happens when using a proxy or load balancer that has a shorter idle timeout than the S3 SDK's timeout. Option A is correct: increasing the S3 client's timeout settings or using HTTP keep-alive.

Option B is wrong because the bucket policy is not related to timeouts. Option C is wrong because S3 Transfer Acceleration is for large files over long distances, not for timeouts. Option D is wrong because the issue is not about permissions.

Practice this question →

135

MCQeasy

A developer is troubleshooting a slow Amazon RDS MySQL database query. The query is frequently executed and takes 5 seconds to complete. Which AWS service should the developer use to analyze the query performance?

A.AWS CloudTrail

B.Amazon RDS Performance Insights

C.Amazon CloudWatch Logs

D.AWS X-Ray

AnswerB

Provides detailed query performance analysis.

Why this answer

Option A is correct because RDS Performance Insights provides database performance analysis with query-level metrics. Option B is wrong because CloudWatch Logs is for log data, not query analysis. Option C is wrong because X-Ray is for distributed tracing, not database queries.

Option D is wrong because CloudTrail is for API activity, not database performance.

Practice this question →

136

MCQmedium

An AWS Lambda function processes messages from an Amazon SQS queue and writes results to an Amazon DynamoDB table. The function is configured with a reserved concurrency of 5 and a batch size of 10. CloudWatch metrics show high throttling and a growing queue backlog. The function's execution time averages 1 second per message. What is the MOST effective action to reduce throttling while improving throughput?

A.Increase the reserved concurrency to 20.

B.Increase the batch size to 100.

C.Decrease the reserved concurrency to 2.

D.Increase the provisioned write capacity of the DynamoDB table.

AnswerA

Increasing reserved concurrency allows Lambda to scale and invoke more function instances concurrently. This directly reduces throttling and allows the function to process more messages from the SQS queue simultaneously, improving throughput and reducing backlog.

Why this answer

The Lambda function is throttling because its reserved concurrency of 5 limits it to 5 concurrent executions. With a batch size of 10 and 1-second execution time, the function can process at most 5 * 10 = 50 messages per second. Increasing reserved concurrency to 20 allows 20 concurrent executions, raising throughput to 200 messages per second, which directly reduces throttling and clears the backlog.

Exam trap

The trap here is that candidates may confuse Lambda throttling with downstream resource throttling (like DynamoDB) and choose to increase write capacity, or they may think increasing batch size alone will solve the problem without considering the concurrency bottleneck.

How to eliminate wrong answers

Option B is wrong because increasing batch size to 100 would cause each invocation to process more messages, but with only 5 concurrent executions, the function would still be limited to 5 invocations at a time, and the 1-second execution time per message would scale linearly, likely causing timeouts or increased latency without addressing the root cause of throttling. Option C is wrong because decreasing reserved concurrency to 2 would reduce throughput to 20 messages per second, worsening throttling and backlog. Option D is wrong because increasing DynamoDB write capacity addresses potential write throttling from DynamoDB, but the CloudWatch metrics show Lambda throttling, not DynamoDB throttling; the bottleneck is Lambda concurrency, not the database.

Practice this question →

137

MCQmedium

A developer is troubleshooting an application that uses Amazon ElastiCache for Redis to improve performance. The application periodically experiences high latency during peak hours. The developer checks the ElastiCache metrics and sees that the 'Evictions' metric is consistently high and the 'CacheHitRate' metric is low. The cluster has a single node with a cache.t3.small instance type. Which action will most likely improve the cache hit rate and reduce latency?

A.Scale up to a larger node type (e.g., cache.t3.medium) to increase available memory.

B.Enable cluster mode and distribute data across multiple shards to reduce memory pressure.

C.Change the eviction policy to 'allkeys-lfu' to better manage which keys are evicted.

D.Add a read replica for the Redis cluster to offload read traffic.

AnswerA

Increasing memory reduces the need for evictions, allowing more data to remain in cache, which improves the cache hit rate and reduces latency.

Why this answer

The high 'Evictions' and low 'CacheHitRate' metrics indicate that the Redis node is running out of memory, forcing it to evict keys to make room for new data. Scaling up to a larger node type (cache.t3.medium) increases the available memory, allowing more data to be cached and reducing evictions, which directly improves the cache hit rate and reduces latency.

Exam trap

The trap here is that candidates may focus on optimizing eviction policies or adding replicas, but the core issue is insufficient memory capacity, which only scaling up can resolve.

How to eliminate wrong answers

Option B is wrong because enabling cluster mode and distributing data across multiple shards does not increase the total memory per node; it only partitions data, and if the total memory across shards is insufficient, evictions will still occur. Option C is wrong because changing the eviction policy to 'allkeys-lfu' only changes which keys are evicted (least frequently used) but does not address the root cause of insufficient memory; evictions will continue at the same rate. Option D is wrong because adding a read replica offloads read traffic but does not increase the primary node's memory, so evictions and low cache hit rate will persist on the primary node.

Practice this question →

138

Multi-Selecteasy

A developer is using AWS X-Ray to trace requests through a microservices application. The developer notices that some traces are incomplete. Which TWO actions can help ensure complete traces?

Select 2 answers

A.Use the X-Ray SDK to instrument the application code.

B.Open port 2000 on the security groups for TCP traffic.

C.Deploy the X-Ray daemon as a centralized service in a separate instance.

D.Install the CloudWatch agent on all instances.

E.Ensure the X-Ray daemon is running on all EC2 instances.

AnswersA, E

SDK intercepts requests and sends trace data.

Why this answer

Option A: The X-Ray daemon must be running on each EC2 instance to send trace data. Option B: Instrumentation with the SDK sends trace data to the daemon. Option C is wrong because CloudWatch agent is separate.

Option D is wrong because the daemon uses UDP. Option E is wrong because the daemon is not centralized.

Practice this question →

139

MCQhard

A Lambda function using a Kinesis event source repeatedly retries one bad record and blocks progress in the shard. Which feature helps isolate failed records after retry limits?

A.Increase memory to 10 GB only

B.Disable batch processing

C.Configure failure handling with bisect batch on error and an on-failure destination where supported

D.Convert the stream to an S3 bucket

AnswerC

Correct for the stated requirement.

Why this answer

Option C is correct because Lambda's Kinesis event source mapping supports a 'bisect batch on error' feature that splits a failed batch into two smaller batches, allowing the bad record to be isolated and retried separately. Additionally, configuring an on-failure destination (e.g., an SQS queue or SNS topic) sends the record to a dead-letter destination after the retry limit is exhausted, preventing the shard from blocking progress.

Exam trap

The trap here is that candidates often think increasing memory or disabling batch processing will solve the blocking issue, but they fail to recognize that only explicit failure handling with bisect and a dead-letter destination can isolate and remove the bad record without manual intervention.

How to eliminate wrong answers

Option A is wrong because increasing memory to 10 GB only allocates more CPU and memory to the function, but does not address the root cause of a single bad record blocking the shard; it does not provide any mechanism to isolate or skip failed records. Option B is wrong because disabling batch processing (setting batch size to 1) would still cause the same blocking behavior—each record would be processed individually, but a persistent bad record would still be retried indefinitely, blocking the shard. Option D is wrong because converting the stream to an S3 bucket is not a direct replacement for Kinesis event processing; S3 does not support the same record-level retry and failure handling semantics, and this would require a complete architectural change, not a simple configuration fix.

Practice this question →

140

Multi-Selecthard

A company is using AWS CodePipeline for CI/CD. The pipeline has a build stage using AWS CodeBuild, and a deploy stage using AWS CodeDeploy. The deployment is failing with 'Error: Health checks failed'. Which TWO steps should the developer take to troubleshoot this issue? (Select TWO.)

Select 2 answers

A.Verify that the target group's health check path and port are correctly configured.

B.Check the S3 bucket where the build artifacts are stored.

C.Check the CodeDeploy deployment logs for detailed error messages.

D.Check the CodeBuild build logs for errors.

E.Increase the number of EC2 instances in the Auto Scaling group.

AnswersA, C

Misconfigured health checks are a common cause of deployment failures.

Why this answer

Options B and D are correct. Checking the CodeDeploy deployment logs (B) will show detailed error messages. Verifying the target group's health check configuration (D) ensures the application is responding correctly.

Option A (checking CodeBuild logs) is irrelevant because the build stage succeeded. Option C (increasing instance count) does not address health check failure. Option E (checking S3 bucket) is not directly related.

Practice this question →

141

MCQhard

A company runs a critical application on Amazon EC2 instances behind an Application Load Balancer (ALB). The application experiences intermittent errors where some requests return HTTP 503 (Service Unavailable) errors. The developers have verified that the application code is healthy and the EC2 instances pass health checks. The ALB health check is configured to hit a specific endpoint (/health) with a healthy threshold of 2 and an unhealthy threshold of 2. The health check interval is 30 seconds, and the timeout is 5 seconds. The application's /health endpoint sometimes takes up to 6 seconds to respond due to a dependency on a third-party service. The developers want to minimize the 503 errors without changing the application code. Which action should the developer take?

A.Increase the health check timeout to 10 seconds to accommodate the slow /health endpoint.

B.Decrease the unhealthy threshold to 1 so that instances are marked unhealthy after one failed health check.

C.Increase the deregistration delay to 300 seconds to allow connections to drain.

D.Decrease the health check interval to 10 seconds to detect health changes faster.

AnswerA

Prevents false negatives due to slow responses.

Why this answer

Option C is correct because increasing the health check timeout to 10 seconds ensures that the ALB does not mark instances as unhealthy prematurely. Option A is wrong because decreasing the interval may cause more frequent health checks, increasing load. Option B is wrong because decreasing the unhealthy threshold may make instances appear unhealthy more quickly.

Option D is wrong because deregistration delay does not affect health checks.

Practice this question →

142

MCQeasy

A developer is optimizing an S3 bucket that stores large CSV files for analytics. The files are accessed frequently for the first 30 days, then rarely accessed. After 90 days, the data must be retained for compliance but accessed infrequently. What is the MOST cost-effective lifecycle policy?

A.Use S3 Intelligent-Tiering to automatically move data between access tiers.

B.Transition to S3 One Zone-IA after 30 days, then delete after 90 days.

C.Transition to S3 Standard-IA after 30 days, then to S3 Glacier Deep Archive after 90 days.

D.Transition to S3 Glacier Flexible Retrieval after 30 days, then to S3 Glacier Deep Archive after 90 days.

AnswerC

Standard-IA is cost-effective for infrequent access after 30 days, and Deep Archive provides the lowest cost for long-term compliance.

Why this answer

Option D is correct because it transitions to S3 Standard-IA after 30 days (when access becomes rare) and to S3 Glacier Deep Archive after 90 days (for long-term compliance with lowest cost). Option A is wrong because S3 One Zone-IA is less durable. Option B is wrong because S3 Glacier Flexible Retrieval is more expensive than Deep Archive for long-term storage.

Option C is wrong because S3 Intelligent-Tiering has monitoring costs and is not needed if access patterns are known.

Practice this question →

143

Multi-Selectmedium

Which THREE factors should a developer consider when designing a stateless application on AWS? (Choose 3)

Select 3 answers

A.Avoid storing data on the local file system of the instances

B.Store session state in a shared external datastore like ElastiCache

C.Store session state in the instance memory for low latency

D.Use sticky sessions on the load balancer to maintain session affinity

E.Use a shared database like Amazon DynamoDB for persistent data

AnswersA, B, E

Local storage is not shared and can be lost if the instance terminates.

Why this answer

Options A, C, and D are correct. A stateless application should store session state externally (e.g., ElastiCache), use a shared database for data storage, and avoid storing state on local instance storage. Option B is wrong because sticky sessions create statefulness.

Option E is wrong because storing state in memory makes the instance stateful.

Practice this question →

144

MCQhard

A company runs a Node.js application on Amazon EC2 instances behind an Application Load Balancer (ALB). Users report intermittent 503 errors. The ALB target group health checks are failing. The developer checks the EC2 instance logs and sees no application errors. What is the MOST likely cause?

A.The health check path is set to '/' but the application serves on a different path.

B.The EC2 instances are running out of memory.

C.The health check path returns a 5xx status code due to a missing dependency.

D.The security group for the EC2 instances does not allow inbound traffic from the ALB.

AnswerC

Intermittent failure could be due to occasionally missing dependency causing health check to return 503.

Why this answer

Option D is correct because if health checks are failing but the application works, the health check path or configuration is likely wrong. Option A is wrong because security group rules blocking health checks would cause constant failure. Option B is wrong because the application is responding on the correct port.

Option C is wrong because health checks target a specific path, not the root.

Practice this question →

145

MCQeasy

A developer is troubleshooting a web application that intermittently returns HTTP 504 errors. The application runs on EC2 instances behind an Application Load Balancer. What is the most likely cause of these errors?

A.The target group is using an HTTPS health check but the instances only support HTTP.

B.The load balancer's cross-zone load balancing is disabled.

C.The load balancer idle timeout is set too low, and the application takes longer than the timeout to respond.

D.The security group for the EC2 instances is missing an inbound rule for the load balancer.

AnswerC

Idle timeout exceeded leads to 504.

Why this answer

HTTP 504 (Gateway Timeout) errors from an Application Load Balancer indicate that the load balancer successfully connected to the target (EC2 instance) but the target did not respond within the configured idle timeout period. The default idle timeout is 60 seconds, and if the application's processing time exceeds this value, the load balancer terminates the connection and returns a 504. Option C directly addresses this mismatch between the load balancer timeout and the application response time.

Exam trap

The trap here is that candidates often confuse HTTP 504 (Gateway Timeout) with HTTP 502 (Bad Gateway) or health check failures, leading them to select options related to security groups or health check mismatches instead of the correct idle timeout configuration.

How to eliminate wrong answers

Option A is wrong because HTTPS health checks require the target to support HTTPS; if the instances only support HTTP, the health check would fail and the instances would be marked unhealthy, leading to 503 errors (not 504). Option B is wrong because disabling cross-zone load balancing affects traffic distribution across Availability Zones, not the timeout behavior that causes 504 errors. Option D is wrong because a missing inbound security group rule for the load balancer would prevent the load balancer from establishing connections to the instances, resulting in 502 errors or health check failures, not intermittent 504 timeouts.

Practice this question →

146

Multi-Selecteasy

A developer is using an Amazon SQS queue with a Lambda function as a consumer. Messages are being sent to the queue but the Lambda function is not processing them. Which THREE of the following are possible causes?

Select 3 answers

A.The SQS queue has a dead-letter queue configured.

B.The SQS queue policy denies access to the Lambda function.

C.The Lambda function's execution role does not have sqs:ReceiveMessage permission.

D.The SQS queue has a rate limit that prevents Lambda from polling.

E.The event source mapping between SQS and Lambda is disabled.

AnswersB, C, E

A queue policy can explicitly deny access.

Why this answer

Option A is correct because the Lambda function may not have permission to poll the queue. Option B is correct because the SQS queue might have a policy that denies access. Option C is correct because the event source mapping might be disabled.

Option D is wrong because the DLQ is for failed messages, not for preventing processing. Option E is wrong because SQS has no rate limiting for Lambda polling.

Practice this question →

147

MCQhard

A developer is trying to decrypt an S3 object using an AWS KMS key. The decryption fails with an 'AccessDenied' error. The IAM policy attached to the developer's user includes the statement in the exhibit. The KMS key policy includes the following statement: { "Sid": "Enable IAM User Permissions", "Effect": "Allow", "Principal": { "AWS": "arn:aws:iam::123456789012:root" }, "Action": "kms:*", "Resource": "*" } What is the most likely reason for the failure?

A.The KMS key policy does not grant access to the developer's IAM user.

B.The KMS key policy specifies 'kms:*' which is too broad and causes a conflict.

C.The developer's IAM policy does not include 'kms:GenerateDataKey' permission.

D.The developer's IAM policy uses 'Resource' with the full key ARN but the key policy requires a different format.

AnswerC

S3 SSE-KMS decryption often requires GenerateDataKey as well.

Why this answer

The IAM policy only grants kms:Decrypt, but to decrypt an S3 object, the developer also needs kms:GenerateDataKey (if using envelope encryption) or kms:ReEncrypt. However, the most common issue is that the IAM policy does not allow the necessary action. Option B is correct because the developer's IAM policy lacks 'kms:GenerateDataKey' which is often needed.

Option A is wrong because the key policy allows IAM users. Option C is wrong because the key policy uses the root account. Option D is wrong because the key policy does not restrict the principal.

Practice this question →

148

MCQeasy

A developer is deploying a serverless application using AWS CloudFormation. The stack creation fails with the error 'The following resource(s) failed to create: [MyLambdaFunction]'. The developer checks the CloudWatch logs but finds no logs for the Lambda function. What is the most likely reason?

A.The Lambda function code has a syntax error that prevents creation.

B.The Lambda function was never invoked.

C.The Lambda function's IAM role does not have permission to write to CloudWatch Logs.

D.The CloudFormation template has a syntax error.

AnswerC

If the role lacks logs:CreateLogGroup and logs:CreateLogStream, the function cannot create a log group, causing creation to fail.

Why this answer

Option C is correct because the error indicates the Lambda function resource failed to create, and the absence of CloudWatch logs suggests the function was created but lacked the necessary IAM permissions to write logs. Without the `logs:CreateLogGroup`, `logs:CreateLogStream`, and `logs:PutLogEvents` permissions, Lambda cannot write to CloudWatch Logs, so no logs appear even if the function is invoked. This is a common misconfiguration when deploying Lambda via CloudFormation without attaching the proper IAM policy.

Exam trap

The trap here is that candidates assume the absence of logs means the function was never invoked or had a code error, but the key clue is the creation failure—if the function resource itself fails to create, logs cannot exist, pointing to a permissions issue during the initial provisioning phase.

How to eliminate wrong answers

Option A is wrong because a syntax error in the Lambda function code would not prevent the resource from being created; CloudFormation would still create the function, but invocation would fail, and logs would appear (if permissions allow). Option B is wrong because the error message states the resource failed to create, meaning the function was never successfully created, so it cannot be invoked; the absence of logs is not due to lack of invocation but due to creation failure. Option D is wrong because a CloudFormation template syntax error would cause a different error (e.g., 'Template validation error') and would prevent the entire stack from being parsed, not just a single resource creation failure.

Practice this question →

149

MCQmedium

A company has a Lambda function that processes messages from an SQS queue. The function sometimes fails to process a message, and the message is not retried. The developer wants to ensure that failed messages are retried at least once. What should the developer do?

A.Configure a dead-letter queue (DLQ) for the SQS queue.

B.Configure the SQS queue's redrive policy with a maxReceiveCount of 2 and a DLQ.

C.Set the Lambda function's retry attempts to 0 in the event source mapping.

D.Increase the SQS queue's visibility timeout to 6 hours.

AnswerB

This ensures messages are retried up to 2 times before being sent to the DLQ.

Why this answer

Option D is correct because setting the SQS redrive policy with a DLQ and configuring the Lambda function with a maximum retry count of 2 ensures retries. Option A is wrong because increasing the visibility timeout without a DLQ does not guarantee retries. Option B is wrong because DLQ alone does not enable retries.

Option C is wrong because Lambda's default retry behavior is not sufficient without proper DLQ configuration.

Practice this question →

150

MCQmedium

A company runs a critical application on Amazon ECS with Fargate. The application uses an Application Load Balancer (ALB) to distribute traffic. Recently, the company noticed that the ALB returns 502 Bad Gateway errors during peak traffic hours. The developer checks the ECS service metrics and sees that the number of running tasks remains constant, while CPU and memory utilization are below 50%. The ALB target group health checks are failing intermittently for some tasks. What is the MOST likely cause of the 502 errors?

A.The task definition's startup command is failing, causing the tasks to never become healthy.

B.The ECS service is not configured with auto scaling to handle the increased traffic.

C.The ALB idle timeout is set too low, causing connections to be closed before the application responds.

D.The security group for the ECS tasks is blocking traffic from the ALB.

AnswerC

A low idle timeout can cause the ALB to close connections prematurely, resulting in 502 errors.

Why this answer

Option A is correct because if the tasks are overloaded and cannot respond within the idle timeout, the ALB returns 502. The health checks fail because the tasks are slow to respond. Option B is wrong because the tasks are not at capacity and the service does not need to scale.

Option C is wrong because the security group is not likely the issue if health checks succeed sometimes. Option D is wrong because the tasks are not failing to start; they are running but slow.

Practice this question →

← PreviousPage 2 of 4 · 291 questions totalNext →

Ready to test yourself?

Try a timed practice session using only Troubleshooting and Optimization questions.

Start 20-question session