SAA-C03 SAA-C03 Questions 601–675 | Page 9/14

601

MCQeasy

Based on the exhibit, which Amazon EFS performance mode is the best fit for this workload?

A.Use General Purpose performance mode for low-latency access.

B.Use Max I/O performance mode to optimize for the highest possible latency tolerance.

C.Use One Zone storage class to increase metadata speed.

D.Use Provisioned Throughput mode because it is the only performance mode available.

AnswerA

General Purpose is the best EFS performance mode when the priority is low latency for small file operations. The exhibit describes a moderate number of clients and latency-sensitive metadata access, which matches the strengths of General Purpose. It is the usual choice for most applications unless the workload specifically needs very large-scale parallel throughput.

Why this answer

The General Purpose performance mode is the best fit for this workload because it provides the lowest latency for file operations, which is critical for latency-sensitive applications such as web serving, content management, and development environments. EFS General Purpose mode is optimized for workloads where consistent low-latency access is required, making it the default and recommended choice for most use cases.

Exam trap

The trap here is that candidates confuse performance modes (General Purpose vs. Max I/O) with throughput modes (Bursting vs. Provisioned) or storage classes (Standard vs.

One Zone), leading them to select options that address throughput or availability rather than latency requirements.

How to eliminate wrong answers

Option B is wrong because Max I/O performance mode is designed for workloads that require high throughput and parallel processing, but it introduces higher latency, making it unsuitable for low-latency access requirements. Option C is wrong because One Zone storage class is a storage class, not a performance mode; it reduces durability and availability by storing data in a single Availability Zone and does not affect metadata speed. Option D is wrong because Provisioned Throughput is a throughput mode, not a performance mode; EFS offers General Purpose and Max I/O as performance modes, and Provisioned Throughput can be used with either performance mode to set a specific throughput level.

Full explanation →

602

MCQmedium

Your EC2 instances run in private subnets with no NAT gateway. The instances use the AWS SDK to call STS AssumeRole to obtain temporary credentials for other services. Application logs show errors like: "EndpointConnectionError: Could not connect to https://sts.<region>.amazonaws.com". Which change most directly resolves this while keeping instances private?

A.Create an interface VPC endpoint for STS (com.amazonaws.<region>.sts) and associate it with the instance subnets and a security group that allows HTTPS.

B.Create a gateway VPC endpoint for S3 and route the STS traffic through the S3 endpoint gateway.

C.Open an inbound rule in the instances’ security group to allow outbound HTTPS to the internet CIDR block directly.

D.Attach an Internet Gateway to the private subnet route table so the STS API can be reached over public internet.

AnswerA

Interface endpoints provide private, in-VPC connectivity to AWS APIs like STS without requiring internet access or NAT.

Why this answer

The error indicates the EC2 instances cannot reach the STS public endpoint over the internet because they are in private subnets without a NAT gateway. An interface VPC endpoint for STS (com.amazonaws.<region>.sts) allows private, direct connectivity to the STS API using AWS PrivateLink, without requiring internet access. Associating the endpoint with the instance subnets and a security group that allows HTTPS (port 443) resolves the connectivity issue while keeping the instances private.

Exam trap

The trap here is that candidates often confuse gateway endpoints (which only work for S3 and DynamoDB) with interface endpoints (which work for many services like STS), or they mistakenly think security group rules alone can enable outbound internet access without a route.

How to eliminate wrong answers

Option B is wrong because a gateway VPC endpoint for S3 only provides private connectivity to S3, not to STS; STS is a different service and cannot be reached through an S3 endpoint. Option C is wrong because opening an inbound rule in the instances’ security group for outbound HTTPS to the internet CIDR block does not provide a route to the internet; the instances are in private subnets with no NAT gateway or internet gateway, so outbound traffic to the internet is blocked regardless of security group rules. Option D is wrong because attaching an Internet Gateway to the private subnet route table would make the subnet public, violating the requirement to keep instances private; it would also expose the instances to inbound internet traffic.

Full explanation →

603

MCQhard

A warehouse integration service must process every event at least once, but duplicate processing is acceptable if the consumer handles idempotency. Which eventing approach is most suitable? The architecture review board prefers a managed AWS-native control.

A.Use CloudFront signed URLs

B.Use Amazon SQS standard queue and design consumers to be idempotent

C.Use UDP messages sent directly to workers

D.Use an in-memory queue on one EC2 instance

AnswerB

SQS standard queues provide at-least-once delivery and high throughput; consumers must handle occasional duplicates.

Why this answer

Amazon SQS standard queues provide at-least-once delivery, which guarantees that every message is processed at least once, meeting the requirement that every event must be processed. Duplicate processing is acceptable because the consumer can be designed to handle idempotency. This is a managed, AWS-native service that aligns with the architecture review board's preference.

Exam trap

The trap here is that candidates may confuse 'at-least-once' delivery with 'exactly-once' delivery and incorrectly choose a solution like FIFO queues (not listed) or dismiss SQS standard queues due to the duplicate processing allowance, but the question explicitly states duplicates are acceptable if idempotency is handled, making SQS standard the correct choice.

How to eliminate wrong answers

Option A is wrong because CloudFront signed URLs are used for securing content delivery, not for event processing or messaging. Option C is wrong because UDP is a connectionless, unreliable protocol that does not guarantee delivery, so it cannot ensure at-least-once processing. Option D is wrong because an in-memory queue on a single EC2 instance is not managed, not AWS-native, and introduces a single point of failure, violating the requirement for a resilient, managed service.

Full explanation →

604

Multi-Selectmedium

An order-processing worker consumes messages from Amazon SQS. Occasionally, the worker times out after successfully creating a payment record but before deleting the message, which causes duplicate charges during retries. Some messages also fail validation repeatedly because required fields are missing. Which two changes should the team make? Select two.

Select 2 answers

A.Make the payment step idempotent using a unique transaction identifier.

B.Configure an SQS dead-letter queue with a redrive policy.

C.Reduce the visibility timeout so failed messages return to the queue faster.

D.Run only one long-lived worker instance so the queue can never be processed twice.

E.Switch from a standard queue to a FIFO queue and remove all other changes.

AnswersA, B

Correct. SQS provides at-least-once delivery, so the same message can be processed more than once if the worker times out, retries, or crashes after partially completing the work. An idempotency key lets the application recognize that the payment was already created and prevents duplicate charges.

Why this answer

Option A is correct because making the payment step idempotent using a unique transaction identifier ensures that if the same message is processed multiple times due to a timeout, the payment is only charged once. This is a common pattern for handling at-least-once delivery semantics in Amazon SQS, where the worker must be designed to handle duplicate messages safely.

Exam trap

The trap here is that candidates often think reducing the visibility timeout will speed up recovery, but it actually increases the chance of duplicate processing, and they may also overlook that a FIFO queue alone does not fix the worker's failure to delete the message after processing.

Full explanation →

605

Multi-Selecthard

A company has three workloads. First, a stable EC2 application will remain on the same instance family for at least one year. Second, an ECS service on Fargate may shift between launch types but has steady baseline usage. Third, a fault-tolerant nightly batch job can be interrupted and restarted. Which three pricing choices should the architect recommend? Select three.

Select 3 answers

A.Standard Reserved Instances for the EC2 application.

B.Compute Savings Plans for the Fargate service.

C.Spot Instances for the nightly batch job.

D.On-Demand Instances for all three workloads.

E.Dedicated Hosts for the batch job.

AnswersA, B, C

A stable EC2 workload on the same family is a classic Reserved Instance use case because the commitment matches the predictable baseline.

Why this answer

Standard Reserved Instances (A) are ideal for the stable EC2 application because the workload has a predictable, steady-state usage and will remain on the same instance family for at least one year. By committing to a 1-year or 3-year term, the company receives a significant discount (up to 72%) compared to On-Demand pricing, making it the most cost-effective choice for this stable, long-running workload.

Exam trap

The trap here is that candidates often assume On-Demand is the only safe choice for all workloads, failing to recognize that the stable EC2 workload qualifies for Reserved Instances, the steady Fargate usage qualifies for Compute Savings Plans, and the fault-tolerant batch job is a textbook use case for Spot Instances.

Full explanation →

606

MCQmedium

A analytics dashboard uses an Application Load Balancer in one Region. Global users need lower network latency to the application without caching dynamic responses. What should be considered?

A.AWS Global Accelerator

B.S3 Cross-Region Replication

C.AWS Backup cross-Region copy

D.CloudFront only with long TTLs

AnswerA

Global Accelerator routes traffic over the AWS global network to improve performance for TCP/UDP applications without relying on caching.

Why this answer

AWS Global Accelerator uses the AWS global network and Anycast IPs to route traffic to the optimal Regional endpoint, reducing latency for global users without caching dynamic responses. It does not cache content, so dynamic data is always fetched from the origin, meeting the requirement of no caching while improving network performance via the AWS backbone.

Exam trap

The trap here is that candidates often choose CloudFront for any global latency improvement, but the requirement of 'no caching dynamic responses' disqualifies CloudFront unless TTL=0 is used, which still incurs edge request overhead, whereas Global Accelerator is purpose-built for non-cached dynamic traffic.

How to eliminate wrong answers

Option B (S3 Cross-Region Replication) is wrong because it replicates static objects across S3 buckets in different Regions, which does not reduce latency for dynamic application responses served by an ALB. Option C (AWS Backup cross-Region copy) is wrong because it is a backup and disaster recovery feature for copying backup data across Regions, not a mechanism to lower network latency for live application traffic. Option D (CloudFront only with long TTLs) is wrong because CloudFront caches content at edge locations, and using long TTLs would serve stale cached responses, violating the requirement of no caching for dynamic responses.

Full explanation →

607

MCQhard

A DynamoDB table for a retail API has a partition key based only on the current date. Write throttling occurs during business hours. What is the best design change? The architecture review board prefers a managed AWS-native control.

A.Use a higher-cardinality partition key that distributes writes across partitions

B.Create a global secondary index with the same date key

C.Reduce the table's write capacity

D.Move the table to S3 Glacier Instant Retrieval

AnswerA

A low-cardinality hot partition causes throttling; a better key spreads writes more evenly.

Why this answer

Using only the current date as a partition key creates a hot partition because all writes for the day target a single partition, leading to throttling. A higher-cardinality partition key, such as a composite key combining date with a unique attribute like user ID or order ID, distributes writes evenly across multiple partitions, fully utilizing DynamoDB's provisioned throughput. This is the best managed-native solution to resolve write throttling without changing the table's capacity or moving data.

Exam trap

The trap here is that candidates often think adding a GSI or adjusting capacity solves throttling, but the root cause is the partition key's low cardinality, which only a higher-cardinality key can fix by distributing writes across partitions.

How to eliminate wrong answers

Option B is wrong because a global secondary index (GSI) with the same date key does not solve the hot partition issue; GSIs have their own throughput and inherit the same write distribution problem, potentially causing throttling on the index. Option C is wrong because reducing the table's write capacity would worsen throttling during business hours, not resolve the underlying hot partition caused by the poor key design. Option D is wrong because S3 Glacier Instant Retrieval is an object storage class for infrequently accessed data with millisecond retrieval, not a replacement for DynamoDB's low-latency, high-throughput key-value access, and moving the table would break the API's real-time requirements.

Full explanation →

608

MCQhard

A payments API uses Amazon SQS. Poison messages are repeatedly failing and blocking useful retries. What should the architect configure? The design must avoid adding custom operational scripts.

A.A FIFO queue without a redrive policy

B.A dead-letter queue with an appropriate maxReceiveCount

C.A larger message retention period only

D.Short polling instead of long polling

AnswerB

A DLQ isolates messages that fail repeatedly so they can be investigated without disrupting normal processing.

Why this answer

A dead-letter queue (DLQ) with an appropriate maxReceiveCount allows messages that repeatedly fail processing to be moved out of the source queue after a specified number of receive attempts. This prevents poison messages from blocking retries and consuming processing resources, without requiring custom operational scripts.

Exam trap

The trap here is that candidates may think increasing retention or switching polling modes solves poison messages, but only a DLQ with maxReceiveCount directly addresses repeated failures without custom scripts.

How to eliminate wrong answers

Option A is wrong because a FIFO queue without a redrive policy does not automatically handle poison messages; it still requires a DLQ configuration to move failing messages out. Option C is wrong because increasing the message retention period only keeps messages longer but does not prevent poison messages from repeatedly failing and blocking retries. Option D is wrong because short polling (immediate return with fewer messages) does not address poison message handling; it only affects message availability and latency, not failure management.

Full explanation →

609

MCQmedium

A company stores application logs in an S3 bucket. They retain logs for 180 days. Compliance requires that the logs be immutable once written, but the business only reviews logs about once per month. Currently, the team stores everything in S3 Standard, and their monthly S3 bill is too high. They want to reduce storage cost without changing the requirement to keep logs for 180 days. Which lifecycle approach best meets the goal?

A.Use a lifecycle policy to transition objects older than 30 days to S3 Standard-IA, and keep them there until day 180.

B.Use a lifecycle policy to transition objects older than 30 days to S3 Glacier Deep Archive and delete after 30 days.

C.Use a lifecycle policy to transition objects older than 30 days to S3 Intelligent-Tiering with no minimum storage duration.

D.Disable lifecycle management and instead lower costs by deleting objects immediately after they are written.

AnswerA

Logs accessed about monthly match Standard-IA economics and still provide fast retrieval.

Why this answer

Option A is correct because it transitions logs to S3 Standard-IA after 30 days, which reduces storage costs while still meeting the 180-day retention requirement. S3 Standard-IA is designed for data accessed less frequently but requires rapid access when needed, aligning with the monthly review pattern. The lifecycle policy keeps objects in S3 Standard-IA until day 180, ensuring immutability (via S3 Object Lock or bucket policies) and compliance without premature deletion.

Exam trap

The trap here is that candidates may choose S3 Intelligent-Tiering (Option C) thinking it automatically optimizes costs, but it does not guarantee savings for monthly access patterns and has a 30-day minimum duration per tier, making it less cost-effective than a direct transition to S3 Standard-IA.

How to eliminate wrong answers

Option B is wrong because transitioning to S3 Glacier Deep Archive and deleting after 30 days violates the 180-day retention requirement, as objects would be removed far too early. Option C is wrong because S3 Intelligent-Tiering has a minimum storage duration of 30 days per tier transition, and it does not guarantee cost savings for logs accessed only once per month; it also may incur monitoring and automation costs. Option D is wrong because deleting objects immediately after they are written violates the 180-day retention requirement and compliance needs, and it does not address cost optimization through lifecycle transitions.

Full explanation →

610

MCQeasy

A content publishing system exposes a static website from S3 and CloudFront. Users should still receive cached pages if the S3 origin has a short outage. Which feature helps most? The design must avoid adding custom operational scripts.

A.IAM Access Analyzer

B.AWS Backup Vault Lock

C.CloudFront caching with appropriate TTLs

D.S3 Select

AnswerC

CloudFront can serve cached content from edge locations when the origin is temporarily unavailable.

Why this answer

CloudFront caches responses from the S3 origin based on configured TTLs (Cache-Control or Expires headers). If the S3 origin experiences a short outage, CloudFront can still serve cached content to users as long as the TTL has not expired, ensuring availability without custom scripts. This is the most direct and resilient feature for this use case.

Exam trap

The trap here is that candidates may confuse backup or access control features (like Backup Vault Lock or IAM Access Analyzer) with availability mechanisms, or think S3 Select provides caching, when the correct answer is simply leveraging CloudFront's built-in caching TTLs to serve stale content during origin outages.

How to eliminate wrong answers

Option A is wrong because IAM Access Analyzer helps identify unintended access to resources, not caching or origin resilience. Option B is wrong because AWS Backup Vault Lock prevents deletion of backups, not caching or serving stale content during origin outages. Option D is wrong because S3 Select is a feature to retrieve subsets of data from objects using SQL queries, not related to caching or origin resilience.

Full explanation →

611

MCQmedium

A marketing site stores logs in S3. Logs are queried for 30 days, rarely accessed for one year, and then retained for compliance. What should reduce storage cost?

A.S3 lifecycle policy that transitions objects to lower-cost storage classes over time

B.Keep all logs in S3 Standard indefinitely

C.Use EBS snapshots for the logs

D.Move all logs immediately to S3 Glacier Deep Archive

AnswerA

Lifecycle rules automate transitions based on age, matching storage cost to access patterns.

Why this answer

An S3 Lifecycle policy automates the transition of objects from S3 Standard (frequently accessed) to lower-cost storage classes like S3 Standard-IA (infrequent access) after 30 days, then to S3 Glacier Deep Archive for long-term compliance retention. This matches the access pattern: frequent queries for 30 days, rare access for a year, then archival storage, minimizing cost without manual intervention.

Exam trap

The trap here is that candidates might choose immediate archiving (Option D) to minimize storage cost, overlooking the 30-day query requirement and the retrieval latency/cost of Glacier Deep Archive, or mistakenly think EBS snapshots (Option C) are a valid alternative for log storage.

How to eliminate wrong answers

Option B is wrong because keeping all logs in S3 Standard indefinitely incurs the highest per-GB storage cost, ignoring the significant cost savings from transitioning to lower-cost tiers for rarely accessed and archived data. Option C is wrong because EBS snapshots are block-level backups for EC2 volumes, not designed for object storage of logs; using them would require an EC2 instance to manage the logs, adding compute and management overhead. Option D is wrong because immediately moving all logs to S3 Glacier Deep Archive would incur retrieval costs and delays (hours) for the 30-day query period, violating the requirement for frequent queries during that time.

Full explanation →

612

MCQmedium

A marketing team runs a report-generation process that must execute once per day at 02:00 UTC. It usually completes in 10315 minutes, but sometimes takes up to 45 minutes due to varying data volumes. They currently run the workload on an EC2 instance that is always on, which wastes money during off-hours. The team wants to minimize operational overhead and pay mainly for actual execution time. What is the best architecture choice?

A.Use a scheduled Amazon EC2 Auto Scaling group that keeps a minimum of one instance running at all times.

B.Use an EventBridge schedule to run the report as an Amazon ECS task on AWS Fargate and write results to S3.

C.Use AWS Lambda triggered by an EventBridge schedule at 02:00 UTC and write results to S3.

D.Use an EMR cluster provisioned daily with manual teardown to ensure the instance is always available before 02:00.

AnswerB

Fargate allows the containerized job to run only when scheduled, so the team pays for task runtime instead of keeping an EC2 instance always on.

Why this answer

Amazon ECS on AWS Fargate is the best choice because it eliminates the need to manage servers, scales automatically, and charges only for the vCPU and memory resources consumed during task execution. The EventBridge schedule triggers the Fargate task at 02:00 UTC, and the report is written to S3, which provides durable, cost-effective storage. This architecture minimizes operational overhead and cost by avoiding an always-on EC2 instance.

Exam trap

The trap here is that candidates may choose AWS Lambda without considering its 15-minute execution timeout, which cannot handle the 45-minute maximum runtime of this report-generation process.

How to eliminate wrong answers

Option A is wrong because an Auto Scaling group with a minimum of one instance still keeps an EC2 instance running 24/7, incurring costs during off-hours and not paying mainly for actual execution time. Option C is wrong because AWS Lambda has a maximum execution timeout of 15 minutes (900 seconds), which cannot accommodate the report-generation process that sometimes takes up to 45 minutes. Option D is wrong because provisioning an EMR cluster daily with manual teardown introduces significant operational overhead and does not minimize costs, as EMR clusters incur charges for underlying EC2 instances even when idle.

Full explanation →

613

MCQhard

Based on the exhibit, downstream payment timeouts cause EventBridge deliveries to back up and some events are retried until they age out. What change best improves resilience and preserves events during downstream outages?

A.Increase the Lambda timeout so each invocation can wait longer for the payment API.

B.Put an Amazon SQS queue between EventBridge and the consumer, and have workers drain the queue with a DLQ for poison messages.

C.Switch the target to a Lambda function with reserved concurrency of zero during outages.

D.Replace EventBridge with CloudWatch Logs subscriptions so the consumer can poll the log stream later.

AnswerB

SQS is the right durability and buffering layer for this requirement. EventBridge can publish orders.checkout events to a queue, and workers can consume them at a controlled rate even when the payment API is unavailable. This decouples event ingestion from downstream processing, absorbs bursts, and preserves events until the outage ends. A DLQ provides a safe landing zone for messages that continue to fail after retries so they are not silently dropped.

Why this answer

Option B is correct because introducing an SQS queue between EventBridge and the consumer decouples the event delivery from the downstream payment API. During outages, events are stored durably in SQS and can be processed later without being lost. A Dead Letter Queue (DLQ) captures events that fail repeatedly, preventing poison messages from blocking the queue and ensuring no events age out due to retry exhaustion.

Exam trap

The trap here is that candidates often assume increasing timeouts or concurrency adjustments can fix backpressure issues, but they fail to recognize that decoupling with a durable queue is the only way to preserve events during extended downstream outages without losing them to retry expiration.

How to eliminate wrong answers

Option A is wrong because increasing the Lambda timeout does not prevent events from backing up or aging out; it only allows a single invocation to wait longer, which can exacerbate concurrency limits and still result in timeouts if the downstream API is unavailable. Option C is wrong because setting reserved concurrency to zero during outages would completely stop all Lambda invocations, causing all events to be dropped or immediately sent to the DLQ, rather than preserving them for later processing. Option D is wrong because replacing EventBridge with CloudWatch Logs subscriptions does not provide a durable, replayable buffer; log subscriptions are designed for real-time streaming and cannot natively pause or retry deliveries during outages, leading to event loss.

Full explanation →

614

MCQeasy

Your application uses ElastiCache Redis as a cache for user profiles stored in DynamoDB. You must ensure that when a profile is updated, subsequent reads see the latest value quickly. Which cache strategy is generally the best fit for this requirement?

A.Write to DynamoDB only, and never update or invalidate the Redis cache.

B.Use a cache-aside approach with TTL plus explicit invalidation after writes.

C.Cache only for reads, and do not fetch from DynamoDB when a key is missing.

D.Rely on eventual consistency of Redis replication to propagate updates to all nodes.

AnswerB

A cache-aside (lazy loading) pattern reads from cache first; if missing/expired, it fetches from the source of truth. After an update, explicitly invalidating or updating the cached entry ensures subsequent reads quickly reflect changes. TTL provides protection against missed invalidations while invalidation accelerates correctness after writes.

Why this answer

The cache-aside (lazy loading) pattern with TTL plus explicit invalidation ensures that after a write to DynamoDB, the stale Redis entry is removed, forcing the next read to fetch the updated profile from DynamoDB and repopulate the cache. This minimizes the window of inconsistency while keeping cache management simple and efficient for user profile workloads.

Exam trap

The trap here is that candidates may confuse cache-aside with write-through or write-behind patterns, or assume that Redis replication alone can solve cache consistency, when in fact explicit invalidation is required to ensure reads see the latest value after a write to the primary data store.

How to eliminate wrong answers

Option A is wrong because never updating or invalidating the cache means reads will serve stale data indefinitely, violating the requirement that subsequent reads see the latest value quickly. Option C is wrong because caching only for reads and not fetching from DynamoDB on a cache miss would result in cache misses returning no data, effectively breaking the application's ability to serve profiles. Option D is wrong because relying on eventual consistency of Redis replication does not address the core issue of cache staleness after a write; Redis replication propagates data between nodes but does not invalidate or update cached entries that were written before the DynamoDB update.

Full explanation →

615

MCQmedium

A customer-managed KMS key (CMK) encrypts SQS messages. A consumer service uses an IAM role that includes kms:Decrypt permission for that CMK. After a security change, the consumer fails with: "AccessDeniedException: kms:Decrypt is not allowed" CloudTrail indicates the KMS request is reaching KMS, but the CMK key policy no longer includes the consumer role (or its principal). What is the best fix?

A.Update the CMK key policy to allow the consumer role principal to perform kms:Decrypt on the CMK.

B.Update only the consumer role identity policy because identity policies always override key policies.

C.Enable default encryption on SQS so that KMS permissions are no longer required.

D.Create an S3 bucket policy statement allowing kms:Decrypt because the messages are stored in S3.

AnswerA

For customer-managed keys, the CMK key policy is the authoritative authorization control for KMS operations. Even if the role’s identity policy allows kms:Decrypt, KMS will still deny the request unless the key policy also permits the principal (or a grant permits it).

Why this answer

Option A is correct because the error indicates that the KMS key policy explicitly denies the consumer role's kms:Decrypt request. Since KMS key policies are the primary access control for CMKs, adding the consumer role principal with kms:Decrypt permission resolves the issue. The CloudTrail log confirms the request reaches KMS, so the problem is solely the missing key policy statement.

Exam trap

The trap here is that candidates assume identity policies alone are sufficient for KMS access, but KMS requires explicit authorization in the key policy unless the key policy includes a statement delegating access to IAM policies.

How to eliminate wrong answers

Option B is wrong because identity policies (like IAM role policies) do not override key policies; KMS requires that both the key policy and the identity policy grant the permission, and if the key policy denies or omits the principal, the request fails regardless of the identity policy. Option C is wrong because enabling default encryption on SQS does not eliminate the need for KMS permissions; it would still require the consumer to have kms:Decrypt on the default KMS key, and the consumer's current issue is with a customer-managed CMK, not the default key. Option D is wrong because SQS messages are not stored in S3; SQS is a separate service, and an S3 bucket policy has no effect on KMS permissions for SQS message decryption.

Full explanation →

616

MCQhard

Based on the exhibit, which change will most improve the CloudFront cache hit ratio for the static assets while still serving the same files to all users?

A.Create a custom cache policy that includes only the v query string and excludes cookies.

B.Enable Origin Shield and keep the current cache behavior unchanged.

C.Move the static assets to individual presigned URLs for each viewer.

D.Increase the CloudFront default TTL to 24 hours while continuing to forward all cookies and query strings.

AnswerA

This removes unnecessary cache-key fragmentation. Since all users receive identical static files, forwarding user-specific cookies and irrelevant query strings destroys cache reuse. Keeping only the version parameter preserves correct object variation while allowing many more requests to hit the same cached object at the edge.

Why this answer

Option A is correct because static assets (e.g., images, CSS, JS) are typically served identically to all users, so forwarding a unique query string like 'v' for versioning still allows CloudFront to cache a single object per version. By excluding cookies and other query strings, you prevent cache fragmentation caused by irrelevant variations, directly improving the cache hit ratio. This custom cache policy ensures that requests for the same 'v' value are served from the edge cache rather than forwarded to the origin.

Exam trap

The trap here is that candidates assume increasing TTL or enabling Origin Shield will fix a low cache hit ratio, when the real issue is cache key fragmentation caused by forwarding all cookies and query strings.

How to eliminate wrong answers

Option B is wrong because enabling Origin Shield reduces load on the origin and improves cache fill efficiency, but it does not address the root cause of a low cache hit ratio—forwarding all cookies and query strings still fragments the cache at the edge. Option C is wrong because moving static assets to individual presigned URLs for each viewer would make every request unique, destroying any possibility of caching and drastically reducing the cache hit ratio. Option D is wrong because increasing the default TTL to 24 hours while continuing to forward all cookies and query strings does not solve cache fragmentation; CloudFront still treats requests with different cookies or query strings as separate cache objects, so the cache hit ratio remains low.

Full explanation →

617

MCQeasy

You need to run batch jobs on EC2. The jobs can tolerate interruptions: if an instance is terminated, the job can restart from checkpoints. To reduce compute cost as much as possible, what is the best choice?

A.EC2 On-Demand Instances to avoid interruptions

B.EC2 Spot Instances with checkpoint-based interruption handling

C.Savings Plans to guarantee capacity for the entire year

D.Reserved Instances with no interruption handling

AnswerB

Spot Instances are priced lower because AWS can reclaim capacity. When your workload can be interrupted and later restarted from checkpoints, the interruption model is compatible with Spot, making it the most cost-optimized option among the choices.

Why this answer

Spot Instances offer significant cost savings (up to 90% compared to On-Demand) but can be reclaimed by AWS with a two-minute warning. Since the batch jobs can tolerate interruptions and restart from checkpoints, Spot Instances are the most cost-effective choice. This aligns with the requirement to reduce compute cost as much as possible while handling interruptions gracefully.

Exam trap

The trap here is that candidates often choose On-Demand or Reserved Instances because they fear interruptions, but the question explicitly states the jobs can tolerate interruptions, so the most cost-effective option is Spot Instances, not a more expensive but stable alternative.

How to eliminate wrong answers

Option A is wrong because On-Demand Instances are fully priced and do not provide cost savings; the question explicitly asks to reduce cost as much as possible, and the jobs can tolerate interruptions, so paying full price is unnecessary. Option C is wrong because Savings Plans provide a discount in exchange for a commitment to a consistent amount of compute usage (measured in $/hour) over a 1- or 3-year term, but they do not inherently reduce cost for interruptible batch workloads as much as Spot Instances, and they still require paying for the committed usage even if the job is not running. Option D is wrong because Reserved Instances require a 1- or 3-year commitment and provide a discount over On-Demand, but they are still more expensive than Spot Instances and do not leverage the interruptible nature of the workload; additionally, they lock in capacity that may not be needed continuously.

Full explanation →

618

MCQmedium

Account Y provides a role named AnalyticsReadOnly to engineers in Account X. The role trust policy currently allows sts:AssumeRole from the Account X principal. A new security requirement states that only STS sessions created with MFA are allowed to assume the role. Which trust policy condition is the best choice to enforce MFA for sts:AssumeRole?

A.Add a condition "Bool": { "aws:MultiFactorAuthPresent": "true" } in the role trust policy for the sts:AssumeRole action.

B.Add a condition "StringEquals": { "aws:username": "mfa-user" } in the IAM policy attached to the role.

C.Add a condition requiring "sts:ExternalId" to equal a fixed value in the trust policy.

D.Add a condition "Bool": { "aws:SecureTransport": "true" } in the trust policy to require HTTPS.

AnswerA

aws:MultiFactorAuthPresent is a condition key designed to reflect whether MFA was used when establishing the STS session. By requiring it to be true in the trust policy, STS denies AssumeRole when the caller did not authenticate with MFA.

Why this answer

Option A is correct because the `aws:MultiFactorAuthPresent` condition key checks whether the principal used MFA to obtain the session credentials. By adding a `Bool` condition set to `"true"` in the trust policy for the `sts:AssumeRole` action, only STS sessions that were created after MFA authentication will be allowed to assume the role. This directly enforces the security requirement without affecting other authentication methods.

Exam trap

The trap here is that candidates may confuse `aws:MultiFactorAuthPresent` with other condition keys like `aws:SecureTransport` or `aws:username`, or think that `sts:ExternalId` can enforce MFA, when in fact only the `Bool` condition on `aws:MultiFactorAuthPresent` directly checks MFA status.

How to eliminate wrong answers

Option B is wrong because `aws:username` refers to the IAM user name, not the MFA status; requiring a specific username does not enforce MFA and can be bypassed if that user does not use MFA. Option C is wrong because `sts:ExternalId` is used to prevent the confused deputy problem in cross-account access, not to enforce MFA. Option D is wrong because `aws:SecureTransport` only ensures the session uses HTTPS/TLS, which is already required for AWS API calls, and does not verify MFA usage.

Full explanation →

619

MCQmedium

A trading analytics system deploys 10 EC2 instances that exchange very frequent, low-latency messages over the network. The instances must be placed as close together as possible to minimize network hop count and inter-node jitter. Which deployment choice best matches this requirement?

A.Use a spread placement group to distribute instances across multiple underlying hardware to improve overall availability.

B.Use a cluster placement group so the instances are placed close together to reduce latency and jitter.

C.Use no placement group and rely on the Auto Scaling group to balance instance placement automatically.

D.Use a partition placement group so each instance is assigned to separate failure domains for low variance.

AnswerB

Cluster placement groups place instances close together (within a single Availability Zone when supported) to reduce network hop count and improve inter-instance network performance. This directly targets low-latency, jitter-sensitive communication between many nodes.

Why this answer

A cluster placement group is designed for low-latency, high-throughput scenarios by placing all instances in a single Availability Zone within the same rack or logical cluster, minimizing network hop count and inter-node jitter. This directly meets the requirement for very frequent, low-latency messaging between 10 EC2 instances.

Exam trap

The trap here is that candidates often confuse 'low latency' with 'high availability' and choose a spread placement group (Option A) thinking it reduces jitter, when in fact it increases network distance and latency by distributing instances across hardware.

How to eliminate wrong answers

Option A is wrong because a spread placement group distributes instances across distinct underlying hardware to maximize availability, which increases network distance and latency, opposite to the requirement. Option C is wrong because relying on an Auto Scaling group without a placement group does not guarantee close physical proximity; instances may be placed across different racks or AZs, increasing jitter. Option D is wrong because a partition placement group isolates instances into separate failure domains (partitions) to reduce correlated failures, but this increases network hops between partitions, not minimizing latency.

Full explanation →

620

MCQmedium

A analytics dashboard uses an Application Load Balancer in one Region. Global users need lower network latency to the application without caching dynamic responses. What should be considered? The design must avoid adding custom operational scripts.

A.AWS Global Accelerator

B.S3 Cross-Region Replication

C.AWS Backup cross-Region copy

D.CloudFront only with long TTLs

AnswerA

Global Accelerator routes traffic over the AWS global network to improve performance for TCP/UDP applications without relying on caching.

Why this answer

AWS Global Accelerator uses the AWS global network to route traffic from edge locations to the optimal regional endpoint, reducing latency and jitter for global users. It does not cache content, making it ideal for dynamic responses that cannot be cached. The service requires no custom scripts, as it integrates directly with the Application Load Balancer via a static IP address or DNS name.

Exam trap

The trap here is that candidates often confuse Global Accelerator with CloudFront, assuming both are for caching, but Global Accelerator does not cache content and is specifically designed for non-cacheable, dynamic traffic requiring low latency and fast failover.

How to eliminate wrong answers

Option B (S3 Cross-Region Replication) is wrong because it replicates objects across S3 buckets in different regions, but it does not reduce network latency for dynamic application traffic; it is designed for data redundancy and disaster recovery, not for real-time request routing. Option C (AWS Backup cross-Region copy) is wrong because it copies backup data across regions for compliance or disaster recovery, and it has no impact on live application latency or traffic routing. Option D (CloudFront only with long TTLs) is wrong because CloudFront caches content at edge locations, which violates the requirement to avoid caching dynamic responses; long TTLs would serve stale data, and disabling caching would negate the latency benefit, while custom scripts would be needed to bypass caching for dynamic content.

Full explanation →

621

MCQhard

Based on the exhibit, what is the best change to improve read performance without increasing write latency on the primary database?

A.Create an RDS read replica and direct the reporting queries to the replica endpoint.

B.Convert the DB instance to Multi-AZ so the primary can serve more reads.

C.Increase the primary instance class to a larger size and keep all traffic on one writer.

D.Migrate the reporting workload to DynamoDB to gain faster reads.

AnswerA

A read replica offloads the long-running read-only reports from the primary database, which preserves write performance and reduces read latency for the reporting workload. Because the business accepts slightly stale report data, the asynchronous replication delay is acceptable. This is the most direct and AWS-native way to separate read pressure from writes.

Why this answer

Creating an RDS read replica offloads read-heavy reporting queries from the primary database instance, improving read performance without increasing write latency on the primary. The replica operates asynchronously, so writes on the primary are not blocked or delayed by the reporting workload.

Exam trap

The trap here is confusing Multi-AZ (which only provides failover redundancy) with read replicas (which provide read scaling), leading candidates to incorrectly select Multi-AZ as a performance solution.

How to eliminate wrong answers

Option B is wrong because Multi-AZ provides high availability and automatic failover, not additional read capacity; the standby instance cannot serve reads. Option C is wrong because scaling up the instance class increases both read and write capacity but does not isolate reporting traffic, so write latency could still be impacted by heavy reads. Option D is wrong because migrating to DynamoDB is a full architectural change that does not address the existing RDS read performance issue without increasing write latency; it also introduces data synchronization complexity.

Full explanation →

622

MCQhard

A risk simulation workload in private subnets downloads large amounts of data from S3 through a NAT gateway. NAT data processing charges are high. What should the architect use to reduce cost? The architecture review board prefers a managed AWS-native control.

A.A larger NAT gateway

B.Gateway VPC endpoint for Amazon S3

C.S3 Object Lambda

D.AWS Shield Advanced

AnswerB

A gateway endpoint routes S3 traffic privately without NAT gateway data processing charges.

Why this answer

A Gateway VPC Endpoint for Amazon S3 allows instances in private subnets to access S3 directly over the AWS network without traversing a NAT gateway, eliminating NAT data processing charges. This is a managed AWS-native control that meets the architecture review board's preference, as it uses AWS PrivateLink to route traffic to S3 without requiring an internet gateway or NAT.

Exam trap

The trap here is that candidates often confuse Gateway VPC Endpoints with Interface Endpoints, thinking both reduce costs equally, but Gateway Endpoints are free to use and specifically designed for S3 and DynamoDB, while Interface Endpoints incur hourly charges and per-GB data processing fees.

How to eliminate wrong answers

Option A is wrong because a larger NAT gateway would increase, not reduce, costs; it offers higher throughput but still incurs per-GB data processing charges for all traffic through it. Option C is wrong because S3 Object Lambda is used to transform data on the fly during retrieval from S3, not to reduce data transfer costs from S3 to a VPC; it does not address NAT gateway charges. Option D is wrong because AWS Shield Advanced is a DDoS protection service that does not reduce data transfer costs or NAT gateway charges; it adds cost for enhanced security.

Full explanation →

623

MCQmedium

A company runs a stateful analytics workload on EC2 instances that use EBS volumes. The data must be restorable in another Region after a major outage, with frequent point-in-time recovery. Which approach provides the most suitable replication mechanism for the EBS-backed data?

A.Create scheduled EBS snapshots and copy them to another Region, then restore the volumes from those snapshots during recovery.

B.Enable EBS multi-attach to spread the workload across AZs and replicate snapshots automatically between Regions.

C.Use RDS read replicas in another Region and keep the analytics dataset in an RDS instance only.

D.Rely on instance store for durability and copy only AMIs across Regions.

AnswerA

Snapshotting and cross-Region copying gives point-in-time images of EBS volumes that can be restored in the target Region.

Why this answer

Scheduled EBS snapshots provide point-in-time backups of EBS volumes, which can be copied to another Region using the cross-Region snapshot copy feature. During recovery, you restore volumes from those snapshots in the target Region, ensuring the data is restorable after a major outage. This approach meets the requirements for frequent point-in-time recovery and cross-Region durability.

Exam trap

The trap here is that candidates may confuse EBS multi-attach (which is for high availability within a single AZ) with cross-Region replication, or mistakenly think instance store provides durability for long-term data recovery.

How to eliminate wrong answers

Option B is wrong because EBS multi-attach allows a single EBS volume to be attached to multiple EC2 instances within the same Availability Zone, but it does not replicate snapshots automatically between Regions or provide cross-Region disaster recovery. Option C is wrong because RDS read replicas are for relational databases, not for analytics workloads running on EC2 with EBS volumes; moving the dataset to RDS would require a database migration and does not replicate EBS-backed data. Option D is wrong because instance store volumes are ephemeral and do not persist data across instance stops or terminations, making them unsuitable for durable data that needs point-in-time recovery; copying AMIs across Regions does not replicate the underlying EBS data.

Full explanation →

624

MCQmedium

A company hosts a web application on EC2 instances behind an Application Load Balancer (ALB) in us-east-1. A static failover site is hosted in an S3 bucket with static website hosting enabled. The company needs automatic DNS failover to the S3 bucket if the primary ALB becomes unhealthy. Which Route 53 configuration achieves this?

A.Configure Route 53 Failover routing with a health check on the ALB as PRIMARY and the S3 bucket website endpoint as SECONDARY

B.Configure Route 53 Weighted routing with 100% weight on the ALB and 0% on the S3 bucket

C.Configure Route 53 Latency routing with records in both regions to route to the healthiest endpoint

D.Configure Route 53 Geolocation routing with North American users directed to the ALB and all others to S3

AnswerA

Failover routing with a health-checked PRIMARY (ALB) and SECONDARY (S3) provides automatic DNS switchover. When the ALB health check fails, Route 53 returns the S3 endpoint automatically.

Why this answer

Route 53 Failover routing uses health checks to route traffic to a primary resource and automatically switch to a secondary when the primary health check fails.

Configuration: Create a Route 53 health check targeting the ALB endpoint. Create a PRIMARY alias A record pointing to the ALB with the health check associated. Create a SECONDARY alias A record pointing to the S3 static website endpoint. When the ALB health check fails, Route 53 returns the S3 endpoint automatically.

Exam trap

Route 53 offers multiple routing policies. Failover routing is active-passive — one primary resource, one standby. Weighted routing splits traffic percentages (active-active).

Latency routing picks the lowest-latency endpoint. Geolocation routes by user geography. Only Failover routing provides automatic primary/secondary switchover based on health checks.

Weighted routing at 100%/0% does NOT failover when the 100% target fails.

Why the other options are wrong

Weighted routing at 100%/0% does not failover. When the 100% target (ALB) is unhealthy, Route 53 does not automatically redirect to the 0% target (S3). Weighted routing splits traffic by percentage without health-check-based switching.

Latency routing routes to the lowest-latency endpoint for each client. It does not implement primary/secondary logic. If us-east-1 is unhealthy, some clients may still be routed there unless combined with health checks (but even then this is not a defined primary/secondary failover).

Geolocation routing directs traffic by user geography — North American users always go to the ALB even when it fails. S3 only receives other-region traffic. This is not a failover configuration.

Full explanation →

625

MCQhard

A DynamoDB table for a travel booking site has a partition key based only on the current date. Write throttling occurs during business hours. What is the best design change? The architecture review board prefers a managed AWS-native control.

A.Create a global secondary index with the same date key

B.Move the table to S3 Glacier Instant Retrieval

C.Reduce the table's write capacity

D.Use a higher-cardinality partition key that distributes writes across partitions

AnswerD

A low-cardinality hot partition causes throttling; a better key spreads writes more evenly.

Why this answer

Option D is correct because using a low-cardinality partition key like the current date causes all writes to land on a single partition, leading to throttling. By choosing a higher-cardinality partition key (e.g., combining date with a user ID or booking ID), writes are distributed evenly across multiple partitions, leveraging DynamoDB's internal partitioning to handle the throughput. This is a managed, AWS-native design change that resolves hot partition issues without additional services.

Exam trap

The trap here is that candidates often confuse a GSI as a solution for write performance, when in fact GSIs only help with read query patterns and do not alleviate write hot spots on the base table.

How to eliminate wrong answers

Option A is wrong because creating a global secondary index (GSI) with the same date key does not solve the write throttling; GSIs have their own write capacity and inherit the same hot partition problem from the base table's partition key. Option B is wrong because moving the table to S3 Glacier Instant Retrieval is not a managed AWS-native control for DynamoDB write throttling; S3 is a different storage service and cannot replace DynamoDB's real-time transactional write capabilities. Option C is wrong because reducing the table's write capacity would worsen throttling during business hours, as it lowers the maximum allowed writes per second, directly contradicting the need to handle high write demand.

Full explanation →

626

MCQmedium

A company hosts an internal HTTP API on an internal Network Load Balancer (NLB) in VPC A. A partner team in a separate AWS account needs access, but their VPC CIDR overlaps with VPC A, so VPC peering is not feasible. Security requirements state the API must remain non-public (no internet-facing ALB/NLB) and access must use AWS private networking. Which architecture best meets these requirements?

A.Use AWS PrivateLink by creating a VPC endpoint service backed by the NLB in VPC A, then create an interface VPC endpoint in the partner VPC with appropriate endpoint access controls.

B.Expose the NLB to the internet with an Elastic IP and restrict access using the NLB’s security group only.

C.Use VPC peering between VPC A and the partner VPC and update route tables to resolve the overlap.

D.Deploy a NAT gateway in VPC A and route the partner’s traffic to the NLB through the NAT gateway.

AnswerA

PrivateLink exposes the service privately via interface endpoints, avoiding peering and keeping the NLB non-public for secure partner access.

Why this answer

Option A is correct because AWS PrivateLink allows you to expose an internal NLB in VPC A as a VPC endpoint service, and the partner team can create an interface VPC endpoint in their own VPC to connect privately. This solution avoids overlapping CIDR issues because traffic flows through PrivateLink’s network interfaces using private IPs, not through VPC peering or internet routing. It also satisfies the non-public requirement since the API remains accessible only via private networking within AWS.

Exam trap

The trap here is that candidates may assume VPC peering can handle overlapping CIDRs with route table adjustments, but AWS explicitly prohibits overlapping CIDRs in VPC peering connections, making PrivateLink the only viable private networking option.

How to eliminate wrong answers

Option B is wrong because exposing the NLB to the internet with an Elastic IP makes the API publicly accessible, violating the security requirement that the API must remain non-public. Option C is wrong because VPC peering cannot resolve overlapping CIDR blocks; overlapping CIDRs cause routing conflicts and are explicitly unsupported for VPC peering. Option D is wrong because a NAT gateway is used for outbound internet traffic from a private subnet, not for inbound traffic from another VPC, and it would not provide private connectivity between VPCs with overlapping CIDRs.

Full explanation →

627

Multi-Selecthard

A low-latency market-data engine runs 10 EC2 instances that exchange small messages thousands of times per second. The team wants the lowest possible network latency and jitter, and they can tolerate single-AZ placement for this tier because another layer handles disaster recovery. Which changes should they make? Select three.

Select 3 answers

A.Use a cluster placement group for the instances.

B.Use Nitro-based instances with enhanced networking support.

C.Launch all latency-sensitive nodes in one Availability Zone to fit the cluster placement group constraint.

D.Use a spread placement group to maximize low-latency communication across the fleet.

E.Distribute the instances across multiple Availability Zones to reduce intra-cluster latency.

AnswersA, B, C

Correct. Cluster placement groups place instances physically close together within an Availability Zone to minimize latency and maximize network throughput. This is the standard AWS design for tightly coupled workloads that depend on frequent east-west communication.

Why this answer

A cluster placement group is designed for low-latency, high-throughput scenarios by ensuring instances are in close proximity within a single Availability Zone, which minimizes network latency and jitter. This directly meets the requirement for the lowest possible network latency and jitter for the market-data engine.

Exam trap

The trap here is that candidates may confuse spread placement groups (designed for fault tolerance) with cluster placement groups (designed for low latency), or incorrectly think distributing across AZs reduces latency when it actually increases it.

Full explanation →

628

Multi-Selectmedium

A company runs an order system on EC2 with a self-managed PostgreSQL database, a self-managed RabbitMQ broker, and a shared file server for attachments. The team wants to reduce patching, backups, and cluster administration while keeping the architecture simple and using managed services where possible. Which three changes should they make? Select three.

Select 3 answers

A.Replace the database with Amazon RDS for PostgreSQL.

B.Replace the broker with Amazon MQ for RabbitMQ.

C.Store attachments in Amazon S3 instead of the shared file server.

D.Keep the database on EC2 and add more EBS volumes.

E.Move RabbitMQ to Dedicated Hosts for better isolation.

AnswersA, B, C

Amazon RDS handles routine database operations such as backups, patching, and maintenance windows, which reduces administrative overhead. It is the managed-service replacement for a self-managed PostgreSQL database.

Why this answer

Amazon RDS for PostgreSQL is a managed database service that automates patching, backups, and replication, eliminating the need for self-managing PostgreSQL on EC2. This directly reduces the operational overhead of cluster administration and aligns with the goal of using managed services.

Exam trap

The trap here is that candidates may think adding more EBS volumes or using Dedicated Hosts reduces administrative overhead, but these options actually increase complexity or cost without moving to a managed service, which is the core requirement of the question.

Full explanation →

629

MCQhard

Based on the exhibit, the platform team wants developers to create application roles for Lambda and ECS, but no developer-created role may ever exceed the approved permission set. Which change best meets this requirement?

A.Remove all IAM permissions from AppProvisioner and require a central security team to create every role manually.

B.Attach a permissions boundary strategy to the delegated workflow and require every created role to include that boundary using the iam:PermissionsBoundary condition.

C.Allow developers to keep creating roles, but add a CloudTrail rule that alerts security after a privileged policy is attached.

D.Move the delegated IAM workflow into a separate VPC and restrict it with security groups and network ACLs.

AnswerB

A permissions boundary creates an upper limit on what any developer-created role can ever do, even if someone later attaches broader policies. Requiring the boundary during role creation prevents privilege escalation while still allowing delegated self-service for approved application roles. This is the standard AWS pattern when teams need to create roles but must remain inside a strict security envelope.

Why this answer

Option B is correct because it uses an IAM permissions boundary attached to the delegated role creation workflow, combined with the `iam:PermissionsBoundary` condition key to enforce that every developer-created role must include that boundary. This ensures no role can exceed the approved permission set, as the boundary acts as a maximum limit on permissions, even if the role's policy grants more. The delegated workflow (e.g., AWS Service Catalog or IAM Role creation via Lambda) can create roles, but the boundary prevents any escalation beyond the predefined scope.

Exam trap

The trap here is that candidates confuse reactive monitoring (like CloudTrail alerts) with preventive controls, or mistakenly think network isolation (VPC/security groups) can restrict IAM permissions, when only IAM boundaries or service control policies (SCPs) can cap permissions at the identity level.

How to eliminate wrong answers

Option A is wrong because removing all IAM permissions from AppProvisioner and requiring manual role creation by a central security team eliminates the delegation entirely, which contradicts the requirement that developers create application roles for Lambda and ECS; it also introduces operational bottlenecks and does not leverage IAM boundaries. Option C is wrong because adding a CloudTrail rule to alert after a privileged policy is attached is reactive, not preventive; it does not stop a developer-created role from exceeding the approved permission set at creation time, violating the 'may never exceed' requirement. Option D is wrong because moving the delegated IAM workflow into a separate VPC with security groups and network ACLs addresses network-level access control, not IAM permission boundaries; it cannot restrict the permissions of IAM roles created by developers, as IAM policies are not governed by network constructs.

Full explanation →

630

MCQhard

A Lambda-based travel booking site has unpredictable traffic spikes and users see latency caused by cold starts. The function must respond consistently during expected campaign windows. What should be configured? The team wants the control to be enforceable during normal operations.

A.Provisioned concurrency during campaign windows

B.A larger deployment package

C.CloudTrail data events

D.Reserved concurrency only

AnswerA

Provisioned concurrency keeps execution environments initialized and reduces cold-start latency.

Why this answer

Provisioned concurrency initializes a specified number of execution environments in advance, eliminating cold starts for those instances. During campaign windows, this ensures consistent latency by keeping functions warm and ready to handle spikes immediately. The team can enforce this configuration only during expected high-traffic periods, leaving normal operations unaffected.

Exam trap

The trap here is confusing reserved concurrency (which only limits scaling) with provisioned concurrency (which pre-warms instances); candidates often pick reserved concurrency thinking it prevents cold starts, but it does not address initialization latency.

How to eliminate wrong answers

Option B is wrong because a larger deployment package increases the time to download and initialize the function code, worsening cold starts rather than solving them. Option C is wrong because CloudTrail data events record API activity for auditing and do not affect Lambda execution latency or concurrency. Option D is wrong because reserved concurrency only caps the maximum concurrent executions for a function, preventing it from consuming all available concurrency but does not pre-warm instances or reduce cold starts.

Full explanation →

631

MCQeasy

Your team hosts versioned static assets (for example, /static/app-<buildHash>.js). Each build hash never changes, but you release new files on new URLs. To maximize cache hit rate and reduce origin load using CloudFront, what should you do when generating HTTP responses for these assets?

A.Set Cache-Control: no-cache so CloudFront always revalidates with the origin

B.Set Cache-Control: public, max-age=31536000, immutable for the versioned assets

C.Set Cache-Control: max-age=0 and rely on CloudFront to cache by default

D.Disable CloudFront caching and forward all headers and query strings to the origin

AnswerB

For content-addressed/versioned URLs, a long max-age lets CloudFront treat the object as fresh for a long period. Adding the immutable directive tells clients not to revalidate while the max-age is still valid, supporting high cache hit rates and fewer origin fetches for repeat requests.

Why this answer

Option B is correct because setting `Cache-Control: public, max-age=31536000, immutable` tells CloudFront and browsers to cache the versioned asset for one year (31536000 seconds) and never revalidate, since the URL changes with each new build. The `immutable` directive (RFC 8246) signals that the content will never change on that URL, eliminating conditional revalidation requests and maximizing cache hits, which reduces origin load.

Exam trap

The trap here is that candidates confuse 'no-cache' (which still allows caching but forces revalidation) with 'no-store' (which forbids caching entirely), or they assume that `max-age=0` is acceptable for versioned assets, not realizing it forces revalidation and reduces cache efficiency.

How to eliminate wrong answers

Option A is wrong because `Cache-Control: no-cache` forces CloudFront to revalidate every request with the origin, which increases origin load and defeats the purpose of caching immutable assets. Option C is wrong because `max-age=0` tells CloudFront and browsers to treat the response as stale immediately, requiring revalidation on every request, which reduces cache hit rate and increases origin load. Option D is wrong because disabling CloudFront caching and forwarding all headers/query strings bypasses the CDN entirely, causing every request to go to the origin, which maximizes origin load and eliminates caching benefits.

Full explanation →

632

Multi-Selectmedium

A single EC2 instance hosts a low-latency database cache that writes a large random working set to block storage. The application needs sustained high IOPS and low latency, and the storage must remain attached to the instance while it runs. Which two design choices best meet the requirement? Select two.

Select 2 answers

A.Use an io2 Block Express EBS volume for the highest sustained IOPS and low-latency performance.

B.Stripe multiple EBS volumes together with RAID 0 to increase aggregate IOPS and throughput.

C.Use an S3 bucket as the backing store because object storage scales automatically.

D.Choose a cold HDD-based volume so the cache has durable low-cost storage.

E.Use the root volume from a T-series instance because burst credits can absorb the write spikes.

AnswersA, B

io2 Block Express is designed for demanding block-storage workloads that need very high, consistent IOPS with low latency. It is a strong fit when the data must remain on attached EBS storage rather than on ephemeral instance store.

Why this answer

Option A is correct because io2 Block Express EBS volumes are designed for mission-critical workloads requiring sustained high IOPS and low latency. They offer up to 256,000 IOPS per volume with sub-millisecond latency, making them ideal for a low-latency database cache that writes a large random working set to block storage. The storage remains attached to the EC2 instance while it runs, meeting the requirement for persistent block-level storage.

Exam trap

The trap here is that candidates may confuse burst credits (which apply to CPU performance on T-series instances) with storage performance, or assume that object storage like S3 can serve as a low-latency block device, when in fact only EBS volumes provide the required persistent, low-latency block storage attached to an EC2 instance.

Full explanation →

633

Multi-Selecthard

A SaaS vendor has a steady 24/7 control plane on ECS and several small event-driven tasks that currently run on a separate always-on service. Management wants the billing discount that applies across both ECS and Lambda usage without committing to a specific instance family. Which two actions are best? Select two.

Select 2 answers

A.Buy a Compute Savings Plan for the predictable baseline usage.

B.Move the event-driven tasks to AWS Lambda instead of keeping a separate always-on service.

C.Buy an EC2 Instance Savings Plan tied to one instance family for all workloads.

D.Use Spot Instances for the control plane because it is the largest bill.

E.Increase the ECS desired count so Lambda can be removed.

AnswersA, B

Correct. A Compute Savings Plan discounts predictable compute spend across ECS and Lambda without binding the team to one instance family. That flexibility matches a mixed compute estate and avoids overcommitting.

Why this answer

A Compute Savings Plan offers the largest discount (up to 66%) across both ECS and Lambda usage without committing to a specific instance family, which matches the requirement to cover both services flexibly. It applies to any EC2 instance, including those used by ECS, and to AWS Lambda compute, making it ideal for a mixed workload with a predictable baseline.

Exam trap

The trap here is that candidates confuse Savings Plans with Reserved Instances or Spot Instances, assuming a specific instance family commitment is required, or they think Spot Instances can replace a billing discount mechanism for a steady workload.

Full explanation →

634

MCQhard

Based on the exhibit, which storage design best supports the application servers' shared working directory requirement?

A.Mount Amazon EFS on every EC2 instance and use it as the shared workspace.

B.Attach one gp3 EBS volume to each instance and synchronize the files with cron jobs.

C.Store the artifacts in S3 and have each node read them directly from S3 as a filesystem.

D.Use instance store on each instance because it provides the fastest local file access.

AnswerA

EFS provides shared, persistent, POSIX-compliant file access across multiple EC2 instances and Availability Zones. That matches the requirement that all nodes see the same workspace immediately and that files survive instance replacement. It is the right choice when the application needs a common filesystem rather than an object store or local-only disk.

Why this answer

Amazon EFS provides a fully managed, NFS-based shared file system that can be mounted concurrently on multiple EC2 instances across multiple Availability Zones. This directly satisfies the requirement for a shared working directory where all application servers can read and write files simultaneously without additional synchronization overhead.

Exam trap

The trap here is that candidates often confuse object storage (S3) with shared file storage, assuming S3 can serve as a drop-in replacement for a POSIX filesystem, but S3 lacks file locking, atomic renames, and low-latency metadata operations required for a shared working directory.

How to eliminate wrong answers

Option B is wrong because attaching separate gp3 EBS volumes to each instance and synchronizing files with cron jobs introduces complexity, potential data inconsistency, and latency, and does not provide a true shared filesystem; EBS volumes can only be attached to a single instance at a time (unless using multi-attach, which is limited to specific scenarios). Option C is wrong because while S3 can be accessed from EC2 instances, it is an object storage service, not a POSIX-compliant filesystem; using S3 as a filesystem via tools like s3fs introduces performance overhead, eventual consistency issues, and lacks native file locking, making it unsuitable for a shared working directory requiring concurrent writes. Option D is wrong because instance store provides ephemeral, block-level storage that is physically attached to the host, but it is temporary and data is lost on instance stop or termination; it cannot be shared across multiple instances, so it does not meet the shared working directory requirement.

Full explanation →

635

MCQeasy

A web application runs on an Amazon EC2 Auto Scaling group (ASG) behind an Application Load Balancer (ALB). The ALB is configured to use at least two Availability Zones (AZs), but the ASG currently uses subnets in only one AZ. If that AZ becomes unavailable, the application stops serving requests. Which change most directly improves resilience to an AZ outage?

A.Keep the ASG in one Availability Zone, but reduce ALB health check intervals.

B.Place the ASG across multiple Availability Zones by configuring it with subnets in at least two AZs.

C.Switch the load balancer from an ALB to an NLB to remove HTTP health check dependency.

D.Add an Amazon SQS queue to buffer requests during failures.

AnswerB

An ASG launches instances into the AZs of the subnets you specify. By placing the ASG in at least two AZs, the ALB can route traffic to healthy targets in the remaining AZ(s) if one AZ fails, enabling recovery as new instances maintain desired capacity.

Why this answer

Option B is correct because distributing an Auto Scaling group across multiple Availability Zones (AZs) ensures that if one AZ fails, the remaining AZs continue to serve traffic. The Application Load Balancer (ALB) is already configured for at least two AZs, but the ASG’s single-AZ subnet placement creates a single point of failure. By adding subnets in at least two AZs to the ASG, the application becomes resilient to an AZ outage without any other architectural changes.

Exam trap

The trap here is that candidates assume the ALB’s multi-AZ configuration automatically protects the application, overlooking that the ASG must also span multiple AZs to provide compute redundancy.

How to eliminate wrong answers

Option A is wrong because reducing health check intervals only detects failures faster but does not eliminate the single point of failure; if the sole AZ becomes unavailable, no healthy instances exist to serve traffic. Option C is wrong because switching from an ALB to an NLB does not address the root cause—the ASG is still in one AZ—and HTTP health checks are not the issue; the ALB can already perform health checks across AZs. Option D is wrong because adding an SQS queue buffers requests but does not provide compute capacity in another AZ; without instances in a second AZ, the queue cannot process requests during an AZ outage.

Full explanation →

636

MCQmedium

A test environment runs on x86 EC2 instances and uses open-source software with no architecture-specific licensing restriction. What should be evaluated to reduce compute cost? The design must avoid adding custom operational scripts.

A.Cross-Region data replication for all data

B.AWS Graviton-based instances after performance testing

C.io2 Block Express volumes for all instances

D.Dedicated Hosts by default

AnswerB

Graviton instances often provide better price performance for compatible workloads.

Why this answer

Option B is correct because AWS Graviton-based instances (ARM architecture) offer up to 40% better price-performance compared to x86 instances for many workloads. Since the environment uses open-source software with no architecture-specific licensing restrictions, migrating to Graviton after performance testing can significantly reduce compute costs without requiring custom operational scripts, as AWS provides native support for ARM-based instances.

Exam trap

The trap here is that candidates may confuse cost optimization with performance improvement or licensing requirements, leading them to select Dedicated Hosts or high-performance storage options that actually increase costs.

How to eliminate wrong answers

Option A is wrong because cross-region data replication increases data transfer and storage costs, and it does not directly address compute cost reduction. Option C is wrong because io2 Block Express volumes are high-performance, high-cost EBS volumes designed for I/O-intensive workloads, not for reducing compute costs, and they would increase storage costs unnecessarily. Option D is wrong because Dedicated Hosts incur additional per-host charges and are used for licensing or compliance requirements, not for cost optimization; they would increase compute costs rather than reduce them.

Full explanation →

637

MCQeasy

An order-processing system publishes an event whenever a payment succeeds. Three downstream services (inventory, shipping, and analytics) must react independently. Analytics sometimes has high latency, but order processing must not be blocked. What is the best AWS approach to decouple these consumers?

A.Have order processing call each service synchronously via HTTPS and retry on failures.

B.Publish payment events to SNS (or EventBridge) and let each downstream service consume independently (for example, via SQS queues or other async targets).

C.Store events in a single relational database table and let consumers poll continuously for new rows.

D.Send events directly from the producer to each consumer EC2 instance using SSH tunnels.

AnswerB

Using pub/sub decouples the producer from consumers. Order processing publishes once and can complete without waiting for each downstream service. Each consumer receives events independently, so analytics latency does not directly block inventory or shipping processing.

Why this answer

Option B is correct because Amazon SNS (or EventBridge) enables asynchronous, fan-out messaging where a single payment-success event is published once and delivered independently to multiple downstream services (inventory, shipping, analytics) via SQS queues or other targets. This decouples the producer from consumer latency—analytics can take its time without blocking order processing—and ensures each consumer processes the event at its own pace, meeting the requirement for independent, non-blocking reactions.

Exam trap

The trap here is that candidates may choose synchronous integration (Option A) because it seems simpler, failing to recognize that the requirement 'must not be blocked' explicitly demands asynchronous decoupling, not just retries.

How to eliminate wrong answers

Option A is wrong because synchronous HTTPS calls with retries tightly couple the producer to all consumers; if analytics has high latency, order processing is blocked waiting for responses, violating the requirement that it must not be blocked. Option C is wrong because storing events in a single relational database table introduces a single point of failure, creates a polling bottleneck, and tightly couples consumers to a shared schema and table, which is not a decoupled, scalable architecture. Option D is wrong because sending events directly via SSH tunnels requires direct network connectivity to each EC2 instance, introduces security risks, and tightly couples the producer to consumer instances, making it brittle and unscalable.

Full explanation →

638

Multi-Selectmedium

A public API is delivered through CloudFront and an Application Load Balancer. The security team wants AWS to automatically block repetitive bursts from the same client IP and also reduce exposure to common web exploits without custom code. Which two AWS WAF features should be enabled? Select two.

Select 2 answers

A.A rate-based rule that blocks clients exceeding a request threshold from the same source IP.

B.An AWS Managed Rules group for common web exploits.

C.ALB sticky sessions for all requests.

D.A security group rule that blocks requests based on HTTP path.

E.CloudFront origin access control for the API endpoint.

AnswersA, B

Rate-based rules are the native WAF feature for detecting and blocking unusually high request rates from the same IP or set of IPs. This helps stop bursts that can indicate abuse or application-layer flooding.

Why this answer

A rate-based rule in AWS WAF automatically tracks request rates per source IP and blocks clients that exceed a configured threshold within a 5-minute evaluation window. This directly addresses the security team's requirement to block repetitive bursts from the same client IP without custom code.

Exam trap

The trap here is that candidates often confuse security group rules (Layer 3/4) with WAF rules (Layer 7), mistakenly thinking a security group can filter HTTP paths or block application-layer attacks.

Full explanation →

639

MCQmedium

An internal worker consumes messages from an Amazon SQS Standard queue. Recently, some messages fail validation in the worker (for example, missing required fields), causing the worker to crash before it can successfully process those messages. Those messages keep getting retried repeatedly, slowing down processing of valid messages. The team wants a resilient mechanism to quarantine bad messages after a limited number of receive attempts. What should they implement?

A.Increase the SQS visibility timeout to several hours so the worker does not retry too quickly.

B.Configure a redrive policy with a Dead-Letter Queue (DLQ) and set maxReceiveCount so poison messages are moved to the DLQ after repeated failures.

C.Switch the queue to an SNS topic and subscribe the worker directly, eliminating message retries.

D.Enable KMS encryption with a new CMK to ensure validation errors stop occurring.

AnswerB

An SQS DLQ with a redrive policy is specifically designed for poison-message handling. When a message exceeds maxReceiveCount without successful processing (for example, the worker crashes before deletion), SQS moves the message to the DLQ. This quarantines bad messages and protects throughput for valid messages.

Why this answer

Option B is correct because Amazon SQS supports configuring a redrive policy with a Dead-Letter Queue (DLQ) that automatically moves messages after a specified number of receive attempts (maxReceiveCount). This isolates poison messages that fail validation and cause crashes, preventing them from being retried indefinitely and slowing down valid message processing. The worker can then focus on valid messages while the DLQ stores the problematic ones for later analysis or manual intervention.

Exam trap

The trap here is that candidates may think increasing the visibility timeout (Option A) solves the retry problem, but it only delays retries without eliminating the root cause, while the DLQ mechanism (Option B) provides a proper quarantine by moving messages after a configurable number of receive attempts.

How to eliminate wrong answers

Option A is wrong because increasing the visibility timeout to several hours would only delay retries, not prevent them; the worker would still crash repeatedly on the same invalid messages after each timeout expires, and valid messages would be blocked for hours. Option C is wrong because switching to an SNS topic eliminates message retries entirely, but the worker would still crash on invalid messages without any retry mechanism or quarantine, and SNS does not provide a built-in DLQ for consumer-side failures. Option D is wrong because enabling KMS encryption with a new CMK addresses data encryption at rest and in transit, but has no effect on message content validation errors or crash handling; encryption does not fix missing required fields or prevent retries.

Full explanation →

640

Multi-Selectmedium

A company is using AWS for big data analytics and wants to optimize costs for its data processing pipeline. Which three design choices will help achieve this goal? (Choose three.)

Select 3 answers

.Use Amazon EMR with Spot Instances for task nodes to reduce compute costs for fault-tolerant jobs.

.Store all intermediate data on Amazon EBS io2 Block Express volumes for high performance.

.Use Amazon S3 for input and output data, and enable S3 Select to reduce the amount of data transferred to processing.

.Provision a fixed number of EC2 instances for processing to avoid scaling delays.

.Use AWS Glue with a scheduled crawler to catalog data, and choose a lower number of Data Processing Units (DPUs) for smaller jobs.

.Store all data in Amazon Redshift Spectrum external tables to avoid loading data into Redshift clusters.

Why this answer

Using Amazon EMR with Spot Instances for task nodes is correct because task nodes are typically used for fault-tolerant, stateless processing in EMR clusters. Spot Instances offer significant cost savings (up to 90% compared to On-Demand) and can be interrupted, but since task nodes can be replaced without affecting job completion, this is a cost-optimized design for big data pipelines. Amazon S3 Select is correct because it allows you to retrieve only a subset of data from an object (e.g., specific columns or rows) using SQL expressions, reducing the amount of data transferred to your processing application and lowering data transfer and processing costs.

Using AWS Glue with a scheduled crawler and choosing a lower number of DPUs for smaller jobs is correct because DPUs (Data Processing Units) are the billing unit for AWS Glue jobs; selecting a lower DPU count for jobs that do not require high parallelism directly reduces costs.

Exam trap

The trap here is that candidates often confuse high-performance storage (like io2 Block Express) with cost optimization, or assume that fixed provisioning avoids scaling costs, when in reality both increase costs in a variable-load big data pipeline.

Full explanation →

641

MCQmedium

A web application for a claims portal is behind an Application Load Balancer. The application must be protected from common SQL injection and cross-site scripting attacks with minimum operational overhead. What should the architect deploy?

A.AWS WAF associated with the Application Load Balancer

B.AWS Shield Advanced only

C.Network ACLs on the public subnets

D.Security groups on the application instances

AnswerA

AWS WAF can inspect HTTP requests and block common web exploits when associated with an ALB.

Why this answer

AWS WAF is a web application firewall that integrates directly with an Application Load Balancer to filter and monitor HTTP/HTTPS requests. It provides managed rules specifically designed to block common attack patterns like SQL injection and cross-site scripting (XSS) with minimal operational overhead, as the rules are pre-configured and automatically updated by AWS.

Exam trap

The trap here is that candidates often confuse network-layer controls (like security groups or NACLs) with application-layer protection, assuming they can block SQL injection or XSS, but these operate at Layer 3/4 and cannot inspect HTTP request bodies or headers for malicious content.

How to eliminate wrong answers

Option B is wrong because AWS Shield Advanced provides DDoS protection, not application-layer attack filtering for SQL injection or XSS. Option C is wrong because Network ACLs operate at the subnet level (Layer 3/4) and cannot inspect application-layer payloads for SQL injection or XSS patterns. Option D is wrong because security groups act as stateful firewalls at the instance level (Layer 3/4) and cannot perform deep packet inspection for application-layer attacks.

Full explanation →

642

MCQmedium

A ticket booking system runs on EC2 instances behind an Application Load Balancer. The design must tolerate the failure of one Availability Zone. What should the Auto Scaling group configuration include? The design must avoid adding custom operational scripts.

A.Subnets in at least two Availability Zones with health checks enabled

B.All instances in one larger subnet

C.A Network Load Balancer in one subnet

D.A single EC2 instance with detailed monitoring

AnswerA

An Auto Scaling group spanning multiple AZs can replace unhealthy instances and maintain capacity during an AZ failure.

Why this answer

Option A is correct because placing subnets in at least two Availability Zones ensures that if one AZ fails, the Auto Scaling group can launch instances in the remaining healthy AZ, maintaining application availability. Health checks integrated with the Application Load Balancer allow the Auto Scaling group to automatically replace unhealthy instances without custom scripts, aligning with the requirement to avoid operational overhead.

Exam trap

The trap here is that candidates often assume a single larger subnet or a Network Load Balancer provides AZ resilience, but they fail to recognize that without multiple subnets in distinct AZs, the architecture cannot survive an AZ failure, and custom scripts would be needed for health checks without ELB integration.

How to eliminate wrong answers

Option B is wrong because placing all instances in one larger subnet confines them to a single Availability Zone, violating the requirement to tolerate the failure of one AZ. Option C is wrong because a Network Load Balancer operates at Layer 4 and does not provide the health check integration needed for Auto Scaling group instance replacement; additionally, placing it in one subnet creates a single point of failure. Option D is wrong because a single EC2 instance, even with detailed monitoring, cannot survive an AZ failure and does not leverage Auto Scaling for automatic recovery.

Full explanation →

643

Multi-Selecthard

A public web application sits behind Amazon CloudFront with an Application Load Balancer as the origin. The security team wants all edge traffic inspected by AWS WAF and also wants to prevent anyone on the internet from reaching the ALB directly. Which two changes should be made? Select two.

Select 2 answers

A.Associate an AWS WAF web ACL with the CloudFront distribution.

B.Restrict the ALB security group inbound rules to the AWS-managed CloudFront origin-facing prefix list.

C.Place the ALB in private subnets and keep the CloudFront distribution unchanged.

D.Use an S3 Origin Access Control instead of a security group change.

E.Open the ALB to 0.0.0.0/0 and rely on WAF alone for protection.

AnswersA, B

CloudFront supports AWS WAF at the edge, so requests can be inspected and filtered before they reach the origin. This placement stops malicious traffic early and applies the protection globally at the distribution layer.

Why this answer

Option A is correct because AWS WAF can be associated directly with a CloudFront distribution to inspect all edge traffic before it reaches the origin. This allows the security team to filter malicious requests at the AWS edge locations, reducing the attack surface and offloading processing from the Application Load Balancer.

Exam trap

The trap here is that candidates often think placing the ALB in private subnets is sufficient, but they forget that CloudFront cannot route traffic to private subnets without a public endpoint or a VPC origin configuration, making option C invalid.

Full explanation →

644

MCQmedium

A company uses an Amazon Aurora DB cluster in a Multi-AZ configuration. During a planned failover of the writer instance, the database endpoints in the application are updated incorrectly. After failover, reads work but writes fail with connection errors and timeouts for several minutes. The team currently uses the instance endpoint for the writer. What should they change to improve write resilience during failovers?

A.Continue using the instance endpoint, but increase application retry count so the writer changes are handled more quickly.

B.Use the Aurora cluster writer endpoint for all write operations.

C.Use a read replica endpoint for writes because it is typically stable across failovers.

D.Disable Multi-AZ failover so the writer instance never changes and writes remain consistent.

AnswerB

Aurora provides a writer endpoint designed specifically for write traffic. During failover, Aurora updates where the writer endpoint points, so the same DNS name continues to resolve to the current writer instance without requiring manual endpoint changes in the application.

Why this answer

The Aurora cluster writer endpoint always points to the current primary (writer) instance, even after a failover. By using this endpoint instead of a static instance endpoint, the application automatically resolves to the new writer without manual updates, eliminating connection errors and timeouts during failover transitions.

Exam trap

The trap here is that candidates confuse the instance endpoint (which is static and tied to a specific instance) with the cluster endpoint (which is dynamic and always points to the current writer), assuming any endpoint will automatically follow failover.

How to eliminate wrong answers

Option A is wrong because increasing the retry count does not fix the root cause—the application is still pointing to the old (now read-only) instance endpoint, so writes will continue to fail until the endpoint is manually corrected. Option C is wrong because read replica endpoints point to read-only instances; writes to a read replica will always fail with an error, regardless of failover state. Option D is wrong because disabling Multi-AZ failover removes high availability entirely, making the database vulnerable to a single point of failure, which contradicts the goal of improving write resilience.

Full explanation →

645

Multi-Selecthard

A claims workflow requires point-in-time recovery and accidental-delete protection for a DynamoDB table. Which two settings should the architect enable? The design must avoid adding custom operational scripts.

Select 2 answers

A.Point-in-time recovery

B.DAX

C.Deletion protection or tightly controlled delete permissions

D.Global secondary indexes

AnswersA, C

PITR allows restoration to a specific second within the supported recovery window.

Why this answer

Point-in-time recovery (PITR) for DynamoDB enables continuous backups with 35-day granularity, allowing restoration to any second within that window. This directly satisfies the point-in-time recovery requirement without custom scripts, as it is a native AWS feature.

Exam trap

The trap here is that candidates often confuse DAX with a data protection feature, but DAX only accelerates reads and has no role in backup or deletion prevention.

Full explanation →

646

MCQmedium

A web application uses an Amazon Aurora DB cluster for a read-heavy workload. The application team needs higher read throughput but cannot change the database schema. They want to avoid blocking writes and are willing to route read traffic separately. What is the most appropriate architecture change?

A.Create Aurora read replicas and route SELECT queries to an Aurora reader endpoint.

B.Scale up the writer instance storage only; read capacity will automatically increase without using a reader endpoint.

C.Move the Aurora cluster to Multi-AZ deployment mode only; read scaling is handled automatically without replicas.

D.Replace the cluster with a single RDS instance because it offers consistent performance for both reads and writes.

AnswerA

Read replicas increase read capacity, and using the Aurora reader endpoint sends read traffic to replicas.

Why this answer

Creating Aurora read replicas and routing SELECT queries to the Aurora reader endpoint is the most appropriate architecture change because Aurora's reader endpoint distributes read traffic across up to 15 low-latency read replicas, providing higher aggregate read throughput without blocking writes. This approach requires no schema changes and allows the application to separate read and write traffic, directly addressing the read-heavy workload requirement.

Exam trap

The trap here is that candidates often confuse Multi-AZ deployment (which provides failover only) with read replica scaling, or mistakenly believe that scaling storage or using a single instance can improve read throughput without schema changes.

How to eliminate wrong answers

Option B is wrong because scaling up the writer instance storage does not increase read throughput; Aurora's read capacity is tied to compute resources (e.g., instance class) and the number of replicas, not storage size. Option C is wrong because Multi-AZ deployment in Aurora is for high availability and failover, not for scaling read throughput; read scaling requires dedicated read replicas with a reader endpoint. Option D is wrong because replacing the cluster with a single RDS instance would eliminate read scaling capabilities and introduce a single point of failure, making it unsuitable for a read-heavy workload that needs higher throughput without blocking writes.

Full explanation →

647

MCQeasy

An S3 bucket stores application logs. After 30 days, the team rarely accesses the logs, but compliance requires keeping them for 18 months. Which setup most directly reduces storage cost while maintaining compliance?

A.Configure an S3 Lifecycle policy to transition objects to a colder storage class after 30 days and expire (delete) them after 18 months.

B.Enable S3 Versioning and rely on deleting old versions after 30 days to reduce storage costs while keeping the latest data.

C.Move the bucket to a different AWS region farther from the users to reduce the likelihood of accidental reads and thereby lower storage costs.

D.Switch all objects to S3 Glacier Instant Retrieval immediately, regardless of object age, to minimize storage charges.

AnswerA

Lifecycle transitions lower the storage cost for older objects, and the expiration at 18 months enforces the compliance retention requirement.

Why this answer

Option A is correct because S3 Lifecycle policies allow you to automatically transition objects to cheaper storage classes (e.g., S3 Standard-IA or S3 Glacier Deep Archive) after 30 days, reducing storage costs for rarely accessed logs. The policy also sets an expiration action to delete objects after 18 months, meeting the compliance requirement without manual intervention.

Exam trap

The trap here is that candidates may think moving to a different region or using versioning reduces costs, but the core concept is that S3 Lifecycle policies directly automate cost optimization by transitioning to colder storage classes and expiring data, which is the most direct and compliant approach.

How to eliminate wrong answers

Option B is wrong because enabling S3 Versioning and deleting old versions does not address the need to keep logs for 18 months; it only manages versions, not the primary objects, and can increase costs due to storing multiple versions. Option C is wrong because moving the bucket to a different region does not reduce storage costs; it may increase data transfer costs and does not change the storage class or lifecycle management. Option D is wrong because switching all objects to S3 Glacier Instant Retrieval immediately, regardless of age, would likely increase costs for frequently accessed logs in the first 30 days, as this storage class has higher retrieval costs and is not optimal for data that is still being accessed.

Full explanation →

648

Multi-Selectmedium

A company is running a production web application on Amazon EC2 instances behind an Application Load Balancer (ALB). The workload has predictable traffic spikes during business hours and low traffic at night. The current architecture uses On-Demand EC2 instances, leading to high costs. The company wants to reduce costs without sacrificing availability or performance. Which three of the following strategies would help achieve this goal? (Choose three.)

Select 3 answers

.Purchase Reserved Instances for the baseline capacity that runs 24/7.

.Add Spot Instances for the entire workload during peak hours.

.Use Auto Scaling with a mixed instances policy that includes On-Demand and Spot Instances.

.Migrate to AWS Lambda for all web application traffic.

.Implement a scheduled scaling action to increase capacity before business hours and decrease after.

.Consolidate all instances into a single larger instance to reduce overhead.

Why this answer

Purchasing Reserved Instances for the baseline 24/7 capacity provides a significant discount (up to 72%) compared to On-Demand pricing, directly reducing costs for the always-running portion of the workload. This strategy is correct because it matches the predictable, steady-state traffic component without sacrificing availability or performance.

Exam trap

The trap here is that candidates may think Spot Instances can be used for the entire workload during peak hours, but they overlook the interruption risk and the requirement for the workload to be fault-tolerant, which a production web application behind an ALB typically is not without careful design.

Full explanation →

649

Multi-Selecthard

A development environment runs a small web app on EC2 and an Amazon RDS database, but it is used only on weekdays during office hours. The team wants to minimize spend and can tolerate a short startup delay after the environment is started. Which two changes should the architect recommend? Select two.

Select 2 answers

A.Stop the EC2 instances outside business hours and start them on a schedule.

B.Replace the database with Aurora Serverless v2 so capacity can scale down during idle periods.

C.Move the app to larger EC2 instances so fewer machines are managed.

D.Keep RDS and EC2 running all weekend because start/stop is operationally risky.

E.Use Spot Instances for the database tier.

AnswersA, B

Scheduled stop and start removes idle compute spend from nights and weekends while preserving the same application architecture.

Why this answer

Option A is correct because stopping EC2 instances outside business hours eliminates compute costs (you are not charged for stopped instances, only for attached EBS volumes). The team tolerates a short startup delay, so a scheduled stop/start (e.g., via AWS Instance Scheduler or Lambda) directly reduces spend without architectural changes.

Exam trap

The trap here is that candidates may think Aurora Serverless v2 is always cheaper than stopping a traditional RDS instance, but they overlook that Serverless v2 still has a minimum ACU charge and storage costs, while stopping an RDS instance eliminates compute cost entirely (only storage and backups are billed).

Full explanation →

650

MCQmedium

A web application for a IoT ingestion API is behind an Application Load Balancer. The application must be protected from common SQL injection and cross-site scripting attacks with minimum operational overhead. What should the architect deploy? The design must avoid adding custom operational scripts.

A.AWS WAF associated with the Application Load Balancer

B.Network ACLs on the public subnets

C.Security groups on the application instances

D.AWS Shield Advanced only

AnswerA

AWS WAF can inspect HTTP requests and block common web exploits when associated with an ALB.

Why this answer

AWS WAF is a web application firewall that helps protect web applications from common web exploits like SQL injection and cross-site scripting (XSS). By associating an AWS WAF web ACL with the Application Load Balancer, you can filter and monitor HTTP(S) requests based on rules that match malicious patterns, with no custom scripts or operational overhead. This is the most efficient and managed solution for the stated requirements.

Exam trap

The trap here is that candidates often confuse network-layer controls (NACLs, security groups) with application-layer protection, or assume Shield Advanced alone covers all web threats, when in fact WAF is specifically designed for Layer 7 attack mitigation like SQLi and XSS.

How to eliminate wrong answers

Option B is wrong because Network ACLs are stateless packet filters that operate at the subnet level (Layer 3/4) and cannot inspect application-layer payloads like SQL or XSS patterns. Option C is wrong because security groups are stateful virtual firewalls that control traffic based on IP addresses, ports, and protocols (Layer 3/4), and they lack the ability to perform deep packet inspection for web application attacks. Option D is wrong because AWS Shield Advanced provides DDoS protection and enhanced detection, but it does not include the rule-based filtering needed to block SQL injection or XSS attacks.

Full explanation →

651

MCQhard

A DynamoDB table for a retail API has a partition key based only on the current date. Write throttling occurs during business hours. What is the best design change?

A.Use a higher-cardinality partition key that distributes writes across partitions

B.Create a global secondary index with the same date key

C.Reduce the table's write capacity

D.Move the table to S3 Glacier Instant Retrieval

AnswerA

A low-cardinality hot partition causes throttling; a better key spreads writes more evenly.

Why this answer

Using a partition key based solely on the current date creates a 'hot partition' because all writes for that day target the same partition, leading to throttling. A higher-cardinality partition key (e.g., combining date with a unique attribute like user ID or order ID) distributes write traffic evenly across multiple partitions, allowing DynamoDB to utilize its full throughput capacity and eliminating throttling.

Exam trap

The trap here is that candidates may think a GSI can solve write throttling, but GSIs only help with read patterns and do not redistribute write load on the base table.

How to eliminate wrong answers

Option B is wrong because creating a global secondary index (GSI) with the same date key does not change the base table's partition key; writes still target the same hot partition, so throttling persists. Option C is wrong because reducing the table's write capacity would worsen throttling, not solve it, as the issue is uneven distribution, not insufficient total capacity. Option D is wrong because S3 Glacier Instant Retrieval is an object storage service for archival data, not a transactional database; it cannot support DynamoDB's low-latency read/write operations or query patterns.

Full explanation →

652

MCQmedium

A document portal requires consistent high IOPS for a transactional database on EC2. Which EBS volume type is most suitable? The architecture review board prefers a managed AWS-native control.

A.sc1 Cold HDD

B.Instance store only

C.Provisioned IOPS SSD such as io2

D.st1 Throughput Optimized HDD

AnswerC

io2 is designed for business-critical workloads requiring consistent high IOPS and durability.

Why this answer

The scenario requires consistent high IOPS for a transactional database, which demands low-latency, predictable performance. Provisioned IOPS SSD (io2) is the only EBS volume type that allows you to specify a guaranteed IOPS rate independent of volume size, making it ideal for latency-sensitive transactional workloads. It is also a managed AWS-native service, satisfying the architecture review board's preference.

Exam trap

The trap here is that candidates may confuse 'high IOPS' with throughput-optimized HDDs (st1) or mistakenly think instance store offers managed persistence, but the key differentiator is the need for consistent, provisioned IOPS and managed durability that only io2 provides.

How to eliminate wrong answers

Option A is wrong because sc1 Cold HDD is designed for infrequently accessed, cold data with low cost, offering burstable throughput but very low IOPS, making it unsuitable for transactional databases requiring consistent high IOPS. Option B is wrong because instance store provides temporary, block-level storage that is physically attached to the host, but it is not managed (data is lost on instance stop/termination) and does not qualify as a managed AWS-native control. Option D is wrong because st1 Throughput Optimized HDD is optimized for large, sequential workloads like big data and log processing, not for random I/O patterns of transactional databases, and it cannot guarantee high IOPS.

Full explanation →

653

MCQhard

A media processing workflow in private subnets downloads large amounts of data from S3 through a NAT gateway. NAT data processing charges are high. What should the architect use to reduce cost? The design must avoid adding custom operational scripts.

A.S3 Object Lambda

B.AWS Shield Advanced

C.Gateway VPC endpoint for Amazon S3

D.A larger NAT gateway

AnswerC

A gateway endpoint routes S3 traffic privately without NAT gateway data processing charges.

Why this answer

A Gateway VPC endpoint for Amazon S3 allows instances in private subnets to access S3 directly over the AWS network without traversing a NAT gateway. This eliminates NAT data processing charges because traffic stays within the AWS backbone, reducing costs significantly for large data downloads.

Exam trap

The trap here is that candidates often confuse Gateway VPC endpoints with Interface VPC endpoints, assuming both incur costs, or mistakenly think NAT gateways are required for all private subnet outbound traffic, missing the S3-specific optimization.

How to eliminate wrong answers

Option A is wrong because S3 Object Lambda is used to transform data on the fly during retrieval, not to reduce data transfer costs or bypass NAT gateways. Option B is wrong because AWS Shield Advanced provides DDoS protection, not cost optimization for S3 data transfer. Option D is wrong because a larger NAT gateway would increase, not reduce, costs due to higher hourly and data processing charges.

Full explanation →

654

MCQhard

A media processing workflow generates analytics files that are accessed unpredictably. Some files become hot again months later. The team wants automatic storage cost optimisation without retrieval delays. What should be used? The design must avoid adding custom operational scripts.

A.S3 Intelligent-Tiering

B.Manual monthly review and object copying

C.S3 Glacier Flexible Retrieval for all files

D.EFS One Zone for analytics files

AnswerA

Intelligent-Tiering automatically moves objects between access tiers based on usage while preserving low-latency access.

Why this answer

S3 Intelligent-Tiering automatically moves objects between access tiers (frequent, infrequent, and archive instant access) based on changing access patterns, with no retrieval delays for hot objects and no operational overhead. This matches the unpredictable access pattern where files become hot again months later, as the service monitors access and adjusts storage class without custom scripts.

Exam trap

The trap here is that candidates may choose S3 Glacier Flexible Retrieval thinking it is the cheapest option, overlooking the retrieval delay requirement and the fact that Intelligent-Tiering provides automatic cost optimisation without latency penalties for unpredictable access patterns.

How to eliminate wrong answers

Option B is wrong because manual monthly review and object copying introduces operational overhead and potential delays, violating the requirement to avoid custom operational scripts and automatic cost optimisation. Option C is wrong because S3 Glacier Flexible Retrieval has retrieval delays (minutes to hours) for files that become hot again, which violates the 'no retrieval delays' requirement. Option D is wrong because EFS One Zone is a file system, not an object storage class, and does not provide automatic tiering or cost optimisation for unpredictable access patterns; it also incurs costs for all data stored regardless of access frequency.

Full explanation →

655

MCQmedium

A trading dashboard stores uploaded documents in S3. The business requires a copy in another AWS Region for disaster recovery. What should be configured? The design must avoid adding custom operational scripts.

A.An EBS snapshot schedule

B.S3 Cross-Region Replication with versioning enabled

C.S3 lifecycle transition to Glacier Flexible Retrieval

D.A CloudFront distribution

AnswerB

CRR asynchronously replicates objects to a bucket in another Region and requires versioning.

Why this answer

S3 Cross-Region Replication (CRR) automatically replicates objects to a destination bucket in a different AWS Region, meeting the disaster recovery requirement without custom scripts. Versioning must be enabled on both source and destination buckets for CRR to function, as it tracks object versions and ensures consistency during replication.

Exam trap

The trap here is that candidates may confuse S3 Cross-Region Replication with S3 lifecycle policies or CloudFront, thinking they provide cross-region replication, but only CRR with versioning enabled meets the DR requirement without custom scripts.

How to eliminate wrong answers

Option A is wrong because EBS snapshots are for Amazon Elastic Block Store volumes attached to EC2 instances, not for S3 objects; they cannot replicate S3 data across regions. Option C is wrong because S3 lifecycle transitions to Glacier Flexible Retrieval only change storage class within the same region for cost optimization, not replicate data to another region. Option D is wrong because CloudFront is a content delivery network (CDN) that caches content at edge locations for low-latency access, not a replication mechanism for disaster recovery across regions.

Full explanation →

656

MCQeasy

A stateless web application runs on Amazon EC2 instances across two Availability Zones. The team wants unhealthy instances to be removed automatically and replaced without manual action. What is the best solution?

A.Place the instances in a single subnet and increase the instance size.

B.Use an Application Load Balancer with an Auto Scaling group and configure health checks.

C.Use a network ACL to detect failed instances and restart them.

D.Store the web servers on EBS volumes so the data survives failures.

AnswerB

An Application Load Balancer distributes traffic across healthy targets, and an Auto Scaling group can replace instances that fail health checks. Together, they provide automatic recovery from instance failure and keep the application available across multiple Availability Zones. This is the standard resilient design for stateless EC2 web tiers.

Why this answer

Option B is correct because an Application Load Balancer (ALB) with an Auto Scaling group provides automated health checks and instance replacement. The ALB performs HTTP/HTTPS health checks against the instances, and when an instance fails the health check, the Auto Scaling group automatically terminates the unhealthy instance and launches a new one to maintain the desired capacity. This ensures the stateless web application remains available across both Availability Zones without manual intervention.

Exam trap

The trap here is that candidates confuse network ACLs (stateless packet filters) with health check mechanisms, or assume that persistent storage (EBS) alone provides high availability without an orchestration layer like Auto Scaling.

How to eliminate wrong answers

Option A is wrong because placing instances in a single subnet and increasing instance size does not provide automated health monitoring or replacement; it only changes the capacity of a single instance and introduces a single point of failure. Option C is wrong because a network ACL is a stateless firewall that filters traffic at the subnet level and cannot detect failed instances or restart them; it has no awareness of application health or instance lifecycle. Option D is wrong because storing web servers on EBS volumes does not automate health checks or replacement; while EBS provides persistent storage, the stateless nature of the application means data persistence is irrelevant to fault tolerance, and manual action would still be required to replace failed instances.

Full explanation →

657

MCQmedium

A media archive requires consistent high IOPS for a transactional database on EC2. Which EBS volume type is most suitable? The team wants the control to be enforceable during normal operations.

A.Provisioned IOPS SSD such as io2

B.st1 Throughput Optimized HDD

C.Instance store only

D.sc1 Cold HDD

AnswerA

io2 is designed for business-critical workloads requiring consistent high IOPS and durability.

Why this answer

The io2 Provisioned IOPS SSD volume type is designed for latency-sensitive transactional database workloads that require consistent high IOPS. It allows you to specify a guaranteed IOPS rate (up to 256,000 IOPS for io2 Block Express) and provides 99.999% durability, making it ideal for enforcing performance control during normal operations.

Exam trap

The trap here is that candidates often confuse throughput-optimized HDD (st1) with IOPS-focused SSD, assuming 'high throughput' implies high IOPS, but st1 is designed for sequential access and cannot provide the low-latency random I/O that transactional databases require.

How to eliminate wrong answers

Option B (st1 Throughput Optimized HDD) is wrong because it is a throughput-optimized HDD volume designed for large, sequential workloads like big data and log processing, not for transactional databases requiring consistent low-latency IOPS. Option C (Instance store only) is wrong because instance store volumes are ephemeral and data is lost on instance stop or termination, making them unsuitable for persistent database storage. Option D (sc1 Cold HDD) is wrong because it is a cold HDD volume optimized for infrequently accessed data with the lowest cost, offering very low IOPS and throughput that cannot meet the demands of a transactional database.

Full explanation →

658

MCQmedium

An order-quote Lambda function is invoked directly by API Gateway. Traffic is predictable during the business day, and the first request after scaling from zero causes unacceptable latency. The team wants to keep the current architecture and reduce cold-start impact. Which configuration should they use?

A.Increase the function timeout so the first invocation has more time to finish.

B.Enable provisioned concurrency for the Lambda function.

C.Set reserved concurrency to a fixed number and leave the rest unchanged.

D.Increase the memory size only to eliminate cold starts.

AnswerB

Provisioned concurrency keeps a set number of Lambda execution environments initialized and ready to serve traffic. That directly reduces or removes cold starts for predictable workloads such as business-hours APIs. It is the most appropriate choice when the team wants to preserve serverless architecture while delivering consistent response times for the first request and subsequent requests.

Why this answer

Provisioned concurrency initializes a specified number of execution environments in advance, so when the first request arrives after scaling from zero, it is served by a pre-warmed instance instead of incurring a cold start. This directly addresses the unacceptable latency without changing the architecture or requiring code modifications.

Exam trap

The trap here is that candidates confuse reserved concurrency (which caps concurrent executions) with provisioned concurrency (which pre-warms instances), or mistakenly believe that increasing memory or timeout can eliminate the cold-start initialization delay.

How to eliminate wrong answers

Option A is wrong because increasing the function timeout does not prevent cold starts; it only allows the invocation to run longer, but the initial cold-start delay (e.g., loading runtime, initializing code) still occurs. Option C is wrong because reserved concurrency limits the maximum number of concurrent executions but does not pre-warm instances; it can even worsen cold starts by capping available capacity. Option D is wrong because increasing memory size can reduce cold-start duration (since CPU allocation scales with memory) but does not eliminate cold starts entirely; the first invocation after idle still requires initialization.

Full explanation →

659

MCQmedium

An order processing workflow uses Amazon SQS as the decoupling layer between a producer and a consumer Lambda function. The consumer intermittently fails due to a downstream dependency. The team has observed that certain “poison” messages keep being retried repeatedly and prevent other messages from being processed efficiently. Which SQS configuration most directly addresses this issue?

A.Set the SQS queue’s retention period to 10 years and rely on application retries to eventually succeed.

B.Increase visibility timeout to a very large value and avoid dead-letter queues to keep ordering stable.

C.Configure a redrive policy with a dead-letter queue (DLQ) and set an appropriate visibility timeout greater than the maximum processing time.

D.Switch the queue to FIFO and remove retries in the Lambda event source mapping entirely.

AnswerC

A DLQ isolates poison messages after a receive count threshold, and correct visibility timeout prevents premature retries.

Why this answer

Option C is correct because configuring a redrive policy with a dead-letter queue (DLQ) allows messages that repeatedly fail processing to be moved out of the main queue after a specified number of receive attempts. Setting an appropriate visibility timeout greater than the maximum processing time ensures that messages are not made visible again before the consumer finishes processing, preventing premature retries. This directly isolates poison messages so they no longer block the processing of other messages in the queue.

Exam trap

The trap here is that candidates may think increasing visibility timeout or switching to FIFO alone will handle failed messages, but without a DLQ, poison messages remain in the queue and continue to block other messages, which is the core issue described.

How to eliminate wrong answers

Option A is wrong because setting the retention period to 10 years does not address the repeated retry of poison messages; it only keeps messages longer, allowing them to continue blocking the queue. Option B is wrong because increasing visibility timeout to a very large value without a DLQ means poison messages will remain in the queue indefinitely, still preventing other messages from being processed efficiently. Option D is wrong because switching to FIFO does not solve the poison message problem; FIFO queues still require a DLQ for handling failed messages, and removing retries entirely would cause messages to be lost permanently without any recovery mechanism.

Full explanation →

660

MCQeasy

Based on the exhibit, the team wants to improve application performance without changing the code. Which EC2 instance family should they choose next?

A.Choose a compute-optimized instance family such as C6i to increase CPU performance.

B.Choose a memory-optimized instance family such as R6i to provide more RAM.

C.Choose a storage-optimized instance family such as I4i to improve block storage throughput.

D.Choose a burstable instance family such as T3 to reduce cost and improve performance.

AnswerB

Memory-optimized instances are the best fit when memory pressure is causing slowdowns. The exhibit shows CPU is low while memory is consistently near saturation, which strongly suggests the application needs more RAM rather than more compute. Moving to an R6i family should reduce paging and improve response times without changing the application design.

Why this answer

The exhibit shows a memory-constrained application (e.g., high cache miss rates or swap usage) where performance is bottlenecked by insufficient RAM. Choosing a memory-optimized instance family like R6i provides more RAM per vCPU, allowing the application to keep more data in memory and reduce disk I/O, directly improving performance without code changes.

Exam trap

The trap here is that candidates assume 'improve performance' always means faster CPU, but the exhibit's memory pressure metric (e.g., high swap usage or cache miss rate) directly points to RAM as the bottleneck, making memory-optimized instances the correct choice.

How to eliminate wrong answers

Option A is wrong because compute-optimized instances (C6i) increase CPU performance, but the exhibit indicates the bottleneck is memory, not CPU; adding CPU power won't reduce swap or cache misses. Option C is wrong because storage-optimized instances (I4i) improve block storage throughput, which helps I/O-bound workloads, but the exhibit shows memory pressure, not storage latency or throughput issues. Option D is wrong because burstable instances (T3) are designed for variable workloads with CPU credits and actually have less consistent performance and lower baseline RAM, which could worsen memory-constrained applications.

Full explanation →

661

MCQeasy

Your application runs in private subnets with no NAT gateway. It needs to call AWS Secrets Manager to retrieve secrets. For private connectivity without internet egress, which VPC endpoint type should you create for AWS Secrets Manager?

A.An Interface VPC endpoint (AWS PrivateLink) for secretsmanager in your Region

B.A Gateway VPC endpoint for secretsmanager

C.A NAT gateway in the private subnet route table

D.A VPC peering connection to the AWS public network hosting Secrets Manager

AnswerA

Secrets Manager supports Interface VPC endpoints. An interface endpoint provides private connectivity from subnets to the Secrets Manager API without traversing the public internet.

Why this answer

An Interface VPC endpoint (AWS PrivateLink) creates an elastic network interface in your subnet with a private IP address, allowing your instances to communicate with AWS Secrets Manager over the AWS network without traversing the internet. Since your application runs in private subnets with no NAT gateway, this is the only supported endpoint type for Secrets Manager, as Gateway endpoints are only available for S3 and DynamoDB.

Exam trap

The trap here is that candidates often confuse Gateway endpoints (which are free and only for S3/DynamoDB) with Interface endpoints (which incur hourly charges and support many services like Secrets Manager), leading them to incorrectly select option B.

How to eliminate wrong answers

Option B is wrong because Gateway VPC endpoints are only supported for Amazon S3 and DynamoDB, not for AWS Secrets Manager. Option C is wrong because a NAT gateway requires an internet gateway and public subnet, and the question explicitly states there is no NAT gateway and no internet egress allowed. Option D is wrong because VPC peering connects two VPCs within the AWS network, not to the AWS public network hosting Secrets Manager; Secrets Manager is accessed via service endpoints, not through peering.

Full explanation →

662

MCQmedium

A dev sandbox has unpredictable DynamoDB traffic with long idle periods and occasional spikes. Which capacity mode should minimize operational overhead and avoid paying for idle provisioned capacity?

A.Reserved capacity for maximum daily traffic

B.Provisioned capacity set for peak traffic

C.DynamoDB on-demand capacity mode

D.Global tables in every Region

AnswerC

On-demand capacity is suitable for unpredictable workloads and charges per request without capacity planning.

Why this answer

DynamoDB on-demand capacity mode (Option C) is ideal for unpredictable workloads with long idle periods and occasional spikes because it automatically scales to handle traffic without requiring any capacity planning. You pay only for the reads and writes you actually perform, eliminating the cost of idle provisioned capacity and the operational overhead of managing scaling.

Exam trap

The trap here is that candidates confuse 'reserved capacity' (an EC2/RDS concept) with DynamoDB capacity modes, or think that provisioned capacity with auto-scaling is always cheaper, ignoring the cost of idle capacity during long idle periods.

How to eliminate wrong answers

Option A is wrong because Reserved capacity is not a DynamoDB capacity mode; it is a pricing model for EC2 or RDS, and DynamoDB does not offer reserved capacity. Option B is wrong because Provisioned capacity set for peak traffic would incur costs for idle capacity during long idle periods, and you would still need to manually adjust or use auto-scaling to handle spikes, increasing operational overhead. Option D is wrong because Global tables are a replication feature for multi-Region active-active setups, not a capacity mode, and they do not address cost or overhead from idle capacity or traffic spikes.

Full explanation →

663

Multi-Selecthard

A serverless checkout API runs on AWS Lambda behind API Gateway. Traffic spikes are predictable every weekday at 09:00 UTC, and p95 latency jumps for the first few minutes after each deployment because execution environments are cold. The team wants to reduce this startup impact without changing the API contract. Which changes should they make? Select three.

Select 3 answers

A.Configure provisioned concurrency on the production Lambda alias during the busy windows.

B.Initialize SDK clients and other reusable objects outside the handler so they are created once per execution environment.

C.Reduce the deployment package size and remove unnecessary layers to shorten function initialization.

D.Replace provisioned concurrency with reserved concurrency because reserved concurrency keeps instances warm.

E.Increase the function timeout so the first request has more time to warm up.

AnswersA, B, C

Correct. Provisioned concurrency keeps a pool of pre-initialized execution environments ready to handle invocations, which directly reduces cold-start latency. Using an alias allows the team to manage production traffic separately from development or canary versions and to schedule capacity for the predictable weekday peak.

Why this answer

Provisioned concurrency initializes a specified number of execution environments in advance, so when traffic spikes at 09:00 UTC, the Lambda function is already warm and can serve requests without cold start latency. This directly addresses the p95 latency jump after deployment without altering the API contract.

Exam trap

The trap here is confusing reserved concurrency (which only limits concurrency) with provisioned concurrency (which pre-warms instances), leading candidates to incorrectly select reserved concurrency as a solution for cold starts.

Full explanation →

664

MCQmedium

A production internal reporting portal runs continuously on EC2 with predictable usage for the next three years. The team wants a discount while retaining some instance-family flexibility. What should they buy? The design must avoid adding custom operational scripts.

A.Spot Instances only

B.Dedicated Instances

C.Compute Savings Plan

D.S3 Intelligent-Tiering

AnswerC

Compute Savings Plans provide discounts for a committed spend while allowing flexibility across instance families, sizes, Regions, and compute services.

Why this answer

Compute Savings Plans offer the lowest prices on EC2 instance usage (up to 66% off On-Demand) in exchange for a 1- or 3-year commitment, while allowing flexibility across instance families, sizes, OS, and regions. This matches the requirement for a discount on predictable, continuous usage without locking into a specific instance type, and requires no custom scripts.

Exam trap

The trap here is that candidates confuse Savings Plans with Reserved Instances, assuming they require instance-family lock-in, or they incorrectly apply storage services (S3 Intelligent-Tiering) to compute cost optimization.

How to eliminate wrong answers

Option A is wrong because Spot Instances are not suitable for a continuously running production portal; they can be interrupted with a 2-minute warning, making them unreliable for steady-state workloads. Option B is wrong because Dedicated Instances provide physical isolation at a higher cost and do not offer a discount mechanism; they are for compliance or licensing needs, not cost savings. Option D is wrong because S3 Intelligent-Tiering is an object storage class for data with changing access patterns, not applicable to EC2 compute instances.

Full explanation →

665

MCQmedium

A website serves mostly cacheable images, CSS, and JavaScript from an ALB. Users in Europe and Asia report slower page loads, and the ALB receives far more requests than expected. The team also wants text assets compressed automatically. Which change is the best first step?

A.Increase the ALB size and add more target instances behind it.

B.Use Route 53 latency-based routing to send users to the nearest ALB.

C.Place Amazon CloudFront in front of the ALB and enable compression and caching.

D.Replace the ALB with an NLB to reduce latency for web requests.

AnswerC

CloudFront is the right choice because it caches static content at edge locations close to users, reducing latency and lowering the number of requests that reach the ALB. It also supports compression for text-based assets such as CSS, JavaScript, and HTML. This improves both performance and origin offload without changing the application logic.

Why this answer

CloudFront is the correct first step because it acts as a CDN that caches cacheable content (images, CSS, JS) at edge locations close to users in Europe and Asia, reducing load on the ALB and improving page load times. It also supports automatic compression of text assets (e.g., via gzip or Brotli) without requiring backend changes, directly addressing the team's requirement for compressed text assets. By offloading requests from the ALB, CloudFront reduces the number of requests hitting the origin, solving the 'far more requests than expected' issue.

Exam trap

The trap here is that candidates often think scaling the ALB (Option A) or using latency-based routing (Option B) will solve performance issues, but they overlook that caching and compression at the edge (CloudFront) directly address both latency and request volume without requiring backend changes.

How to eliminate wrong answers

Option A is wrong because increasing ALB size and adding instances only addresses capacity, not the root causes of high latency for distant users or the unexpectedly high request volume; it does not cache content or compress text assets automatically. Option B is wrong because Route 53 latency-based routing still sends all requests to an ALB, which does not cache content or compress text assets; it also does not reduce the number of requests hitting the ALB, as each user request still reaches the origin. Option D is wrong because replacing the ALB with an NLB does not provide caching, compression, or any application-layer features; NLBs operate at Layer 4 and cannot inspect or cache HTTP content, nor can they compress text assets.

Full explanation →

666

MCQmedium

A company uses IAM permission boundaries to prevent developers from escalating privileges. The security team created a permission boundary that allows only read-only actions on most AWS services, but teams can still manage their own resources. A developer can create an IAM role with broad permissions, and the boundary does not appear to be restricting it. Which corrective action best aligns with how permission boundaries work?

A.Rely on an AWS-managed policy attached to the developer’s IAM user; permission boundaries only apply to users.

B.Ensure the role creation process sets the permission boundary on the new role, using the boundary’s ARN in the CreateRole call or role template.

C.Attach the permission boundary policy as an SCP in AWS Organizations so it automatically applies to all roles.

D.Grant the developer IAM permissions to add a “deny” statement to the boundary policy so the boundary blocks escalation.

AnswerB

Permission boundaries are evaluated based on the boundary attached to the principal/role being created or used. If a developer creates roles without specifying the boundary, the boundary won’t restrict the resulting permissions. Enforcing boundary attachment via role templates or required parameters ensures every created role is constrained.

Why this answer

Permission boundaries must be explicitly set on a role during its creation (via the `CreateRole` API call or an infrastructure-as-code template). Without specifying the boundary ARN in the role creation request, the role inherits no boundary, allowing the developer to attach broad permissions that exceed the intended restriction. Option B correctly identifies that the developer’s role creation process must include the boundary ARN to enforce the limitation.

Exam trap

The trap here is that candidates assume permission boundaries are automatically inherited or enforced globally, when in fact they must be explicitly applied to each role during creation, and SCPs are a separate mechanism that does not replace boundaries.

How to eliminate wrong answers

Option A is wrong because permission boundaries apply to IAM roles and users, not just users; the boundary can be attached to a role to limit its effective permissions. Option C is wrong because SCPs (Service Control Policies) are organization-level policies that apply to all accounts and principals, but they do not replace the need for a permission boundary on the role itself; SCPs and boundaries work at different layers and are not interchangeable. Option D is wrong because a developer cannot modify the boundary policy itself—the boundary is a separate policy document that only the security team can edit; adding a deny statement to the boundary would not block escalation because the boundary already limits permissions, and the developer lacks permission to alter it.

Full explanation →

667

MCQeasy

A system processes events from Amazon SQS and sometimes sees duplicate messages due to retries. The business requirement is that each payment must be charged at most once. What design choice best addresses this resiliency requirement?

A.Assume duplicates never occur because the consumer deletes messages immediately after receiving them.

B.Implement idempotent processing using a deduplication key (for example, paymentId) and record completed charges so duplicates are safely ignored.

C.Increase the SQS visibility timeout until duplicates never happen.

D.Use SNS topics instead of SQS so retries are disabled by default.

AnswerB

Idempotency ensures at-most-once side effects even when duplicates are delivered. Persist a record keyed by paymentId (e.g., a unique constraint/conditional write). If the record indicates the payment was already charged, skip the charge for any subsequent duplicate message.

Why this answer

Option B is correct because implementing idempotent processing with a deduplication key (e.g., paymentId) ensures that even if duplicate messages arrive from SQS (due to retries or at-least-once delivery), the consumer can check a record of completed charges and safely ignore duplicates. This satisfies the business requirement of charging each payment at most once without relying on SQS’s best-effort deduplication or message ordering.

Exam trap

The trap here is that candidates assume SQS guarantees exactly-once delivery or that increasing visibility timeouts can prevent duplicates, but SQS is designed for at-least-once delivery, and the only reliable way to handle duplicates is to make the consumer idempotent.

How to eliminate wrong answers

Option A is wrong because SQS provides at-least-once delivery, and deleting a message immediately after receiving it does not prevent duplicates that may arrive before the delete is processed or due to visibility timeout expiration; assuming duplicates never occur violates the fundamental reliability guarantee of SQS. Option C is wrong because increasing the visibility timeout cannot eliminate duplicates; it only delays the redelivery of unacknowledged messages, and duplicates can still occur due to network retries, consumer crashes, or SQS’s internal replication. Option D is wrong because SNS topics do not disable retries by default; SNS uses at-least-once delivery and can retry HTTP/S endpoints, and switching to SNS does not solve the duplicate problem—it may even introduce additional delivery attempts without built-in deduplication.

Full explanation →

668

MCQmedium

A latency-sensitive API is implemented with AWS Lambda. The team enabled provisioned concurrency to avoid cold starts, setting provisioned concurrency to 50 because marketing campaigns occasionally cause spikes. However, during most weekdays the API receives little traffic (near zero), and the team is seeing high monthly Lambda costs from idle provisioned capacity. What is the best cost-optimized strategy that still meets the requirement of fast initial responses during traffic spikes?

A.Increase provisioned concurrency to 100 so that cold starts never occur, regardless of traffic patterns.

B.Use Application Auto Scaling scheduled actions to increase provisioned concurrency on the Lambda alias before campaign windows and reduce it to a minimal baseline afterward.

C.Turn provisioned concurrency off permanently and rely on retries at the client side to mask cold starts.

D.Replace Lambda with a single always-on EC2 instance sized for peak demand to eliminate cold starts.

AnswerB

Provisioned concurrency is billed while allocated, even when idle. Scheduling higher provisioned concurrency only during known spike windows reduces idle cost while preserving fast startup behavior during campaigns.

Why this answer

Option B is correct because it uses Application Auto Scaling scheduled actions to dynamically adjust provisioned concurrency, scaling up to 50 before marketing campaigns and reducing to a minimal baseline (e.g., 1-5) during low-traffic weekdays. This eliminates idle capacity costs while ensuring fast initial responses during spikes, as provisioned concurrency keeps Lambda environments warm and ready to handle requests without cold starts.

Exam trap

The trap here is that candidates may assume provisioned concurrency must be set to a static high value to handle spikes, ignoring AWS's native Auto Scaling capabilities that can dynamically adjust capacity based on schedule or metrics, thus missing the cost-optimization aspect of the question.

How to eliminate wrong answers

Option A is wrong because increasing provisioned concurrency to 100 would double the idle capacity cost during low-traffic periods, exacerbating the cost issue without addressing the root problem of over-provisioning. Option C is wrong because turning off provisioned concurrency permanently would cause cold starts on every invocation during traffic spikes, violating the latency-sensitive requirement; client-side retries do not mask the initial latency of a cold start (typically 1-5 seconds for Lambda). Option D is wrong because replacing Lambda with a single always-on EC2 instance sized for peak demand would incur higher costs (24/7 compute) and eliminate the serverless benefits of automatic scaling and pay-per-use, while still risking performance degradation if the single instance is overwhelmed.

Full explanation →

669

MCQeasy

A company hosts a web application on Amazon EC2 instances in an Auto Scaling group behind an Application Load Balancer (ALB). The ALB and the Auto Scaling group are currently deployed in only one Availability Zone (AZ). The business wants the application to keep running if that AZ has an outage. What is the best change?

A.Increase the desired capacity in the existing Availability Zone to handle all traffic during an outage.

B.Deploy the ALB and the Auto Scaling group across at least two Availability Zones so healthy targets remain.

C.Enable longer ALB health check intervals so failing instances are detected more slowly.

D.Switch from the ALB to an Internet Gateway so instances can fail over to the public internet.

AnswerB

To tolerate an AZ outage, both the load-balancing entry point (the ALB) and the compute capacity (the Auto Scaling instances) must be available in more than one AZ. With the ALB in multiple AZs and the Auto Scaling group using multiple subnets/AZs, requests can be routed to healthy targets in a remaining AZ while Auto Scaling replaces unhealthy instances.

Why this answer

Deploying the ALB and Auto Scaling group across at least two Availability Zones (AZs) ensures that if one AZ fails, the ALB can route traffic to healthy EC2 instances in the remaining AZ(s). This is the fundamental AWS best practice for high availability: an ALB is a regional service that requires targets in multiple AZs to survive an AZ outage, and the Auto Scaling group must also span those AZs to maintain capacity. Without multi-AZ deployment, a single AZ failure makes the entire application unavailable regardless of instance health checks.

Exam trap

The trap here is that candidates think increasing capacity or adjusting health check intervals can compensate for a single-AZ deployment, but AWS high availability fundamentally requires distributing resources across multiple isolated failure domains (AZs).

How to eliminate wrong answers

Option A is wrong because increasing the desired capacity in a single AZ does not protect against an AZ outage; all instances are in the same failure domain, so they all become unreachable simultaneously. Option C is wrong because enabling longer health check intervals would delay detection of failing instances, making the application less responsive to failures and increasing downtime, not improving availability. Option D is wrong because an Internet Gateway (IGW) is a VPC component that enables outbound internet access for instances, not a load balancer; it cannot perform health checks, distribute traffic, or fail over traffic between instances, and it does not replace the ALB's role in high availability.

Full explanation →

670

Multi-Selectmedium

A containerized service on Amazon ECS connects to a database with a password that must never be stored in plaintext or hardcoded in the image. The application reads the password at startup and occasionally reconnects later, so it needs to retrieve the current secret when needed. Which three actions should the architect take? Select three.

Select 3 answers

A.Store the database password in AWS Secrets Manager.

B.Have the application retrieve the secret from Secrets Manager at runtime when it needs the password.

C.Grant the ECS task role least-privilege permission to read only that secret.

D.Store the password in a plain environment variable and update it manually during maintenance windows.

E.Use an IAM user access key inside the container so the database password can be embedded in code.

AnswersA, B, C

Secrets Manager is designed for sensitive credentials and integrates with IAM and rotation features. It is a better fit than putting passwords in code, images, or plain variables.

Why this answer

AWS Secrets Manager is the correct service for storing sensitive data like database passwords because it provides encryption at rest (using AWS KMS) and automatic rotation capabilities. By storing the password in Secrets Manager, the architect ensures it is never exposed in plaintext or hardcoded in the container image, meeting the security requirement.

Exam trap

The trap here is that candidates might think environment variables or IAM access keys are acceptable for secrets, but the exam requires using a dedicated secrets management service like Secrets Manager to avoid plaintext exposure and enable rotation.

Full explanation →

671

MCQmedium

A telemetry pipeline uses an Application Load Balancer in one Region. Global users need lower network latency to the application without caching dynamic responses. What should be considered? The design must avoid adding custom operational scripts.

A.AWS Global Accelerator

B.S3 Cross-Region Replication

C.CloudFront only with long TTLs

D.AWS Backup cross-Region copy

AnswerA

Global Accelerator routes traffic over the AWS global network to improve performance for TCP/UDP applications without relying on caching.

Why this answer

AWS Global Accelerator uses the AWS global network to route traffic from global users to the Application Load Balancer, reducing latency and jitter by leveraging Anycast IP addresses and edge locations. It does not cache responses, making it ideal for dynamic content where low latency is required without custom operational scripts.

Exam trap

The trap here is that candidates often confuse CloudFront with Global Accelerator, assuming CloudFront can reduce latency for dynamic content without caching, but CloudFront inherently caches content at edge locations and requires custom origin headers or Lambda@Edge to bypass caching, which violates the 'no custom operational scripts' constraint.

How to eliminate wrong answers

Option B (S3 Cross-Region Replication) is wrong because it replicates objects across S3 buckets in different Regions, which does not reduce network latency for dynamic application traffic and is unrelated to ALB routing. Option C (CloudFront only with long TTLs) is wrong because CloudFront caches responses at edge locations, and long TTLs would serve stale dynamic content, contradicting the requirement to avoid caching dynamic responses. Option D (AWS Backup cross-Region copy) is wrong because it is a backup and disaster recovery service that copies backup data across Regions, not a solution for reducing network latency to an application endpoint.

Full explanation →

672

Multi-Selectmedium

A containerized service runs in private subnets and retrieves secrets from AWS Secrets Manager and configuration parameters from AWS Systems Manager Parameter Store on startup. A NAT Gateway is currently used only for these AWS API calls, and the security team wants to eliminate that recurring charge. Which two endpoints should be added? Select two.

Select 2 answers

A.Create an interface VPC endpoint for AWS Secrets Manager.

B.Create an interface VPC endpoint for AWS Systems Manager.

C.Create a gateway VPC endpoint for Amazon S3 instead.

D.Add an Internet Gateway and send the traffic through public subnets.

E.Replace the NAT Gateway with a NAT instance.

AnswersA, B

Secrets Manager uses an interface endpoint in a private-subnet design. That keeps startup traffic off the NAT Gateway while still letting the service retrieve secrets privately over the AWS network.

Why this answer

Option A is correct because AWS Secrets Manager is accessed via API calls over HTTPS, and an interface VPC endpoint (powered by AWS PrivateLink) allows private connectivity to the service without traversing the internet or a NAT Gateway. This eliminates the need for the NAT Gateway for Secrets Manager traffic, reducing costs and improving security by keeping traffic within the AWS network.

Exam trap

The trap here is that candidates often confuse gateway VPC endpoints (for S3/DynamoDB) with interface VPC endpoints (for most other AWS services), leading them to incorrectly select option C instead of recognizing that both Secrets Manager and Systems Manager require interface endpoints.

Full explanation →

673

MCQmedium

Account A has an IAM role named FinanceDataRole that is assumed by a principal in Account B. The role’s trust policy includes a condition requiring sts:ExternalId to equal "Fin-2026-Q2". A developer in Account B calls AssumeRole but receives an error: AccessDenied: ExternalId mismatch. The security team requires that you do not remove the ExternalId condition. What is the correct remediation?

A.Add kms:Decrypt to the developer’s IAM policy so KMS can validate the ExternalId during AssumeRole.

B.Update the AssumeRole call in Account B to include sts:ExternalId="Fin-2026-Q2" exactly as required.

C.Increase the role’s MaxSessionDuration to reduce authentication failures.

D.Remove the ExternalId condition from the trust policy to allow all AssumeRole requests.

AnswerB

The trust policy explicitly checks the ExternalId value provided in the AssumeRole request. Supplying the exact required ExternalId satisfies the condition and allows STS to issue credentials for the role.

Why this answer

The error 'AccessDenied: ExternalId mismatch' occurs because the AssumeRole API call from Account B does not include the required sts:ExternalId parameter. The trust policy on the FinanceDataRole explicitly requires this parameter to match 'Fin-2026-Q2' as a security measure to prevent the confused deputy problem. Option B is correct because the developer must pass the exact ExternalId value in the AssumeRole request to satisfy the condition and successfully assume the role.

Exam trap

The trap here is that candidates may think the ExternalId is automatically passed or that the error is due to permission issues (like KMS or session duration), when in fact the developer must explicitly include the correct ExternalId in the AssumeRole API call.

How to eliminate wrong answers

Option A is wrong because KMS is not involved in validating ExternalId during AssumeRole; the ExternalId check is performed by the AWS STS service based on the role's trust policy, not by KMS. Option C is wrong because MaxSessionDuration controls the maximum session length for an assumed role, not authentication failures related to ExternalId mismatches. Option D is wrong because the security team explicitly requires that the ExternalId condition not be removed, and removing it would weaken security by eliminating the confused deputy protection.

Full explanation →

674

MCQeasy

A company has a primary application in us-east-1 and a standby environment in us-west-2. Users should go to the primary site while it is healthy and automatically switch to the standby site if the primary fails. Which Route 53 routing policy should they use?

A.Weighted routing

B.Failover routing with health checks

C.Geolocation routing

D.Latency-based routing

AnswerB

Route 53 failover routing is designed for active-passive resilience. You define a primary record and a secondary record, then attach health checks so DNS answers shift to the standby when the primary is unhealthy. This provides a simple disaster recovery pattern for user-facing endpoints without requiring application-level traffic management.

Why this answer

Failover routing with health checks is the correct choice because it allows you to configure an active-passive failover pattern where Route 53 directs traffic to the primary resource (us-east-1) as long as it passes a health check. If the health check fails, Route 53 automatically routes traffic to the secondary resource (us-west-2), ensuring high availability without manual intervention.

Exam trap

The trap here is that candidates often confuse failover routing with latency-based routing, thinking that latency routing will automatically switch to a healthy region, but latency routing only optimizes for speed and does not consider health status unless combined with health checks, which is not its primary purpose.

How to eliminate wrong answers

Option A is wrong because weighted routing distributes traffic across multiple resources based on assigned weights, which is used for load balancing or testing, not for automatic failover when a primary site fails. Option C is wrong because geolocation routing directs traffic based on the geographic location of the user, which is designed for content localization or regional restrictions, not for active-passive failover. Option D is wrong because latency-based routing sends traffic to the resource with the lowest latency for the user, which optimizes performance but does not provide automatic failover when a primary resource becomes unhealthy.

Full explanation →

675

MCQhard

Based on the exhibit, an application runs on Amazon Aurora MySQL. The writer instance is frequently near 85% CPU while the reader instance is under 20% CPU. Application traces show that most of the database traffic is read-only SELECT queries, but the code currently sends all queries to the writer endpoint. What should the solutions architect recommend to improve performance with the smallest functional change?

A.Increase the writer instance size and keep all traffic on the writer endpoint.

B.Point read-only database traffic to the Aurora reader endpoint and keep writes on the writer endpoint.

C.Convert the cluster to a Multi-AZ RDS PostgreSQL deployment to get automatic failover and better read performance.

D.Enable cross-Region read replicas so SELECT queries are routed to a remote Region for improved performance.

AnswerB

This directly uses the cluster’s read scale-out capability. The reader endpoint distributes read traffic across replicas, reducing load on the writer and increasing read throughput without changing schema or database engine.

Why this answer

Option B is correct because the Aurora reader endpoint distributes read-only traffic across all available reader instances, offloading the writer instance and reducing its CPU utilization. Since the application traces show most traffic is read-only SELECT queries, this change requires only modifying the connection string for reads while keeping writes on the writer endpoint, making it the smallest functional change.

Exam trap

The trap here is that candidates may think increasing instance size (Option A) is the simplest fix, but they overlook the fact that Aurora's architecture is designed to offload reads to reader instances, which is a more cost-effective and scalable solution with minimal code change.

How to eliminate wrong answers

Option A is wrong because increasing the writer instance size does not address the root cause—the writer is overloaded with read traffic that could be handled by readers—and it incurs higher cost without leveraging Aurora's built-in read scaling. Option C is wrong because converting to RDS PostgreSQL Multi-AZ does not provide the same read scaling as Aurora readers; Multi-AZ only provides a standby for failover, not active read offloading, and it requires a full migration. Option D is wrong because cross-Region read replicas introduce significant latency for read queries and are intended for disaster recovery or global read scaling, not for reducing CPU on the local writer instance.

Full explanation →

SAA-C03 (SAA-C03) — Questions 601–675