SAA-C03 SAA-C03 Questions 376–450 | Page 6/14

376

Multi-Selecthard

A internal reporting portal has old unattached EBS volumes and many stale snapshots. Which two actions reduce storage cost without affecting running instances? The architecture review board prefers a managed AWS-native control.

Select 2 answers

A.Disable CloudTrail logging

B.Stop all EC2 instances in the account

C.Delete unattached EBS volumes after verifying they are no longer needed

D.Apply snapshot lifecycle policies to expire obsolete snapshots

AnswersC, D

Unattached volumes continue to incur charges until deleted.

Why this answer

Option C is correct because unattached EBS volumes incur storage costs without providing any benefit to running instances. Deleting them after verification directly reduces costs while having zero impact on running workloads. Option D is correct because snapshot lifecycle policies automate the deletion of obsolete snapshots based on age or count, eliminating manual cleanup and reducing storage costs without affecting running instances.

Exam trap

The trap here is that candidates may confuse stopping instances (which stops billing for instance hours but not for EBS storage) with a cost-saving measure, or think disabling logging reduces storage costs, when the actual savings come from removing orphaned storage resources.

Full explanation →

377

MCQhard

Based on the exhibit, the company stores application logs in Amazon S3 for 400 days. The logs are read heavily for the first 30 days, occasionally for the next 90 days, and very rarely after that. Retrieval after day 120 can take up to several hours, but the data must remain available until day 400. Which lifecycle policy is the most cost-effective fit?

A.Keep all logs in S3 Standard for 400 days and enable requester pays to reduce the company's bill.

B.Transition logs to S3 Standard-IA after 30 days, then to S3 Glacier Flexible Retrieval after 120 days, and expire them at 400 days.

C.Transition logs directly from S3 Standard to S3 Glacier Deep Archive after 30 days and expire them at 400 days.

D.Move logs to S3 Intelligent-Tiering only and disable lifecycle transitions because access is unpredictable.

AnswerB

This follows the access pattern and the retrieval-time requirement. S3 Standard fits the heavy-read period in the first 30 days. Standard-IA is a lower-cost choice for the next 90 days when access is only occasional, and Glacier Flexible Retrieval is appropriate after day 120 because the logs are rarely read and can tolerate retrieval in hours. Expiration at day 400 satisfies the retention requirement exactly.

Why this answer

Option B is correct because it aligns the storage class transitions with the access patterns: S3 Standard for the first 30 days (heavy reads), S3 Standard-IA for the next 90 days (occasional reads), and S3 Glacier Flexible Retrieval for the remaining period (rare access, with retrieval up to several hours acceptable). This minimizes storage costs while ensuring data availability until day 400, where lifecycle expiration removes the objects.

Exam trap

The trap here is that candidates may choose Option C (S3 Glacier Deep Archive) because it is the cheapest storage class, but they overlook the occasional access requirement between days 30 and 120 and the retrieval time constraints, which make S3 Glacier Flexible Retrieval the correct choice for the final tier.

How to eliminate wrong answers

Option A is wrong because keeping all logs in S3 Standard for 400 days is the most expensive option, and enabling requester pays does not reduce the company's bill for storage costs—it only shifts the cost of data retrieval to the requester, which is irrelevant here as the company owns the data. Option C is wrong because transitioning directly from S3 Standard to S3 Glacier Deep Archive after 30 days ignores the occasional access needs between days 30 and 120; Deep Archive has a retrieval time of 12–48 hours and is not suitable for data that may be accessed occasionally, plus it incurs a minimum storage charge of 180 days. Option D is wrong because S3 Intelligent-Tiering is designed for unpredictable access patterns, but here the access pattern is predictable (heavy, occasional, rare), and disabling lifecycle transitions would prevent automatic cost optimization, leading to higher costs than a tailored lifecycle policy.

Full explanation →

378

MCQeasy

A Lambda function processes CPU-heavy JSON transformations and often runs slower than expected. The team wants to improve performance without changing the code. What should they try first?

A.Increase the Lambda memory setting

B.Move the function to Amazon S3

C.Change the function to an ALB target

D.Disable CloudWatch logging

AnswerA

Increasing memory also gives Lambda more CPU, which often improves performance for CPU-bound functions.

Why this answer

Increasing the Lambda memory setting allocates more CPU power proportionally, as AWS Lambda allocates CPU credits linearly with memory (up to 10,240 MB). For CPU-heavy JSON transformations, this directly reduces execution time without any code changes, making it the simplest and most effective first step.

Exam trap

The trap here is that candidates assume performance issues must be solved by code optimization or architectural changes, overlooking that Lambda's memory setting directly controls CPU power, making it the simplest fix for CPU-bound functions.

How to eliminate wrong answers

Option B is wrong because Amazon S3 is an object storage service, not a compute environment; moving a Lambda function to S3 is impossible and reflects a misunderstanding of service boundaries. Option C is wrong because changing the function to an ALB target does not alter its CPU or memory allocation; ALB merely routes HTTP requests to the Lambda, offering no performance improvement for CPU-bound tasks. Option D is wrong because disabling CloudWatch logging does not free up CPU resources; Lambda execution and logging are asynchronous, and logging overhead is negligible compared to CPU-intensive processing.

Full explanation →

379

MCQmedium

Company A stores encrypted log files in its S3 bucket using SSE-KMS with a customer-managed KMS key. A partner application in Company B uploads objects into Company A's bucket using an IAM role in Company B. Uploads fail with an error indicating KMS access is denied (kms:Encrypt not authorized). Neither the partner IAM policy nor the S3 bucket policy currently mentions KMS. What is the most secure and correct change to allow cross-account uploads to succeed?

A.In Company A's KMS key policy, allow Company B's partner role principal to use the key for kms:Encrypt, kms:GenerateDataKey, and kms:DescribeKey, and also add a matching IAM policy in Company B that grants the partner role those same KMS actions on Company A's key ARN, constrained to the target S3 bucket context when possible.

B.In Company B's IAM policy, allow kms:Encrypt on Company A's KMS key ARN, without changing Company A's key policy.

C.Create a new KMS key in Company B and configure Company A's S3 bucket to use that key for SSE-KMS.

D.Disable key policy restrictions by setting the KMS key to enabled and removing all policy statements so that encryption automatically works for any principal.

AnswerA

Cross-account SSE-KMS requires both the KMS key policy in the key owner account and an IAM policy in the caller account to allow the required KMS actions. Scoping the permissions to the specific bucket or encryption context reduces blast radius.

Why this answer

Option A is correct because cross-account SSE-KMS uploads require both the KMS key policy in Company A to explicitly grant the partner role principal the necessary KMS actions (kms:Encrypt, kms:GenerateDataKey, kms:DescribeKey) and an IAM policy in Company B that allows the partner role to call those actions on Company A's key ARN. The bucket policy alone cannot authorize KMS operations; KMS key policies act as the primary access control for customer-managed keys, and without the key policy grant, the partner role's IAM permissions are insufficient. Constraining the IAM policy to the target S3 bucket context (using kms:ViaService or kms:EncryptionContext conditions) adds a security best practice by limiting the key's use to only that specific S3 bucket.

Exam trap

The trap here is that candidates assume an IAM policy in the partner account is sufficient for cross-account KMS access, overlooking that KMS key policies are the mandatory gatekeeper for external principals, and that the key policy must explicitly grant the external role.

How to eliminate wrong answers

Option B is wrong because KMS key policies are the authoritative access control for customer-managed keys; without a key policy grant to Company B's role, Company B's IAM policy alone cannot authorize KMS actions on Company A's key, even if the IAM policy allows it. Option C is wrong because S3 bucket SSE-KMS encryption is configured per object or bucket default; Company A's bucket is already set to use Company A's KMS key, and using a different key from Company B would break the existing encryption setup and likely cause decryption failures for Company A. Option D is wrong because removing all policy statements from a KMS key effectively denies all principals (including the key's AWS account) from using the key, making encryption impossible for anyone; KMS requires an explicit allow in the key policy for any principal to use the key.

Full explanation →

380

MCQmedium

A test environment has EC2 instances that are oversized based on CPU, memory, and network utilisation. Which AWS service should identify rightsizing recommendations?

A.AWS DataSync

B.AWS Shield

C.AWS Artifact

D.AWS Compute Optimizer

AnswerD

Compute Optimizer analyses utilisation metrics and recommends rightsizing for supported resources.

Why this answer

AWS Compute Optimizer uses machine learning to analyze historical utilization metrics (CPU, memory, network, and storage) and provides rightsizing recommendations for EC2 instances, including over-provisioned resources. It helps reduce costs by suggesting instance types or sizes that better match actual workload demands.

Exam trap

The trap here is that candidates may confuse AWS Compute Optimizer with AWS Trusted Advisor, but Trusted Advisor provides general cost optimization checks (e.g., idle instances) while Compute Optimizer delivers granular, ML-driven rightsizing recommendations for specific instance types.

How to eliminate wrong answers

Option A is wrong because AWS DataSync is a data transfer service for moving large datasets between on-premises storage and AWS services, not a resource optimization tool. Option B is wrong because AWS Shield is a managed DDoS protection service that safeguards applications from distributed denial-of-service attacks, not a rightsizing advisor. Option C is wrong because AWS Artifact is a self-service portal for accessing AWS compliance reports and agreements, such as SOC and PCI reports, and does not provide compute optimization recommendations.

Full explanation →

381

Multi-Selecthard

A e-learning platform uses CloudFront in front of an S3 origin. Which two settings help keep users from bypassing CloudFront and accessing the bucket directly?

Select 2 answers

A.Enable CloudFront standard logging

B.Configure Origin Access Control for the S3 origin

C.Enable S3 static website hosting

D.Use an S3 bucket policy that allows access only from the CloudFront distribution

AnswersB, D

Origin Access Control allows CloudFront to securely access a private S3 bucket.

Why this answer

Origin Access Control (OAC) is the recommended way to restrict access to an S3 bucket so that it only accepts requests from a specific CloudFront distribution. By configuring OAC, CloudFront signs requests to S3 using a trusted identity, and the bucket policy denies any direct access that does not include that signature, effectively preventing users from bypassing CloudFront.

Exam trap

The trap here is that candidates often think enabling logging or static website hosting somehow restricts access, when in fact only a properly configured bucket policy combined with OAC (or OAI) can enforce that all traffic goes through CloudFront.

Full explanation →

382

Multi-Selectmedium

A customer portal must recover from a regional outage within a few hours. The business wants lower ongoing cost than a fully active second Region and does not want to rebuild everything from scratch during the outage. Which two DR patterns best fit that goal? Select two.

Select 2 answers

A.Backup and restore

B.Pilot light

C.Warm standby

D.Multi-site active-active

E.Single-AZ deployment

AnswersB, C

Pilot light keeps only core components running in the secondary Region, which lowers cost while reducing recovery time.

Why this answer

Pilot light is correct because it maintains a minimal, always-running core infrastructure (e.g., a small database and application server) in the secondary Region, replicating data continuously. During a regional outage, you can rapidly scale up the environment by provisioning additional resources (e.g., EC2 instances from pre-baked AMIs) to become fully active, meeting the recovery time objective (RTO) of a few hours while keeping ongoing costs lower than a fully active second Region.

Exam trap

AWS often tests the distinction between pilot light and warm standby by making candidates confuse the minimal 'pilot light' core with a fully scaled 'warm standby' environment, or by assuming that backup and restore can meet a few-hour RTO when it typically cannot due to provisioning and data restoration latency.

Full explanation →

383

MCQmedium

A legacy market-data service runs on EC2 and exposes a custom TCP protocol. Clients must connect over TCP with very low latency, and the team wants static IP addresses at the load-balancing layer. Which AWS service is the best fit?

A.Application Load Balancer, because it provides advanced routing for all protocols.

B.Network Load Balancer, because it supports TCP, static IPs, and very low latency.

C.Amazon API Gateway, because it can front any network protocol with throttling.

D.Amazon CloudFront, because it can route traffic to EC2 instances at the edge.

AnswerB

A Network Load Balancer is the best fit for a custom TCP service that needs extremely low latency and static IP addresses. NLB operates at Layer 4, preserves high throughput, and is commonly used when protocol simplicity and performance matter more than application-layer routing features. It matches the workload's network requirements without adding unnecessary HTTP-specific behavior.

Why this answer

The Network Load Balancer (NLB) operates at Layer 4, supports TCP traffic natively, provides static IP addresses per Availability Zone, and delivers very low latency by processing packets without inspecting application-layer headers. This makes it the ideal choice for a legacy market-data service that requires a custom TCP protocol and fixed IPs at the load-balancing layer.

Exam trap

The trap here is that candidates often confuse the ALB's 'advanced routing' capabilities with support for all protocols, but ALB is strictly Layer 7 and cannot handle raw TCP or custom protocols, making NLB the only correct choice for TCP with static IPs and low latency.

How to eliminate wrong answers

Option A is wrong because the Application Load Balancer (ALB) operates at Layer 7 (HTTP/HTTPS/gRPC) and cannot handle raw TCP or custom TCP protocols; it would terminate the TCP connection and require HTTP-level routing. Option C is wrong because Amazon API Gateway only supports RESTful, HTTP, WebSocket, or GraphQL APIs over HTTPS/WebSocket, not arbitrary TCP protocols. Option D is wrong because Amazon CloudFront is a content delivery network (CDN) that works over HTTP/HTTPS and cannot forward raw TCP traffic or provide static IP addresses at the load-balancing layer.

Full explanation →

384

Multi-Selecthard

A company processes product-image uploads in bursts. Each transform takes up to ten minutes, and every job can be retried safely from the beginning. The current EC2 worker fleet is idle most of the day. Which two changes most reduce cost and idle capacity? Select two.

Select 2 answers

A.Buffer jobs in Amazon SQS and let workers scale from queue depth.

B.Run the workers on AWS Fargate Spot, since interruptions are acceptable.

C.Keep a fixed fleet of m6i.large instances in an Auto Scaling group with a higher minimum.

D.Use Reserved Instances for the workers even though demand is highly bursty.

E.Process uploads only during a nightly window so the fleet looks busier.

AnswersA, B

Correct. SQS decouples uploads from processing and smooths bursty demand. Queue depth is a practical scaling signal, so the company avoids paying for idle workers while still absorbing traffic spikes.

Why this answer

Option A is correct because Amazon SQS decouples the bursty upload workload from the worker fleet. By using SQS queue depth as the metric for an Auto Scaling policy, workers scale up only when jobs are waiting and scale down to zero during idle periods, eliminating wasted capacity. This directly reduces cost by matching compute resources to actual demand.

Exam trap

The trap here is that candidates may think a fixed fleet or Reserved Instances are cheaper for predictable workloads, but they overlook that bursty, idle-heavy patterns require elastic scaling and spot pricing to truly minimize cost.

Full explanation →

385

MCQhard

A claims workflow uses Amazon SQS. Poison messages are repeatedly failing and blocking useful retries. What should the architect configure? The team wants the control to be enforceable during normal operations.

A.A FIFO queue without a redrive policy

B.Short polling instead of long polling

C.A dead-letter queue with an appropriate maxReceiveCount

D.A larger message retention period only

AnswerC

A DLQ isolates messages that fail repeatedly so they can be investigated without disrupting normal processing.

Why this answer

A dead-letter queue (DLQ) with an appropriate maxReceiveCount is the correct solution because it automatically moves messages that have failed processing a specified number of times to a separate queue, preventing them from blocking subsequent retries. This enforces control during normal operations by isolating poison messages without manual intervention, allowing the main queue to continue processing valid messages.

Exam trap

The trap here is that candidates may confuse a dead-letter queue with simply increasing retention or changing polling methods, failing to recognize that only a DLQ with a maxReceiveCount enforces automatic removal of poison messages during normal operations.

How to eliminate wrong answers

Option A is wrong because a FIFO queue without a redrive policy does not handle poison messages; it only preserves message order and exactly-once processing, but without a DLQ, failed messages will continue to be retried indefinitely. Option B is wrong because short polling (returning immediately even if the queue is empty) does not address poison messages; it affects message availability timing, not retry behavior or failure handling. Option D is wrong because increasing the message retention period only extends how long messages stay in the queue; it does not limit retries or remove failing messages, so poison messages would still block retries until the retention period expires.

Full explanation →

386

MCQeasy

A company stores compliance reports in Amazon S3. Objects are written once and rarely accessed. They need to keep the data for 3 years. When retrieval is needed for an audit, the reports can be restored within hours (not minutes). What storage class should the company use for new objects, assuming minimal operational overhead?

A.S3 Standard

B.S3 Glacier Flexible Retrieval

C.S3 Intelligent-Tiering

D.S3 Glacier Deep Archive

AnswerB

Glacier Flexible Retrieval is designed for infrequent access with retrieval typically on the order of hours.

Why this answer

S3 Glacier Flexible Retrieval is the correct choice because it offers retrieval times of minutes to hours (typically 1–5 minutes for expedited, 3–5 hours for standard), which aligns with the 'within hours' requirement. It is designed for data that is rarely accessed but must be retained for long periods (3 years), and it provides a low-cost storage class with minimal operational overhead since objects can be transitioned via lifecycle policies or stored directly.

Exam trap

The trap here is that candidates often confuse 'Glacier Deep Archive' as the cheapest option for long-term storage, but fail to consider the retrieval time constraint of 12–48 hours, which violates the 'within hours' requirement, making S3 Glacier Flexible Retrieval the correct balance of cost and retrieval speed.

How to eliminate wrong answers

Option A is wrong because S3 Standard is optimized for frequently accessed data with millisecond retrieval, which is unnecessary and cost-inefficient for rarely accessed compliance reports stored for 3 years. Option C is wrong because S3 Intelligent-Tiering automatically moves objects between access tiers based on usage patterns, but it incurs a monthly monitoring fee per object and is not cost-optimal for data that is written once and never accessed again, as it would remain in the infrequent access tier without savings over Glacier Flexible Retrieval. Option D is wrong because S3 Glacier Deep Archive has a retrieval time of 12–48 hours, which exceeds the 'within hours' requirement and would not meet the audit retrieval window.

Full explanation →

387

MCQeasy

A database administrator wants a regular backup of an Amazon RDS database so the team can restore to a recent point in time if needed. Which AWS feature should they use?

A.RDS automated backups and snapshots

B.Amazon Route 53 alias records

C.Security groups

D.AWS WAF rules

AnswerA

RDS automated backups and snapshots provide point-in-time recovery capability for the database.

Why this answer

Amazon RDS automated backups and snapshots provide the ability to restore a database to any point within the backup retention period (up to 35 days). Automated backups include transaction logs for point-in-time recovery, while manual snapshots are user-initiated backups stored until explicitly deleted. This directly meets the requirement for regular backups and point-in-time restore capability.

Exam trap

The trap here is that candidates may confuse security groups or WAF rules with backup mechanisms because they are common security services, but they have no role in data persistence or recovery.

How to eliminate wrong answers

Option B is wrong because Amazon Route 53 alias records are a DNS routing feature used to map domain names to AWS resources, not a backup mechanism. Option C is wrong because security groups act as virtual firewalls controlling inbound and outbound traffic to RDS instances, they do not perform backups. Option D is wrong because AWS WAF rules are used to filter and monitor HTTP/HTTPS requests to protect web applications from common exploits, they have no role in database backup or restore operations.

Full explanation →

388

MCQeasy

A.S3 Standard

B.S3 Glacier Flexible Retrieval

C.S3 Intelligent-Tiering

D.S3 Glacier Deep Archive

AnswerB

Glacier Flexible Retrieval is designed for infrequent access with retrieval typically on the order of hours.

Why this answer

S3 Glacier Flexible Retrieval is the correct choice because it is designed for long-term archival data that is rarely accessed, with retrieval times ranging from minutes to hours. It offers a lower storage cost than S3 Standard while still meeting the 3-year retention requirement and the acceptable retrieval window of 'within hours'. This minimizes operational overhead as objects are automatically transitioned to the appropriate storage tier without manual lifecycle management.

Exam trap

The trap here is that candidates often confuse S3 Glacier Deep Archive with S3 Glacier Flexible Retrieval, assuming both have similar retrieval times, but Deep Archive requires 12+ hours for standard retrievals, which fails the 'within hours' constraint.

How to eliminate wrong answers

Option A is wrong because S3 Standard is optimized for frequently accessed data with millisecond retrieval, making it unnecessarily expensive for data that is rarely accessed and only needed for audits. Option C is wrong because S3 Intelligent-Tiering is designed for data with unknown or changing access patterns and incurs a monthly monitoring fee per object, which adds operational overhead and cost for a workload where access patterns are known (rare access). Option D is wrong because S3 Glacier Deep Archive has a retrieval time of 12 hours or more, which exceeds the 'within hours' requirement and would not meet the audit retrieval window.

Full explanation →

389

MCQmedium

A ticket booking system runs on EC2 instances behind an Application Load Balancer. The design must tolerate the failure of one Availability Zone. What should the Auto Scaling group configuration include?

A.Subnets in at least two Availability Zones with health checks enabled

B.All instances in one larger subnet

C.A Network Load Balancer in one subnet

D.A single EC2 instance with detailed monitoring

AnswerA

An Auto Scaling group spanning multiple AZs can replace unhealthy instances and maintain capacity during an AZ failure.

Why this answer

Option A is correct because an Auto Scaling group configured with subnets in at least two Availability Zones and health checks enabled ensures that if one AZ fails, EC2 instances in the remaining AZs continue to serve traffic. The Application Load Balancer distributes requests across healthy instances in multiple AZs, and the Auto Scaling group replaces failed instances in the affected AZ, maintaining capacity. This design meets the requirement to tolerate the failure of one Availability Zone.

Exam trap

The trap here is that candidates often think a single larger subnet or a different load balancer type provides resilience, but only distributing subnets across multiple Availability Zones with health checks ensures the system can survive an AZ failure.

How to eliminate wrong answers

Option B is wrong because placing all instances in one larger subnet within a single Availability Zone creates a single point of failure; if that AZ goes down, all instances become unavailable. Option C is wrong because a Network Load Balancer operates at Layer 4 and does not provide the HTTP/HTTPS health checks or path-based routing needed for a ticket booking system, and placing it in one subnet does not address multi-AZ resilience. Option D is wrong because a single EC2 instance, even with detailed monitoring, cannot survive an AZ failure; there is no redundancy or automatic failover.

Full explanation →

390

MCQmedium

A batch analytics job runs for several hours each night and can be interrupted and restarted. Which EC2 purchasing option should minimize cost? The architecture review board prefers a managed AWS-native control.

A.On-Demand Instances only

B.Dedicated Hosts

C.Spot Instances

D.Provisioned IOPS volumes

AnswerC

Spot Instances offer deep discounts for interruptible workloads.

Why this answer

Spot Instances are correct because the batch job is fault-tolerant (can be interrupted and restarted) and runs for several hours each night, making it an ideal candidate for Spot Instances, which offer up to 90% cost savings compared to On-Demand. AWS-managed services like EC2 Auto Scaling or Amazon EMR can automatically handle Spot Instance interruptions by replacing instances or checkpointing the job, aligning with the architecture review board's preference for a managed AWS-native control.

Exam trap

The trap here is that candidates may choose On-Demand Instances (Option A) due to a misconception that Spot Instances are unreliable for any workload, failing to recognize that fault-tolerant, interruptible jobs like batch processing are exactly the use case for which Spot Instances are designed and recommended for cost optimization.

How to eliminate wrong answers

Option A is wrong because On-Demand Instances provide no interruption but are significantly more expensive than Spot Instances for fault-tolerant workloads, failing to minimize cost. Option B is wrong because Dedicated Hosts are designed for licensing or compliance requirements (e.g., per-socket or per-core licensing) and are the most expensive option, not cost-optimal for a batch job that can tolerate interruptions. Option D is wrong because Provisioned IOPS volumes are a storage type (EBS), not an EC2 purchasing option, and thus irrelevant to the question of minimizing compute cost.

Full explanation →

391

MCQmedium

An application runs on EC2 instances in private subnets behind an Application Load Balancer (ALB). Security groups allow inbound HTTPS (443) from the ALB’s security group to the instance security group, and outbound from instances is set to allow ephemeral ports. Despite this, clients see connection timeouts. After reviewing network ACLs, you find the NACL associated with the instance subnet has an inbound allow for destination port 443, but it does not have a corresponding outbound allow for ephemeral ports. What is the most likely reason the traffic fails, and what should be updated?

A.NACLs are stateless, so you must update the NACL to allow the return (outbound) ephemeral port range; security groups alone cannot override a blocked NACL.

B.NACLs are stateful and automatically track connections; the fix is to add a new inbound rule to the security group for client source ports.

C.The issue is caused by ALB health checks; configure a new target group health check on port 80 so traffic can be routed.

D.Because instances are in private subnets, add a NAT gateway so return traffic can reach the internet over dynamic routing.

AnswerA

Stateless NACLs require both inbound and outbound rules. Missing outbound for ephemeral ports will block return traffic even if SG rules are correct.

Why this answer

Network ACLs are stateless, meaning they do not automatically allow return traffic. Even though the security group allows inbound HTTPS from the ALB, the NACL blocks the response traffic because it lacks an outbound rule for ephemeral ports (typically 1024-65535). Since NACLs are evaluated before security groups, a missing outbound allow rule causes the connection to time out.

Exam trap

The trap here is that candidates assume security groups alone handle all traffic filtering, forgetting that NACLs are stateless and require explicit outbound rules for return traffic, especially for ephemeral ports.

How to eliminate wrong answers

Option B is wrong because NACLs are stateless, not stateful; they do not automatically track connections, so adding an inbound rule to the security group does not fix the missing outbound NACL rule. Option C is wrong because the issue is not related to ALB health checks; health checks use the same traffic path and would also fail due to the NACL, and changing the health check port does not address the stateless NACL problem. Option D is wrong because the instances are in private subnets behind an ALB, and return traffic to the ALB does not require a NAT gateway; the ALB and instances communicate within the VPC, and the failure is due to NACL rules, not internet routing.

Full explanation →

392

MCQmedium

An application runs on EC2 in us-east-1 and frequently reads objects from an S3 bucket that is physically located in us-west-2. The finance team reports unexpectedly high inter-Region data transfer charges because the application retrieves objects for many user requests. A constraint: the bucket in us-west-2 must remain the system of record for compliance, but the application can read from a replica in us-east-1. What should the solutions architect do to minimize network spend while meeting the compliance constraint?

A.Enable S3 Cross-Region Replication from the us-west-2 source bucket to a destination bucket in us-east-1, and update the app to read from the us-east-1 bucket.

B.Create an interface VPC endpoint for S3 in us-east-1 and keep all object reads pointing to the us-west-2 bucket.

C.Use VPC peering between two regions and route all requests to the us-west-2 bucket over the peering link.

D.Use Route 53 latency-based routing to send users to a us-west-2 web endpoint and keep the S3 bucket unchanged.

AnswerA

CRR keeps the west bucket as the source of record while creating a near-region copy to reduce inter-Region transfer on reads.

Why this answer

Option A is correct because enabling S3 Cross-Region Replication (CRR) automatically copies objects from the us-west-2 source bucket to a destination bucket in us-east-1, satisfying the compliance requirement that the us-west-2 bucket remains the system of record. By updating the application to read from the us-east-1 replica, all data retrieval traffic stays within the same region, eliminating inter-Region data transfer charges (which are typically $0.02/GB for S3 GET requests across regions). This approach directly addresses the cost issue while preserving the original bucket as the authoritative source.

Exam trap

The trap here is that candidates may assume VPC endpoints or peering can magically route S3 traffic within a region, but S3 is a regional service and inter-Region data transfer charges apply regardless of the network path used.

How to eliminate wrong answers

Option B is wrong because an interface VPC endpoint for S3 in us-east-1 does not change the physical location of the bucket; the application would still read from the us-west-2 bucket, incurring inter-Region data transfer charges for each request. Option C is wrong because VPC peering does not support inter-Region traffic for S3; S3 is a regional service accessed via public endpoints or gateway endpoints, and VPC peering does not route S3 traffic through the peering link—it would still traverse the public internet or require a NAT gateway, incurring the same or higher costs. Option D is wrong because Route 53 latency-based routing only directs user traffic to a web endpoint in us-west-2, but the application still reads from the same us-west-2 bucket, so inter-Region data transfer charges persist; this option does not create a local replica or reduce cross-region traffic.

Full explanation →

393

MCQhard

Based on the exhibit, a company wants EC2 instances in private subnets to access Amazon S3 without using a NAT gateway, and bucket access must be allowed only when requests come through the approved VPC endpoint. Which design is the most appropriate?

A.Use the S3 gateway VPC endpoint and keep the bucket policy that denies requests unless aws:SourceVpce matches the approved endpoint.

B.Use an interface VPC endpoint for S3 only, because gateway endpoints cannot be used with bucket policies.

C.Add a NAT gateway and remove the bucket policy condition because the NAT route will automatically secure the S3 traffic.

D.Move the bucket policy restriction to a security group attached to the S3 bucket so only the VPC endpoint can reach it.

AnswerA

For S3, a gateway VPC endpoint is the correct private-connectivity option for EC2 instances in private subnets. The route table sends S3 prefix-list traffic to the gateway endpoint, so requests stay on the AWS network instead of traversing a NAT gateway or the public internet. The bucket policy condition on aws:SourceVpce then ensures that even valid AWS-authenticated requests are accepted only when they arrive through the approved endpoint ID.

Why this answer

Option A is correct because an S3 gateway VPC endpoint allows EC2 instances in private subnets to access S3 without traversing the internet or requiring a NAT gateway. By adding a bucket policy condition that denies access unless `aws:SourceVpce` matches the approved VPC endpoint ID, you ensure that only requests originating from that specific endpoint are allowed, meeting the security requirement.

Exam trap

The trap here is that candidates often confuse gateway endpoints with interface endpoints, assuming gateway endpoints cannot enforce bucket policies, or they mistakenly think security groups can be applied to S3 buckets, leading them to choose option D.

How to eliminate wrong answers

Option B is wrong because gateway endpoints for S3 can absolutely be used with bucket policies; in fact, the `aws:SourceVpce` condition is specifically designed for gateway endpoints. Option C is wrong because adding a NAT gateway would route traffic through the internet, which is unnecessary and violates the requirement to avoid using a NAT gateway; also, removing the bucket policy condition would leave the bucket open to any request, not just those through the VPC endpoint. Option D is wrong because S3 buckets do not support security groups; security groups are network-level constructs for EC2 instances and cannot be attached to S3 buckets.

Full explanation →

394

Multi-Selecthard

A company operates 40 AWS accounts and wants chargeback by application, environment, and business unit. Finance needs detailed line items, and engineering wants consistent monthly reports without manual spreadsheet work. The current tagging scheme is inconsistent, and many resources are missing billing metadata. Which three actions should the architect recommend? Select three.

Select 3 answers

A.Standardize cost allocation tags such as Application, Environment, and BusinessUnit.

B.Activate the approved tags as cost allocation tags in the billing console.

C.Enable the AWS Cost and Usage Report and store it in S3 for Athena queries.

D.Use AWS Budgets alone because it provides the most detailed line-item attribution.

E.Rely only on the Cost Explorer console for chargeback accuracy.

AnswersA, B, C

Consistent tags are the foundation of chargeback because they let costs be grouped accurately by the business dimensions finance cares about.

Why this answer

Option A is correct because standardizing cost allocation tags like Application, Environment, and BusinessUnit ensures consistent metadata across all 40 accounts. This is a prerequisite for accurate chargeback, as AWS Cost Allocation Tags allow you to group resources by these dimensions for detailed billing reports.

Exam trap

The trap here is that candidates may think AWS Budgets or Cost Explorer alone can provide detailed chargeback data, but neither offers the raw line-item granularity and queryability that the CUR with Athena provides for automated reporting.

Full explanation →

395

Multi-Selectmedium

A media company stores daily financial exports in Amazon S3. The files must be protected against accidental overwrite or deletion, and the business also wants a second copy in another Region for recovery after a regional outage. Which two actions should the architect take? Select two.

Select 2 answers

A.Enable bucket versioning on the S3 bucket.

B.Turn on S3 Transfer Acceleration for the bucket.

C.Use only lifecycle policies to move objects to Glacier.

D.Configure replication to a bucket in a second AWS Region.

E.Enable S3 Block Public Access on the bucket.

AnswersA, D

Versioning preserves prior object versions so accidental deletes and overwrites can be recovered later.

Why this answer

Option A is correct because enabling S3 Versioning on the bucket protects objects from accidental overwrite or deletion by preserving previous versions of each object. When versioning is enabled, a delete marker is placed instead of permanently removing the object, and overwrites create a new version while retaining the old one. This directly meets the requirement to guard against accidental data loss.

Exam trap

The trap here is that candidates may confuse S3 Transfer Acceleration or Block Public Access with data protection features, when in fact only versioning and replication directly address the requirements for preventing accidental deletion and providing cross-region recovery.

Full explanation →

396

MCQeasy

An order-processing application becomes slow when traffic spikes. The frontend should stay responsive even if downstream workers are temporarily overloaded. What should the team add to the design?

A.Amazon SQS queue between the frontend and the workers

B.A larger NAT Gateway

C.A single bigger EC2 instance for the worker

D.An Amazon Route 53 health check on the frontend

AnswerA

SQS buffers work so the frontend can respond quickly while workers process messages at their own pace. It smooths spikes and supports retries when processing is delayed.

Why this answer

Adding an Amazon SQS queue between the frontend and the workers decouples the components, allowing the frontend to remain responsive by immediately offloading requests to the queue even when downstream workers are overloaded. The workers can then process messages at their own pace, and the queue acts as a buffer to absorb traffic spikes without blocking the frontend.

Exam trap

The trap here is that candidates often confuse scaling solutions (like larger instances or NAT Gateways) with decoupling patterns, failing to recognize that asynchronous message queuing is the correct approach to keep the frontend responsive under load.

How to eliminate wrong answers

Option B is wrong because a larger NAT Gateway only increases the network throughput for outbound traffic from private subnets, but does not address the decoupling or buffering needed to handle worker overload. Option C is wrong because a single bigger EC2 instance for the worker still represents a single point of failure and cannot elastically scale to absorb traffic spikes; it does not provide the asynchronous decoupling required. Option D is wrong because an Amazon Route 53 health check on the frontend only monitors the frontend's availability and can trigger failover, but it does not buffer or decouple requests from overloaded downstream workers.

Full explanation →

397

MCQmedium

A media archive requires consistent high IOPS for a transactional database on EC2. Which EBS volume type is most suitable? The architecture review board prefers a managed AWS-native control.

A.Provisioned IOPS SSD such as io2

B.st1 Throughput Optimized HDD

C.Instance store only

D.sc1 Cold HDD

AnswerA

io2 is designed for business-critical workloads requiring consistent high IOPS and durability.

Why this answer

The io2 Provisioned IOPS SSD volume type is designed for latency-sensitive transactional workloads that require consistent, high IOPS. It provides a service-level agreement (SLA) of 99.999% durability and supports up to 256,000 IOPS per volume, making it ideal for a database on EC2 that demands predictable performance. As a managed AWS-native EBS volume, it aligns with the architecture review board's preference for a fully AWS-controlled storage solution.

Exam trap

The trap here is that candidates often confuse throughput-optimized HDD (st1) with IOPS requirements, mistakenly thinking high throughput equals high IOPS, but transactional databases need random I/O performance, not sequential throughput.

How to eliminate wrong answers

Option B (st1 Throughput Optimized HDD) is wrong because it is optimized for sequential throughput, not random IOPS, and cannot deliver the consistent high IOPS required by a transactional database. Option C (Instance store only) is wrong because instance store volumes are ephemeral and data is lost on instance stop or termination, making them unsuitable for a persistent database. Option D (sc1 Cold HDD) is wrong because it is designed for infrequently accessed data with the lowest cost and lowest IOPS, far below the needs of a transactional database.

Full explanation →

398

MCQmedium

A public API for a financial reporting platform is deployed on API Gateway. Clients must authenticate with standards-based tokens issued by an external OpenID Connect provider. Which authorization mechanism should be used?

A.JWT authorizer configured for the OpenID Connect issuer

B.IAM authorization for all internet users

C.API keys only

D.A VPC endpoint policy

AnswerA

A JWT authorizer validates tokens from a trusted OIDC issuer with low operational overhead.

Why this answer

Option A is correct because the scenario requires standards-based token authentication from an external OpenID Connect (OIDC) provider. API Gateway's JWT authorizer can validate JSON Web Tokens (JWTs) directly against the OIDC issuer's well-known configuration (JWKS URI) without custom Lambda code, making it the simplest and most secure choice for token-based authentication.

Exam trap

The trap here is that candidates often confuse IAM authorization (for AWS internal services) with token-based authentication for external clients, or they assume API keys alone are sufficient for security, ignoring the requirement for standards-based token validation.

How to eliminate wrong answers

Option B is wrong because IAM authorization is designed for AWS principals (users/roles) using Signature Version 4 signing, not for internet clients with external OIDC tokens. Option C is wrong because API keys only provide basic identification and rate limiting, not authentication or authorization against a standards-based token issuer. Option D is wrong because a VPC endpoint policy controls access to the API Gateway via VPC endpoints, not authentication of client tokens from an external OIDC provider.

Full explanation →

399

MCQeasy

An S3 bucket uses a customer-managed KMS key as the default for SSE-KMS encryption. A service role will upload objects using s3:PutObject. Assuming the role already has permission to write to the bucket, which KMS permission is most directly required for the role to let S3 encrypt the object during upload?

A.kms:GenerateDataKey (and optionally kms:DescribeKey)

B.kms:Decrypt only

C.kms:CreateAlias and kms:UpdateAlias only

D.kms:ScheduleKeyDeletion and kms:CancelKeyDeletion only

AnswerA

For SSE-KMS uploads, S3 uses KMS to generate a data key for encrypting the object. kms:GenerateDataKey is the direct permission required for that flow. kms:DescribeKey can be useful for validation or troubleshooting, but it is not the core cryptographic permission.

Why this answer

When S3 uses SSE-KMS with a customer-managed KMS key, the service calls KMS to generate a data key for encrypting the object. The s3:PutObject operation requires the caller to have kms:GenerateDataKey permission on the KMS key so that S3 can obtain the plaintext and encrypted versions of the data key. Optionally, kms:DescribeKey may be needed for S3 to verify the key exists, but kms:GenerateDataKey is the most directly required permission.

Exam trap

The trap here is that candidates often confuse kms:Decrypt (needed for GET/read operations) with kms:GenerateDataKey (needed for PUT/write operations), or they assume any KMS permission will work because S3 handles encryption transparently.

How to eliminate wrong answers

Option B is wrong because kms:Decrypt is used for reading or decrypting objects, not for uploading new objects with SSE-KMS. Option C is wrong because kms:CreateAlias and kms:UpdateAlias are for managing key aliases, not for encrypting data during upload. Option D is wrong because kms:ScheduleKeyDeletion and kms:CancelKeyDeletion are key lifecycle management actions, unrelated to the encryption process for PutObject.

Full explanation →

400

Multi-Selectmedium

A serverless order-ingestion API writes directly to a database. During traffic spikes, the database occasionally throttles, Lambda retries create duplicate order records, and some requests time out. Which two changes best improve buffering and safe retry behavior? Select two.

Select 2 answers

A.Increase the Lambda timeout and keep writing directly to the database.

B.Put an Amazon SQS queue between the API and the database-processing function.

C.Replace SQS with SNS so every request is delivered immediately to all subscribers.

D.Make the database write idempotent by using a unique request token or order ID.

E.Disable retries so failed writes are never duplicated.

AnswersB, D

SQS buffers bursts and decouples producers from consumers, so the database can be processed at a steadier rate.

Why this answer

Option B is correct because inserting an SQS queue between the API Gateway and the Lambda function decouples the ingestion from the database write. During traffic spikes, SQS buffers the requests, allowing the Lambda function to poll at a controlled rate, which prevents database throttling. Additionally, SQS provides built-in retry logic with a visibility timeout, so failed messages are automatically retried without creating duplicate order records.

Exam trap

The trap here is that candidates often think SNS (Option C) is a suitable replacement for SQS because both are messaging services, but SNS lacks buffering and retry mechanics, making it inappropriate for smoothing traffic spikes and handling failures gracefully.

Full explanation →

401

MCQeasy

A CI pipeline needs to upload build artifacts only to s3://ci-artifacts/uploads/*. You also want the pipeline to list only objects under uploads/ to verify that the upload succeeded. Which IAM policy approach is the best fit for least privilege?

A.Allow s3:PutObject on arn:aws:s3:::ci-artifacts/uploads/* and allow s3:ListBucket on arn:aws:s3:::ci-artifacts with a condition that restricts s3:prefix to uploads/.

B.Allow s3:PutObject on arn:aws:s3:::ci-artifacts/* and allow s3:ListBucket on arn:aws:s3:::ci-artifacts without any prefix condition.

C.Allow s3:GetObject on arn:aws:s3:::ci-artifacts/uploads/* so the pipeline can confirm artifacts exist.

D.Allow s3:PutObject on arn:aws:s3:::ci-artifacts/uploads/* and also allow s3:DeleteObject on arn:aws:s3:::ci-artifacts/uploads/*.

AnswerA

This scopes object writes to only the uploads/ prefix (resource-level restriction for s3:PutObject) and scopes object listing to only that same prefix by restricting the ListBucket request via the s3:prefix condition key (bucket-level authorization for s3:ListBucket).

Why this answer

Option A is correct because it grants the minimum required permissions: s3:PutObject on the specific uploads/ path for uploading artifacts, and s3:ListBucket on the bucket with a condition restricting the s3:prefix to uploads/ to list only objects under that prefix. This follows the least privilege principle by scoping both actions to the exact resources needed.

Exam trap

The trap here is that candidates often confuse s3:GetObject with s3:ListBucket for verifying uploads, or they forget to restrict the s3:prefix condition on ListBucket, leading to overly permissive policies.

How to eliminate wrong answers

Option B is wrong because it allows s3:PutObject on the entire bucket (arn:aws:s3:::ci-artifacts/*) instead of restricting to uploads/, and allows s3:ListBucket without a prefix condition, which would list all objects in the bucket, violating least privilege. Option C is wrong because s3:GetObject is not needed to verify upload success; listing objects (s3:ListBucket) is the correct action to confirm an object exists after upload. Option D is wrong because it grants s3:DeleteObject, which is unnecessary for the pipeline's requirements and introduces excessive permissions that could allow accidental or malicious deletion of artifacts.

Full explanation →

402

MCQeasy

A web service runs continuously on AWS 24/7. The team expects steady compute usage for the next 12–24 months, but may change instance families/sizes as performance tuning continues. Which purchase option best reduces cost while keeping flexibility to change instance types?

A.Buy EC2 On-Demand instances and rely on future Spot capacity for discounts

B.Use Compute Savings Plans for the expected steady usage

C.Buy Reserved Instances with a fixed instance type and region

D.Buy Spot Instances and stop scaling to avoid interruption risk

AnswerB

Compute Savings Plans provide a discounted hourly rate in exchange for a commitment. They are the most flexible Savings Plans option and can apply across EC2 usage regardless of instance family or size changes, so the team can continue tuning instance types while still receiving discounted pricing for the committed usage.

Why this answer

Compute Savings Plans offer the lowest prices (up to 66% off On-Demand) in exchange for a commitment to a consistent amount of compute usage (measured in $/hour) for a 1- or 3-year term. Unlike Reserved Instances, they automatically apply to any EC2 instance family, size, OS, or region, giving you the flexibility to change instance types as performance tuning evolves, while still reducing costs for steady-state workloads.

Exam trap

The trap here is that candidates often confuse Reserved Instances (which lock instance family) with Savings Plans (which offer flexibility across families), leading them to choose Option C because they think 'Reserved' is the only way to get a discount for steady usage.

How to eliminate wrong answers

Option A is wrong because On-Demand instances provide no discount for steady 24/7 usage, and Spot capacity is not suitable for continuous workloads due to potential interruptions (Spot instances can be reclaimed with a 2-minute warning). Option C is wrong because Reserved Instances lock you into a specific instance family and region (e.g., m5.large in us-east-1), removing the flexibility to change instance types during performance tuning. Option D is wrong because Spot Instances are designed for fault-tolerant, interruptible workloads; relying on them for a continuous 24/7 service risks abrupt termination, and stopping scaling does not eliminate interruption risk.

Full explanation →

403

MCQhard

A company runs a production MySQL database on Amazon RDS in us-east-1. A read replica exists in us-west-2 for disaster recovery. The primary region experiences a complete outage. Which of the following describes the correct procedure to restore database service using the cross-region read replica?

A.Wait for AWS to automatically fail over the read replica to become the new primary

B.Restore the primary database from the most recent automated snapshot in us-west-2

C.Manually promote the us-west-2 read replica to a standalone DB instance and update application endpoints

D.Create a new RDS instance in us-west-2 and manually restore data from application logs

AnswerC

Manual promotion is the correct procedure. The replica becomes a writable standalone DB in us-west-2. Applications must update their connection strings to the new endpoint.

Why this answer

Cross-region RDS read replicas support manual promotion to a standalone database instance. When the primary region fails, the replica must be manually promoted — this makes it an independent writable instance in us-west-2.

Key points: Promotion is NOT automatic (unlike RDS Multi-AZ failover). Promotion breaks the replication link — the replica becomes autonomous. After promotion, application connection strings must be updated to the new endpoint. Any replication lag at the time of the outage represents potential data loss (RPO > 0).

Exam trap

RDS Multi-AZ provides automatic failover — no manual action required. Cross-region read replicas do NOT failover automatically — promotion must be manually triggered. This distinction appears frequently.

For automatic cross-region failover with near-zero RPO, use Amazon Aurora Global Database.

Why the other options are wrong

RDS cross-region read replicas do NOT automatically failover. Only RDS Multi-AZ provides automatic same-region failover. Manual promotion is required for cross-region replicas.

Restoring from a snapshot creates a new instance from an older state. The read replica contains more recent data (continuously synchronized). Promotion is faster and yields less data loss than snapshot restoration when the replica is available.

Creating an empty new instance and manually re-entering data is not a valid DR procedure. The read replica already contains synchronized production data. Never manually re-enter data as part of a DR plan.

Full explanation →

404

MCQmedium

A solutions architect is designing an S3 bucket for a claims portal. The objects must never be publicly accessible, even if a developer later adds an overly broad bucket policy. What should the architect configure?

A.Enable S3 Block Public Access at the account or bucket level

B.Create an IAM policy that denies s3:GetObject to anonymous users

C.Enable server access logging on the bucket

D.Enable S3 Transfer Acceleration

AnswerA

S3 Block Public Access prevents public ACLs and public bucket policies from exposing the bucket.

Why this answer

S3 Block Public Access provides a definitive override that prevents any public access to objects, regardless of bucket policies or object ACLs. By enabling this setting at the account or bucket level, the architect ensures that even if a developer later adds an overly broad bucket policy, the S3 service will block all public access. This is the only option that directly and permanently prevents public exposure.

Exam trap

The trap here is that candidates often think an IAM deny policy (Option B) is sufficient, but they miss that bucket policies can grant access to anonymous users independently of IAM, making S3 Block Public Access the only reliable safeguard.

How to eliminate wrong answers

Option B is wrong because an IAM policy that denies s3:GetObject to anonymous users does not block access granted via a bucket policy that explicitly allows public access; bucket policies can override IAM policies for anonymous principals. Option C is wrong because server access logging only records requests to the bucket, it does not enforce any access restrictions. Option D is wrong because S3 Transfer Acceleration is a performance feature that speeds up uploads over long distances, it has no effect on access control or public accessibility.

Full explanation →

405

MCQeasy

An engineering team deploys a stateless web API on EC2 using an Auto Scaling group and an Application Load Balancer (ALB). During a recent test, they noticed that when one Availability Zone was unavailable, traffic failed until new instances were manually launched. Which change most directly improves automatic failover for the compute layer within a single Region?

A.Place the Auto Scaling group in only one subnet so instance launches are simpler.

B.Ensure the ALB and Auto Scaling group span multiple subnets in at least two Availability Zones.

C.Increase the target group deregistration delay to allow old instances to stay longer.

D.Use a Network Load Balancer, but keep all subnets in a single Availability Zone.

AnswerB

Spreading the ALB and Auto Scaling group across at least two AZs provides redundant capacity. If one AZ fails, the ALB continues routing to healthy targets in the other AZ.

Why this answer

Option B is correct because placing both the ALB and the Auto Scaling group across multiple subnets in at least two Availability Zones ensures that if one AZ becomes unavailable, the ALB can route traffic to healthy instances in the remaining AZs, and the Auto Scaling group can automatically launch replacement instances in the other AZs. This directly provides automatic failover for the compute layer within a single Region without manual intervention.

Exam trap

The trap here is that candidates may think a single-AZ setup with a load balancer is sufficient for high availability, but without multi-AZ subnets for both the ALB and Auto Scaling group, the architecture remains vulnerable to AZ failure and requires manual recovery.

How to eliminate wrong answers

Option A is wrong because placing the Auto Scaling group in only one subnet (single AZ) creates a single point of failure; if that AZ becomes unavailable, all instances are lost and traffic fails until new instances are manually launched in another AZ. Option C is wrong because increasing the target group deregistration delay only keeps old instances longer during a deregistration process, which does not help with failover when an entire AZ is unavailable; it delays traffic draining but does not provide automatic recovery from AZ failure. Option D is wrong because using a Network Load Balancer in a single AZ still creates a single point of failure; the NLB cannot route traffic to other AZs if the only AZ is down, and it does not improve automatic failover compared to an ALB spanning multiple AZs.

Full explanation →

406

Multi-Selecthard

A private application in two private subnets must download objects from S3 and read parameters from Systems Manager Parameter Store without routing traffic through the public internet. Which two components should the architect use? The architecture review board prefers a managed AWS-native control.

Select 2 answers

A.Interface VPC endpoint for Systems Manager

B.Internet gateway attached to the VPC

C.NAT gateway in each Availability Zone

D.Gateway VPC endpoint for Amazon S3

AnswersA, D

Systems Manager/Parameter Store access uses interface endpoints powered by AWS PrivateLink.

Why this answer

Interface VPC endpoints (AWS PrivateLink) for Systems Manager allow EC2 instances in private subnets to access Parameter Store without traversing the internet. Gateway VPC endpoints for S3 provide a highly available, managed route to S3 via the VPC route table, requiring no NAT or internet gateway. Both are AWS-native, managed services that meet the architecture review board's preference.

Exam trap

The trap here is that candidates often confuse gateway VPC endpoints (for S3 and DynamoDB) with interface endpoints (for most other AWS services), or incorrectly assume NAT gateways are required for all private subnet outbound traffic, when managed endpoints can bypass the internet entirely.

Full explanation →

407

MCQeasy

An orders service consumes payment instructions from an Amazon SQS queue. Sometimes the consumer times out after applying the payment but before deleting the SQS message. As a result, the same payment instruction is processed again. Which design change most directly prevents duplicate side effects caused by message retries?

A.Delete the SQS message immediately after it is received, before processing, to ensure it is not retried.

B.Implement idempotency by recording a processed marker keyed by the instruction ID and ignoring duplicates.

C.Increase the SQS visibility timeout to a maximum value to avoid retries entirely.

D.Convert the queue to FIFO and enable content-based deduplication.

AnswerB

Idempotency ensures that repeated deliveries of the same instruction do not cause repeated side effects. By persisting a record keyed by instruction ID (or enforcing a unique constraint in a transactional store), the service can detect duplicates and safely skip or reconcile them even if SQS redelivers the message.

Why this answer

Option B is correct because implementing idempotency ensures that even if the same payment instruction is processed multiple times due to a timeout and retry, the side effect (e.g., applying the payment) occurs only once. By recording a processed marker keyed by the instruction ID (e.g., using a DynamoDB table or Redis), the consumer can check the marker before processing and ignore duplicates. This directly addresses the root cause—duplicate processing—without altering the queue's retry behavior.

Exam trap

The trap here is that candidates confuse message deduplication (preventing duplicate deliveries) with idempotent processing (preventing duplicate side effects), leading them to choose Option D, which only prevents redelivery but does not handle the case where the same message is processed twice due to a consumer timeout before deletion.

How to eliminate wrong answers

Option A is wrong because deleting the SQS message immediately after receipt, before processing, defeats the purpose of at-least-once delivery; if the consumer crashes after deletion but before processing, the payment instruction is lost permanently, leading to data loss. Option C is wrong because increasing the visibility timeout to a maximum value (e.g., 12 hours) does not prevent retries entirely; the message will still be retried if the consumer fails to delete it within the timeout, and it can also delay processing of other messages. Option D is wrong because converting to a FIFO queue with content-based deduplication deduplicates based on the message body, not the processing outcome; if the same message is received again due to a consumer timeout, the deduplication ID (derived from the body) remains the same, so the message is not redelivered—but this does not prevent the duplicate side effect from the first retry that already occurred, and it also requires the queue to be FIFO, which may not suit the existing architecture.

Full explanation →

408

MCQmedium

Account B has an IAM role that includes kms:Decrypt for a specific KMS key ARN in account A. However, when the role tries to read an S3 object encrypted with that CMK, the application fails with AccessDenied: not authorized to perform kms:Decrypt. CloudTrail shows the KMS API call is denied by key policy. What is the most secure and correct fix?

A.Update the IAM role in account B to include kms:Encrypt and kms:GenerateDataKey; then kms:Decrypt will start working automatically.

B.Update the KMS key policy in account A to allow the account B role principal to use kms:Decrypt on the key.

C.Disable key policy for the CMK by switching to S3-managed encryption, because KMS key policies are always enforced regardless of grants.

D.Create an SCP in account A that allows kms:Decrypt for all accounts, avoiding changes to the key policy.

AnswerB

Cross-account use of a CMK requires the KMS key policy (in the CMK’s account) to allow the external principal to perform kms:Decrypt. Since CloudTrail shows the denial is by key policy, updating the key policy to grant the account B role kms:Decrypt on the specific key is the correct and least-privilege solution.

Why this answer

Option B is correct because cross-account access to a customer managed KMS key (CMK) requires the key policy to explicitly grant the external IAM role principal the necessary permissions (e.g., kms:Decrypt). Even if the IAM role in Account B has an IAM policy allowing kms:Decrypt, the KMS key policy in Account A acts as a resource-based policy that must also allow the action; without this, the request is denied by the key policy, as shown in CloudTrail.

Exam trap

The trap here is that candidates often assume IAM permissions alone are sufficient for cross-account KMS operations, forgetting that KMS key policies are resource-based and must explicitly grant access to external principals.

How to eliminate wrong answers

Option A is wrong because adding kms:Encrypt and kms:GenerateDataKey to the IAM role does not resolve the key policy denial; the issue is the key policy in Account A, not the IAM permissions in Account B, and kms:Decrypt does not automatically work from other actions. Option C is wrong because disabling the CMK and switching to S3-managed encryption (SSE-S3) is not a secure fix for cross-account access; it removes customer control over encryption keys and does not address the need for cross-account KMS decryption. Option D is wrong because SCPs (Service Control Policies) are used to restrict permissions within an AWS organization, not to grant cross-account access; they cannot override a key policy denial, and creating an SCP that allows kms:Decrypt for all accounts would be insecure and ineffective.

Full explanation →

409

MCQmedium

A public API is deployed in two AWS Regions: us-east-1 (primary) and us-west-2 (secondary). The team wants Route 53 to automatically route users to the secondary region if the primary API becomes unhealthy. They will use Route 53 health checks that monitor the API’s /status endpoint over HTTPS. Which Route 53 configuration most directly implements this failover behavior?

A.Create two latency-based alias records for the same name, each with different health checks; Route 53 will automatically shift to the secondary when primary is unhealthy.

B.Create a primary alias record and a failover alias record (secondary), configure failover routing policy, and attach health checks to both records.

C.Use geolocation routing with a health check; when the primary is unhealthy, Route 53 will automatically change the region mapping globally.

D.Use simple routing with weighted records and a low health check threshold so traffic quickly moves to the secondary region.

AnswerB

Route 53 failover routing (primary/secondary) is designed for active-passive regional DR. When the primary health check fails, Route 53 automatically stops returning the primary alias and returns the secondary alias target; attaching health checks ensures the change is driven by the /status endpoint health.

Why this answer

B is correct because the failover routing policy in Route 53 is specifically designed for active-passive failover. By creating a primary alias record and a secondary failover alias record, each with an associated health check, Route 53 will automatically route traffic to the secondary region when the health check for the primary fails. This directly implements the required behavior without relying on latency or geographic proximity.

Exam trap

The trap here is that candidates often confuse failover routing with latency-based or geolocation routing, assuming that health checks automatically trigger failover in those policies, but only failover routing provides the explicit active-passive failover behavior described in the question.

How to eliminate wrong answers

Option A is wrong because latency-based routing does not support automatic failover based on health checks; it routes based on lowest latency, and while health checks can be associated, Route 53 does not automatically shift traffic to the secondary when the primary is unhealthy—it continues to return the primary record if it is still considered healthy, and if both are healthy, latency determines the response. Option C is wrong because geolocation routing routes based on the user's geographic location, not health; even with a health check, Route 53 does not automatically change region mappings globally—it would only return no answer for the unhealthy location, not redirect to another region. Option D is wrong because simple routing with weighted records does not support health checks for automatic failover; weighted routing distributes traffic based on weights and does not automatically shift all traffic to the secondary when the primary is unhealthy—it would require manual intervention or complex scripting.

Full explanation →

410

Multi-Selectmedium

A development team stores application logs in CloudWatch Logs and has enabled detailed EC2 monitoring on every instance. Auditors only require 90 days of logs, and operations only needs 5-minute instance metrics. Which three changes would most directly reduce recurring monitoring costs while still meeting those requirements? Select three.

Select 3 answers

A.Set CloudWatch Logs retention to 90 days.

B.Use standard EC2 monitoring instead of detailed monitoring where 1-minute metrics are unnecessary.

C.Export older logs to Amazon S3 and use an S3 lifecycle policy for long-term archive.

D.Keep CloudWatch Logs retention set to Never Expire.

E.Enable high-resolution custom metrics for every request.

AnswersA, B, C

Applying a retention policy prevents CloudWatch Logs from storing data indefinitely. Because the auditors only require 90 days, this removes unnecessary log-storage charges beyond the compliance window without losing required history.

Why this answer

Option A is correct because setting CloudWatch Logs retention to 90 days ensures logs are automatically deleted after the required period, eliminating storage costs for older data. By default, logs never expire, which would incur ongoing charges for data beyond the audit requirement. This directly reduces recurring costs by limiting the log data stored.

Exam trap

The trap here is that candidates might think 'Never Expire' is safe for compliance, but it actually increases costs by retaining data indefinitely, while the requirement is only 90 days; also, high-resolution metrics sound useful but are an unnecessary expense when only 5-minute metrics are needed.

Full explanation →

411

Multi-Selecthard

A regional web application for a inventory service must fail over automatically to a secondary Region if the primary endpoint becomes unhealthy. Which two services or features are required? The team wants the control to be enforceable during normal operations.

Select 2 answers

A.Route 53 failover routing with health checks

B.S3 Transfer Acceleration

C.A deployed standby application stack in the secondary Region

D.AWS Organizations service control policies

AnswersA, C

Route 53 can monitor endpoint health and return the standby endpoint when the primary is unhealthy.

Why this answer

Route 53 failover routing with health checks is correct because it enables automatic DNS-level failover to a secondary Region when the primary endpoint is unhealthy. Route 53 health checks monitor the primary endpoint's health, and if they detect a failure, the DNS record is updated to route traffic to the secondary Region's endpoint. This provides enforceable control during normal operations by allowing you to define routing policies that are active at all times.

Exam trap

The trap here is that candidates may think DNS-level failover alone is sufficient, forgetting that a fully deployed standby stack in the secondary Region is required to actually serve traffic after failover.

Full explanation →

412

Multi-Selectmedium

A production Amazon Aurora MySQL database is corrupted by a bad migration at 10:30 UTC, and the problem is discovered at 10:45 UTC. The team wants to recover to the state just before the migration with minimal manual effort. Which two actions should they take? Select two.

Select 2 answers

A.Restore only the affected table from the latest snapshot and keep the current cluster online.

B.Perform a point-in-time restore to a new DB cluster or instance using automated backups.

C.Reboot the writer so Aurora automatically rolls back the bad migration.

D.Validate the restored database, then repoint the application or DNS name to the restored endpoint.

E.Promote a read replica from the same cluster without restoring from backup.

AnswersB, D

Point-in-time restore is the supported mechanism for recovering to a specific timestamp before the corruption occurred. It uses automated backups and transaction logs to recreate a clean copy of the database state.

Why this answer

Option B is correct because Amazon Aurora supports point-in-time recovery (PITR) to any point within the backup retention window, allowing you to restore the database to a state just before the migration (e.g., 10:29 UTC). This uses automated backups and requires minimal manual effort, as you simply specify the target time and a new DB cluster is created.

Exam trap

The trap here is that candidates may think rebooting or promoting a replica can undo data changes, but these actions do not affect committed transactions; only a point-in-time restore can recover to a pre-migration state.

Full explanation →

413

MCQmedium

Based on the exhibit, what is the most appropriate change to restore application access while keeping encryption at rest with customer-managed KMS controls?

A.Change the bucket to SSE-S3 so the application no longer depends on KMS permissions.

B.Update the KMS key policy or add a grant so AppServerRole can use the key for decrypt and data key operations.

C.Move the EC2 instance into the same Availability Zone as the S3 bucket to reduce encryption errors.

D.Attach AmazonS3FullAccess to the application role so S3 can bypass KMS authorization.

AnswerB

For SSE-KMS objects, the caller needs permission to use the KMS key as well as S3 permissions. The role already has S3 access, but KMS is denying Decrypt because the key policy does not allow the role. Adding the role through the key policy or a grant, together with the needed KMS actions, resolves the failure while preserving customer-managed encryption.

Why this answer

The application is failing because AppServerRole lacks the necessary permissions to use the customer-managed KMS key for decrypting S3 objects. By updating the KMS key policy or adding a grant to allow the role to perform `kms:Decrypt` and `kms:GenerateDataKey` operations, you restore access while maintaining encryption at rest with customer-managed KMS controls.

Exam trap

The trap here is that candidates often assume S3 bucket policies alone control access to encrypted objects, forgetting that SSE-KMS requires separate KMS key permissions that must be explicitly granted to the IAM role or user.

How to eliminate wrong answers

Option A is wrong because switching to SSE-S3 removes customer-managed KMS controls, violating the requirement to keep encryption at rest with customer-managed KMS. Option C is wrong because S3 buckets are global resources not tied to Availability Zones, and encryption errors are unrelated to AZ placement. Option D is wrong because attaching AmazonS3FullAccess does not grant KMS permissions; S3 still requires KMS authorization for decrypting SSE-KMS encrypted objects, and this would not bypass KMS authorization.

Full explanation →

414

MCQmedium

A startup runs a mix of workloads using both EC2 instances and AWS Lambda functions. Over the next 12 months, the team expects the overall level of compute usage to be fairly steady, but they may change EC2 instance types for performance tuning and they may add or remove Lambda functions. They want the lowest-cost commitment that will discount *both* EC2 and Lambda usage without requiring them to commit to a specific EC2 instance family (or a fixed instance type). Which AWS option best meets this requirement?

A.Purchase an EC2 Instance Reserved Instance for the current instance family only, and rely on On-Demand pricing for Lambda and any future EC2 types.

B.Purchase a Compute Savings Plan with a 1-year term for the expected average hourly spend across both EC2 and Lambda.

C.Purchase a Standard Reserved Instance for a specific Region and use it only to reduce network egress/transfer charges.

D.Use EC2 Spot for EC2 workloads and keep Lambda on-demand, because Spot will automatically discount Lambda too.

AnswerB

Compute Savings Plans provide discounts on eligible compute usage for EC2 and AWS Lambda (among other services) based on a committed $/hour amount. They do not require committing to a specific instance family/type, and they allow the team to change EC2 instance types and adjust which Lambda functions run while still receiving the discount.

Why this answer

B is correct because a Compute Savings Plan offers the lowest cost commitment that covers both EC2 and Lambda usage, with flexibility to change instance families, sizes, regions (within the same AWS organization), and even switch between EC2 and Fargate. It applies a consistent hourly discount to compute usage up to the committed amount, regardless of instance type or compute service, meeting the startup's need for a steady workload with potential changes.

Exam trap

The trap here is that candidates often confuse Compute Savings Plans with EC2 Instance Savings Plans or Reserved Instances, assuming any 'Savings Plan' requires a specific instance family, when in fact Compute Savings Plans offer the broadest flexibility across EC2, Lambda, and Fargate.

How to eliminate wrong answers

Option A is wrong because purchasing an EC2 Instance Reserved Instance locks the discount to a specific instance family and size, and does not cover Lambda usage, forcing the startup to pay On-Demand for Lambda and any new EC2 types. Option C is wrong because Standard Reserved Instances are for EC2 compute capacity, not network egress/transfer charges, and they still require a specific instance family commitment. Option D is wrong because EC2 Spot Instances provide discounts only for EC2, not Lambda, and they are interruptible, making them unsuitable for steady workloads; they also do not offer a committed discount across both services.

Full explanation →

415

MCQeasy

A media company uses CloudFront in front of an S3 bucket origin for video thumbnails. They want to prevent users from bypassing CloudFront and accessing the S3 bucket directly, while still allowing CloudFront to fetch objects. What is the best option?

A.Keep the bucket public and rely on signed cookies for all thumbnail requests.

B.Use CloudFront Origin Access Control (OAC) or Origin Access Identity (OAI) and update the bucket policy to allow only CloudFront.

C.Enable S3 static website hosting so users access thumbnails directly from the S3 website endpoint.

D.Set S3 bucket permissions to allow all IAM users and block access only by using a WAF rule at CloudFront.

AnswerB

OAC/OAI ensures only CloudFront can access the bucket while keeping the bucket private.

Why this answer

CloudFront Origin Access Control (OAC) or Origin Access Identity (OAI) allows you to restrict direct access to an S3 bucket by configuring the bucket policy to grant read permissions only to the CloudFront distribution's service principal. This ensures that users can only retrieve thumbnails through CloudFront, leveraging its caching and security features, while blocking any direct S3 requests.

Exam trap

The trap here is that candidates often think signed cookies or URLs alone are sufficient to secure direct S3 access, but they forget that those mechanisms only control access through CloudFront and do not restrict the S3 bucket's public endpoint unless the bucket policy explicitly denies direct access.

How to eliminate wrong answers

Option A is wrong because making the bucket public and relying on signed cookies does not prevent users from bypassing CloudFront and accessing the S3 bucket directly via its public URL; signed cookies only control access through CloudFront, not direct S3 access. Option C is wrong because enabling S3 static website hosting exposes a separate website endpoint that users could access directly, defeating the purpose of restricting access to CloudFront. Option D is wrong because setting S3 bucket permissions to allow all IAM users does not block direct access; WAF rules at CloudFront only filter traffic reaching CloudFront, not requests made directly to the S3 bucket endpoint.

Full explanation →

416

MCQmedium

A payments platform requires disaster recovery across Regions. Requirements: RPO of 15 minutes and RTO of about 1 hour. The business cannot afford full duplicate capacity in both Regions all the time, but the team wants automated readiness so failover is mostly operationally guided rather than a slow rebuild. Which DR strategy is the best fit?

A.Backup and restore only, relying on scheduled snapshots and manual restores during incidents.

B.Pilot light, keeping only minimal infrastructure in the secondary Region and starting full services after failover.

C.Warm standby, keeping core infrastructure and a partially provisioned environment ready in the secondary Region with frequent data replication.

D.Active/active, routing production traffic to both Regions continuously and accepting dual-region complexity.

AnswerC

Warm standby balances cost and readiness by keeping enough capacity and services running to shorten recovery time while meeting RPO needs.

Why this answer

Warm standby is the best fit because it maintains a partially provisioned environment in the secondary Region with core infrastructure (e.g., a smaller EC2 instance fleet, a replicated database) and uses frequent data replication (e.g., Amazon RDS cross-Region replication or DynamoDB global tables) to achieve an RPO of 15 minutes. The RTO of about 1 hour is achievable by scaling up the standby environment and redirecting traffic, which is faster than a full rebuild but avoids the cost of full duplicate capacity. This balances the business constraint of not affording active/active with the need for automated readiness and guided failover.

Exam trap

The trap here is that candidates often confuse pilot light with warm standby, assuming minimal infrastructure is sufficient for a 1-hour RTO, but pilot light requires provisioning compute resources after failover, which adds significant time, whereas warm standby already has compute running and only needs scaling.

How to eliminate wrong answers

Option A is wrong because backup and restore only relies on scheduled snapshots (e.g., EBS snapshots or RDS automated backups) and manual restores, which typically cannot achieve an RPO of 15 minutes (snapshots are often taken every few hours) and would result in an RTO far exceeding 1 hour due to manual intervention and data restoration time. Option B is wrong because pilot light keeps only minimal infrastructure (e.g., a small database replica and no application servers) in the secondary Region, and starting full services after failover requires provisioning compute resources, which would likely exceed the 1-hour RTO target. Option D is wrong because active/active requires full duplicate capacity in both Regions all the time, which contradicts the business constraint that they cannot afford this, and it introduces dual-region complexity that is unnecessary for the stated RPO/RTO goals.

Full explanation →

417

MCQmedium

An S3 bucket stores user-uploaded media. Most objects are never read again, but compliance requires keeping them for at least 18 months. Retrieval is rare and typically only needed during investigations. The current design keeps everything in S3 Standard, increasing storage cost. Which configuration best optimizes cost while meeting the retention and rare-access requirements?

A.Move all objects to S3 Glacier Instant Retrieval immediately upon upload and disable lifecycle policies.

B.Use an S3 lifecycle policy to transition objects to S3 Glacier Deep Archive after 30 days, and expire them after 18 months.

C.Keep objects in S3 Standard but compress them with a custom process to reduce storage size.

D.Enable S3 Intelligent-Tiering for all objects and delete any object not accessed within 24 hours.

AnswerB

Lifecycle policies can automatically move data to lower-cost storage classes after it becomes infrequently accessed. Because reads are rare and required only during investigations, Glacier Deep Archive is a strong cost-optimization choice. Setting expiration after 18 months ensures compliance retention is met.

Why this answer

Option B is correct because S3 Glacier Deep Archive offers the lowest storage cost for data that is rarely accessed and has a flexible retrieval time (12 hours for standard retrievals). The lifecycle policy transitions objects after 30 days to minimize Standard costs, and expiration after 18 months ensures compliance by deleting objects exactly when the retention period ends, avoiding manual cleanup.

Exam trap

The trap here is that candidates often choose S3 Glacier Instant Retrieval (Option A) thinking it balances cost and retrieval speed, but they overlook that Deep Archive is far cheaper for data that is almost never accessed, and that disabling lifecycle policies removes the ability to automate retention management.

How to eliminate wrong answers

Option A is wrong because moving all objects to S3 Glacier Instant Retrieval immediately increases cost unnecessarily (Instant Retrieval is more expensive than Deep Archive for data that is rarely accessed) and disabling lifecycle policies prevents automated cost optimization. Option C is wrong because compressing objects in S3 Standard does not reduce the storage cost enough to match the savings of transitioning to a cold storage class, and it adds custom processing overhead without addressing the rare-access requirement. Option D is wrong because S3 Intelligent-Tiering is designed for unpredictable access patterns, not for data that is almost never read, and deleting objects not accessed within 24 hours violates the 18-month compliance retention requirement.

Full explanation →

418

MCQeasy

Your application uses ElastiCache Redis as a cache for user profiles stored in DynamoDB. You must ensure that when a profile is updated, subsequent reads see the latest value quickly. Which cache strategy is generally the best fit for this requirement?

A.Write to DynamoDB only, and never update or invalidate the Redis cache.

B.Use a cache-aside approach with TTL plus explicit invalidation after writes.

C.Cache only for reads, and do not fetch from DynamoDB when a key is missing.

D.Rely on eventual consistency of Redis replication to propagate updates to all nodes.

AnswerB

A cache-aside (lazy loading) pattern reads from cache first; if missing/expired, it fetches from the source of truth. After an update, explicitly invalidating or updating the cached entry ensures subsequent reads quickly reflect changes. TTL provides protection against missed invalidations while invalidation accelerates correctness after writes.

Why this answer

Option B is correct because a cache-aside (lazy loading) strategy with TTL and explicit invalidation ensures that after a write to DynamoDB, the stale Redis entry is removed, forcing the next read to fetch the fresh profile from DynamoDB and repopulate the cache. This combination minimizes the window of stale reads while maintaining high read performance, which is critical for user profile caches where consistency matters.

Exam trap

The trap here is that candidates often confuse eventual consistency within Redis replication (which only applies to Redis-to-Redis sync) with the need to synchronize the cache with the authoritative data store (DynamoDB), leading them to pick option D, which does not address the core requirement of reflecting DynamoDB updates in the cache.

How to eliminate wrong answers

Option A is wrong because never updating or invalidating the Redis cache means stale data persists indefinitely, violating the requirement that subsequent reads see the latest value quickly. Option C is wrong because caching only for reads and not fetching from DynamoDB when a key is missing would result in cache misses returning no data, effectively breaking the application's ability to serve user profiles. Option D is wrong because relying on eventual consistency of Redis replication does not guarantee that updates to DynamoDB are reflected in the cache; Redis replication only synchronizes data between Redis nodes, not between DynamoDB and Redis, and does not address cache invalidation after writes.

Full explanation →

419

Multi-Selecthard

A financial reporting platform uses CloudFront in front of an S3 origin. Which two settings help keep users from bypassing CloudFront and accessing the bucket directly? The design must avoid adding custom operational scripts.

Select 2 answers

A.Use an S3 bucket policy that allows access only from the CloudFront distribution

B.Enable CloudFront standard logging

C.Enable S3 static website hosting

D.Configure Origin Access Control for the S3 origin

AnswersA, D

The bucket policy should trust the CloudFront distribution and deny direct public access.

Why this answer

Option A is correct because an S3 bucket policy that explicitly denies access to any principal except the CloudFront distribution's origin access identity (OAI) or origin access control (OAC) ensures that direct requests to the S3 bucket endpoint are rejected. This policy leverages the aws:SourceArn or aws:SourceIp condition key to restrict access solely to CloudFront, preventing users from bypassing the CDN and hitting the bucket directly.

Exam trap

The trap here is that candidates often confuse logging or static website hosting with security controls, not realizing that only explicit bucket policies or OAC/OAI configurations actually enforce the restriction to CloudFront traffic.

Full explanation →

420

Multi-Selecteasy

A batch processing job can be interrupted and restarted from checkpoints. The business wants to lower compute cost while still keeping the workload resilient to interruptions. Which two choices are best? Select two.

Select 2 answers

A.Run the workload on Amazon EC2 Spot Instances.

B.Store checkpoints in durable storage such as Amazon S3.

C.Use a single On-Demand instance in one Availability Zone only.

D.Disable automatic replacement so the job is never restarted.

E.Keep all intermediate state only in instance memory.

AnswersA, B

Spot Instances are significantly cheaper than On-Demand capacity and are a good fit when the workload can tolerate interruption. Because the job can restart from checkpoints, interruptions are acceptable.

Why this answer

Amazon EC2 Spot Instances offer significant cost savings (up to 90% compared to On-Demand) and are ideal for fault-tolerant, stateless batch processing jobs that can be interrupted and restarted from checkpoints. The workload's ability to resume from checkpoints makes it resilient to Spot Instance interruptions, aligning perfectly with the business goal of lowering compute costs while maintaining resilience.

Exam trap

The trap here is that candidates may overlook the synergy between Spot Instances and checkpointing, mistakenly thinking that any cost-saving measure (like a single On-Demand instance) suffices, or that resilience can be achieved without durable external storage for state.

Full explanation →

421

MCQeasy

An ECS service runs on EC2 capacity. During peak traffic, tasks frequently wait for available container instances. The team wants faster scale-out for the underlying EC2 capacity when tasks increase. What is the best first architectural step?

A.Tune the container health check settings so tasks stop failing and stay running.

B.Use an ECS capacity provider (or Auto Scaling integration) to scale the EC2 instances based on ECS demand.

C.Pin all tasks to a single Availability Zone to reduce placement overhead.

D.Switch the tasks to run only on Fargate so EC2 scaling is no longer relevant.

AnswerB

When ECS tasks need compute, capacity must scale at the EC2 layer so there are enough container instances to place tasks. Integrating ECS with an Auto Scaling capacity provider allows the cluster to scale out in response to pending tasks. This reduces waiting time and improves responsiveness under load.

Why this answer

Option B is correct because an ECS capacity provider (or Auto Scaling integration) directly links ECS task-level demand to EC2 instance scaling. When tasks are pending due to insufficient container instances, the capacity provider triggers a scale-out event on the Auto Scaling group, adding EC2 instances to accommodate the workload. This is the most efficient architectural step to reduce placement delays during peak traffic.

Exam trap

The trap here is that candidates may confuse task-level scaling (e.g., Service Auto Scaling) with infrastructure-level scaling, and incorrectly assume that tuning health checks or placement strategies will resolve a capacity shortage caused by insufficient EC2 instances.

How to eliminate wrong answers

Option A is wrong because tuning container health check settings addresses task failures, not the underlying shortage of EC2 container instances; tasks waiting for available instances is a capacity issue, not a health check issue. Option C is wrong because pinning all tasks to a single Availability Zone increases risk of failure and does not solve the capacity shortage; placement overhead is negligible compared to the lack of instances. Option D is wrong because switching to Fargate is a migration, not an architectural step to improve EC2 scaling; it avoids the EC2 scaling problem rather than solving it, and may not be feasible or cost-effective for all workloads.

Full explanation →

422

Multi-Selectmedium

A SaaS application is deployed in us-east-1 and us-west-2 behind separate ALBs. The business wants DNS to send new clients to the primary Region when it is healthy and automatically fail over to the secondary Region when the primary endpoint is unhealthy. Which two Route 53 settings are required? Select two.

Select 2 answers

A.Use a failover routing policy with a primary and secondary record.

B.Create a health check and associate it with the primary endpoint.

C.Use weighted routing with a 50/50 traffic split between both Regions.

D.Use latency-based routing so clients always choose the fastest Region.

E.Use a geolocation policy without health checks.

AnswersA, B

Failover routing is designed specifically to send traffic to a secondary endpoint when the primary becomes unhealthy.

Why this answer

A failover routing policy is correct because it allows you to designate one record as primary and another as secondary. Route 53 will route traffic to the primary record as long as it is healthy, and automatically fail over to the secondary record when the primary becomes unhealthy. This directly meets the requirement to send new clients to the primary region when healthy and fail over automatically.

Exam trap

The trap here is that candidates often confuse failover routing with weighted or latency-based routing, assuming any multi-region setup with health checks will automatically fail over, but only failover routing provides the explicit primary/secondary failover behavior required.

Full explanation →

423

MCQeasy

Based on the exhibit, the database must continue serving if the current Availability Zone fails. What should you change?

A.Create a read replica in another Availability Zone and promote it manually if needed.

B.Modify the DB instance to use a Multi-AZ deployment.

C.Increase the automated backup retention period to 30 days.

D.Resize the DB instance to a larger class.

AnswerB

A Multi-AZ RDS deployment provides synchronous standby replication in another Availability Zone and automatic failover if the primary AZ becomes unavailable. This directly matches the requirement to keep the database serving after an AZ failure. It is the simplest resilient design change when the application needs high availability rather than just backups.

Why this answer

Option B is correct because Multi-AZ deployment automatically provisions and maintains a synchronous standby replica in a different Availability Zone. If the primary AZ fails, Amazon RDS automatically fails over to the standby, ensuring database availability without manual intervention. This meets the requirement of continuing service during an AZ failure.

Exam trap

The trap here is confusing read replicas (which are for read scaling and asynchronous replication) with Multi-AZ (which is for high availability and synchronous replication), leading candidates to choose Option A for failover scenarios.

How to eliminate wrong answers

Option A is wrong because a read replica is asynchronous and not designed for automatic failover; promoting it manually introduces downtime and data loss risk, which does not satisfy the requirement for continued service without manual action. Option C is wrong because increasing the backup retention period only affects point-in-time recovery duration, not high availability or failover capability. Option D is wrong because resizing the instance class improves performance but does not provide any redundancy or failover across Availability Zones.

Full explanation →

424

MCQhard

An EC2 instance in a private subnet must access an S3 bucket that contains regulated exports for a e-learning platform. The security team requires access to be allowed only when traffic comes through a specific VPC endpoint. What should the architect add to the bucket policy?

A.A security group rule that allows HTTPS to S3

B.A condition that matches aws:RequestedRegion to the bucket Region

C.A deny statement for all IAM users except the EC2 role

D.A condition that matches aws:sourceVpce to the endpoint ID

AnswerD

The aws:sourceVpce condition restricts S3 access to requests that arrive through the specified VPC endpoint.

Why this answer

Option D is correct because the bucket policy can use the `aws:sourceVpce` condition key to restrict access to requests originating from a specific VPC endpoint (interface or Gateway VPC endpoint). This ensures that only traffic routed through that endpoint can access the S3 bucket, meeting the security team's requirement for regulated exports.

Exam trap

The trap here is that candidates often confuse `aws:sourceVpce` with `aws:SourceIp` or `aws:sourceVpc`, thinking a VPC or IP-based condition is sufficient, but only the specific VPC endpoint ID ensures the traffic used the designated endpoint.

How to eliminate wrong answers

Option A is wrong because security group rules control network traffic at the instance level, not the bucket policy level, and cannot enforce that traffic must come through a specific VPC endpoint. Option B is wrong because `aws:RequestedRegion` checks the AWS Region in the request, not the VPC endpoint, and does not restrict traffic to a specific endpoint. Option C is wrong because denying all IAM users except the EC2 role does not enforce the VPC endpoint requirement; it only controls identity-based access, not the network path.

Full explanation →

425

MCQeasy

A inventory service exposes a static website from S3 and CloudFront. Users should still receive cached pages if the S3 origin has a short outage. Which feature helps most?

A.CloudFront caching with appropriate TTLs

B.AWS Backup Vault Lock

C.IAM Access Analyzer

D.S3 Select

AnswerA

CloudFront can serve cached content from edge locations when the origin is temporarily unavailable.

Why this answer

CloudFront caches responses from the S3 origin based on configured TTLs (Cache-Control or Expires headers). If the S3 origin experiences a short outage, CloudFront can still serve cached pages to users from its edge locations, maintaining availability. This is the most direct way to ensure users receive content during origin failures.

Exam trap

The trap here is that candidates may confuse CloudFront's caching with other AWS services like S3 Transfer Acceleration or S3 Cross-Region Replication, which do not provide cached responses during origin outages.

How to eliminate wrong answers

Option B (AWS Backup Vault Lock) is wrong because it is a data protection feature for backups, enforcing retention policies and preventing deletion, not related to serving cached web content during origin outages. Option C (IAM Access Analyzer) is wrong because it analyzes resource-based policies to identify unintended public or cross-account access, not for caching or origin failover. Option D (S3 Select) is wrong because it is a feature to retrieve subsets of object data using SQL queries, not for caching or serving static content during outages.

Full explanation →

426

MCQmedium

A media company has users around the world uploading 1 to 5 GB files directly to a single Amazon S3 bucket. Upload times are slow from distant regions, but the app must keep using S3 as the destination. What should the architects enable to improve upload performance?

A.Amazon CloudFront for origin caching of uploaded files.

B.Amazon S3 Transfer Acceleration on the bucket.

C.Provisioned IOPS EBS volumes attached to a transfer server.

D.Amazon EFS with a mount target in each Region.

AnswerB

S3 Transfer Acceleration improves upload performance over long distances by routing traffic through AWS edge locations and optimized network paths to the target bucket. This is a strong fit for globally distributed users uploading large files directly to S3. It preserves the same storage destination while making the transfer path faster and more consistent for remote clients.

Why this answer

Amazon S3 Transfer Acceleration (B) uses AWS edge locations to accelerate uploads over the public internet. When a user uploads a file, the data is sent to the nearest edge location via optimized network paths, then forwarded over AWS's private backbone to the S3 bucket. This reduces latency and improves throughput for large files (1–5 GB) from distant regions, directly addressing the slow upload times while keeping S3 as the destination.

Exam trap

The trap here is confusing CloudFront's edge caching for downloads with S3 Transfer Acceleration's edge-based upload optimization, leading candidates to select CloudFront (A) even though it does not improve upload performance to S3.

How to eliminate wrong answers

Option A is wrong because Amazon CloudFront is a content delivery network (CDN) for caching and delivering frequently accessed objects to end users, not for accelerating uploads to an origin bucket; it does not optimize the upload path from clients to S3. Option C is wrong because Provisioned IOPS EBS volumes are block storage for EC2 instances, not a service for improving network upload performance to S3; attaching them to a transfer server would add unnecessary complexity and cost without addressing the geographic latency issue. Option D is wrong because Amazon EFS is a shared file system for EC2 instances, not a replacement for S3 as the upload destination; mounting it in each region would require a separate application architecture and does not accelerate direct uploads to the existing S3 bucket.

Full explanation →

427

MCQmedium

A production internal reporting portal runs continuously on EC2 with predictable usage for the next three years. The team wants a discount while retaining some instance-family flexibility. What should they buy? The architecture review board prefers a managed AWS-native control.

A.Spot Instances only

B.Dedicated Instances

C.Compute Savings Plan

D.S3 Intelligent-Tiering

AnswerC

Compute Savings Plans provide discounts for a committed spend while allowing flexibility across instance families, sizes, Regions, and compute services.

Why this answer

The Compute Savings Plan offers the largest discount (up to 66%) in exchange for a commitment to a consistent amount of compute usage (measured in $/hour) for a 1- or 3-year term, while retaining flexibility across instance families, regions, and compute services (EC2, Fargate, Lambda). This matches the requirement for a discount on predictable three-year usage with instance-family flexibility, and it is a managed AWS-native purchasing option.

Exam trap

The trap here is that candidates often confuse Savings Plans with Reserved Instances, assuming that Reserved Instances (which lock to a specific instance family) are the only way to get a discount, but the question explicitly requires instance-family flexibility, making the Compute Savings Plan the correct choice.

How to eliminate wrong answers

Option A is wrong because Spot Instances provide no discount commitment and can be interrupted with a 2-minute notice, making them unsuitable for a continuously running production portal. Option B is wrong because Dedicated Instances are a physical isolation model that incurs additional costs (per instance or per region) and does not inherently provide a discount; they are for compliance or licensing needs, not cost savings. Option D is wrong because S3 Intelligent-Tiering is a storage class for data with changing access patterns, not a compute pricing model, and it does not apply to EC2 instances or provide a discount on compute usage.

Full explanation →

428

MCQeasy

An application uses DynamoDB to store order status. Reads happen extremely frequently for the same few keys (for example, the most recent orders), and the team wants lower read latency without changing the table’s partition key design. Which AWS service best fits this requirement?

A.Amazon DAX (DynamoDB Accelerator) to cache frequently read items

B.Provision AWS WAF rules to reduce DynamoDB read latency caused by bots

C.Enable multi-region writes in DynamoDB Global Tables to speed up reads locally

D.Add more read capacity units to DynamoDB and avoid caching entirely

AnswerA

DAX is an in-memory caching layer specifically built for DynamoDB. It reduces read latency for hot keys by serving cached responses quickly while still reading from DynamoDB when a key is not cached (or when the cached entry expires). This avoids the need to redesign partition keys.

Why this answer

Amazon DAX (DynamoDB Accelerator) is an in-memory cache that sits between your application and DynamoDB, providing microsecond read latency for frequently accessed items. Because the workload involves extremely frequent reads of the same few keys (hot keys), DAX reduces the load on the DynamoDB table and delivers faster responses without requiring any changes to the partition key design.

Exam trap

The trap here is that candidates often confuse throughput scaling (adding RCUs) with latency optimization, or they mistakenly think Global Tables improve read latency within a single region, when in fact DAX is the only option that directly caches hot keys to reduce read latency without altering the table design.

How to eliminate wrong answers

Option B is wrong because AWS WAF is a web application firewall that protects against web exploits and bots at the HTTP/HTTPS layer, but it does not reduce DynamoDB read latency or cache data. Option C is wrong because DynamoDB Global Tables replicate data across regions for disaster recovery and local writes, but they do not reduce read latency for a single-region application and still require reads to go to the DynamoDB API without caching. Option D is wrong because simply adding more read capacity units increases throughput but does not lower latency for hot keys; the read requests still hit the DynamoDB service, and without caching, the same hot keys continue to experience the same base latency.

Full explanation →

429

MCQmedium

Developers for a B2B file exchange site need temporary elevated access to production resources for troubleshooting. The security team wants approvals, expiry, and audit logging. Which approach is best?

A.Attach AdministratorAccess permanently to every developer role

B.Create shared administrator access keys for the team

C.Disable CloudTrail during troubleshooting

D.Use IAM Identity Center permission sets with time-bound access processes and CloudTrail auditing

AnswerD

Federated access with permission sets and audited temporary assignments reduces standing privilege.

Why this answer

IAM Identity Center (formerly AWS SSO) enables time-bound permission sets that grant temporary elevated access with automatic expiry, satisfying the security team's requirements for approvals and audit logging via AWS CloudTrail. This approach follows the principle of least privilege by providing just-in-time access rather than permanent permissions, and all actions are recorded in CloudTrail for compliance.

Exam trap

The trap here is that candidates may think permanent AdministratorAccess (Option A) is acceptable for developers, failing to recognize that AWS explicitly requires temporary credentials with approval workflows for elevated access in secure architectures.

How to eliminate wrong answers

Option A is wrong because permanently attaching AdministratorAccess to every developer role violates the principle of least privilege, creates a standing privilege that cannot enforce expiry or approvals, and increases the attack surface. Option B is wrong because shared administrator access keys lack individual accountability, cannot enforce time-bound access or approvals, and bypass CloudTrail's ability to attribute actions to specific users. Option C is wrong because disabling CloudTrail during troubleshooting removes audit logging entirely, which directly contradicts the security team's requirement for audit logging and violates AWS security best practices.

Full explanation →

430

MCQmedium

A company uses Amazon RDS for a PostgreSQL database powering a customer-facing application. The application’s availability depends on fast database failover with minimal manual intervention. The RDS instance currently runs as a single-AZ deployment in one DB subnet group. Which change most directly meets the goal?

A.Create a read replica in a different Availability Zone and configure the application to fail over manually.

B.Enable Multi-AZ for the RDS DB instance so AWS manages a standby in another Availability Zone with automatic failover.

C.Switch the database to use EBS snapshots more frequently and restore in case of failure.

D.Pin the DB to a specific instance type with higher CPU credits to prevent CPU-related disconnects.

AnswerB

RDS Multi-AZ maintains a standby in another AZ and supports automatic failover, improving resilience and reducing manual work.

Why this answer

Enabling Multi-AZ for the RDS DB instance creates a synchronous standby replica in a different Availability Zone. AWS automatically handles failover to the standby with no manual intervention required, meeting the goal of fast database failover with minimal manual intervention.

Exam trap

The trap here is that candidates confuse a read replica (asynchronous, manual promotion) with Multi-AZ (synchronous, automatic failover), or assume that frequent backups or instance sizing improvements can substitute for a dedicated high-availability standby.

How to eliminate wrong answers

Option A is wrong because a read replica is asynchronous and intended for read scaling, not automatic failover; manual failover requires promoting the replica, which involves data loss risk and does not meet the 'minimal manual intervention' requirement. Option C is wrong because EBS snapshots are point-in-time backups that require manual restore and significant downtime, not fast automated failover. Option D is wrong because CPU credits apply to burstable instance types (e.g., T-series) and do not address database availability or failover; higher CPU credits prevent CPU throttling but do not provide a standby or automatic failover mechanism.

Full explanation →

431

MCQmedium

A web application for a mobile banking backend is behind an Application Load Balancer. The application must be protected from common SQL injection and cross-site scripting attacks with minimum operational overhead. What should the architect deploy?

A.Security groups on the application instances

B.AWS WAF associated with the Application Load Balancer

C.Network ACLs on the public subnets

D.AWS Shield Advanced only

AnswerB

AWS WAF can inspect HTTP requests and block common web exploits when associated with an ALB.

Why this answer

AWS WAF is a web application firewall that helps protect web applications from common web exploits like SQL injection and cross-site scripting (XSS). By associating an AWS WAF web ACL with the Application Load Balancer, you can filter and monitor HTTP(S) requests based on rules that block these attack patterns, all without managing any infrastructure. This provides the required protection with minimal operational overhead because AWS WAF is a fully managed service that integrates directly with ALB.

Exam trap

The trap here is that candidates often confuse network-level controls (security groups, NACLs) with application-layer protection, assuming that blocking ports or IP ranges is sufficient to stop web application attacks like SQL injection and XSS.

How to eliminate wrong answers

Option A is wrong because security groups act as a virtual firewall at the instance level, controlling inbound and outbound traffic based on IP addresses and ports; they cannot inspect application-layer payloads for SQL injection or XSS patterns. Option C is wrong because network ACLs are stateless, operate at the subnet level, and only filter traffic based on IP addresses, ports, and protocols — they have no capability to parse HTTP request bodies or headers for malicious content. Option D is wrong because AWS Shield Advanced provides DDoS protection and cost protection against scaling, but it does not include the application-layer rule sets needed to block SQL injection or XSS attacks; those require a web application firewall like AWS WAF.

Full explanation →

432

MCQmedium

Based on the exhibit, a faulty deployment corrupted production data at 10:30 UTC and the issue was discovered at 10:55 UTC. The team needs to recover the database to the last good state before the corruption. Which action should they take?

A.Restore the latest manual snapshot and accept data loss since the snapshot was taken overnight.

B.Use point-in-time restore to create a new database instance at 10:29 UTC, then switch the application to it.

C.Restart the database instance so the transaction log replays the failed migration cleanly.

D.Create a read replica and promote it, because replicas always contain the previous transaction state.

AnswerB

Point-in-time restore is the correct recovery method when automated backups are enabled and the team needs the database just before a known corruption event. Restoring to 10:29 UTC brings the data back to the last safe moment before the migration began. Creating a new instance first avoids modifying the damaged database until the restored copy is validated.

Why this answer

Option B is correct because Amazon RDS for MySQL (and other engines) supports point-in-time recovery (PITR), which allows you to restore a database to any second within the backup retention period, up to the last five minutes. By restoring to 10:29 UTC (one minute before the corruption at 10:30 UTC), the team can recover the database to its last good state with minimal data loss. After restoring, the application can be pointed to the new instance, avoiding the corrupted data.

Exam trap

The trap here is that candidates may confuse point-in-time restore with snapshot restore, assuming snapshots are the only recovery option, or incorrectly believe that restarting or promoting a replica can undo a logical corruption that has already been written to disk.

How to eliminate wrong answers

Option A is wrong because restoring the latest manual snapshot would revert the database to the time the snapshot was taken (likely overnight), causing significant data loss of all transactions between that snapshot and 10:30 UTC, which is unacceptable when a more precise recovery is available. Option C is wrong because restarting the database instance does not replay transaction logs to undo a faulty deployment; it only replays committed transactions from the binary logs to ensure consistency, which would reapply the corruption. Option D is wrong because a read replica contains the same data as the primary at the time of replication lag, not a previous transaction state; promoting it would still include the corrupted data if the corruption occurred before the replica caught up.

Full explanation →

433

MCQeasy

Based on the exhibit, which AWS feature should the team use to minimize network latency between EC2 instances that exchange messages very frequently?

A.Use a spread placement group to maximize instance separation across hardware.

B.Use a cluster placement group to place instances close together.

C.Use a partition placement group to distribute instances across many partitions.

D.Use multiple Auto Scaling groups to spread traffic across more subnets.

AnswerB

A cluster placement group is designed for workloads that need very low network latency and high packet-per-second performance between instances. The exhibit describes frequent small-message traffic and a need for the lowest possible latency, which makes a cluster placement group the right choice. It keeps instances physically close in the AWS network for faster communication.

Why this answer

A cluster placement group is the correct choice because it groups EC2 instances within a single Availability Zone with low-latency, high-bandwidth networking, achieving single-digit millisecond latency between instances. This is ideal for applications that exchange messages very frequently, as it minimizes network hops and maximizes throughput.

Exam trap

The trap here is that candidates may confuse placement group types, incorrectly assuming a spread or partition group reduces latency when they actually prioritize fault isolation over network performance.

How to eliminate wrong answers

Option A is wrong because a spread placement group maximizes instance separation across distinct hardware to reduce correlated failures, which increases network latency and is unsuitable for high-frequency messaging. Option C is wrong because a partition placement group distributes instances across logical partitions to isolate failures in large distributed systems, but it does not optimize for low latency between instances. Option D is wrong because using multiple Auto Scaling groups to spread traffic across more subnets increases network hops and latency, counteracting the goal of minimizing latency.

Full explanation →

434

MCQeasy

A travel booking site uses EC2 instances behind an ALB. CPU is consistently high during peak traffic, and request latency rises. What should be configured? The design must avoid adding custom operational scripts.

A.A VPC endpoint for CloudWatch only

B.Auto Scaling policy based on an appropriate CloudWatch metric

C.S3 Object Lock

D.Disable health checks

AnswerB

Auto Scaling adds capacity when load increases and removes it when load falls.

Why this answer

An Auto Scaling policy based on a CloudWatch metric like CPUUtilization or request latency directly addresses the high CPU and rising latency by automatically adding EC2 instances during peak traffic. This eliminates the need for custom scripts and ensures the application scales horizontally to maintain performance.

Exam trap

The trap here is that candidates might think a VPC endpoint (Option A) is needed for CloudWatch metrics, but CloudWatch metrics are already available without a VPC endpoint, and scaling requires an Auto Scaling policy, not just metric access.

How to eliminate wrong answers

Option A is wrong because a VPC endpoint for CloudWatch only enables private connectivity to CloudWatch, but does not provide any scaling or performance improvement for EC2 instances. Option C is wrong because S3 Object Lock is used for data retention and compliance, not for scaling compute resources or reducing latency. Option D is wrong because disabling health checks would cause the ALB to route traffic to unhealthy instances, worsening latency and availability issues.

Full explanation →

435

Multi-Selecthard

A private application in two private subnets must download objects from S3 and read parameters from Systems Manager Parameter Store without routing traffic through the public internet. Which two components should the architect use? The team wants the control to be enforceable during normal operations.

Select 2 answers

A.Interface VPC endpoint for Systems Manager

B.Internet gateway attached to the VPC

C.NAT gateway in each Availability Zone

D.Gateway VPC endpoint for Amazon S3

AnswersA, D

Systems Manager/Parameter Store access uses interface endpoints powered by AWS PrivateLink.

Why this answer

An Interface VPC endpoint (AWS PrivateLink) for Systems Manager allows the private subnets to securely access Systems Manager Parameter Store without traversing the internet, using private IP addresses within the VPC. This ensures traffic stays within the AWS network and is enforceable via VPC endpoint policies and security groups.

Exam trap

AWS often tests the distinction between Interface VPC endpoints (for services like Systems Manager, API Gateway, and Kinesis) and Gateway VPC endpoints (for S3 and DynamoDB), and candidates mistakenly assume a NAT gateway or internet gateway is required for private subnets to access these services.

Full explanation →

436

MCQmedium

A retail company lets developers deploy ECS services but they must never be able to modify IAM. The team currently uses an IAM user per developer with an admin-like policy, and several access keys have been leaked. You are asked to redesign access so that: (1) developers authenticate with temporary credentials, (2) they can create/update ECS services and related autoscaling resources, and (3) IAM changes are impossible even if a developer tries to attach new policies. Which design best meets all requirements?

A.Create an IAM user for each developer and keep the existing broad permissions, rotating keys every 90 days.

B.Use an IAM role that developers assume for deployments; attach least-privilege policies for ECS and Auto Scaling; and attach a permission boundary that does not allow iam:* actions, so additional inline or managed policies cannot grant IAM permissions.

C.Attach a policy that allows ecs:* and autoscaling:* and rely on developers to self-review that no IAM statements are added to their roles.

D.Create a single shared IAM role with full administrator permissions so developers can troubleshoot faster when deployments fail.

AnswerB

Assuming a role provides temporary credentials and removes long-lived keys. Least-privilege policies limit allowed actions, and a permission boundary caps the role's effective permissions so IAM actions cannot be gained through later policy changes.

Why this answer

Option B is correct because it uses an IAM role with temporary credentials (via AWS STS AssumeRole), satisfying the requirement to avoid long-lived access keys. The least-privilege policies for ECS and Auto Scaling grant only the necessary permissions, while the permission boundary explicitly denies iam:* actions, preventing developers from escalating privileges by attaching new policies. This combination ensures developers can deploy ECS services but cannot modify IAM in any way.

Exam trap

The trap here is that candidates may think a permission boundary is only for limiting resource access, but it is specifically designed to prevent privilege escalation by restricting IAM actions, which is the key to meeting the 'IAM changes impossible' requirement.

How to eliminate wrong answers

Option A is wrong because rotating keys every 90 days still uses long-lived access keys, which violates the requirement for temporary credentials and does not prevent key leakage or IAM modification. Option C is wrong because relying on developers to self-review is not a technical control; it allows them to potentially attach IAM policies to their roles, violating the requirement that IAM changes be impossible. Option D is wrong because a single shared IAM role with full administrator permissions violates least privilege, allows IAM modifications, and does not use temporary credentials per developer.

Full explanation →

437

MCQeasy

A system uses multiple AWS Lambda functions behind different event sources. One Lambda occasionally spikes and causes other Lambdas to be throttled due to shared concurrency limits. Which setting best helps ensure the important Lambda keeps capacity during spikes?

A.Increase the function timeout so throttling is less likely.

B.Set Reserved Concurrency for the important Lambda function.

C.Enable Provisioned Concurrency for every Lambda in the account.

D.Reduce the number of IAM policies attached to the Lambda roles.

AnswerB

Reserved concurrency allocates a guaranteed amount of concurrent execution capacity to a specific Lambda. This prevents other functions from consuming all concurrency and throttling the important one. If the reserved limit is reached, only that function is throttled, isolating impact.

Why this answer

Reserved Concurrency guarantees a set number of concurrent executions for a specific Lambda function, isolating it from the account-level concurrency pool. This ensures that the important function always has capacity available, even when other functions spike and consume the shared pool. Without this setting, all functions compete for the same 1,000 concurrent executions (default regional limit), and a spike in one can throttle others.

Exam trap

The trap here is that candidates confuse Provisioned Concurrency (which reduces cold starts) with Reserved Concurrency (which guarantees capacity), leading them to choose Option C, even though Provisioned Concurrency does not protect against throttling from other functions.

How to eliminate wrong answers

Option A is wrong because increasing the function timeout does not affect concurrency limits; it only extends how long a single invocation can run, which could actually increase the chance of throttling by holding concurrency slots longer. Option C is wrong because Provisioned Concurrency pre-warms environments to reduce cold starts but does not reserve capacity away from the shared pool; it still counts toward the account concurrency limit and does not prevent throttling of other functions. Option D is wrong because reducing IAM policies affects permissions, not concurrency limits; it has no impact on Lambda's throttling behavior.

Full explanation →

438

MCQhard

A DynamoDB table for a retail API has a partition key based only on the current date. Write throttling occurs during business hours. What is the best design change? The design must avoid adding custom operational scripts.

A.Use a higher-cardinality partition key that distributes writes across partitions

B.Create a global secondary index with the same date key

C.Reduce the table's write capacity

D.Move the table to S3 Glacier Instant Retrieval

AnswerA

A low-cardinality hot partition causes throttling; a better key spreads writes more evenly.

Why this answer

Using only the current date as a partition key creates a 'hot partition' because all writes for the day target a single partition, exceeding its 1,000 WCU limit. A higher-cardinality partition key (e.g., combining date with user ID or order ID) distributes writes evenly across partitions, eliminating throttling without custom scripts.

Exam trap

The trap here is that candidates often confuse GSIs as a solution for write hot spots, but GSIs only help with read patterns and do not change the base table's write distribution.

How to eliminate wrong answers

Option B is wrong because a global secondary index (GSI) inherits the same write capacity from the base table and does not redistribute the write load; it would still be throttled. Option C is wrong because reducing write capacity would worsen throttling, not solve it. Option D is wrong because S3 Glacier Instant Retrieval is for archival data with infrequent access, not for a DynamoDB table requiring low-latency writes for a retail API.

Full explanation →

439

MCQmedium

A media processing pipeline uses EBS-backed storage for an application that performs sustained random I/O with low latency requirements. During peak processing windows, the team sees increased read latency and occasional timeouts at the application layer. They need predictable, high IOPS performance rather than best-effort throughput. Which EBS configuration choice is most appropriate?

A.Use gp2 volumes and rely on burst credits to handle peak random I/O latency requirements.

B.Use io1 or io2 EBS volumes configured with a high provisioned IOPS value, and attach them to EBS-optimized instances.

C.Use standard HDD (st1) volumes, because they provide high throughput and will reduce latency automatically.

D.Use S3 instead of EBS for random I/O latency reduction without changing the application.

AnswerB

io1/io2 are designed for predictable, low-latency IOPS for sustained I/O workloads. By provisioning a sufficient IOPS level, you improve consistency during peak windows. Using EBS-optimized instances ensures the instance-to-EBS bandwidth and I/O performance are adequate so the instance does not become the bottleneck before EBS can deliver the provisioned IOPS.

Why this answer

Option B is correct because io1 and io2 volumes are provisioned IOPS SSD volumes designed for sustained, predictable high IOPS performance, which directly addresses the application's need for low-latency random I/O during peak loads. Attaching them to EBS-optimized instances ensures dedicated network bandwidth for EBS traffic, eliminating contention and preventing timeouts.

Exam trap

The trap here is that candidates may choose gp2 (Option A) assuming burst credits will cover peak loads, but they fail to recognize that sustained peak I/O exhausts credits, leading to performance degradation, whereas provisioned IOPS volumes guarantee consistent performance regardless of duration.

How to eliminate wrong answers

Option A is wrong because gp2 volumes rely on burst credits that can be exhausted during sustained peak I/O, leading to throttled performance and increased latency, not predictable high IOPS. Option C is wrong because st1 volumes are HDD-based and optimized for sequential throughput, not random I/O; they cannot provide low latency or high IOPS for random access patterns. Option D is wrong because S3 is an object storage service with higher latency and no support for low-latency random I/O; it cannot replace EBS for block-level access without significant application changes.

Full explanation →

440

MCQhard

A SaaS vendor’s automation account in Account B needs to assume a role in a customer account in Account A to read a specific S3 bucket and publish a deployment status file. The customer is worried about confused deputy attacks because multiple customers use the same vendor software. Which trust-policy design best meets the requirement?

A.Allow the Account B root principal to assume the role if the caller knows the role ARN.

B.Allow only the vendor’s specific IAM principal to assume the role and require a unique sts:ExternalId condition.

C.Attach a permissions boundary to the role so that the vendor cannot exceed the approved permissions.

D.Require MFA for the role assumption because it ensures only the vendor’s production automation can use the role.

AnswerB

This is the standard confused deputy protection pattern for third-party cross-account access. The trust policy limits who can call AssumeRole, and the sts:ExternalId condition lets the customer require a customer-specific value that the vendor must supply. That prevents another customer or a malicious party from reusing the same role ARN successfully.

Why this answer

Option B is correct because the `sts:ExternalId` condition is specifically designed to prevent the confused deputy problem in cross-account role assumptions. By requiring a unique external ID that only the customer knows, the customer ensures that the vendor's automation can only assume the role when acting on behalf of that specific customer, even if multiple customers use the same vendor software.

Exam trap

The trap here is that candidates often confuse MFA or permissions boundaries as solutions for the confused deputy problem, when in fact only the `sts:ExternalId` condition directly mitigates this specific threat by providing a customer-specific identifier in the trust policy.

How to eliminate wrong answers

Option A is wrong because allowing the Account B root principal to assume the role based solely on knowing the role ARN provides no protection against confused deputy attacks; any IAM entity in Account B (including compromised or malicious principals) could assume the role. Option C is wrong because a permissions boundary limits the maximum permissions the role can grant, but it does not address the confused deputy threat; it controls scope, not identity verification. Option D is wrong because requiring MFA ensures the caller is authenticated via a second factor, but it does not prevent a confused deputy scenario where the vendor's automation could be tricked into assuming the role on behalf of a different customer; MFA does not provide a customer-specific identifier.

Full explanation →

441

MCQeasy

An order system receives events and uses a Lambda function to write each order into a database. During traffic spikes, the database sometimes throttles, and Lambda retries lead to occasional message loss in the event flow. The team wants buffering, automatic retries, and a way to isolate messages that repeatedly fail so they can be inspected later. What design change best meets this need?

A.Send events directly from EventBridge to Lambda without any queue to simplify the flow.

B.Use Amazon SQS as a buffer between the event source and Lambda, with an SQS dead-letter queue (DLQ).

C.Use SNS fan-out to multiple Lambda functions, but keep no retry logic and no DLQ.

D.Store events in an S3 bucket and trigger Lambda immediately after each upload, without using DLQs.

AnswerB

SQS buffers bursts, supports retries via visibility timeouts, and DLQs capture messages that fail repeatedly for later review.

Why this answer

Option B is correct because Amazon SQS acts as a durable buffer between the event source and Lambda, absorbing traffic spikes and decoupling the producer from the consumer. The SQS dead-letter queue (DLQ) automatically captures messages that exceed the configured maximum retries, allowing the team to inspect and reprocess them later without loss. This design provides the required buffering, automatic retries via the Lambda event source mapping, and isolation of repeatedly failing messages.

Exam trap

The trap here is that candidates often assume a direct event-driven flow (like EventBridge to Lambda) is simpler and sufficient, but they overlook the need for buffering and a DLQ to handle throttling and isolate persistent failures, which SQS explicitly provides.

How to eliminate wrong answers

Option A is wrong because sending events directly from EventBridge to Lambda without a queue removes any buffering, so during traffic spikes Lambda will be overwhelmed and retries can still lead to message loss. Option C is wrong because SNS fan-out to multiple Lambda functions with no retry logic and no DLQ provides no buffering, no automatic retries, and no mechanism to isolate failed messages, so throttling and message loss remain unaddressed. Option D is wrong because storing events in S3 and triggering Lambda immediately after each upload does not provide a built-in retry mechanism for Lambda failures, and S3 does not offer a dead-letter queue to isolate repeatedly failing messages; this approach also introduces latency and complexity for real-time order processing.

Full explanation →

442

Multi-Selectmedium

An application team sees that a fleet of EC2 instances averages 15% CPU utilization and has no memory pressure. The service must keep running continuously, but the team wants to lower cost with minimal risk. Which two actions should they take first? Select two.

Select 2 answers

A.Use Compute Optimizer recommendations to identify a smaller instance type.

B.Update the launch template or Auto Scaling group to the smaller instance type after testing.

C.Move the workload to Dedicated Hosts.

D.Add a second NAT Gateway.

E.Enable Provisioned Concurrency for the EC2 workload.

AnswersA, B

Compute Optimizer analyzes historical utilization and suggests instance sizes that better match actual demand. That makes it a low-risk first step for finding a cheaper right-sized option before changing production capacity.

Why this answer

AWS Compute Optimizer analyzes historical utilization metrics (CPU, memory, network) and provides rightsizing recommendations. Since the fleet averages only 15% CPU with no memory pressure, downsizing to a smaller instance type reduces cost without impacting performance. This is the first step to identify the optimal instance family and size.

Exam trap

The trap here is that candidates may confuse Provisioned Concurrency (Lambda-only) with EC2 features, or assume Dedicated Hosts are cost-saving when they actually increase cost for most workloads.

Full explanation →

443

MCQmedium

A.Create a read replica in a different Availability Zone and configure the application to fail over manually.

B.Enable Multi-AZ for the RDS DB instance so AWS manages a standby in another Availability Zone with automatic failover.

C.Switch the database to use EBS snapshots more frequently and restore in case of failure.

D.Pin the DB to a specific instance type with higher CPU credits to prevent CPU-related disconnects.

AnswerB

RDS Multi-AZ maintains a standby in another AZ and supports automatic failover, improving resilience and reducing manual work.

Why this answer

Enabling Multi-AZ for the RDS DB instance creates a synchronous standby replica in a different Availability Zone. AWS automatically handles failover to the standby with no manual intervention required, which directly meets the goal of fast database failover with minimal manual intervention.

Exam trap

The trap here is that candidates often confuse read replicas (which are for read scaling and manual promotion) with Multi-AZ (which provides automatic failover and high availability), leading them to choose Option A incorrectly.

How to eliminate wrong answers

Option A is wrong because a read replica is asynchronous and not designed for automatic failover; manual failover requires application changes and introduces significant downtime. Option C is wrong because restoring from EBS snapshots is a slow, manual process that can take minutes to hours, far from the fast, automated failover required. Option D is wrong because CPU credits (relevant to burstable instances like T-series) do not address database availability or failover; they only prevent CPU throttling, not instance or AZ failures.

Full explanation →

444

MCQmedium

A security analyst needs to let an external vendor (AWS account 555566667777) read data from a set of internal resources in your AWS account. You created an IAM role called VendorReadRole with a policy that allows the required API calls. However, when the vendor tries to access, CloudTrail shows the call fails at AssumeRole with: "Not authorized to perform: sts:AssumeRole". What is the most appropriate fix?

A.Add an allow statement for the vendor in the role’s trust policy to permit sts:AssumeRole from the vendor account (and include any required ExternalId condition).

B.Attach the same allow policy to the vendor account’s existing IAM user so the user can call sts:AssumeRole directly into your role.

C.Replace the AssumeRole call with GetCallerIdentity so the vendor can infer permissions without assuming the role.

D.Enable MFA on the vendor’s IAM user and require MFA for your role using condition keys in the permissions policy.

AnswerA

AssumeRole is blocked unless the role trust policy allows the vendor principal. The role’s permissions policy alone cannot permit assumption.

Why this answer

The error 'Not authorized to perform: sts:AssumeRole' indicates that the role's trust policy does not grant the external AWS account (555566667777) permission to assume the role. The trust policy must include an Allow statement with the sts:AssumeRole action, specifying the external account as the principal, and optionally an ExternalId condition to prevent the confused deputy problem. Without this trust policy configuration, even if the permissions policy allows the required API calls, the vendor cannot assume the role.

Exam trap

The trap here is that candidates often confuse the role's permissions policy (which defines what actions the role can perform) with the trust policy (which defines who can assume the role), leading them to incorrectly modify the permissions policy or the vendor's IAM user instead of the trust policy.

How to eliminate wrong answers

Option B is wrong because attaching the allow policy to the vendor account's IAM user does not grant the user permission to call sts:AssumeRole; the trust policy on the role must explicitly allow the external account (or the user) to assume the role, and the policy attached to the user only controls what the user can do after assuming the role, not the assumption itself. Option C is wrong because GetCallerIdentity only returns details about the caller's identity and does not grant any permissions to read data; it cannot substitute for assuming a role to access resources. Option D is wrong because enabling MFA on the vendor's IAM user and requiring MFA in the role's permissions policy does not address the missing trust policy; the role's trust policy must first allow the vendor to assume the role, and MFA conditions are additional constraints, not a fix for the fundamental authorization failure.

Full explanation →

445

MCQmedium

A public API for a e-learning platform is deployed on API Gateway. Clients must authenticate with standards-based tokens issued by an external OpenID Connect provider. Which authorization mechanism should be used?

A.A VPC endpoint policy

B.IAM authorization for all internet users

C.API keys only

D.JWT authorizer configured for the OpenID Connect issuer

AnswerD

A JWT authorizer validates tokens from a trusted OIDC issuer with low operational overhead.

Why this answer

Option D is correct because API Gateway supports JWT authorizers that validate JSON Web Tokens (JWTs) issued by an external OpenID Connect (OIDC) provider. This allows the API to authenticate clients using standards-based tokens without managing a custom Lambda authorizer, and it directly integrates with the OIDC issuer's JWKS endpoint to verify token signatures.

Exam trap

The trap here is that candidates often confuse API keys (which only identify the caller for usage plans) with authentication mechanisms, or assume IAM authorization can validate third-party OIDC tokens, when in fact IAM authorization requires AWS credentials, not external tokens.

How to eliminate wrong answers

Option A is wrong because a VPC endpoint policy controls access to API Gateway from within a VPC, not authentication for internet-based clients using OIDC tokens. Option B is wrong because IAM authorization is designed for AWS-authenticated principals (e.g., IAM users, roles) and does not validate tokens from external OpenID Connect providers; it uses AWS Signature Version 4, not OIDC tokens. Option C is wrong because API keys only provide simple rate limiting and usage plans, not authentication or authorization; they do not validate the identity of the caller or support OIDC token verification.

Full explanation →

446

MCQmedium

A media archive requires consistent high IOPS for a transactional database on EC2. Which EBS volume type is most suitable?

A.Provisioned IOPS SSD such as io2

B.st1 Throughput Optimized HDD

C.Instance store only

D.sc1 Cold HDD

AnswerA

io2 is designed for business-critical workloads requiring consistent high IOPS and durability.

Why this answer

The scenario requires consistent high IOPS for a transactional database, which demands low-latency, predictable performance. Provisioned IOPS SSD volumes like io2 are designed specifically for such workloads, offering up to 256,000 IOPS per volume with 99.999% durability, making them the most suitable choice for consistent high IOPS.

Exam trap

The trap here is that candidates often confuse throughput-optimized HDD (st1) with IOPS-focused workloads, mistakenly thinking high throughput equals high IOPS, but IOPS measures random access operations while throughput measures sequential data transfer, and transactional databases require low-latency random I/O.

How to eliminate wrong answers

Option B (st1 Throughput Optimized HDD) is wrong because it is a throughput-optimized HDD volume designed for large, sequential workloads like big data and log processing, not for transactional databases requiring consistent high IOPS and low latency. Option C (Instance store only) is wrong because instance store volumes provide ephemeral storage that is not persistent; data is lost if the instance stops or terminates, making it unsuitable for a transactional database that requires data durability. Option D (sc1 Cold HDD) is wrong because it is a cold HDD volume optimized for infrequently accessed data with the lowest cost, offering very low IOPS and throughput, which cannot meet the consistent high IOPS demands of a transactional database.

Full explanation →

447

MCQmedium

A high-volume telemetry pipeline writes streaming click events that must be processed by multiple independent consumers. Which service is most appropriate? The design must avoid adding custom operational scripts.

A.Amazon Kinesis Data Streams

B.AWS DataSync

C.Amazon EBS

D.Amazon Route 53

AnswerA

Kinesis Data Streams supports high-throughput event ingestion with multiple consumers reading from the stream.

Why this answer

Amazon Kinesis Data Streams is designed for real-time streaming of high-volume data, such as click events, and allows multiple independent consumers to process the same stream concurrently via enhanced fan-out or shared throughput. It provides durable, ordered data retention and integrates with AWS Lambda, Kinesis Data Analytics, and Kinesis Data Firehose without requiring custom operational scripts.

Exam trap

The trap here is that candidates may confuse AWS DataSync or EBS as viable for streaming data, but DataSync is for batch file transfers and EBS is for block storage, neither supporting real-time, multi-consumer event processing.

How to eliminate wrong answers

Option B (AWS DataSync) is wrong because it is a data transfer service for moving large datasets between on-premises storage and AWS services (e.g., S3, EFS) over the internet or Direct Connect, not a real-time streaming pipeline for multiple consumers. Option C (Amazon EBS) is wrong because it provides block-level storage volumes for EC2 instances, not a streaming data ingestion or processing service, and cannot support multiple independent consumers reading the same event stream. Option D (Amazon Route 53) is wrong because it is a DNS web service for domain name resolution and traffic routing, not a data streaming or processing service.

Full explanation →

448

Multi-Selectmedium

A media company is designing a high-performance architecture to serve video content to users worldwide. The solution must minimize latency for end users and reduce the load on the origin servers. The video files are stored in an Amazon S3 bucket. Which three options should be combined to meet these requirements? (Choose three.)

Select 3 answers

.Use Amazon CloudFront as a content delivery network (CDN) with the S3 bucket as the origin.

.Enable S3 Transfer Acceleration on the bucket to speed up uploads.

.Configure CloudFront to use Regional Edge Caches to improve cache hit ratios for less popular content.

.Use Amazon ElastiCache for Memcached to cache video metadata at the edge.

.Enable S3 default encryption using AWS KMS to improve data transfer performance.

.Implement origin shield in CloudFront to reduce the number of requests sent to the S3 origin.

Why this answer

Amazon CloudFront as a CDN with the S3 bucket as the origin minimizes latency by caching video content at edge locations worldwide, serving users from the nearest edge. This reduces load on the origin S3 bucket by handling requests at the edge. Regional Edge Caches further improve cache hit ratios for less popular content by caching it at regional locations, reducing the need to fetch from the origin.

Origin shield in CloudFront consolidates requests from multiple edge locations into a single request to the S3 origin, significantly reducing the number of direct requests and lowering origin load.

Exam trap

The trap here is that candidates may confuse S3 Transfer Acceleration (which optimizes uploads) with CloudFront (which optimizes downloads), or think that ElastiCache can be used as a CDN for video content, when it is actually an in-memory cache for application data, not for serving static files at the edge.

Full explanation →

449

MCQmedium

A Lambda function for a healthcare document service needs to read a database password. The password must rotate automatically every 30 days and should not be stored in environment variables. Which service should be used?

A.A KMS-encrypted Lambda environment variable

B.An encrypted object in Amazon S3

C.AWS Systems Manager Parameter Store SecureString without automation

D.AWS Secrets Manager with rotation enabled

AnswerD

Secrets Manager stores secrets securely and supports automatic rotation using a rotation Lambda function.

Why this answer

AWS Secrets Manager is the correct choice because it is designed specifically for storing and automatically rotating database credentials. It supports native rotation for Amazon RDS, Redshift, and DocumentDB with a built-in Lambda rotation function, and it can rotate secrets on a schedule (e.g., every 30 days) without storing the password in environment variables. This meets the healthcare document service's requirement for automatic rotation and secure storage.

Exam trap

The trap here is that candidates often confuse Systems Manager Parameter Store (which can store SecureStrings) with Secrets Manager, but Parameter Store lacks automatic rotation, making it unsuitable for a 30-day rotation requirement without additional custom automation.

How to eliminate wrong answers

Option A is wrong because storing a KMS-encrypted password in a Lambda environment variable does not support automatic rotation; you would have to manually update the environment variable and redeploy the function. Option B is wrong because an encrypted object in Amazon S3 is a static storage mechanism with no built-in rotation capability, and accessing it requires managing S3 permissions and decryption logic manually. Option C is wrong because AWS Systems Manager Parameter Store SecureString without automation can store a secure password but lacks native rotation scheduling; you would need to build a custom rotation solution, whereas Secrets Manager provides this out of the box.

Full explanation →

450

Multi-Selecthard

A serverless checkout API uses AWS Lambda behind API Gateway. Every weekday at 09:00 UTC, marketing triggers a predictable surge. The first few minutes after each surge show cold-start latency, but traffic volume is forecastable and the business wants stable p95 latency. Which two changes should the team implement? Select two.

Select 2 answers

A.Publish a Lambda version and attach provisioned concurrency to an alias that points to that version.

B.Use Application Auto Scaling scheduled actions to raise provisioned concurrency before 09:00 UTC and lower it afterward.

C.Increase the Lambda timeout so the function has more time to initialize during the spike.

D.Double the memory size during the spike without changing the concurrency model.

E.Move the function into more Availability Zones so the platform can spread cold starts across regions.

AnswersA, B

Provisioned concurrency keeps execution environments initialized and ready to serve requests, which is the correct way to reduce cold starts. Using an alias tied to a published version is the standard deployment pattern for managing that setting safely. This directly improves p95 latency during predictable bursts.

Why this answer

Provisioned concurrency keeps a specified number of Lambda execution environments initialized and ready to respond immediately, eliminating cold starts for predictable traffic patterns. By publishing a Lambda version and attaching provisioned concurrency to an alias pointing to that version, the team ensures that the surge at 09:00 UTC is handled without cold-start latency, stabilizing p95 latency.

Exam trap

The trap here is that candidates often confuse increasing Lambda timeout or memory with solving cold-start latency, but these settings do not pre-warm execution environments; only provisioned concurrency (and optionally scheduled scaling) directly eliminates cold starts for predictable surges.

Full explanation →

SAA-C03 (SAA-C03) — Questions 376–450