Knowledge + Practice

SAA-C03 (SAA-C03) — Questions 1–75

1040 questions total · 14pages · All types, answers revealed

Take a mock exam Exam hub

Page 1 of 14

1

Multi-Selectmedium

A production Amazon RDS database already has automated backups enabled. At 10:45 UTC, the team discovers that a faulty migration corrupted rows in a table at 10:30 UTC. The business wants the database restored to exactly the state it had at 10:30 UTC with minimal risk. Which two actions should the team take? Select two.

Select 2 answers

A.Restore the database to a new instance using point-in-time restore for 10:30 UTC.

B.Validate the restored database, then switch the application endpoint to the restored database.

C.Restore the most recent manual snapshot because it will include the 10:30 UTC state.

D.Overwrite the existing database instance in place so the application keeps the same storage volume.

E.Wait for automated backups to complete again, then replay the migration to restore the missing rows.

AnswersA, B

Correct. Point-in-time restore is the RDS recovery method for returning to a specific moment before the corruption occurred. Restoring to a new instance gives the team a clean database copy at the desired timestamp without risking the current production instance.

Why this answer

Option A is correct because Amazon RDS Point-in-Time Restore (PITR) allows you to restore a DB instance to any second within the backup retention period, including 10:30 UTC. This uses automated backups and transaction logs to reconstruct the exact database state at that specific time, providing a precise recovery point with minimal data loss.

Exam trap

The trap here is that candidates may think manual snapshots can be used for point-in-time recovery, but they only capture a single moment and cannot roll forward to a specific time like automated backups can.

Full explanation →

2

MCQeasy

Based on the exhibit, the web team wants the application to continue serving traffic if one Availability Zone fails. Which change best meets the requirement with the least operational overhead?

A.Increase desired capacity to 3 in the same Availability Zone so one extra instance is always available.

B.Add the unused subnet in us-east-1b to the Auto Scaling group so instances can launch in both AZs.

C.Replace the Application Load Balancer with a Network Load Balancer because it will automatically keep the app online.

D.Move the application to a larger EC2 instance type so a single server can handle the full workload.

AnswerB

Placing the Auto Scaling group in at least two Availability Zones allows AWS to distribute and replace instances across zones. Because the Application Load Balancer can route only to healthy targets, adding the second subnet is the lowest-complexity change that gives the application resilience to a full AZ outage.

Why this answer

Option B is correct because it adds the unused subnet in us-east-1b to the Auto Scaling group, enabling EC2 instances to launch across two Availability Zones. This provides fault isolation: if one AZ fails, the ALB can route traffic to healthy instances in the other AZ. The change requires only a configuration update to the Auto Scaling group, minimizing operational overhead while meeting the high-availability requirement.

Exam trap

The trap here is that candidates often assume increasing instance count in a single AZ or using a different load balancer type alone provides high availability, but true resilience requires distributing instances across multiple Availability Zones.

How to eliminate wrong answers

Option A is wrong because increasing desired capacity to 3 in the same Availability Zone does not protect against an AZ failure; all instances remain in a single AZ, so if that AZ fails, all traffic is lost. Option C is wrong because replacing the Application Load Balancer with a Network Load Balancer does not inherently provide cross-AZ failover; the NLB still requires instances in multiple AZs to maintain availability, and the change introduces unnecessary operational overhead. Option D is wrong because moving to a larger EC2 instance type does not eliminate the single point of failure; if the AZ hosting that single instance fails, the application goes down regardless of instance size.

Full explanation →

3

MCQmedium

You deploy a Web ACL with an AWS WAF rate-based rule intended to limit abusive traffic to your API. After the deployment, attackers still reach the backend service. ALB access logs show requests arrive at the ALB, but WAF logs indicate the Web ACL is not evaluating those requests. Which change most likely fixes the issue?

A.Associate the Web ACL with the Application Load Balancer resource ARN so WAF evaluates requests sent to that ALB.

B.Add a security group rule that drops inbound traffic from the attacker IP range at the instances' ENIs.

C.Create a target group stickiness policy so WAF can count requests consistently per client IP.

D.Enable AWS Shield Advanced but keep the Web ACL unattached because Shield automatically applies rate limiting.

AnswerA

For an ALB, the Web ACL must be associated with the load balancer resource itself. If it is not attached to the ALB, WAF will not inspect those requests.

Why this answer

A Web ACL must be explicitly associated with a resource (such as an ALB) for AWS WAF to evaluate incoming requests. In this scenario, the Web ACL was deployed but not associated with the ALB resource ARN, so WAF never inspected the traffic. Associating the Web ACL with the ALB ensures that all requests to the ALB are evaluated by the rate-based rule before reaching the backend.

Exam trap

The trap here is that candidates assume deploying a Web ACL automatically applies it to all resources in the account, when in fact it must be explicitly associated with each resource ARN to take effect.

How to eliminate wrong answers

Option B is wrong because security group rules operate at the instance ENI level and can block traffic, but they do not integrate with WAF rate-based rules or provide application-layer rate limiting; they also cannot be dynamically updated by WAF. Option C is wrong because target group stickiness (sticky sessions) ensures a client is routed to the same target, but it does not cause WAF to evaluate requests or count them per client IP; WAF evaluation is independent of load balancer routing policies. Option D is wrong because AWS Shield Advanced provides DDoS protection and automatic application-layer mitigation, but it does not replace the need for an associated Web ACL to enforce custom rate-based rules; Shield Advanced works alongside WAF, not as a substitute for Web ACL association.

Full explanation →

4

Multi-Selectmedium

A startup runs a 24/7 web tier on Amazon EC2 with a stable baseline of 8 instances and a nightly analytics batch job that can resume from checkpoints if interrupted. The company wants to minimize monthly compute cost without hurting the always-on web tier. Which two actions should it take? Select two.

Select 2 answers

A.Buy a Compute Savings Plan for the steady web tier baseline.

B.Buy Standard Reserved Instances only for the nightly analytics batch job.

C.Run the batch job on Spot Instances and checkpoint progress frequently.

D.Move the entire workload to On-Demand Instances for maximum flexibility.

E.Use Dedicated Hosts for the batch job so the fleet is isolated.

AnswersA, C

A Compute Savings Plan reduces cost for the predictable baseline while preserving flexibility across instance families and Regions. That fits a 24/7 web tier that is expected to run continuously. It is cheaper than On-Demand for the committed portion and avoids overcommitting to a specific instance family.

Why this answer

A Compute Savings Plan offers the largest discount (up to 66%) in exchange for a 1- or 3-year hourly spend commitment, and it automatically applies to any EC2 instance family, size, or region. For the stable 8-instance web tier that runs 24/7, this plan provides significant cost savings while maintaining full flexibility to change instance types or even move to containers or Lambda, without affecting the always-on requirement.

Exam trap

The trap here is that candidates often assume Reserved Instances are always the best choice for any steady workload, but for a part-time batch job, a Savings Plan or Spot is more cost-effective, and they may overlook that Spot Instances with checkpointing are ideal for fault-tolerant, interruptible workloads.

Full explanation →

5

MCQmedium

A read-heavy document portal repeatedly queries the same product catalogue data from DynamoDB with millisecond latency requirements. Which service can reduce read latency and table load? The architecture review board prefers a managed AWS-native control.

A.Amazon Kinesis Data Firehose

B.S3 Transfer Acceleration

C.DynamoDB Accelerator (DAX)

D.AWS Glue Data Catalog

AnswerC

DAX is an in-memory cache for DynamoDB that reduces read latency for suitable access patterns.

Why this answer

DynamoDB Accelerator (DAX) is an in-memory cache for DynamoDB that delivers up to 10x read performance improvement, reducing read latency to microseconds for repeated queries. It offloads read traffic from the DynamoDB table, lowering consumed read capacity units and table load, making it ideal for read-heavy workloads with millisecond latency requirements. As a fully managed, AWS-native service, DAX aligns with the architecture review board's preference for managed controls.

Exam trap

The trap here is that candidates may confuse DAX with ElastiCache (which is also a caching service but not DynamoDB-native) or assume that any AWS caching service works interchangeably, but DAX is the only managed, DynamoDB-specific cache that integrates directly with the DynamoDB API without application code changes.

How to eliminate wrong answers

Option A is wrong because Amazon Kinesis Data Firehose is a streaming data ingestion service for loading data into data stores and analytics tools, not a caching layer for DynamoDB reads; it cannot reduce read latency or table load for repeated queries. Option B is wrong because S3 Transfer Acceleration speeds up uploads and downloads to/from S3 over long distances using AWS edge locations, but it does not cache DynamoDB data or reduce read latency for DynamoDB queries. Option D is wrong because AWS Glue Data Catalog is a metadata repository for ETL jobs and data lake schemas, not a caching service for DynamoDB reads; it has no impact on DynamoDB read latency or table load.

Full explanation →

6

MCQmedium

A trading dashboard uses Aurora MySQL. The company wants fast cross-Region disaster recovery with low RPO. Which architecture should be considered? The architecture review board prefers a managed AWS-native control.

A.A single-AZ Aurora cluster

B.Aurora Global Database

C.Manual snapshots copied monthly

D.An ElastiCache Redis replica

AnswerB

Aurora Global Database replicates with low latency to secondary Regions and supports faster disaster recovery than snapshot-only approaches.

Why this answer

Aurora Global Database is the correct choice because it provides a managed, cross-Region disaster recovery solution with a Recovery Point Objective (RPO) of less than 1 second and a Recovery Time Objective (RTO) of typically less than 1 minute. It uses storage-based replication to keep a secondary cluster in another AWS Region up to date with minimal latency, meeting the low RPO requirement without manual intervention.

Exam trap

The trap here is that candidates may confuse cross-Region replication with multi-AZ deployments, or assume that manual snapshots or caching solutions can meet low RPO requirements, when only a managed global database service like Aurora Global Database provides the necessary sub-second RPO and automated failover.

How to eliminate wrong answers

Option A is wrong because a single-AZ Aurora cluster lacks any cross-Region replication or failover capability, offering no disaster recovery across Regions and resulting in an unacceptably high RPO if the primary Region fails. Option C is wrong because manual snapshots copied monthly provide an RPO of up to one month, which is far too high for the low RPO requirement, and the process is not automated or managed natively for rapid recovery. Option D is wrong because ElastiCache Redis is an in-memory cache, not a persistent database, and cannot serve as a primary data store for the trading dashboard's transactional data; it also lacks cross-Region replication for disaster recovery.

Full explanation →

7

MCQmedium

Company A stores encrypted log files in its S3 bucket using SSE-KMS with a customer-managed KMS key. A partner application in Company B uploads objects into Company A's bucket using an IAM role in Company B. Uploads fail with an error indicating KMS access is denied (kms:Encrypt not authorized). Neither the partner IAM policy nor the S3 bucket policy currently mentions KMS. What is the most secure and correct change to allow cross-account uploads to succeed?

A.In Company A's KMS key policy, allow Company B's partner role principal to use the key for kms:Encrypt, kms:GenerateDataKey, and kms:DescribeKey, and also add a matching IAM policy in Company B that grants the partner role those same KMS actions on Company A's key ARN, constrained to the target S3 bucket context when possible.

B.In Company B's IAM policy, allow kms:Encrypt on Company A's KMS key ARN, without changing Company A's key policy.

C.Create a new KMS key in Company B and configure Company A's S3 bucket to use that key for SSE-KMS.

D.Disable key policy restrictions by setting the KMS key to enabled and removing all policy statements so that encryption automatically works for any principal.

AnswerA

Cross-account SSE-KMS requires both the KMS key policy in the key owner account and an IAM policy in the caller account to allow the required KMS actions. Scoping the permissions to the specific bucket or encryption context reduces blast radius.

Why this answer

For cross-account SSE-KMS uploads, the KMS key policy must explicitly grant the external IAM role principal the required KMS actions (kms:Encrypt, kms:GenerateDataKey, and kms:DescribeKey). Additionally, the partner account's IAM policy must also allow those same actions on the key ARN. This dual-permission model is required because KMS does not implicitly trust IAM policies in the key owner's account for cross-account access; the key policy is the authoritative gatekeeper.

Option A correctly implements both sides, and constraining to the target S3 bucket context (via kms:ViaService or kms:EncryptionContext) adds a security best practice.

Exam trap

The trap here is that candidates assume an IAM policy in the partner account alone is sufficient for cross-account KMS access, forgetting that KMS key policies are the definitive authorization mechanism for external principals.

How to eliminate wrong answers

Option B is wrong because KMS key policies are the primary access control for cross-account use; an IAM policy in Company B alone is insufficient without the key policy granting access to the external principal. Option C is wrong because using a KMS key from Company B would require Company A's S3 bucket to trust that key for SSE-KMS, which is not supported for cross-account uploads—the bucket must use its own key to decrypt. Option D is wrong because removing all policy statements from the KMS key disables all access control, making the key effectively unusable and insecure; KMS requires at least a default key policy to allow the root account, and removing it would break all encryption operations.

Full explanation →

8

MCQmedium

A payments platform requires disaster recovery across Regions. Requirements: RPO of 15 minutes and RTO of about 1 hour. The business cannot afford full duplicate capacity in both Regions all the time, but the team wants automated readiness so failover is mostly operationally guided rather than a slow rebuild. Which DR strategy is the best fit?

A.Backup and restore only, relying on scheduled snapshots and manual restores during incidents.

B.Pilot light, keeping only minimal infrastructure in the secondary Region and starting full services after failover.

C.Warm standby, keeping core infrastructure and a partially provisioned environment ready in the secondary Region with frequent data replication.

D.Active/active, routing production traffic to both Regions continuously and accepting dual-region complexity.

AnswerC

Warm standby balances cost and readiness by keeping enough capacity and services running to shorten recovery time while meeting RPO needs.

Why this answer

Warm standby is the best fit because it maintains a partially provisioned environment in the secondary Region with core infrastructure (e.g., a smaller EC2 Auto Scaling group, a standby database with synchronous or asynchronous replication) and frequent data replication, enabling an RPO of 15 minutes and an RTO of about 1 hour. This approach balances cost and automated readiness, as the team can scale up the standby environment during failover without the expense of full duplicate capacity, while still meeting the recovery objectives through automated replication (e.g., Amazon RDS Multi-AZ cross-Region or DynamoDB global tables).

Exam trap

The trap here is that candidates often confuse pilot light with warm standby, assuming minimal infrastructure is sufficient for a 1-hour RTO, but pilot light's need to provision and configure full services after failover typically pushes RTO beyond 1 hour, whereas warm standby's partially provisioned environment allows faster scaling.

How to eliminate wrong answers

Option A is wrong because backup and restore with scheduled snapshots cannot achieve an RPO of 15 minutes (snapshots are typically taken every few hours) and manual restores would far exceed the 1-hour RTO due to data transfer and restoration time. Option B is wrong because pilot light keeps only minimal infrastructure (e.g., a small database and no application servers) and requires starting full services after failover, which would likely exceed the 1-hour RTO due to provisioning and configuration time. Option D is wrong because active/active requires full duplicate capacity in both Regions all the time, which the business explicitly cannot afford, and introduces unnecessary complexity and cost for a scenario where failover is only occasional.

Full explanation →

9

MCQmedium

A marketing site runs on x86 EC2 instances and uses open-source software with no architecture-specific licensing restriction. What should be evaluated to reduce compute cost?

A.Cross-Region data replication for all data

B.io2 Block Express volumes for all instances

C.AWS Graviton-based instances after performance testing

D.Dedicated Hosts by default

AnswerC

Graviton instances often provide better price performance for compatible workloads.

Why this answer

Option C is correct because AWS Graviton-based instances (ARM architecture) offer up to 40% better price-performance compared to comparable x86 instances for many workloads. Since the marketing site uses open-source software with no architecture-specific licensing restrictions, migrating to Graviton after performance testing can significantly reduce compute costs without sacrificing performance.

Exam trap

The trap here is that candidates assume Dedicated Hosts (Option D) always reduce costs due to 'dedicated' implying efficiency, but they actually increase costs unless specific licensing requirements (e.g., Windows Server or SQL Server) mandate physical isolation.

How to eliminate wrong answers

Option A is wrong because cross-Region data replication increases storage and data transfer costs, not reduces compute costs; it is a disaster recovery strategy, not a cost-optimization technique for compute. Option B is wrong because io2 Block Express volumes are high-performance, high-cost SSDs designed for latency-sensitive workloads, not for reducing compute costs; they would increase overall costs without addressing compute efficiency. Option D is wrong because Dedicated Hosts incur additional hourly charges for physical server isolation and are typically used for licensing or compliance requirements, not for cost reduction; they would increase compute costs compared to shared tenancy.

Full explanation →

10

MCQeasy

An internal worker consumes messages from an Amazon SQS queue. Occasionally, a message fails validation in the worker (for example, missing required fields). Reprocessing the same bad message repeatedly wastes processing time and delays healthy messages. What is the best AWS approach to handle these poison messages without blocking the rest of the queue?

A.Configure an SQS dead-letter queue (DLQ) using a redrive policy with a maxReceiveCount.

B.Delete the SQS queue and recreate it daily to clear invalid messages.

C.Increase the consumer timeout/processing time so validation failures take longer to occur.

D.Use SNS fan-out without any DLQ and rely only on application retries.

AnswerA

With a redrive policy, SQS continues delivering the message to consumers until it has been received unsuccessfully maxReceiveCount times. After that threshold, SQS moves the poison message to a DLQ, isolating it from the main processing flow so healthy messages can continue being processed.

Why this answer

Option A is correct because an SQS dead-letter queue (DLQ) with a redrive policy that sets a maxReceiveCount allows the worker to process a message up to a specified number of times. After that threshold is exceeded, the message is automatically moved to the DLQ, isolating the poison message and preventing it from blocking or delaying the processing of healthy messages in the main queue.

Exam trap

The trap here is that candidates may think increasing timeouts or relying on application retries alone can solve the problem, but they fail to recognize that only a DLQ with a redrive policy provides automatic, queue-level isolation of poison messages without blocking healthy message processing.

How to eliminate wrong answers

Option B is wrong because deleting and recreating the queue daily is disruptive, causes data loss of all messages (including valid ones), and does not provide a targeted mechanism to isolate only the poison messages. Option C is wrong because increasing the consumer timeout or processing time does not prevent validation failures; it only delays the retry cycle and does not remove the bad message from the queue, so it will still be reprocessed and waste resources. Option D is wrong because SNS fan-out without a DLQ and relying only on application retries means the poison message will be repeatedly delivered to all subscribers, causing infinite retries and blocking the processing of healthy messages; there is no automatic isolation mechanism.

Full explanation →

11

Multi-Selecthard

A company processes product-image uploads in bursts. Each transform takes up to ten minutes, and every job can be retried safely from the beginning. The current EC2 worker fleet is idle most of the day. Which two changes most reduce cost and idle capacity? Select two.

Select 2 answers

A.Buffer jobs in Amazon SQS and let workers scale from queue depth.

B.Run the workers on AWS Fargate Spot, since interruptions are acceptable.

C.Keep a fixed fleet of m6i.large instances in an Auto Scaling group with a higher minimum.

D.Use Reserved Instances for the workers even though demand is highly bursty.

E.Process uploads only during a nightly window so the fleet looks busier.

AnswersA, B

Correct. SQS decouples uploads from processing and smooths bursty demand. Queue depth is a practical scaling signal, so the company avoids paying for idle workers while still absorbing traffic spikes.

Why this answer

Option A is correct because Amazon SQS can decouple the upload bursts from the worker fleet, allowing the workers to scale based on the ApproximateNumberOfMessagesVisible metric via a target tracking scaling policy. This eliminates idle capacity by keeping workers at zero when no jobs are queued and scaling up only when bursts arrive. Option B is correct because AWS Fargate Spot provides up to a 70% discount over On-Demand, and since each transform can be retried safely from the beginning, interruptions are acceptable without data loss.

Exam trap

The trap here is that candidates often choose Reserved Instances for any cost reduction scenario, forgetting that bursty workloads with idle periods are better served by spot instances and serverless scaling, not commitment-based discounts.

Full explanation →

12

MCQeasy

A media processing pipeline runs batch jobs overnight. The jobs are stateless, can be restarted from checkpoints, and can tolerate interruptions. The team wants to minimize compute cost. Which EC2 approach is the best fit?

A.Use On-Demand instances to guarantee uninterrupted capacity.

B.Use Spot Instances and design the jobs to handle interruptions by checkpointing and retrying.

C.Use a 1-year Reserved Instance for the current instance type and lock the fleet to it.

D.Use Savings Plans but still treat interruptions as failures that require manual intervention.

AnswerB

Spot is the most cost-effective EC2 option when the workload can handle interruption. Because the jobs are stateless and can resume from checkpoints, losing an instance due to a Spot interruption does not lose progress. The design aligns directly with Spot’s best-effort interruption model, minimizing compute cost while still completing the batch work.

Why this answer

Spot Instances offer significant cost savings (up to 90% off On-Demand) but can be reclaimed by AWS with a 2-minute interruption notice. Since the batch jobs are stateless, checkpointable, and interruption-tolerant, they are an ideal workload for Spot Instances. Designing the jobs to save progress to a durable checkpoint (e.g., Amazon S3) and automatically retry on interruption ensures resilience while minimizing compute cost.

Exam trap

The trap here is that candidates often assume all production workloads need On-Demand or Reserved Instances for reliability, failing to recognize that stateless, checkpointable batch jobs are the perfect use case for Spot Instances to drastically reduce costs.

How to eliminate wrong answers

Option A is wrong because On-Demand instances are the most expensive pricing model and provide no cost optimization benefit for workloads that can tolerate interruptions. Option C is wrong because a 1-year Reserved Instance locks the fleet to a specific instance type and commits to a full year of payment, which is inflexible and not cost-optimal for a batch workload that may vary in size or instance needs. Option D is wrong because Savings Plans provide a discount in exchange for a 1- or 3-year hourly spend commitment, but they do not inherently handle interruptions; treating interruptions as failures requiring manual intervention defeats the purpose of automation and increases operational overhead.

Full explanation →

13

MCQeasy

A team runs a stateless web app on Amazon EC2 behind an Application Load Balancer. During traffic spikes, new EC2 instances take several minutes to finish bootstrapping before they can receive traffic. Which Auto Scaling configuration most directly reduces the time until additional capacity is available?

A.Increase the ALB target group deregistration delay.

B.Use an Auto Scaling warm pool so pre-initialized instances are ready to enter service.

C.Reduce the Auto Scaling group minimum size to one instance.

D.Replace the Application Load Balancer with a Network Load Balancer.

AnswerB

Warm pools keep instances pre-launched and initialized, which reduces the time needed to add capacity during spikes.

Why this answer

Option B is correct because an Auto Scaling warm pool allows you to maintain a pool of pre-initialized instances that are ready to quickly enter the target group and start serving traffic. Instead of waiting for new instances to boot and configure during a scale-out event, the warm pool provides instances that have already completed bootstrapping, drastically reducing the time to additional capacity.

Exam trap

The trap here is that candidates may confuse the deregistration delay (which handles graceful connection draining) with a mechanism to speed up instance readiness, or they may incorrectly assume that reducing the minimum size or switching to a Network Load Balancer will improve scaling speed, when neither addresses the root cause of slow bootstrapping.

How to eliminate wrong answers

Option A is wrong because increasing the ALB target group deregistration delay only affects how long the load balancer waits before terminating existing connections when an instance is deregistered; it does not speed up the provisioning of new instances. Option C is wrong because reducing the Auto Scaling group minimum size to one instance actually decreases the baseline capacity, making the system more vulnerable to traffic spikes and potentially increasing the time to scale out. Option D is wrong because replacing the Application Load Balancer with a Network Load Balancer does not address the bootstrapping delay; NLB operates at Layer 4 and does not reduce instance initialization time, and it lacks the application-layer health checks and routing features that ALB provides for HTTP-based workloads.

Full explanation →

14

MCQhard

Based on the exhibit, your application runs entirely in private subnets and only needs to reach Amazon S3, Amazon DynamoDB, AWS Secrets Manager, and CloudWatch Logs. The monthly bill is dominated by NAT Gateway charges. Which change most directly reduces cost while preserving private connectivity to these AWS services?

A.Replace the NAT Gateway with an Internet Gateway and keep the current private subnet routes unchanged.

B.Add a second NAT Gateway in another Availability Zone to reduce cross-AZ data transfer charges.

C.Create only interface endpoints for all four services and keep the NAT Gateway for fallback.

D.Create S3 and DynamoDB gateway endpoints, create interface endpoints for Secrets Manager and CloudWatch Logs, update route tables, and remove the NAT Gateway.

AnswerD

S3 and DynamoDB use gateway endpoints, which are the cost-effective private path for those services. Secrets Manager and CloudWatch Logs require interface endpoints for private access. Once these are in place, the NAT Gateway is no longer needed for this workload, eliminating the hourly and per-GB NAT charges while keeping traffic on the AWS network.

Why this answer

Option D is correct because it replaces the costly NAT Gateway with free VPC Gateway Endpoints for S3 and DynamoDB, and uses AWS PrivateLink interface endpoints for Secrets Manager and CloudWatch Logs. This eliminates all internet-bound data transfer costs while keeping traffic entirely within the AWS network, directly addressing the cost concern without sacrificing private connectivity.

Exam trap

The trap here is that candidates assume all AWS services require the same type of VPC endpoint, leading them to either use only interface endpoints (costly) or keep the NAT Gateway as a safety net, missing the opportunity to use free gateway endpoints for S3 and DynamoDB.

How to eliminate wrong answers

Option A is wrong because an Internet Gateway alone does not enable private subnets to reach AWS services; private subnets still need a NAT device to route traffic through the Internet Gateway, so removing the NAT Gateway without adding endpoints would break connectivity. Option B is wrong because adding a second NAT Gateway increases costs (additional hourly charges and cross-AZ data transfer fees) rather than reducing them, and the current bill is dominated by NAT Gateway charges, not cross-AZ traffic. Option C is wrong because using interface endpoints for all four services would incur per-hour and per-GB data processing charges for S3 and DynamoDB, which are more expensive than using free gateway endpoints for those two services; keeping the NAT Gateway as a fallback also retains unnecessary costs.

Full explanation →

15

Multi-Selectmedium

An application runs in private subnets and must access Amazon S3, Amazon DynamoDB, and AWS Secrets Manager. The security team wants the traffic to stay on the AWS network and the finance team wants to eliminate NAT Gateway charges. Which three changes should they make? Select three.

Select 3 answers

A.Create gateway VPC endpoints for S3.

B.Create gateway VPC endpoints for DynamoDB.

C.Create an interface VPC endpoint for Secrets Manager.

D.Place the instances in public subnets with an internet gateway.

E.Keep the NAT Gateway and add a proxy instance for service access.

AnswersA, B, C

A gateway endpoint for S3 allows private access without sending traffic through a NAT Gateway. It keeps S3 traffic on the AWS network and reduces NAT processing charges.

Why this answer

A gateway VPC endpoint for S3 allows instances in private subnets to access S3 over the AWS network without traversing the internet or a NAT gateway. This eliminates NAT Gateway charges and keeps traffic on the AWS backbone, meeting both security and cost requirements.

Exam trap

The trap here is that candidates often confuse gateway endpoints (for S3 and DynamoDB) with interface endpoints (for most other AWS services), leading them to incorrectly select interface endpoints for S3 or DynamoDB, which would incur hourly charges and not be the most cost-optimized solution.

Full explanation →

16

MCQeasy

A production application uses an Amazon RDS Multi-AZ DB instance. During an unplanned failover, the database endpoint remains the same. What change should the application team make to handle the failover reliably?

A.Hard-code the new writer instance IP address after failover completes.

B.Keep using the same RDS endpoint and implement connection retry logic on failures.

C.Disable Multi-AZ and rely on manual intervention to switch endpoints.

D.Move reads to application-side caching only, and avoid reopening DB connections.

AnswerB

For RDS Multi-AZ, the DB endpoint is designed to remain consistent. During failover, in-flight connections may drop, so the application should treat connection/transaction errors as transient and reconnect with retry (for example, exponential backoff).

Why this answer

Option B is correct because the RDS Multi-AZ DNS endpoint remains unchanged during a failover, automatically pointing to the new writer instance. Implementing connection retry logic with exponential backoff allows the application to handle the brief DNS propagation delay and connection interruption, ensuring reliable recovery without manual intervention.

Exam trap

The trap here is that candidates assume the endpoint changes or that Multi-AZ provides seamless failover without any application-side changes, but in reality the application must implement retry logic to handle the brief connection disruption during DNS propagation.

How to eliminate wrong answers

Option A is wrong because hard-coding the new writer instance IP address is impractical and error-prone; the IP address can change after failover, and this approach bypasses the automatic DNS update provided by Multi-AZ. Option C is wrong because disabling Multi-AZ removes high availability entirely, forcing manual endpoint switching which increases downtime and violates the goal of reliable failover handling. Option D is wrong because moving reads to application-side caching does not address the need to re-establish the database connection after failover; the application must still handle connection failures and retries for writes.

Full explanation →

17

Multi-Selectmedium

A solutions architect is designing a cost-optimized data storage solution for a large dataset that is accessed infrequently but must be retained for compliance for 7 years. Which three actions should the architect take to minimize costs? (Choose three.)

Select 3 answers

.Store the data in Amazon S3 Glacier Deep Archive immediately after creation.

.Use Amazon S3 lifecycle policies to transition data from S3 Standard to S3 Glacier Deep Archive after 30 days.

.Enable S3 Intelligent-Tiering to automatically move data between access tiers based on usage patterns.

.Store all data in Amazon EBS gp2 volumes attached to an EC2 instance for low-latency access.

.Use S3 Object Lock in compliance mode to prevent data deletion during the retention period.

.Replicate all data to a second AWS Region using S3 Cross-Region Replication to ensure durability.

Why this answer

Amazon S3 lifecycle policies allow you to define rules that automatically transition objects to colder storage tiers like S3 Glacier Deep Archive after a specified period. This approach minimizes costs by keeping data in S3 Standard only for the initial 30 days when it might be accessed, then moving it to the lowest-cost storage class for the remaining compliance period. S3 Intelligent-Tiering automatically optimizes costs by monitoring access patterns and moving data between frequent, infrequent, and archive access tiers without manual intervention.

S3 Object Lock in compliance mode prevents any user, including the root user, from deleting or overwriting objects during the retention period, ensuring regulatory compliance.

Exam trap

The trap here is that candidates may think immediate archiving to Glacier Deep Archive is the cheapest option, but they overlook the need for lifecycle policies to balance initial access needs with long-term cost savings, and they may confuse durability (which S3 already provides) with compliance retention, leading them to select unnecessary replication.

Full explanation →

18

MCQmedium

Your company has an internal service hosted behind a Network Load Balancer (NLB) in VPC 10.0.0.0/16. A consumer team in a different VPC (10.1.0.0/16) must call the service without using the public internet. You want private connectivity using AWS PrivateLink. Which configuration best enables least-privilege access while keeping the traffic private?

A.Expose the NLB with an Internet Gateway route and restrict access using a security group attached to the NLB.

B.Create a VPC endpoint (interface endpoint) in the consumer VPC that points to the service name published by the provider account, and limit allowed clients using the endpoint’s security group rules.

C.Create an S3 Gateway endpoint in the consumer VPC and store the service hostname in SSM Parameter Store so clients can resolve privately.

D.Use a bastion host in the provider VPC and allow the consumer VPC to SSH to it; from there, the consumer makes HTTP calls to the NLB.

AnswerB

PrivateLink uses an interface VPC endpoint in the consumer VPC (using the provider’s published service name). Traffic stays on the AWS network, not the public internet. Security groups on the interface endpoint provide least-privilege control over which client resources can reach the endpoint, and the provider side can also restrict who can connect.

Why this answer

Option B is correct because AWS PrivateLink uses an interface VPC endpoint in the consumer VPC to connect privately to a Network Load Balancer (NLB) in the provider VPC, keeping traffic within the AWS network. The endpoint’s security group acts as a stateful firewall to restrict which clients in the consumer VPC can access the service, enforcing least-privilege access. This eliminates exposure to the public internet and avoids complex routing or gateway configurations.

Exam trap

The trap here is that candidates often confuse Gateway Endpoints (which only work with S3 and DynamoDB) with Interface Endpoints (which support PrivateLink for services behind an NLB), leading them to incorrectly select Option C.

How to eliminate wrong answers

Option A is wrong because attaching an Internet Gateway route to the NLB would expose the service to the public internet, violating the requirement for private connectivity and least-privilege access; NLB security groups are not supported (NLBs use security groups only for target instances, not the load balancer itself). Option C is wrong because an S3 Gateway endpoint is designed exclusively for Amazon S3 access and cannot be used to connect to an NLB or resolve a service hostname; SSM Parameter Store does not provide private network connectivity. Option D is wrong because using a bastion host introduces a single point of failure, requires SSH key management, and violates least-privilege by granting broad network access; it also adds latency and operational overhead compared to a direct PrivateLink connection.

Full explanation →

19

MCQhard

A IoT ingestion API must ensure that only encrypted EBS volumes can be created in the account. What is the strongest preventive control?

A.Use an SCP that denies ec2:CreateVolume when the encrypted condition is false

B.Run a daily Lambda function to encrypt unencrypted volumes

C.Enable VPC Flow Logs

D.Tag encrypted volumes after creation

AnswerA

An SCP can prevent noncompliant volume creation across accounts in an organization.

Why this answer

An SCP (Service Control Policy) is the strongest preventive control because it can deny the ec2:CreateVolume API call when the encrypted condition is false, effectively blocking the creation of any unencrypted EBS volume at the account level before it happens. This is a preventive control that enforces encryption as a mandatory requirement, unlike detective or corrective measures that act after the fact.

Exam trap

The trap here is confusing preventive controls (like SCPs that block the action) with detective or corrective controls (like Lambda scripts or tagging), leading candidates to choose a reactive solution instead of the strongest preventive one.

How to eliminate wrong answers

Option B is wrong because running a daily Lambda function to encrypt unencrypted volumes is a corrective/reactive control, not a preventive one; it only fixes volumes after they have already been created unencrypted, leaving a window of non-compliance. Option C is wrong because VPC Flow Logs are a detective control that captures network traffic metadata, not a mechanism to enforce or prevent the creation of encrypted EBS volumes. Option D is wrong because tagging encrypted volumes after creation is a labeling action that provides visibility but does not prevent the creation of unencrypted volumes in the first place.

Full explanation →

20

MCQmedium

A analytics dashboard uses RDS MySQL and receives many read-only reporting queries that slow down the primary database. What should the architect add?

A.S3 lifecycle policy

B.RDS read replica and route reporting queries to it

C.Multi-AZ standby and route reads to the standby

D.A larger NAT gateway

AnswerB

Read replicas offload read traffic from the primary instance.

Why this answer

Adding an RDS read replica offloads read-heavy reporting queries from the primary MySQL instance, preserving write performance. The read replica asynchronously replicates data using MySQL's native binlog replication, and routing reporting queries to its endpoint reduces contention on the primary.

Exam trap

The trap here is confusing a Multi-AZ standby (which is for high availability only and cannot serve reads) with a read replica (which is explicitly designed to offload read traffic).

How to eliminate wrong answers

Option A is wrong because S3 lifecycle policies manage object transitions and expirations in S3, not database query offloading. Option C is wrong because a Multi-AZ standby is a synchronous replica used only for failover; it does not serve read traffic (RDS does not allow direct reads from the standby). Option D is wrong because a larger NAT gateway increases outbound internet bandwidth for private subnets, which does not address database read query performance.

Full explanation →

21

MCQmedium

A team serves static web assets (JS, CSS, images) from an Amazon S3 origin through CloudFront. Recently, the S3 origin has received a high number of requests for the same files, increasing origin data transfer costs. CloudFront access logs show many cache misses, and each request includes a unique query string used only for tracking (for example, ?utm=...). The application does not require query-string-specific content. What CloudFront change will most directly reduce origin fetches and cost?

A.Update the CloudFront cache policy to exclude query strings from the cache key so that requests differing only by tracking query parameters reuse the same cached object.

B.Lower the minimum TTL and set Cache-Control headers to no-store to force CloudFront to revalidate more often.

C.Enable Origin Shield to ensure all origin fetches go through a single regional shield with no other configuration changes.

D.Switch the S3 origin from S3 to a different storage class optimized for request rates, keeping the cache key the same.

AnswerA

CloudFront cache misses increase when the cache key includes values that vary per request. If the tracking query string is part of the cache key, each unique ?utm value generates a separate cache entry even though the underlying object (JS/CSS/image) is identical, causing repeated origin fetches. Excluding query strings from the cache key collapses those variations into a single cached object, increasing the cache hit rate and reducing origin fetches and origin data transfer.

Why this answer

Option A is correct because CloudFront's cache policy controls which parts of a request (including query strings) are included in the cache key. By excluding the tracking query strings (e.g., `?utm=...`) from the cache key, CloudFront will treat all requests for the same file as identical, serving the cached object regardless of the query string. This directly reduces the number of origin fetches (cache misses) and lowers S3 data transfer costs, as the application does not require query-string-specific content.

Exam trap

The trap here is that candidates may think enabling Origin Shield (Option C) or changing storage classes (Option D) will solve the problem, but they overlook the fundamental issue of cache key fragmentation caused by unique query strings, which is directly addressed by adjusting the cache policy.

How to eliminate wrong answers

Option B is wrong because lowering the minimum TTL and setting `Cache-Control: no-store` would force CloudFront to revalidate or bypass the cache entirely, increasing origin fetches and costs, not reducing them. Option C is wrong because enabling Origin Shield alone, without adjusting the cache key to exclude query strings, does not address the root cause of cache misses caused by unique query strings; Origin Shield would still forward each unique query string request to the origin. Option D is wrong because switching the S3 storage class (e.g., to S3 Standard-IA or One Zone-IA) does not change the cache key behavior; the high number of unique query strings would still cause cache misses and origin fetches, and some storage classes may even incur higher per-request costs.

Full explanation →

22

Multi-Selecthard

A image sharing application uses CloudFront in front of an S3 origin. Which two settings help keep users from bypassing CloudFront and accessing the bucket directly?

Select 2 answers

A.Enable CloudFront standard logging

B.Enable S3 static website hosting

C.Configure Origin Access Control for the S3 origin

D.Use an S3 bucket policy that allows access only from the CloudFront distribution

AnswersC, D

Origin Access Control allows CloudFront to securely access a private S3 bucket.

Why this answer

Origin Access Control (OAC) is a CloudFront feature that restricts access to an S3 origin by requiring that all requests include a specific signature that only CloudFront can generate. When you configure OAC, CloudFront signs requests to S3 using its own credentials, and the S3 bucket policy is updated to allow access only to the CloudFront distribution's canonical user ID. This ensures that direct requests to the S3 bucket URL are denied, preventing users from bypassing CloudFront.

Exam trap

The trap here is that candidates often confuse enabling S3 static website hosting (which creates a public endpoint) with a security control, when in fact it would undermine the goal of restricting direct access.

Full explanation →

23

MCQeasy

Based on the exhibit, a web application must stay available if one Availability Zone fails. What is the best change to improve resilience?

A.Increase the desired capacity to 8 instances in the same subnet.

B.Add a subnet in another Availability Zone to the Auto Scaling group and keep the ALB spanning both AZs.

C.Replace the Application Load Balancer with a Network Load Balancer.

D.Move the instances to a larger instance type with more CPU and memory.

AnswerB

This places application instances across multiple Availability Zones, which protects the stateless tier from a single-AZ failure. The ALB already spans two AZs, so the missing piece is the Auto Scaling group using subnets in more than one AZ. That allows AWS to replace unhealthy instances and continue serving traffic from the surviving Zone.

Why this answer

Adding a subnet in another Availability Zone (AZ) to the Auto Scaling group and keeping the ALB spanning both AZs ensures that if one AZ fails, the ALB can route traffic to healthy instances in the other AZ. This is the standard pattern for building multi-AZ resilient architectures with Auto Scaling and ALB, as it eliminates the single point of failure at the AZ level.

Exam trap

The trap here is that candidates often think increasing instance count or size improves resilience, but without multi-AZ distribution, all instances remain vulnerable to a single AZ failure.

How to eliminate wrong answers

Option A is wrong because increasing the desired capacity to 8 instances in the same subnet does not protect against an AZ failure; all instances remain in a single AZ, so if that AZ goes down, all instances become unavailable. Option C is wrong because replacing the ALB with a Network Load Balancer does not inherently improve resilience against AZ failure; both ALB and NLB support multi-AZ deployments, but the issue is the lack of cross-AZ instance distribution, not the load balancer type. Option D is wrong because moving to a larger instance type with more CPU and memory addresses performance scaling, not availability; it does not protect against an AZ outage.

Full explanation →

24

MCQmedium

A mobile game backend uses Amazon Aurora. The workload has many short-lived database connections from Lambda functions, causing connection storms. What should be added? The architecture review board prefers a managed AWS-native control.

A.An internet gateway

B.S3 Select

C.RDS Proxy

D.A larger Route 53 hosted zone

AnswerC

RDS Proxy pools and manages database connections, improving scalability for serverless and bursty workloads.

Why this answer

RDS Proxy is a fully managed, AWS-native service that sits between Lambda functions and Aurora, pooling and reusing database connections. This prevents connection storms by reducing the overhead of establishing new connections for each short-lived Lambda invocation, and it also improves scalability and resilience by handling failover transparently.

Exam trap

The trap here is that candidates might confuse network-level components (like an internet gateway) or data retrieval services (like S3 Select) with database connection management, overlooking the purpose-built RDS Proxy service for handling short-lived, high-frequency connections from serverless workloads.

How to eliminate wrong answers

Option A is wrong because an internet gateway is used to enable VPC-to-internet connectivity, not to manage database connections or mitigate connection storms. Option B is wrong because S3 Select is a feature for retrieving subsets of data from objects in S3, not for managing database connections or connection pooling. Option D is wrong because a larger Route 53 hosted zone increases the number of DNS records you can host, but it does not address database connection management or connection storms.

Full explanation →

25

Multi-Selecthard

A customer analytics portal uses CloudFront in front of an S3 origin. Which two settings help keep users from bypassing CloudFront and accessing the bucket directly?

Select 2 answers

A.Enable CloudFront standard logging

B.Configure Origin Access Control for the S3 origin

C.Use an S3 bucket policy that allows access only from the CloudFront distribution

D.Enable S3 static website hosting

AnswersB, C

Origin Access Control allows CloudFront to securely access a private S3 bucket.

Why this answer

Option B is correct because Origin Access Control (OAC) is a CloudFront feature that restricts access to an S3 origin so that only the CloudFront distribution can fetch objects. When OAC is configured, CloudFront signs requests using a trusted identity, and the S3 bucket policy can then deny any request that does not come from that identity, effectively blocking direct S3 access.

Exam trap

The trap here is that candidates often confuse CloudFront standard logging with access control, or mistakenly think enabling S3 static website hosting somehow restricts access, when in fact it opens an additional direct endpoint.

Full explanation →

26

MCQmedium

An order-processing service consumes messages from an Amazon SQS Standard queue using a custom worker. During traffic spikes, the worker occasionally times out after performing some work but before acknowledging the message, so SQS redelivers it and it may be processed again. You also observe that a small set of “poison” messages always fail validation. What change most directly improves resilience by (1) preventing poison messages from retrying indefinitely and (2) avoiding duplicate side effects caused by legitimate retries?

A.Increase the SQS visibility timeout and, when validation fails, call DeleteMessage in the consumer to remove the message immediately.

B.Move to SNS topics with subscriptions and rely on SNS to provide exactly-once delivery to eliminate duplicates automatically.

C.Configure a dead-letter queue (DLQ) with a redrive policy that moves messages after maxReceiveCount, and implement idempotent processing in the consumer using an idempotency key.

D.Change the queue to FIFO and enable content-based deduplication, leaving the consumer logic unchanged.

AnswerC

SQS Standard is at-least-once delivery, so timeouts can cause redelivery and duplicates. A DLQ with a redrive policy prevents poison messages from retrying forever by moving them after repeated failures. Idempotent processing (for example, storing a processed marker in a database with conditional logic keyed by an idempotency key) prevents duplicate side effects when retries occur for valid messages.

Why this answer

Option C is correct because a dead-letter queue (DLQ) with a maxReceiveCount redrive policy directly addresses the poison message problem by moving messages that repeatedly fail validation out of the main queue after a set number of retries, preventing indefinite retries. Implementing idempotent processing using an idempotency key ensures that even if a legitimate message is redelivered due to a visibility timeout, the consumer can detect and skip duplicate side effects, thus solving both requirements most directly.

Exam trap

The trap here is that candidates often confuse FIFO queues as a universal solution for both deduplication and poison message handling, but FIFO only provides exactly-once processing within a deduplication window and does not automatically handle poison messages without a DLQ, nor does it address idempotency for retries outside that window.

How to eliminate wrong answers

Option A is wrong because increasing the visibility timeout does not prevent poison messages from retrying indefinitely—they will still be retried until the timeout expires, and calling DeleteMessage after validation failure only removes the message from the queue but does not stop redelivery if the consumer times out before acknowledging; it also does not address duplicate side effects from legitimate retries. Option B is wrong because SNS topics do not provide exactly-once delivery; SNS is a pub/sub messaging service that delivers messages to multiple subscribers but does not guarantee deduplication or eliminate duplicates, and it does not replace the need for a DLQ or idempotent processing. Option D is wrong because switching to a FIFO queue with content-based deduplication eliminates duplicates within a 5-minute deduplication window but does not handle poison messages—they would still be retried indefinitely unless a DLQ is configured, and leaving consumer logic unchanged means idempotency is not addressed, so duplicate side effects from retries beyond the deduplication window could still occur.

Full explanation →

27

MCQeasy

Based on the exhibit, what change best reduces Lambda cold-start impact for a predictable user-upload workflow?

A.Set a reserved concurrency limit for the function to protect it from throttling.

B.Enable provisioned concurrency for the function.

C.Increase the function timeout to give more time for initialization.

D.Move the function to a larger memory setting only to eliminate all initialization time.

AnswerB

Provisioned concurrency keeps a pre-initialized pool of Lambda execution environments ready to respond immediately. The exhibit shows long init duration after inactivity, which is the classic symptom of cold starts affecting user experience. Because the traffic pattern is predictable during launches, provisioned concurrency is the most direct way to reduce startup latency and smooth response times.

Why this answer

Provisioned concurrency pre-warms a specified number of execution environments so that when a user upload triggers the Lambda function, there is no cold-start latency. This is the most direct way to eliminate initialization time for a predictable workload, as it keeps instances ready to handle requests immediately.

Exam trap

The trap here is that candidates often confuse reserved concurrency (which limits concurrency) with provisioned concurrency (which pre-warms instances), or they assume that increasing memory or timeout will solve cold starts, when in fact only provisioned concurrency directly addresses initialization latency for predictable workloads.

How to eliminate wrong answers

Option A is wrong because reserved concurrency only caps the maximum number of concurrent executions to prevent throttling; it does not pre-warm instances or reduce cold-start impact. Option C is wrong because increasing the function timeout does not affect initialization time; it only extends the maximum duration a function can run, which does not address cold starts. Option D is wrong because moving to a larger memory setting can reduce initialization time by providing more CPU and resources, but it does not eliminate all initialization time, and it is not as targeted or effective as provisioned concurrency for predictable workloads.

Full explanation →

28

Multi-Selecthard

An application stores user-uploaded binaries in S3. Access is unpredictable for the first month, then most objects become cold. The team wants the cheapest approach that avoids manually guessing access patterns. Which two actions are best? Select two.

Select 2 answers

A.Enable S3 Intelligent-Tiering on the bucket.

B.Keep all objects in S3 Standard because lifecycle transitions add too much management.

C.Add a lifecycle rule to move very old objects to S3 Glacier Deep Archive when minute-level retrieval is no longer required.

D.Copy all binaries to Amazon EFS so retrieval is faster.

E.Disable versioning because S3 Intelligent-Tiering needs it to work.

AnswersA, C

Correct. Intelligent-Tiering is designed for objects with uncertain or changing access patterns. It automatically moves data between access tiers, reducing the need for manual guessing and avoiding overpaying for standard storage.

Why this answer

Option A is correct because S3 Intelligent-Tiering automatically moves objects between access tiers (frequent, infrequent, archive instant, and deep archive) based on changing access patterns, without manual lifecycle rules. This is ideal for unpredictable access in the first month followed by cold storage, as it optimizes cost by charging only for the storage tier actually used per object, with a small monthly monitoring fee per object.

Exam trap

The trap here is that candidates may think S3 Intelligent-Tiering requires versioning or manual lifecycle rules, but it is a fully automated, versioning-independent feature designed specifically for unpredictable access patterns.

Full explanation →

29

Multi-Selectmedium

A software vendor in Account B must assume a role in Account A to process support tickets. Security wants to prevent confused deputy attacks. Which two configurations are required for this access pattern to work safely? Select two.

Select 2 answers

A.Require a specific sts:ExternalId value in the role trust policy in Account A.

B.Make sure the vendor includes that same ExternalId when calling sts:AssumeRole.

C.Share long-term access keys from Account A with the vendor.

D.Attach a permissions boundary to the role to satisfy the ExternalId requirement.

E.Allow sts:GetSessionToken instead of sts:AssumeRole in the trust policy.

AnswersA, B

A trust policy condition on sts:ExternalId is the standard confused-deputy protection for third-party role assumption. It ensures that only callers who know the shared external identifier can assume the role.

Why this answer

Option A is correct because requiring a specific sts:ExternalId value in the role trust policy in Account A is a standard AWS mechanism to prevent the confused deputy problem. The ExternalId acts as a unique secret that the vendor must provide when assuming the role, ensuring that the role is assumed only for the intended purpose and not by a malicious third party.

Exam trap

The trap here is that candidates often confuse the ExternalId with a permissions boundary or think that long-term keys are acceptable for cross-account access, but the correct answer requires both the trust policy condition and the caller's inclusion of the ExternalId in the API call.

Full explanation →

30

MCQhard

An EC2 instance in a private subnet must access an S3 bucket that contains regulated exports for a customer analytics portal. The security team requires access to be allowed only when traffic comes through a specific VPC endpoint. What should the architect add to the bucket policy? The design must avoid adding custom operational scripts.

A.A security group rule that allows HTTPS to S3

B.A condition that matches aws:RequestedRegion to the bucket Region

C.A deny statement for all IAM users except the EC2 role

D.A condition that matches aws:sourceVpce to the endpoint ID

AnswerD

The aws:sourceVpce condition restricts S3 access to requests that arrive through the specified VPC endpoint.

Why this answer

Option D is correct because the bucket policy can use the `aws:sourceVpce` condition key to restrict access exclusively to traffic originating from a specific VPC endpoint ID. This ensures that only requests sent through that VPC endpoint are allowed, meeting the security team's requirement without requiring custom scripts or additional infrastructure.

Exam trap

The trap here is that candidates may confuse security group rules with bucket policies, or assume that restricting by IAM user or region is sufficient to enforce network-level control, when in fact only the `aws:sourceVpce` condition key directly ties access to a specific VPC endpoint.

How to eliminate wrong answers

Option A is wrong because security group rules operate at the network interface level and cannot be attached to an S3 bucket; S3 bucket policies are resource-based policies that do not support security group references. Option B is wrong because `aws:RequestedRegion` restricts the AWS Region in which the request is made, not the network path or VPC endpoint used, so it does not enforce that traffic comes through a specific VPC endpoint. Option C is wrong because denying all IAM users except the EC2 role would not restrict traffic to a specific VPC endpoint; it only controls which IAM identities can access the bucket, not the network path, and could break legitimate access from other services or users.

Full explanation →

31

Multi-Selectmedium

A Lambda function behind API Gateway has predictable traffic spikes every hour. The function does not need access to resources in a VPC, and p95 latency spikes are caused by cold starts during scale-out. Which two actions are most effective? Select two.

Select 2 answers

A.Enable provisioned concurrency for the function.

B.Remove the function from a VPC because it has no VPC dependencies.

C.Set reserved concurrency to a low fixed number.

D.Increase the Lambda timeout to 15 minutes.

E.Add an SQS dead-letter queue to reduce startup latency.

AnswersA, B

Provisioned concurrency keeps a pool of initialized execution environments ready to handle requests. That removes most cold-start delay and is the most direct way to stabilize p95 latency during predictable bursts.

Why this answer

Option A is correct because provisioned concurrency pre-warms a specified number of Lambda execution environments, eliminating cold starts for those instances. This directly addresses the p95 latency spikes caused by cold starts during predictable traffic spikes, as the function will have warm containers ready to handle incoming requests without the initialization delay.

Exam trap

The trap here is that candidates often confuse reserved concurrency (which limits concurrency and can cause throttling) with provisioned concurrency (which pre-warms environments), or they mistakenly believe that increasing timeout or adding a DLQ can mitigate cold start latency.

Full explanation →

32

Multi-Selectmedium

A company is designing a high-performance database architecture for an e-commerce platform that experiences rapid spikes in read traffic during flash sales. The database must handle millions of reads per second with sub-millisecond latency. The data is key-value in nature, with a small number of attributes per item. Which three options should be included in the architecture? (Choose three.)

Select 3 answers

.Amazon DynamoDB as the primary database.

.Amazon RDS for MySQL with Multi-AZ and Read Replicas.

.DynamoDB Accelerator (DAX) as an in-memory cache.

.Amazon ElastiCache for Redis with cluster mode enabled.

.Amazon S3 as a primary data store accessed via Select and Range queries.

.Amazon Redshift with auto-scaling for real-time reads.

Why this answer

Amazon DynamoDB is a fully managed NoSQL key-value database that delivers single-digit millisecond latency at any scale, making it ideal for high-traffic e-commerce platforms with key-value data. DynamoDB Accelerator (DAX) is an in-memory cache that sits in front of DynamoDB, reducing read latency to microseconds for millions of reads per second. Amazon ElastiCache for Redis with cluster mode enabled provides a distributed in-memory cache that can offload read traffic from the primary database, further reducing latency and handling spikes during flash sales.

Exam trap

The trap here is that candidates often choose Amazon RDS with Read Replicas for read scaling, but they fail to recognize that relational databases cannot achieve sub-millisecond latency for millions of reads per second, and that DynamoDB with caching layers is the correct high-performance key-value solution.

Full explanation →

33

MCQeasy

Account A hosts an IAM role (RoleInAccountA). The trust policy in Account A correctly allows a specific principal from Account B to call sts:AssumeRole. However, when Account B’s application calls sts:AssumeRole, it receives an AccessDenied error. What is the most likely missing requirement in Account B?

A.Account B’s calling principal must have an identity-based policy that allows sts:AssumeRole on RoleInAccountA’s role ARN.

B.Account A must attach an S3 bucket policy statement to allow sts:AssumeRole from Account B.

C.Account B must add kms:Decrypt permissions to the caller to satisfy AssumeRole.

D.Account B must create an SCP in the organization to allow sts:AssumeRole.

AnswerA

Cross-account role assumption is authorized on both sides: the trust policy allows who can assume, and the caller’s identity policy must allow sts:AssumeRole on the target role ARN.

Why this answer

Option A is correct because for an IAM role in Account A to be assumed by a principal in Account B, two conditions must be met: (1) the trust policy of the role in Account A must grant the sts:AssumeRole permission to the Account B principal, and (2) the calling principal in Account B must have an identity-based policy that explicitly allows sts:AssumeRole on the ARN of RoleInAccountA. Without this identity-based policy in Account B, the request is denied by AWS's explicit deny default, even if the trust policy in Account A is correctly configured.

Exam trap

The trap here is that candidates often assume the trust policy alone is sufficient for cross-account role assumption, forgetting that the calling principal must also have an explicit identity-based policy granting sts:AssumeRole on the target role ARN.

How to eliminate wrong answers

Option B is wrong because S3 bucket policies are used to control access to S3 resources, not to authorize sts:AssumeRole calls; sts:AssumeRole is governed by IAM policies and trust policies, not S3 bucket policies. Option C is wrong because kms:Decrypt permissions are relevant only if the role or resources accessed after assuming the role require decryption of KMS-encrypted data; they are not a prerequisite for the sts:AssumeRole API call itself. Option D is wrong because Service Control Policies (SCPs) in AWS Organizations can only deny or allow permissions for principals within the organization, but the question does not indicate that Account B is part of an organization, and even if it were, SCPs are not the missing requirement—the identity-based policy is the immediate missing element.

Full explanation →

34

MCQmedium

A microservice reads a secret from AWS Secrets Manager using its task role (ServiceRole). The secret is configured to use a customer-managed CMK. In production, the service fails with AccessDeniedException on GetSecretValue. CloudTrail shows that Secrets Manager attempted kms:Decrypt but was denied. Which IAM policy change is most appropriate to fix the failure while keeping least privilege?

A.Add kms:Decrypt permission for the specific CMK ARN to ServiceRole, and also keep secretsmanager:GetSecretValue for the specific secret ARN.

B.Add secretsmanager:ListSecrets permission on "*" so the service can discover the secret and retry the read.

C.Add s3:GetObject permission to ServiceRole for the KMS key alias stored in an S3 bucket.

D.Add kms:Encrypt permission instead of kms:Decrypt, because the service only needs to read the secret.

AnswerA

The failure is due to kms:Decrypt being denied. Granting decrypt on the specific CMK and limiting Secrets Manager access to the exact secret preserves least-privilege while allowing Secrets Manager’s decryption step.

Why this answer

The AccessDeniedException occurs because the task role (ServiceRole) lacks the kms:Decrypt permission for the customer-managed CMK used to encrypt the secret. Secrets Manager calls kms:Decrypt on your behalf when retrieving the secret value. Adding kms:Decrypt for the specific CMK ARN to ServiceRole, while retaining secretsmanager:GetSecretValue for the specific secret ARN, grants the minimum required permissions to decrypt and read the secret.

Exam trap

The trap here is that candidates assume secretsmanager:GetSecretValue alone is sufficient, overlooking that Secrets Manager must call kms:Decrypt with the caller's permissions when a customer-managed CMK is used.

How to eliminate wrong answers

Option B is wrong because secretsmanager:ListSecrets on "*" does not grant permission to decrypt the secret; it only lists secret metadata and does not resolve the kms:Decrypt denial. Option C is wrong because the KMS key alias is not stored in an S3 bucket in this scenario, and s3:GetObject is irrelevant to decrypting the secret; the error is about KMS decryption, not S3 access. Option D is wrong because kms:Encrypt is used to encrypt data, not to decrypt it; reading a secret requires kms:Decrypt, not kms:Encrypt.

Full explanation →

35

Multi-Selectmedium

A media company keeps application logs in Amazon S3 for 400 days. The logs are read heavily for the first 30 days, occasionally for the next 90 days, and almost never after that. The team wants to lower storage cost without affecting retention requirements. Which two lifecycle transitions should it configure? Select two.

Select 2 answers

A.Transition the objects to S3 Standard-IA after 30 days.

B.Transition the objects to S3 Glacier Deep Archive after 120 days.

C.Transition the objects to S3 One Zone-IA after 30 days.

D.Keep the objects in S3 Standard for the full 400 days.

E.Use only S3 Intelligent-Tiering and never add archival transitions.

AnswersA, B

Standard-IA is a good fit after the initial hot period because retrievals become less frequent but still matter. It reduces storage cost compared with S3 Standard while keeping the data quickly accessible for the next several months.

Why this answer

Option A is correct because after the first 30 days of heavy read access, transitioning to S3 Standard-IA reduces storage costs while still providing low-latency retrieval for the occasional reads that occur over the next 90 days. S3 Standard-IA is designed for data accessed infrequently but requires rapid access, matching the usage pattern described.

Exam trap

The trap here is that candidates often confuse 'occasional access' with 'rare access' and incorrectly choose Glacier Deep Archive too early, or they overlook that S3 Intelligent-Tiering does not include archival tiers, leading to higher costs for long-term retention.

Full explanation →

36

MCQmedium

In AWS Organizations, a Service Control Policy (SCP) denies kms:Decrypt on a production CMK for all principals in the Finance OU. A developer in the Finance OU created/updated an IAM policy that allows secrets access, but the application still fails with AccessDenied due to the SCP. You must enable only the Finance OU to decrypt that specific CMK while keeping the SCP restrictions for other OUs. What is the correct remediation?

A.Update the developer’s IAM policy to allow kms:Decrypt on the CMK alias ARN so the request bypasses the SCP.

B.Modify the SCP so it no longer denies kms:Decrypt for that specific CMK when applied to the Finance OU, while preserving the deny behavior for other OUs.

C.Add a KMS key policy statement that allows the developer role to decrypt the CMK.

D.Attach a permissions boundary that grants kms:Decrypt so the SCP becomes irrelevant.

AnswerB

Because the SCP is what creates the Deny, the correct fix is to adjust the SCP scope/conditions so that kms:Decrypt for the specific CMK is not denied for the Finance OU. Other OUs remain under the same restrictive SCP behavior.

Why this answer

Option B is correct because SCPs are evaluated before IAM policies and cannot be bypassed by IAM permissions. By modifying the SCP to exclude the specific CMK for the Finance OU (e.g., using a Condition key like `kms:ViaService` or a resource-level exception), you remove the explicit deny for that OU while keeping it in place for all other OUs. This ensures the developer's IAM policy can then allow `kms:Decrypt` without being blocked by the SCP.

Exam trap

The trap here is that candidates mistakenly think IAM policies or KMS key policies can override an SCP, but SCPs are a higher-order policy that always takes precedence over any allow within the account.

How to eliminate wrong answers

Option A is wrong because SCPs take precedence over IAM policies; an IAM policy allowing `kms:Decrypt` cannot bypass an SCP that explicitly denies the same action. Option C is wrong because a KMS key policy statement granting decrypt to the developer role is still subject to the SCP's explicit deny, which overrides any allow from the key policy. Option D is wrong because a permissions boundary limits the maximum permissions an IAM role can have, but it does not override an SCP; the SCP's explicit deny still applies and blocks the action.

Full explanation →

37

MCQeasy

A Lambda function needs to read the current value of exactly one AWS Secrets Manager secret at startup. Which least-privilege IAM permission (action and resource scope) should you grant to the Lambda execution role?

A.secretsmanager:ListSecrets on all secrets (resource set to "*")

B.secretsmanager:GetSecretValue on only the secret’s full ARN

C.secretsmanager:UpdateSecret on the specific secret ARN

D.secretsmanager:DescribeSecret on all secrets (resource set to "*")

AnswerB

GetSecretValue is the specific action required to retrieve the secret value. Scoping the permission to the secret’s full ARN ensures the Lambda role can read only that secret and cannot access other secrets.

Why this answer

The Lambda function needs to read the current value of exactly one secret at startup. The least-privilege permission is `secretsmanager:GetSecretValue` scoped to that secret's full ARN. This action retrieves the secret value, and restricting the resource to the specific ARN ensures the function cannot access any other secrets.

Exam trap

The trap here is that candidates may confuse `ListSecrets` or `DescribeSecret` with `GetSecretValue`, thinking metadata retrieval is sufficient, or they may apply a broad resource scope ("*") instead of the specific ARN, violating the least-privilege principle that AWS emphasizes in the SAA-C03 exam.

How to eliminate wrong answers

Option A is wrong because `secretsmanager:ListSecrets` only returns metadata (names, ARNs) and not the secret value, so it cannot satisfy the requirement to read the current value. Option C is wrong because `secretsmanager:UpdateSecret` is a write operation that modifies the secret, which is unnecessary and violates least privilege for a read-only startup task. Option D is wrong because `secretsmanager:DescribeSecret` returns metadata (e.g., rotation configuration, tags) but not the secret value, and scoping it to all secrets grants excessive access.

Full explanation →

38

Multi-Selecthard

A regional web application for a inventory service must fail over automatically to a secondary Region if the primary endpoint becomes unhealthy. Which two services or features are required? The architecture review board prefers a managed AWS-native control.

Select 2 answers

A.Route 53 failover routing with health checks

B.S3 Transfer Acceleration

C.A deployed standby application stack in the secondary Region

D.AWS Organizations service control policies

AnswersA, C

Route 53 can monitor endpoint health and return the standby endpoint when the primary is unhealthy.

Why this answer

Route 53 failover routing with health checks is correct because it provides the DNS-level automatic failover mechanism required to redirect traffic from the primary Region to a secondary Region when the primary endpoint becomes unhealthy. Route 53 health checks monitor the primary endpoint's health, and when they detect a failure, the failover routing policy automatically returns the IP address of the secondary endpoint, enabling seamless failover without manual intervention. This is a managed AWS-native control that meets the architecture review board's preference.

Exam trap

The trap here is that candidates may think a pre-deployed standby stack in the secondary Region is optional or that a single service like Route 53 alone can handle failover, but both the DNS routing mechanism (Route 53) and the actual compute/storage resources in the secondary Region are required for a working failover solution.

Full explanation →

39

MCQmedium

A company wants S3 access to be available only from private connectivity. They created an Interface VPC Endpoint for S3 (that provides private connectivity from their VPC to S3) and configured the application to use it from private subnets. The IAM role allows: - s3:GetObject on arn:aws:s3:::confidential-bucket/reports/* However, requests fail with AccessDenied. The S3 bucket policy includes an allow statement that permits GetObject only if: - aws:SourceVpce equals "vpce-0abc12345def6789" After redeploying the VPC endpoint, the application still uses the same IAM permissions but gets AccessDenied. What change is most likely to fix the issue?

A.Update the bucket policy to allow the new VPC endpoint ID (the vpce-* value) created by the redeployment.

B.Add internet egress via a NAT Gateway so the requests can reach S3 over the public endpoint.

C.Remove the aws:SourceVpce condition from the bucket policy to ensure the IAM permissions are sufficient.

D.Update the IAM role to add s3:PutObject permissions so the requests can be authorized.

AnswerA

The bucket policy is pinned to a specific endpoint ID using aws:SourceVpce. Redeploying or recreating the endpoint creates a new endpoint ID, so requests now present a different aws:SourceVpce value. Updating the bucket policy to match the new endpoint ID makes the condition true again while keeping access restricted to that specific private endpoint.

Why this answer

Option A is correct because redeploying a VPC Endpoint creates a new endpoint ID (vpce-*). The bucket policy explicitly allows access only if aws:SourceVpce matches the original endpoint ID. Since the new endpoint has a different ID, the condition fails, causing AccessDenied.

Updating the bucket policy to reference the new vpce ID restores access.

Exam trap

The trap here is that candidates assume IAM permissions alone are sufficient, overlooking that bucket policy conditions tied to a specific VPC endpoint ID become invalid after the endpoint is redeployed, causing an AccessDenied even with correct IAM roles.

How to eliminate wrong answers

Option B is wrong because adding a NAT Gateway would route traffic over the public internet, defeating the purpose of private connectivity and violating the bucket policy's SourceVpce condition. Option C is wrong because removing the condition would allow any VPC endpoint or public access to the bucket, compromising the security requirement for private-only access. Option D is wrong because the error is AccessDenied, not a missing permission; s3:PutObject is irrelevant to GetObject requests and does not address the condition mismatch.

Full explanation →

40

Multi-Selecthard

A internal reporting portal has old unattached EBS volumes and many stale snapshots. Which two actions reduce storage cost without affecting running instances?

Select 2 answers

A.Disable CloudTrail logging

B.Stop all EC2 instances in the account

C.Delete unattached EBS volumes after verifying they are no longer needed

D.Apply snapshot lifecycle policies to expire obsolete snapshots

AnswersC, D

Unattached volumes continue to incur charges until deleted.

Why this answer

Option C is correct because unattached EBS volumes incur storage costs without providing any benefit to running instances. Deleting them after verifying they are no longer needed directly reduces these costs without affecting any active EC2 instances.

Exam trap

The trap here is that candidates may confuse stopping EC2 instances with reducing EBS costs, but stopping instances does not delete the underlying volumes or snapshots, so storage charges continue.

Full explanation →

41

Matchinghard

A company runs a stateless application tier behind an Application Load Balancer. Match each observed scaling pattern on the left to the best Auto Scaling strategy or metric on the right.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts

Matches

Scale the Auto Scaling group on ALB RequestCountPerTarget.

Scale on SQS queue depth using a custom CloudWatch metric.

Use scheduled scaling to add capacity before the recurring surge.

Use target tracking on EC2 CPUUtilization.

Why these pairings

Steady increase uses step scaling for gradual adjustments; sudden spikes use simple scaling; cyclical patterns use scheduled scaling; consistent low traffic may need no scaling; unpredictable bursts use target tracking; gradual decrease uses simple scaling.

Full explanation →

42

MCQhard

Based on the exhibit, a public API is behind CloudFront. A single client IP is sending bursts of requests that are overwhelming the origin, and the team wants AWS to automatically mitigate the abuse at the edge without changing the application code. What should the team do?

A.Associate an AWS WAF web ACL with CloudFront and add a rate-based rule for the offending IP behavior.

B.Increase the ALB idle timeout to allow the origin to absorb more concurrent requests.

C.Add an Amazon Route 53 health check to fail over traffic to another DNS name.

D.Enable AWS Shield Advanced and rely on automatic DDoS protection for all request bursts.

AnswerA

AWS WAF is the right control at the CloudFront edge because it can inspect requests before they reach the origin and enforce a rate-based rule on abusive traffic patterns. A rate-based rule can automatically count requests by source IP and block or challenge requests that exceed the configured threshold, which directly addresses the burst traffic shown in the logs. This meets the requirement to mitigate at the edge without any application changes.

Why this answer

AWS WAF rate-based rules automatically block or rate-limit requests from a client IP when the request rate exceeds a threshold you define. By associating the web ACL with CloudFront, the rule is enforced at the edge before traffic reaches the origin, mitigating abuse without modifying application code.

Exam trap

The trap here is that candidates confuse AWS Shield Advanced's automatic DDoS mitigation (which handles network/transport layer floods) with the need for a WAF rate-based rule to stop application-layer request bursts from a single IP.

How to eliminate wrong answers

Option B is wrong because increasing the ALB idle timeout does not reduce the volume of requests hitting the origin; it only keeps idle connections open longer, which can actually worsen resource exhaustion. Option C is wrong because Route 53 health checks and failover reroute traffic to another endpoint but do not mitigate bursts from a single IP; the abusive client would simply follow the failover. Option D is wrong because AWS Shield Advanced provides enhanced DDoS protection against volumetric attacks, but it does not automatically apply per-IP rate limiting for application-layer request bursts; a rate-based rule in AWS WAF is required for that granular control.

Full explanation →

43

MCQeasy

A production Amazon RDS database has automated backups enabled. An application mistakenly updates a table and the issue is discovered one hour later. The team needs to restore the database to the exact state it had 45 minutes ago. Which approach best meets the requirement?

A.Perform a point-in-time restore to a timestamp within the automated backup window.

B.Restore only from the latest daily snapshot, then manually undo the last hour’s changes.

C.Increase Multi-AZ to generate a new standby and redirect traffic back to the previous primary state.

D.Stop the database and change the application to ignore the table going forward.

AnswerA

Point-in-time restore lets RDS recover to a specific time, which matches the “45 minutes ago” requirement.

Why this answer

Amazon RDS Point-in-Time Recovery (PITR) allows you to restore a DB instance to any specific second within the automated backup retention period, which includes the transaction logs needed to reconstruct the database state at the desired time. Since the issue is discovered one hour after the mistaken update, PITR can restore the database to exactly 45 minutes ago by replaying binary logs up to that precise timestamp, meeting the requirement without data loss.

Exam trap

The trap here is that candidates often confuse automated backups with manual snapshots or assume Multi-AZ provides recovery capabilities, but Multi-AZ only ensures failover, not point-in-time restoration, and PITR requires transaction logs, not just snapshots.

How to eliminate wrong answers

Option B is wrong because restoring from the latest daily snapshot would only provide a backup from the snapshot time (likely hours or days old), not the state 45 minutes ago, and manually undoing changes is error-prone and not supported by RDS as a native feature. Option C is wrong because Multi-AZ provides high availability through a standby replica, but it does not offer point-in-time recovery; the standby is an identical copy of the primary and cannot be used to revert to a previous state. Option D is wrong because stopping the database and ignoring the table does not restore the lost or incorrect data; it only avoids the immediate symptom without recovering the required database state.

Full explanation →

44

MCQhard

A Lambda-based retail API has unpredictable traffic spikes and users see latency caused by cold starts. The function must respond consistently during expected campaign windows. What should be configured? The design must avoid adding custom operational scripts.

A.A larger deployment package

B.Reserved concurrency only

C.Provisioned concurrency during campaign windows

D.CloudTrail data events

AnswerC

Provisioned concurrency keeps execution environments initialized and reduces cold-start latency.

Why this answer

Provisioned concurrency initializes a specified number of execution environments in advance, eliminating cold starts during campaign windows. This ensures consistent response times for the Lambda-based retail API under unpredictable traffic spikes without requiring custom scripts.

Exam trap

The trap here is confusing reserved concurrency (which prevents throttling but does not address cold starts) with provisioned concurrency (which eliminates cold starts by keeping environments warm).

How to eliminate wrong answers

Option A is wrong because a larger deployment package increases cold start duration, making latency worse. Option B is wrong because reserved concurrency only caps the maximum number of concurrent executions to prevent throttling, but does not pre-warm environments to avoid cold starts. Option D is wrong because CloudTrail data events record API activity for auditing, not performance optimization.

Full explanation →

45

MCQmedium

A web application runs on an EC2 Auto Scaling group (ASG) behind an Application Load Balancer (ALB). The ASG spans three Availability Zones. After a deployment, new instances frequently fail the ALB target group health checks with HTTP 5xx responses and are quickly terminated by the ASG. What change most improves resiliency during deployments with minimal downtime by preventing premature removal of instances that are still starting?

A.Reduce the ASG health check grace period to 0 seconds so issues are detected faster.

B.Use a longer ASG health check grace period and deploy new instances using controlled replacement (for example, rolling instance refresh) so existing healthy instances continue serving while new ones warm up.

C.Restrict the ASG to a single Availability Zone so health check evaluation is simpler.

D.Disable ALB health checks so the ASG does not terminate instances on HTTP 5xx responses.

AnswerB

A longer ASG health check grace period prevents instances from being evaluated too early during normal startup time. Controlled replacement or rolling instance refresh ensures capacity is maintained while new instances warm up, so the ALB continues routing requests only to healthy targets.

Why this answer

Option B is correct because increasing the ASG health check grace period gives new instances more time to complete their startup and pass the ALB health checks before the ASG marks them unhealthy. A rolling instance refresh replaces instances in a controlled manner, ensuring that existing healthy instances continue serving traffic while new instances warm up, minimizing downtime and preventing premature termination.

Exam trap

The trap here is that candidates think reducing the grace period or disabling health checks will speed up recovery, when in fact it causes premature termination or serves traffic to unhealthy instances, increasing downtime.

How to eliminate wrong answers

Option A is wrong because reducing the grace period to 0 seconds would cause the ASG to terminate instances even faster when they return HTTP 5xx during startup, worsening the problem. Option C is wrong because restricting to a single Availability Zone reduces fault tolerance and does not address the root cause of premature termination during startup. Option D is wrong because disabling ALB health checks would prevent the ASG from detecting actual instance failures, leading to serving traffic to unhealthy instances and increasing downtime.

Full explanation →

46

MCQmedium

A team serves static assets from an S3 origin through CloudFront. Cache hit ratio is low. Analytics show that requests include an Authorization header (even though the assets are public) and the cache key currently varies on that header, causing CloudFront to treat the same asset as different cache entries. What is the best change to improve cache hit ratio without breaking access controls?

A.Keep Authorization in the CloudFront cache key, but increase the origin response minimum TTL to 1 day.

B.Modify the CloudFront cache policy so the cache key does not include the Authorization header.

C.Switch the S3 origin from the current bucket to a website endpoint to enable automatic caching headers.

D.Enable CloudFront to forward all headers to S3 so origin can decide caching behavior per request.

AnswerB

CloudFront cache hit ratio depends on what constitutes a unique cache key. If Authorization is included, identical public assets requested with different Authorization values will map to different cache objects and reduce reuse. Removing Authorization from the cache key makes those requests share the same edge cache entry, improving hit ratio and reducing origin traffic. Because the scenario states the assets are public, removing Authorization from the cache key does not break access controls (access is not controlled by Authorization at the origin).

Why this answer

The low cache hit ratio is caused by the Authorization header being included in the CloudFront cache key, which creates separate cache entries for the same object even though the assets are public. By modifying the cache policy to exclude the Authorization header, CloudFront will treat all requests for the same asset as identical, dramatically improving the cache hit ratio without affecting access controls because the assets are already public.

Exam trap

The trap here is that candidates may think increasing TTL or changing the origin type will fix caching, when the real issue is the cache key composition—specifically, the Authorization header fragmenting the cache.

How to eliminate wrong answers

Option A is wrong because increasing the minimum TTL does not address the root cause—the cache key still varies on the Authorization header, so separate cache entries will persist and the cache hit ratio will remain low. Option C is wrong because switching to an S3 website endpoint does not change how CloudFront caches based on headers; the cache key is still controlled by the CloudFront cache policy, not the origin type. Option D is wrong because forwarding all headers to S3 would include the Authorization header in the cache key, making the problem worse by further fragmenting the cache.

Full explanation →

47

MCQmedium

Based on the exhibit, the application sees several minutes of connection errors during an Aurora failover. What is the best change to reduce failover impact?

A.Change the application to use the Aurora cluster writer endpoint and retry transient connections.

B.Add an Aurora read replica and keep using the same JDBC URL.

C.Increase the EC2 instance size of the application servers.

D.Switch to a single-AZ RDS PostgreSQL instance for simpler connectivity.

AnswerA

The current configuration targets a specific instance endpoint, which becomes stale after failover. The Aurora cluster writer endpoint always resolves to the current writer, so the application can reconnect without manual endpoint changes. Adding retries with backoff helps the application survive the short DNS and connection transition during failover.

Why this answer

The Aurora cluster writer endpoint always points to the current primary instance, even after a failover. By using this endpoint and implementing retry logic for transient connection errors, the application can automatically reconnect to the new writer without manual intervention, reducing the impact of the failover from several minutes to seconds.

Exam trap

The trap here is that candidates often think adding read replicas or scaling application servers will fix failover connectivity, but the real issue is that the application must use the correct endpoint and handle transient disconnections gracefully.

How to eliminate wrong answers

Option B is wrong because adding a read replica does not help with write connection errors during failover; the application would still need to reconnect to the new writer, and the same JDBC URL (which likely points to a specific instance) would fail after failover. Option C is wrong because increasing the EC2 instance size of the application servers does not address the root cause of connection errors during failover; it only improves compute capacity, not database connectivity resilience. Option D is wrong because switching to a single-AZ RDS PostgreSQL instance would actually increase downtime during a failure (no automatic failover) and does not solve the transient connection issue; it also loses Aurora's high-availability features.

Full explanation →

48

MCQmedium

A SOC analyst needs an immutable, centralized audit record of configuration and API changes across multiple AWS accounts. Recently, an operator changed an IAM role trust policy, and investigators must determine exactly which principal made the change and which parameters were used. Your current setup sends application logs to CloudWatch Logs, but there is no organization-level API audit logging. Which approach best satisfies the requirement?

A.Enable an AWS Organizations CloudTrail organization trail that delivers management event logs (including IAM) to a centralized S3 bucket in a dedicated audit account, for all regions.

B.Use CloudWatch Logs metric filters on application logs to infer which principals changed trust policies.

C.Rely on GuardDuty alerts to provide the full request parameters for every IAM policy change.

D.Enable AWS Config only and store periodic snapshots without CloudTrail management events.

AnswerA

CloudTrail management events provide authoritative audit logs for API actions like IAM policy changes and can be centralized via an organization trail.

Why this answer

Option A is correct because an AWS Organizations CloudTrail organization trail captures management events (including IAM API calls like 'UpdateAssumeRolePolicy') across all accounts in the organization, delivering immutable logs to a centralized S3 bucket in a dedicated audit account. This provides the exact principal ARN, source IP, user agent, and request parameters for every API call, meeting the requirement for a centralized, immutable audit record of configuration and API changes.

Exam trap

The trap here is that candidates confuse AWS Config's resource tracking with CloudTrail's API-level auditing, failing to realize that only CloudTrail captures the 'who' and 'how' (principal and parameters) of a change, while Config only records the 'what' (state after change).

How to eliminate wrong answers

Option B is wrong because CloudWatch Logs metric filters on application logs can only infer patterns from log data, not capture the exact API request parameters or principal identity for IAM policy changes, and application logs are not immutable or centralized across accounts. Option C is wrong because GuardDuty alerts are designed for threat detection (e.g., anomalous API behavior) and do not provide full request parameters for every IAM policy change; they only generate findings based on suspicious activity, not a complete audit trail. Option D is wrong because AWS Config alone records resource configuration changes (e.g., IAM policy state) but does not capture who made the change or the API request parameters; it requires CloudTrail to record the API caller identity and parameters.

Full explanation →

49

MCQmedium

A read-heavy media archive repeatedly queries the same product catalogue data from DynamoDB with millisecond latency requirements. Which service can reduce read latency and table load? The architecture review board prefers a managed AWS-native control.

A.DynamoDB Accelerator (DAX)

B.Amazon Kinesis Data Firehose

C.AWS Glue Data Catalog

D.S3 Transfer Acceleration

AnswerA

DAX is an in-memory cache for DynamoDB that reduces read latency for suitable access patterns.

Why this answer

DynamoDB Accelerator (DAX) is an in-memory cache for DynamoDB that delivers microsecond read latency, directly addressing the millisecond requirement. By caching frequently accessed product catalogue data, DAX offloads read requests from the DynamoDB table, reducing table load and read capacity unit consumption. As a fully managed, AWS-native service, it aligns with the architecture review board's preference for managed controls.

Exam trap

The trap here is that candidates may confuse S3 Transfer Acceleration (which optimizes uploads to S3) with a caching solution for DynamoDB, or mistakenly think Glue Data Catalog or Kinesis Firehose can cache database queries, when only DAX provides in-memory acceleration for DynamoDB reads.

How to eliminate wrong answers

Option B is wrong because Amazon Kinesis Data Firehose is a streaming data ingestion service for loading data into data lakes or analytics tools, not a caching or read-latency reduction solution for DynamoDB. Option C is wrong because AWS Glue Data Catalog is a metadata repository for ETL and data discovery, not a cache that accelerates DynamoDB read queries. Option D is wrong because S3 Transfer Acceleration uses AWS edge locations to speed up uploads to S3 over long distances, but it does not cache DynamoDB data or reduce read latency for repeated queries.

Full explanation →

50

MCQeasy

A company runs a stateless web API on Amazon EC2 behind an Application Load Balancer. The team notices that during business hours, the ALB starts queueing requests and the average request latency rises. They want to scale out quickly and reliably based on demand, not CPU alone. Which Auto Scaling approach best matches this requirement?

A.Use a fixed-size Auto Scaling group and increase capacity manually once per hour.

B.Use target tracking scaling based on ALB request count per target.

C.Scale based only on EC2 instance memory utilization, regardless of load.

D.Use step scaling with a single threshold on average network-in bytes.

AnswerB

Target tracking can automatically adjust capacity using ALB load metrics and respond faster.

Why this answer

Option B is correct because target tracking scaling based on ALB request count per target directly measures the load on each instance, allowing the Auto Scaling group to add or remove instances to maintain a target value. This approach scales out quickly and reliably based on actual demand (request queuing and latency), not just CPU, which aligns with the requirement to respond to rising latency and queueing during business hours.

Exam trap

The trap here is that candidates often default to CPU-based scaling (a common but incomplete metric) or memory-based scaling, overlooking that for a stateless web API behind an ALB, request count per target is the most direct indicator of demand and latency issues.

How to eliminate wrong answers

Option A is wrong because manual scaling once per hour cannot react quickly to sudden demand spikes during business hours, leading to continued queueing and latency. Option C is wrong because scaling based solely on memory utilization ignores the actual request load and latency, and a stateless web API may not show memory pressure even when request queueing is high. Option D is wrong because step scaling with a single threshold on average network-in bytes is not directly correlated with request queueing or latency, and network-in can be influenced by factors other than application demand (e.g., large payloads), making it unreliable for scaling based on request count.

Full explanation →

51

MCQhard

A patient portal must use shared file storage across Linux EC2 instances in multiple Availability Zones. The storage must remain available during an AZ failure. Which service should be used? The architecture review board prefers a managed AWS-native control.

A.Instance store volumes

B.Amazon EFS with mount targets in multiple Availability Zones

C.An EBS volume attached to all instances

D.S3 mounted as a POSIX file system without a file gateway

AnswerB

EFS is regional file storage and supports mount targets across AZs.

Why this answer

Amazon EFS provides a fully managed, shared POSIX-compliant file system that can be mounted concurrently across multiple Linux EC2 instances. By creating mount targets in multiple Availability Zones, the file system remains accessible even if one AZ fails, meeting the high-availability requirement. This aligns with the architecture review board's preference for a managed AWS-native control.

Exam trap

The trap here is that candidates may confuse EBS multi-attach (which is limited to a single AZ and specific volume types) with the cross-AZ shared file system capability of EFS, or incorrectly assume that mounting S3 as a POSIX file system is a reliable, managed solution for shared storage.

How to eliminate wrong answers

Option A is wrong because instance store volumes are ephemeral, tied to a single EC2 instance, and data is lost on instance stop or termination, so they cannot provide shared, durable storage across AZs. Option C is wrong because a single EBS volume can only be attached to one EC2 instance at a time (except for multi-attach io1/io2 volumes, which are limited to a few Nitro-based instances and still not designed for cross-AZ shared file storage). Option D is wrong because mounting an S3 bucket as a POSIX file system (e.g., via s3fs-fuse) does not provide native POSIX locking or consistency semantics, and it introduces performance and reliability issues; it is not a managed AWS-native file system service.

Full explanation →

52

MCQeasy

You store application logs in an S3 bucket. After 30 days, the logs are rarely accessed, but you must retain them for 1 year for compliance. Which S3 feature is the best way to reduce storage cost while meeting the retention requirement?

A.Create an S3 lifecycle rule to transition older objects to a colder storage class after 30 days, then expire after 1 year

B.Keep all logs in S3 Standard and rely on lower request rates to reduce cost

C.Copy logs to EBS snapshots each week and delete the original files

D.Use S3 replication to a second bucket in another region to reduce costs

AnswerA

S3 lifecycle policies can automatically transition objects to lower-cost storage classes based on age. Transitioning after 30 days reduces ongoing storage costs because the logs are rarely accessed, while expiring after 1 year ensures you still meet the compliance retention window.

Why this answer

Option A is correct because an S3 Lifecycle rule can automatically transition objects from S3 Standard to a colder storage class (e.g., S3 Glacier Instant Retrieval or S3 Glacier Deep Archive) after 30 days, reducing storage costs for rarely accessed logs. After 1 year, the rule can expire the objects, which permanently deletes them, meeting the compliance retention requirement without manual intervention.

Exam trap

The trap here is that candidates may think S3 Standard is always the cheapest option for infrequently accessed data, but they overlook the significant cost savings from lifecycle transitions to colder storage classes like S3 Glacier Deep Archive, which are designed for long-term archival with rare access.

How to eliminate wrong answers

Option B is wrong because keeping all logs in S3 Standard incurs higher storage costs for data that is rarely accessed after 30 days, and lower request rates do not offset the per-GB storage cost of S3 Standard. Option C is wrong because copying logs to EBS snapshots is not a cost-effective or scalable solution for log retention; EBS snapshots are designed for block-level backups of volumes, not for storing individual log files, and deleting the original S3 objects would lose the logs. Option D is wrong because S3 replication to a second bucket in another region increases costs due to replication fees, data transfer charges, and storage costs in the destination bucket, and does not inherently reduce storage costs.

Full explanation →

53

MCQeasy

A security team requires that every object uploaded to s3://secure-bucket/uploads/ must be encrypted using SSE-KMS with a specific customer-managed KMS key. Which S3 bucket policy condition approach best enforces this requirement for PutObject requests?

A.Deny PutObject unless s3:x-amz-server-side-encryption equals "aws:kms" and s3:x-amz-server-side-encryption-aws-kms-key-id equals the required CMK ARN

B.Allow PutObject only when aws:SecureTransport is true; encryption is then guaranteed automatically

C.Deny PutObject if the request includes Content-Type other than "application/octet-stream"

D.Deny PutObject when the caller’s role is not allowed to kms:Decrypt in their IAM policy

AnswerA

This enforces the encryption choice at upload time by validating the request headers that specify SSE-KMS and the exact KMS key ID/ARN. Using a Deny condition ensures uploads that do not include the correct SSE-KMS headers (for example, unencrypted uploads or uploads using a different KMS key) are rejected immediately.

Why this answer

Option A is correct because it uses a Deny effect with the s3:x-amz-server-side-encryption condition key set to 'aws:kms' and the s3:x-amz-server-side-encryption-aws-kms-key-id condition key set to the specific customer-managed KMS key ARN. This ensures that any PutObject request that does not include both the required encryption header and the exact KMS key identifier is denied, enforcing the encryption requirement at the bucket policy level.

Exam trap

The trap here is that candidates often confuse encryption in transit (aws:SecureTransport) with encryption at rest (SSE-KMS), or they mistakenly think that checking the caller's KMS permissions in the bucket policy is sufficient, when in fact the policy must inspect the request headers to enforce the encryption requirement.

How to eliminate wrong answers

Option B is wrong because requiring aws:SecureTransport (HTTPS) only ensures the data is encrypted in transit, not at rest; it does not enforce SSE-KMS or a specific KMS key. Option C is wrong because restricting Content-Type to 'application/octet-stream' has no relation to server-side encryption and would block legitimate uploads with other content types. Option D is wrong because denying PutObject based on the caller's inability to kms:Decrypt is irrelevant; the encryption requirement is about the upload process (kms:GenerateDataKey and kms:Encrypt), not decryption, and the condition should check the request headers, not the caller's IAM permissions.

Full explanation →

54

MCQmedium

A global video platform serves mostly static images and JavaScript files from an S3 origin. Users in distant countries report slow load times. What should improve performance most? The team wants the control to be enforceable during normal operations.

A.A larger S3 bucket

B.Amazon CloudFront distribution with the S3 bucket as origin

C.RDS read replicas

D.An EC2 Auto Scaling group in one Region

AnswerB

CloudFront caches content at edge locations close to users, reducing latency.

Why this answer

Amazon CloudFront is a content delivery network (CDN) that caches static content (images, JavaScript) at edge locations worldwide, reducing latency for users in distant countries. By using the S3 bucket as an origin, CloudFront serves cached copies from the nearest edge, drastically improving load times. This solution is enforceable during normal operations because CloudFront provides cache control headers and invalidation APIs to manage content freshness.

Exam trap

The trap here is that candidates may confuse scaling compute (EC2 Auto Scaling) or database (RDS read replicas) with content delivery, failing to recognize that static content performance is solved by a CDN like CloudFront, not by scaling backend resources.

How to eliminate wrong answers

Option A is wrong because increasing the S3 bucket size does not reduce latency; S3 is a regional service and does not cache content globally. Option C is wrong because RDS read replicas are for database read scaling, not for serving static files from S3. Option D is wrong because an EC2 Auto Scaling group in one Region does not address global latency; it only scales compute capacity in a single geographic area, leaving distant users unaffected.

Full explanation →

55

Multi-Selecthard

A private application in two private subnets must download objects from S3 and read parameters from Systems Manager Parameter Store without routing traffic through the public internet. Which two components should the architect use? The security team requires the decision to be auditable.

Select 2 answers

A.Interface VPC endpoint for Systems Manager

B.Internet gateway attached to the VPC

C.NAT gateway in each Availability Zone

D.Gateway VPC endpoint for Amazon S3

AnswersA, D

Systems Manager/Parameter Store access uses interface endpoints powered by AWS PrivateLink.

Why this answer

Interface VPC endpoints (AWS PrivateLink) allow private subnets to access Systems Manager Parameter Store without traversing the internet, using private IP addresses within the VPC. This meets the requirement for private, auditable access because all traffic stays within the AWS network and can be logged via VPC Flow Logs.

Exam trap

The trap here is that candidates often confuse NAT gateways as a private-only solution, not realizing they still route through the internet gateway and public internet, which fails the 'no public internet' requirement.

Full explanation →

56

MCQeasy

Based on the exhibit, the team wants to improve application performance without changing the code. Which EC2 instance family should they choose next?

A.Choose a compute-optimized instance family such as C6i to increase CPU performance.

B.Choose a memory-optimized instance family such as R6i to provide more RAM.

C.Choose a storage-optimized instance family such as I4i to improve block storage throughput.

D.Choose a burstable instance family such as T3 to reduce cost and improve performance.

AnswerB

Memory-optimized instances are the best fit when memory pressure is causing slowdowns. The exhibit shows CPU is low while memory is consistently near saturation, which strongly suggests the application needs more RAM rather than more compute. Moving to an R6i family should reduce paging and improve response times without changing the application design.

Why this answer

The exhibit shows that the application is experiencing high memory utilization (e.g., memory pressure or swapping), which degrades performance. Choosing a memory-optimized instance family such as R6i provides more RAM per vCPU, directly addressing the bottleneck without requiring code changes. This improves application performance by reducing or eliminating swap usage and allowing more data to be cached in memory.

Exam trap

The trap here is that candidates often assume 'improving performance' always means faster CPU or storage, but the exhibit’s memory utilization metric directly points to a memory bottleneck, making the memory-optimized family the correct choice despite the lack of explicit code changes.

How to eliminate wrong answers

Option A is wrong because compute-optimized instances (C6i) increase CPU performance, but the exhibit indicates the bottleneck is memory, not CPU; thus, more CPU would not resolve high memory utilization. Option C is wrong because storage-optimized instances (I4i) improve block storage throughput and IOPS, which is irrelevant if the performance issue stems from insufficient RAM rather than disk I/O. Option D is wrong because burstable instances (T3) are designed for workloads with low average CPU usage and can actually degrade performance under sustained high load due to CPU credit exhaustion; they do not address memory constraints and may worsen the problem.

Full explanation →

57

MCQmedium

An application in Account B (IAM role arn:aws:iam::account-b:role/app-read) reads objects from an S3 bucket in Account A. The bucket uses SSE-KMS with a customer-managed KMS key in Account A. Object reads consistently fail with an error that includes "AccessDenied" and "kms:Decrypt". The IAM permissions in Account B for kms:Decrypt are correct, but the requests still fail. Which change will most directly fix the failure?

A.Add kms:Decrypt to the KMS key policy in Account A for the Account B role arn:aws:iam::account-b:role/app-read, and remove kms:Decrypt from the role policy in Account B.

B.Update the IAM role in Account B to use the s3:GetObject permission only, and rely on S3 to authorize KMS decrypt automatically.

C.Modify the KMS key policy in Account A to allow kms:Decrypt for the Account B role arn:aws:iam::account-b:role/app-read, using the appropriate cross-account conditions (for example, allowing the use via S3 and the expected encryption context for the bucket).

D.Switch the S3 bucket encryption from SSE-KMS to SSE-S3, keeping all existing IAM and KMS configuration unchanged.

AnswerC

For SSE-KMS, S3 must call KMS Decrypt when serving objects. KMS authorization is evaluated against the KMS key policy in Account A in addition to the identity policy in Account B. If the error includes kms:Decrypt AccessDenied in a cross-account scenario, the most direct fix is to update the KMS key policy to allow the Account B role to use the key for decrypt (often with conditions tied to S3 usage and the specific bucket/object encryption context).

Why this answer

Option C is correct because when using SSE-KMS with a customer-managed KMS key in a cross-account scenario, the KMS key policy must explicitly grant the external IAM role (arn:aws:iam::account-b:role/app-read) permission to perform kms:Decrypt. Even if the IAM role in Account B has the correct kms:Decrypt permission, the KMS key policy in Account A acts as a resource-based policy that must also allow the cross-account principal. Without this, the KMS service denies the decrypt request, resulting in the 'AccessDenied' error.

Exam trap

The trap here is that candidates often assume IAM permissions alone are sufficient for cross-account KMS operations, forgetting that KMS key policies are resource-based and must explicitly allow external principals, even when the IAM role has the correct permissions.

How to eliminate wrong answers

Option A is wrong because removing kms:Decrypt from the Account B role policy would remove the necessary permission from the IAM principal, and the KMS key policy alone cannot grant permissions to a cross-account role without the role also having the corresponding IAM permission. Option B is wrong because S3 does not automatically authorize KMS decrypt; the s3:GetObject permission alone does not grant the required kms:Decrypt action, and the KMS key policy must still allow the cross-account role. Option D is wrong because switching to SSE-S3 would change the encryption method and potentially break existing data encrypted with SSE-KMS, and it does not address the root cause of missing cross-account KMS key policy permissions.

Full explanation →

58

Multi-Selectmedium

An ecommerce company runs a 24/7 frontend tier on EC2 and a nightly image-rendering job that can be interrupted and resumed from checkpoints. They want to minimize monthly compute cost without changing the application architecture. Which two actions should they take? Select two.

Select 2 answers

A.Purchase a Compute Savings Plan for the always-on frontend tier.

B.Use Spot Instances for the rendering job fleet.

C.Move both workloads to Dedicated Hosts.

D.Keep the rendering fleet entirely on On-Demand Instances.

E.Use dedicated GPUs for the frontend tier even though it is CPU-light.

AnswersA, B

A Compute Savings Plan reduces cost for steady compute usage while keeping flexibility across supported compute services. It is a strong fit for the always-on frontend tier because the workload runs continuously and is not interruption tolerant.

Why this answer

A Compute Savings Plan offers the largest discount (up to 66%) in exchange for a consistent compute spend commitment, making it ideal for the always-on frontend tier that runs 24/7. This plan applies to any EC2 instance family, region, and even Fargate or Lambda, providing flexibility while reducing costs for steady-state workloads.

Exam trap

The trap here is that candidates might choose On-Demand Instances for the rendering job out of fear of interruptions, failing to recognize that checkpointing makes Spot Instances viable and cost-effective.

Full explanation →

59

MCQhard

A Lambda-based retail API has unpredictable traffic spikes and users see latency caused by cold starts. The function must respond consistently during expected campaign windows. What should be configured? The architecture review board prefers a managed AWS-native control.

A.A larger deployment package

B.Reserved concurrency only

C.Provisioned concurrency during campaign windows

D.CloudTrail data events

AnswerC

Provisioned concurrency keeps execution environments initialized and reduces cold-start latency.

Why this answer

Provisioned concurrency pre-warms a specified number of Lambda execution environments, eliminating cold starts during the campaign windows. This is a managed AWS-native feature that ensures consistent sub-100ms response times even under unpredictable traffic spikes, directly addressing the latency issue.

Exam trap

The trap here is that candidates confuse reserved concurrency (which only limits scaling) with provisioned concurrency (which pre-warms environments), leading them to pick option B as a cost-saving measure without realizing it does not solve cold starts.

How to eliminate wrong answers

Option A is wrong because a larger deployment package increases cold start latency (as more code must be loaded and initialized), making the problem worse. Option B is wrong because reserved concurrency only caps the maximum number of concurrent executions to prevent runaway scaling; it does not pre-warm environments and thus does not eliminate cold starts. Option D is wrong because CloudTrail data events capture API activity for auditing and governance, not for performance optimization or cold start mitigation.

Full explanation →

60

MCQeasy

A cross-account IAM role in Account B reads encrypted S3 objects from Account A. The objects use SSE-KMS with a customer-managed KMS key in Account A. Account B can successfully call s3:GetObject, but decryption fails with an AccessDeniedException from KMS. What change most directly fixes the issue?

A.Add kms:Decrypt only to the Account B role’s IAM policy, without changing the customer-managed KMS key policy in Account A.

B.Update the Account A S3 bucket policy to grant kms:Decrypt to Account B.

C.Update the customer-managed KMS key policy in Account A to allow kms:Decrypt for the specific Account B role principal.

D.Enable KMS key rotation, which automatically allows cross-account decrypt permissions.

AnswerC

With SSE-KMS, S3 calls KMS on your behalf during decryption. KMS checks the customer-managed key policy (and optionally grants). Allowing the Account B role principal in the KMS key policy for kms:Decrypt directly resolves KMS AccessDenied.

Why this answer

SSE-KMS with a customer-managed KMS key requires explicit permission to use the key for decryption. The S3 GetObject call succeeds because the bucket policy allows it, but KMS decryption fails because the KMS key policy in Account A does not grant kms:Decrypt to the IAM role principal in Account B. Updating the KMS key policy to allow the Account B role principal to call kms:Decrypt directly resolves the AccessDeniedException.

Exam trap

The trap here is that candidates assume S3 bucket policies can control KMS permissions, but KMS key policies are the sole mechanism for granting cross-account access to customer-managed keys, and IAM policies alone are insufficient for cross-account KMS operations.

How to eliminate wrong answers

Option A is wrong because adding kms:Decrypt only to the Account B role’s IAM policy is insufficient; cross-account access to a customer-managed KMS key requires the key policy in Account A to explicitly grant the permission to the external principal. Option B is wrong because S3 bucket policies can only grant S3 actions (like s3:GetObject), not KMS actions; KMS permissions must be granted via the KMS key policy or an IAM policy with appropriate trust. Option D is wrong because enabling KMS key rotation does not grant any new permissions; it only changes the backing key material periodically and has no effect on cross-account access control.

Full explanation →

61

MCQhard

A risk simulation workload generates analytics files that are accessed unpredictably. Some files become hot again months later. The team wants automatic storage cost optimisation without retrieval delays. What should be used?

A.Manual monthly review and object copying

B.S3 Glacier Flexible Retrieval for all files

C.S3 Intelligent-Tiering

D.EFS One Zone for analytics files

AnswerC

Intelligent-Tiering automatically moves objects between access tiers based on usage while preserving low-latency access.

Why this answer

S3 Intelligent-Tiering is the correct choice because it automatically moves objects between access tiers (frequent, infrequent, and archive instant access) based on changing access patterns, without any retrieval delays. This handles the unpredictable access described—files that become hot again months later—by keeping them in the archive instant access tier until access resumes, then promoting them instantly. It optimizes storage costs automatically without manual intervention or cold retrieval waits.

Exam trap

The trap here is that candidates often choose S3 Glacier Flexible Retrieval for cost savings, overlooking the 'without retrieval delays' requirement, which disqualifies any cold storage option that requires restoration time.

How to eliminate wrong answers

Option A is wrong because manual monthly review and object copying is not automatic, introduces operational overhead, and risks cost inefficiency or retrieval delays if the review cycle misses changing access patterns. Option B is wrong because S3 Glacier Flexible Retrieval has retrieval delays (minutes to hours) and is not suitable for files that may become hot again unpredictably, as it would cause unacceptable wait times for immediate access. Option D is wrong because EFS One Zone is a file system, not an object storage service, and does not provide the cost optimization or automatic tiering needed for unpredictable access patterns on analytics files.

Full explanation →

62

MCQhard

Based on the exhibit, a central deployment role in Account A is assumed by several CI/CD pipelines from Account B. The role must remain reusable, but the team wants the TeamA pipeline to upload artifacts only to s3://artifact-bucket/teamA/prod/ without creating a separate IAM role. What is the best approach?

A.Use an IAM user in Account B and hard-code the narrower S3 path in its access key policy.

B.Add a bucket ACL that grants write access only to the TeamA pipeline session name.

C.Attach a permissions boundary to the central role so every pipeline session inherits the narrower prefix automatically.

D.Pass an STS session policy when TeamA assumes the role to further restrict the temporary credentials to the teamA/prod prefix.

AnswerD

An STS session policy is specifically designed to reduce the permissions of temporary credentials for a single assume-role session. The reusable base role can remain broad enough for multiple pipelines, while TeamA can pass a session policy that limits effective permissions to the teamA/prod prefix. This preserves the shared role model and achieves least privilege without creating a separate IAM role.

Why this answer

Option D is correct because when the TeamA pipeline assumes the central IAM role in Account A, it can pass an STS session policy that further restricts the temporary credentials to only allow actions on the s3://artifact-bucket/teamA/prod/ prefix. This approach keeps the role reusable for other pipelines while enforcing a narrower permission scope at the session level, without requiring a separate IAM role.

Exam trap

The trap here is that candidates often think a permissions boundary (Option C) can dynamically restrict individual sessions, but permissions boundaries set a hard limit on the role's overall permissions and cannot be applied per-session like an STS session policy can.

How to eliminate wrong answers

Option A is wrong because IAM users in Account B cannot directly access resources in Account A via hard-coded access keys; cross-account access requires IAM roles and trust policies, and hard-coding keys violates security best practices. Option B is wrong because S3 bucket ACLs do not support restricting access based on an IAM role session name; ACLs are legacy and cannot filter by session tags or names. Option C is wrong because a permissions boundary sets the maximum permissions for the role itself, not for individual sessions; it would apply to all pipelines assuming the role, not just TeamA, and cannot dynamically restrict to a specific prefix per session.

Full explanation →

63

MCQmedium

An orders service publishes payment instructions to an Amazon SQS Standard queue. The downstream processor sometimes times out after it has already applied the payment, but before it can delete the message from the queue. As a result, the same payment instruction can be processed more than once. The team wants the strongest way to prevent duplicate side effects while keeping the system decoupled. What should they implement?

A.Keep the queue as SQS Standard but increase the visibility timeout so duplicates are less likely to reappear during timeouts.

B.Change the queue to an SQS FIFO queue and use a stable deduplication ID derived from the payment instruction ID.

C.Make the downstream processor idempotent by recording processed payment instruction IDs in a durable datastore and ignoring repeats.

D.Use an ALB health check to restart the downstream processor when timeouts occur.

AnswerC

SQS Standard is at-least-once delivery, so the same message can be delivered more than once if the consumer times out before deleting it. Idempotent processing is the strongest protection against duplicate side effects because it prevents repeat application of the payment even when the message is redelivered.

Why this answer

Option C is correct because making the downstream processor idempotent ensures that duplicate payment instructions are safely ignored, even if the same message is delivered more than once. This approach provides the strongest guarantee against duplicate side effects without requiring changes to the queue type or increasing visibility timeouts, and it keeps the system fully decoupled.

Exam trap

The trap here is that candidates often assume that switching to a FIFO queue or increasing visibility timeout fully solves duplicate processing, but they overlook that the downstream processor's timeout after applying the payment is the root cause, which idempotency directly addresses.

How to eliminate wrong answers

Option A is wrong because increasing the visibility timeout only reduces the likelihood of duplicates but does not eliminate them; a timeout can still occur after processing, leading to the same duplicate issue. Option B is wrong because switching to an SQS FIFO queue with a deduplication ID prevents duplicate messages from being delivered, but it does not prevent the downstream processor from timing out after applying the payment and before deleting the message, so the same message could be redelivered and processed again. Option D is wrong because an ALB health check only restarts the downstream processor when timeouts occur, but it does not prevent duplicate processing of the same payment instruction.

Full explanation →

64

MCQmedium

Your order-processing system uses EventBridge rules to send events to a Lambda function that updates order status. Over the last week, some events fail with a transient database timeout, and the Lambda retries intermittently but then the events are lost (no alerts after failures). You want at-least-once processing, bounded retries, and a way to inspect unprocessable events for later reprocessing. Which architecture change best meets these requirements?

A.Send EventBridge events to an SQS queue, configure a redrive policy to move messages to a dead-letter queue (DLQ) after a defined receive count, and make the Lambda processing idempotent.

B.Invoke Lambda directly from EventBridge in asynchronous mode, and increase the Lambda timeout to reduce failures.

C.Use SNS topics with Lambda subscriptions, but remove all retry and DLQ configuration to minimize duplicate events.

D.Store failed events only in CloudWatch logs, and have operators manually copy log entries back into the database for reprocessing.

AnswerA

EventBridge-to-SQS provides buffering and decoupling; SQS redrive with a DLQ bounds retries and preserves failed events for analysis and replay.

Why this answer

Option A is correct because it introduces an SQS queue between EventBridge and Lambda, which provides a durable buffer for events. The redrive policy moves events to a dead-letter queue (DLQ) after a defined number of failed processing attempts, ensuring bounded retries and preserving unprocessable events for later inspection and reprocessing. Making the Lambda idempotent guarantees at-least-once processing even if duplicate events occur.

Exam trap

The trap here is that candidates may think increasing Lambda timeout or relying on asynchronous invocation retries alone is sufficient, but they overlook the need for a DLQ to capture and inspect events that persistently fail, which is a key requirement for operational visibility and reprocessing.

How to eliminate wrong answers

Option B is wrong because increasing the Lambda timeout does not address transient database timeouts; it only gives the function more time to complete, but failures can still occur and events will be lost if the Lambda's asynchronous invocation retry limit is exhausted without a DLQ. Option C is wrong because SNS with Lambda subscriptions does not provide a built-in DLQ mechanism; removing retry and DLQ configuration would cause events to be dropped immediately on failure, violating the requirement for bounded retries and inspectable unprocessable events. Option D is wrong because storing failed events only in CloudWatch logs does not provide a structured, queryable, or automated reprocessing mechanism; manual copying is error-prone, unscalable, and does not meet the requirement for bounded retries or at-least-once processing.

Full explanation →

65

MCQmedium

A retail company lets developers deploy ECS services but they must never be able to modify IAM. The team currently uses an IAM user per developer with an admin-like policy, and several access keys have been leaked. You are asked to redesign access so that: (1) developers authenticate with temporary credentials, (2) they can create/update ECS services and related autoscaling resources, and (3) IAM changes are impossible even if a developer tries to attach new policies. Which design best meets all requirements?

A.Create an IAM user for each developer and keep the existing broad permissions, rotating keys every 90 days.

B.Use an IAM role that developers assume for deployments; attach least-privilege policies for ECS and Auto Scaling; and attach a permission boundary that does not allow iam:* actions, so additional inline or managed policies cannot grant IAM permissions.

C.Attach a policy that allows ecs:* and autoscaling:* and rely on developers to self-review that no IAM statements are added to their roles.

D.Create a single shared IAM role with full administrator permissions so developers can troubleshoot faster when deployments fail.

AnswerB

Assuming a role provides temporary credentials and removes long-lived keys. Least-privilege policies limit allowed actions, and a permission boundary caps the role's effective permissions so IAM actions cannot be gained through later policy changes.

Why this answer

Option B is correct because it uses an IAM role with temporary credentials (via AWS STS AssumeRole), satisfying the requirement that developers never have long-term access keys. The least-privilege policies restrict actions to ECS and Auto Scaling only, and the permission boundary explicitly denies iam:* actions, preventing developers from escalating privileges by attaching new policies to their role. This combination ensures developers can deploy ECS services but cannot modify IAM in any way.

Exam trap

The trap here is that candidates may think a permission boundary is optional or that denying iam:* actions in a policy is sufficient, but without a boundary, a developer could attach a new policy that grants iam:* actions, bypassing the deny—the boundary is required to cap permissions at the role level.

How to eliminate wrong answers

Option A is wrong because it retains long-term access keys (rotated every 90 days), which violates the requirement for temporary credentials and does not prevent key leaks; it also keeps broad permissions, allowing potential IAM modifications. Option C is wrong because relying on developers to self-review that no IAM statements are added is not a technical control—developers could still attach policies with IAM actions, violating the requirement that IAM changes be impossible. Option D is wrong because a single shared IAM role with full administrator permissions grants developers the ability to modify IAM, directly contradicting the requirement to prevent IAM changes, and shared credentials increase security risk.

Full explanation →

66

MCQeasy

A inventory service exposes a static website from S3 and CloudFront. Users should still receive cached pages if the S3 origin has a short outage. Which feature helps most? The architecture review board prefers a managed AWS-native control.

A.CloudFront caching with appropriate TTLs

B.AWS Backup Vault Lock

C.IAM Access Analyzer

D.S3 Select

AnswerA

CloudFront can serve cached content from edge locations when the origin is temporarily unavailable.

Why this answer

CloudFront caching with appropriate TTLs allows cached responses to be served to users even when the S3 origin is temporarily unavailable. By setting a minimum TTL (e.g., 0 seconds for fresh content, but a higher default or maximum TTL for stale content), CloudFront can continue delivering previously cached pages from edge locations during an S3 outage, ensuring high availability and resilience. This is a managed AWS-native feature that aligns with the architecture review board's preference.

Exam trap

The trap here is that candidates may confuse data protection features (like Backup Vault Lock) or data retrieval tools (like S3 Select) with caching and origin resilience, overlooking that CloudFront's TTL-based caching is the direct AWS-managed solution for serving content during origin outages.

How to eliminate wrong answers

Option B (AWS Backup Vault Lock) is wrong because it is a data protection feature for backup vaults that prevents deletion of backups, not a mechanism to serve cached content during an origin outage. Option C (IAM Access Analyzer) is wrong because it analyzes resource-based policies to identify unintended public access, not to cache or serve static content. Option D (S3 Select) is wrong because it is a query-in-place feature that retrieves subsets of data from objects using SQL expressions, and it does not provide caching or resilience against origin outages.

Full explanation →

67

MCQmedium

A ticket booking system stores uploaded documents in S3. The business requires a copy in another AWS Region for disaster recovery. What should be configured?

A.S3 lifecycle transition to Glacier Flexible Retrieval

B.An EBS snapshot schedule

C.S3 Cross-Region Replication with versioning enabled

D.A CloudFront distribution

AnswerC

CRR asynchronously replicates objects to a bucket in another Region and requires versioning.

Why this answer

S3 Cross-Region Replication (CRR) with versioning enabled automatically copies objects from a source bucket in one AWS Region to a destination bucket in another Region, meeting the disaster recovery requirement for a geographically separate copy. Versioning must be enabled on both buckets to support replication of all object versions, ensuring consistency and recoverability. This is the native S3 feature designed for cross-region data redundancy without custom scripting or third-party tools.

Exam trap

The trap here is that candidates confuse S3 Cross-Region Replication with S3 lifecycle policies or other storage services like EBS snapshots, failing to recognize that CRR is the only option that directly creates a second copy of S3 objects in a different AWS Region for disaster recovery.

How to eliminate wrong answers

Option A is wrong because S3 lifecycle transition to Glacier Flexible Retrieval only moves objects to a lower-cost storage class within the same bucket and region; it does not create a copy in another AWS Region. Option B is wrong because EBS snapshots are for Amazon Elastic Block Store volumes attached to EC2 instances, not for S3 objects, and they cannot replicate data across regions automatically without additional configuration like copying snapshots manually. Option D is wrong because CloudFront is a content delivery network (CDN) that caches content at edge locations for low-latency delivery; it does not provide persistent cross-region storage replication for disaster recovery.

Full explanation →

68

MCQeasy

A team stores important documents in Amazon S3. They want to recover earlier versions if someone overwrites or deletes a file by mistake. What should they enable?

A.Amazon S3 Versioning

B.Amazon EBS snapshots

C.Amazon CloudWatch logs

D.VPC flow logs

AnswerA

Versioning keeps previous versions of S3 objects, which lets you recover from accidental overwrite or deletion.

Why this answer

Amazon S3 Versioning is the correct choice because it allows you to preserve, retrieve, and restore every version of every object stored in an S3 bucket. When enabled, S3 automatically maintains a unique version ID for each object, so if a file is overwritten or deleted, the previous version remains accessible. This directly addresses the requirement to recover earlier versions after accidental modification or deletion.

Exam trap

The trap here is that candidates may confuse S3 Versioning with backup services like EBS snapshots, but versioning is an S3-native feature for object-level recovery, not a volume-level backup mechanism.

How to eliminate wrong answers

Option B is wrong because Amazon EBS snapshots are point-in-time backups of Amazon Elastic Block Store volumes, used for EC2 instance data persistence, not for versioning individual objects in S3. Option C is wrong because Amazon CloudWatch logs capture operational metrics and log data from AWS services and applications, not object-level version history in S3. Option D is wrong because VPC flow logs capture IP traffic metadata for network interfaces in a VPC, providing network visibility, not object versioning or recovery capabilities.

Full explanation →

69

MCQmedium

A global video platform serves mostly static images and JavaScript files from an S3 origin. Users in distant countries report slow load times. What should improve performance most? The design must avoid adding custom operational scripts.

A.A larger S3 bucket

B.Amazon CloudFront distribution with the S3 bucket as origin

C.RDS read replicas

D.An EC2 Auto Scaling group in one Region

AnswerB

CloudFront caches content at edge locations close to users, reducing latency.

Why this answer

Amazon CloudFront is a content delivery network (CDN) that caches static content (images, JavaScript files) at edge locations worldwide. By distributing content closer to users, it reduces latency and improves load times for distant countries without requiring any custom operational scripts or changes to the S3 bucket.

Exam trap

The trap here is that candidates may think increasing S3 bucket size or adding compute resources (EC2, RDS) can solve latency issues, but the correct solution is a CDN like CloudFront that brings content physically closer to users.

How to eliminate wrong answers

Option A is wrong because increasing the size of an S3 bucket does not improve performance; S3 bucket size has no impact on latency or throughput for static content delivery. Option C is wrong because RDS read replicas are designed to offload read traffic from a relational database, not to accelerate delivery of static files stored in S3. Option D is wrong because an EC2 Auto Scaling group in a single Region does not reduce latency for users in distant countries; it only provides compute scaling within one geographic area, not global edge caching.

Full explanation →

70

MCQhard

Based on the exhibit, a batch platform in Account B must assume a role in Account A. Only the specific role arn:aws:iam::222233334444:role/BatchRunner should be allowed to assume it, and the design must prevent any other role in Account B from reusing the same external ID. Which change best meets the requirement?

A.Add an identity-based policy to the BatchRunner role that allows sts:AssumeRole on the target role.

B.Change the trust policy principal from account root to arn:aws:iam::222233334444:role/BatchRunner and keep the ExternalId condition.

C.Replace the ExternalId condition with a role session name condition so only BatchRunner sessions are accepted.

D.Attach an SCP to Account B that denies sts:AssumeRole unless the request comes from BatchRunner.

AnswerB

This limits assumption to the exact role in Account B while preserving the ExternalId defense against confused deputy attacks.

Why this answer

Option B is correct because the trust policy on the target role in Account A must restrict the principal to the exact BatchRunner role ARN (arn:aws:iam::222233334444:role/BatchRunner) rather than the entire Account B root. This ensures that only that specific role can assume the target role. Keeping the ExternalId condition adds an additional layer of security by requiring a unique identifier that only BatchRunner knows, preventing any other role in Account B from reusing the same external ID.

Exam trap

The trap here is that candidates often think an identity-based policy on the assuming role (Option A) is sufficient, but the trust policy on the target role must explicitly restrict the principal to the specific role ARN, not just the account root.

How to eliminate wrong answers

Option A is wrong because identity-based policies on the BatchRunner role cannot grant it permission to assume a role in another account; the trust policy on the target role must explicitly allow the BatchRunner principal, and the BatchRunner role also needs an sts:AssumeRole permission, but the key missing change is the principal restriction. Option C is wrong because a role session name condition (sts:RoleSessionName) is set by the assuming entity and can be spoofed by any role in Account B, so it does not prevent other roles from reusing the same external ID. Option D is wrong because Service Control Policies (SCPs) are applied at the organization or OU level in AWS Organizations, not to individual accounts, and they cannot restrict based on a specific role ARN within the same account; they also cannot enforce the external ID requirement.

Full explanation →

71

MCQeasy

Based on the exhibit, a web application must stay available if one Availability Zone fails. What is the best change to improve resilience?

A.Increase the desired capacity to 8 instances in the same subnet.

B.Add a subnet in another Availability Zone to the Auto Scaling group and keep the ALB spanning both AZs.

C.Replace the Application Load Balancer with a Network Load Balancer.

D.Move the instances to a larger instance type with more CPU and memory.

AnswerB

This places application instances across multiple Availability Zones, which protects the stateless tier from a single-AZ failure. The ALB already spans two AZs, so the missing piece is the Auto Scaling group using subnets in more than one AZ. That allows AWS to replace unhealthy instances and continue serving traffic from the surviving Zone.

Why this answer

Adding a subnet in another Availability Zone (AZ) to the Auto Scaling group and keeping the ALB spanning both AZs ensures that if one AZ fails, the ALB can route traffic to healthy instances in the other AZ. This is the standard pattern for building multi-AZ resilient architectures with Auto Scaling and ALB, as it eliminates the single point of failure at the AZ level.

Exam trap

The trap here is that candidates often focus on scaling up (more instances or larger instances) or changing the load balancer type, missing the fundamental requirement of distributing resources across multiple Availability Zones to achieve AZ-level resilience.

How to eliminate wrong answers

Option A is wrong because increasing the desired capacity to 8 instances in the same subnet does not protect against an AZ failure; all instances remain in a single AZ, so if that AZ goes down, all instances become unavailable. Option C is wrong because replacing the ALB with a Network Load Balancer does not inherently improve resilience against AZ failure; both ALB and NLB can span multiple AZs, but the key issue is the lack of multi-AZ instance placement, not the load balancer type. Option D is wrong because moving to a larger instance type with more CPU and memory improves performance but does not address AZ-level fault tolerance; a single AZ failure would still take down all instances regardless of size.

Full explanation →

72

MCQhard

Based on the exhibit, a batch-processing service runs on Amazon EC2. The workload is Linux-based, can run on ARM64, and is CPU-bound during its nightly processing window. The team wants the best throughput per dollar without changing the application logic. Which EC2 instance family should the solutions architect recommend?

A.C7g instances based on AWS Graviton processors

B.R7i instances because more memory will improve CPU-bound job throughput.

C.M7a instances because general-purpose families are always the safest performance choice.

D.T3 instances because burstable instances can handle occasional nighttime spikes at lower cost.

AnswerA

C7g instances are compute optimized and use Graviton processors, which often deliver strong price-performance for CPU-bound Linux workloads that can run on ARM64. The exhibit shows the application is compatible and even benchmarks faster on ARM.

Why this answer

The C7g instances are based on AWS Graviton processors (ARM64 architecture), which offer up to 25% better performance per dollar compared to x86-based instances for CPU-bound workloads. Since the workload is Linux-based, can run on ARM64, and is CPU-bound, the C7g family provides the best throughput per dollar without requiring any application logic changes.

Exam trap

The trap here is that candidates may choose memory-optimized or general-purpose instances (like R7i or M7a) thinking they are safer, or burstable instances (T3) assuming they handle spikes cheaply, without recognizing that compute-optimized ARM64 instances (C7g) provide the best throughput per dollar for CPU-bound, ARM64-compatible workloads.

How to eliminate wrong answers

Option B is wrong because R7i instances are memory-optimized, designed for workloads that require large amounts of memory, not for CPU-bound jobs where additional memory does not improve throughput. Option C is wrong because M7a instances are general-purpose and balance compute, memory, and networking, but they are not optimized for CPU-bound workloads and use x86 architecture, which typically offers lower performance per dollar compared to ARM64-based instances for this specific scenario. Option D is wrong because T3 instances are burstable and designed for workloads with low baseline CPU usage and occasional spikes, but they are not suitable for sustained CPU-bound processing during a nightly window, as they would exhaust CPU credits and incur performance throttling or additional costs.

Full explanation →

73

MCQhard

Based on the exhibit, a CI pipeline assumes a shared deployment role in Account A. The role can access several artifact prefixes, but this pipeline must only upload to teamA/prod/ and decrypt using a single KMS key for this execution. Changing the shared role would affect other pipelines. Which approach should the pipeline use?

A.Attach a permission boundary to the pipeline's assumed session so the temporary credentials cannot exceed the shared role permissions.

B.Pass an inline session policy in the AssumeRole request that further restricts the temporary credentials to teamA/prod/ and the approved KMS key.

C.Add an SCP to Account A that forces all roles to use the same S3 prefix and key whenever they are assumed.

D.Change the role trust policy to allow only the teamA/prod/ prefix and the key ARN because trust policies can scope S3 object paths directly.

AnswerB

STS session policies are designed to further restrict the permissions of temporary credentials issued by AssumeRole. In this case, the shared role can remain reusable for other pipelines, while this one execution is narrowed to the exact S3 prefix and KMS key required. The effective permissions become the intersection of the role permissions and the session policy, which preserves least privilege without changing the shared role itself.

Why this answer

Option B is correct because an inline session policy passed in the AssumeRole request allows you to further restrict the temporary credentials' permissions without modifying the shared role itself. This ensures the pipeline can only upload to teamA/prod/ and decrypt using the specified KMS key, while other pipelines using the same role remain unaffected.

Exam trap

The trap here is that candidates confuse permission boundaries (which set a maximum limit) with session policies (which further restrict a specific session), or mistakenly think trust policies can scope resource-level permissions like S3 prefixes or KMS keys.

How to eliminate wrong answers

Option A is wrong because a permission boundary sets the maximum permissions for the role but does not dynamically restrict the session to specific prefixes or keys; it would still allow access to all prefixes the role can access. Option C is wrong because SCPs apply to all principals in the account and cannot be scoped to a single pipeline's session without affecting other roles and users. Option D is wrong because trust policies control who can assume the role, not what actions the assumed session can perform; S3 object paths cannot be scoped in trust policies.

Full explanation →

74

Multi-Selectmedium

An Aurora PostgreSQL application has an OLTP writer and a reporting dashboard that issues many read-only queries. The writer is healthy, but read latency rises noticeably during reporting windows. Which two changes should you make? Select two.

Select 2 answers

A.Add Aurora Replicas to scale out the read workload.

B.Send read-only application traffic to the reader endpoint.

C.Scale up only the writer instance and keep all queries on it.

D.Replace the cluster with a single-AZ RDS instance to reduce replication overhead.

E.Move the dashboard to DynamoDB without changing the query model.

AnswersA, B

Aurora Replicas provide additional read capacity, which lets you spread read-only traffic away from the writer instance.

Why this answer

Adding Aurora Replicas (Option A) is correct because Aurora Replicas are dedicated read-only instances that share the same underlying storage volume as the writer, allowing you to scale read capacity linearly without impacting write performance. Sending read-only traffic to the reader endpoint (Option B) is correct because the reader endpoint automatically load-balances connections across all available Aurora Replicas, ensuring that dashboard queries are distributed and do not overload a single instance.

Exam trap

The trap here is that candidates may think scaling up the writer instance (Option C) is sufficient, but they overlook that read-heavy workloads require horizontal read scaling via replicas, not just vertical scaling of the writer.

Full explanation →

75

Multi-Selecthard

A regional web application for a inventory service must fail over automatically to a secondary Region if the primary endpoint becomes unhealthy. Which two services or features are required?

Select 2 answers

A.Route 53 failover routing with health checks

B.S3 Transfer Acceleration

C.A deployed standby application stack in the secondary Region

D.AWS Organizations service control policies

AnswersA, C

Route 53 can monitor endpoint health and return the standby endpoint when the primary is unhealthy.

Why this answer

Route 53 failover routing with health checks is correct because it monitors the health of the primary endpoint via periodic HTTP/HTTPS or TCP checks. If the health check fails, Route 53 automatically updates DNS resolution to point to the secondary Region's endpoint, enabling failover at the DNS level without manual intervention.

Exam trap

The trap here is that candidates often assume Route 53 alone is sufficient for failover, forgetting that a fully deployed standby application stack in the secondary Region is required to actually serve traffic after DNS rerouting.

Full explanation →

Page 1 of 14

All pages

1 2 3 4 5 6 7 8 9 10 11 12 13 14

Practice SAA-C03 by domain

Target a specific domain to shore up weak areas.

Design Secure Architectures Design Resilient Architectures Design High-Performing Architectures Design Cost-Optimized Architectures

See all domains with question counts →