SAA-C03 SAA-C03 Questions 151–225 | Page 3/14

151

MCQmedium

A global video platform serves mostly static images and JavaScript files from an S3 origin. Users in distant countries report slow load times. What should improve performance most? The architecture review board prefers a managed AWS-native control.

A.A larger S3 bucket

B.Amazon CloudFront distribution with the S3 bucket as origin

C.RDS read replicas

D.An EC2 Auto Scaling group in one Region

AnswerB

CloudFront caches content at edge locations close to users, reducing latency.

Why this answer

Amazon CloudFront is a global content delivery network (CDN) that caches static content (images, JavaScript) at edge locations closer to users, drastically reducing latency for distant countries. By using the S3 bucket as the origin, CloudFront offloads requests from S3 and accelerates delivery via HTTP/2, TCP optimizations, and persistent connections. This is the most effective managed AWS-native solution for improving global load times for static assets.

Exam trap

The trap here is that candidates may confuse scaling storage (larger bucket) or compute (Auto Scaling) with performance improvement, overlooking that latency for static content is primarily a network distance problem solved by a CDN like CloudFront.

How to eliminate wrong answers

Option A is wrong because increasing the S3 bucket size does not improve data transfer speed or reduce latency; S3 performance is independent of bucket size and is limited by the bucket's regional location. Option C is wrong because RDS read replicas are designed for scaling database read traffic, not for serving static files or accelerating HTTP content delivery. Option D is wrong because an EC2 Auto Scaling group in a single Region does not reduce latency for users in distant countries; it only provides regional scalability and fault tolerance, not global edge caching.

Full explanation →

152

Multi-Selectmedium

A fintech company needs a disaster recovery design for a web application in two Regions. The business requires an RPO of 15 minutes and an RTO under 2 hours, but it cannot afford to keep a full production stack running in both Regions all the time. Which two DR strategies best fit the requirement? Select two.

Select 2 answers

A.Pilot light with critical data and minimal services pre-staged in the secondary Region.

B.Warm standby with a scaled-down but running environment in the secondary Region.

C.Active-active deployment with full production capacity in both Regions.

D.Backup-and-restore only, with no pre-provisioned resources in the secondary Region.

E.Single-Region deployment with Multi-AZ only, because that already covers disaster recovery.

AnswersA, B

Correct because pilot light keeps a small but ready foundation in the recovery Region, which lowers cost while still allowing much faster recovery than restoring everything from scratch. It is a common fit when the business can accept a short recovery window and controlled failover steps.

Why this answer

A pilot light strategy is correct because it pre-stages only critical data (e.g., database replication) and minimal core services (e.g., a small EC2 instance or RDS standby) in the secondary Region, which can be scaled up to full production within the RTO of under 2 hours. The RPO of 15 minutes is achievable by using synchronous or near-synchronous replication (e.g., Amazon RDS Multi-AZ cross-Region or DynamoDB global tables) to keep data loss minimal. This approach avoids the cost of a full production stack while meeting the recovery objectives.

Exam trap

The trap here is that candidates often confuse warm standby with active-active, assuming any running environment in the secondary Region must be at full capacity, but warm standby allows a scaled-down environment that can be scaled up within the RTO, meeting cost constraints.

Full explanation →

153

MCQeasy

A content publishing system exposes a static website from S3 and CloudFront. Users should still receive cached pages if the S3 origin has a short outage. Which feature helps most? The architecture review board prefers a managed AWS-native control.

A.IAM Access Analyzer

B.AWS Backup Vault Lock

C.CloudFront caching with appropriate TTLs

D.S3 Select

AnswerC

CloudFront can serve cached content from edge locations when the origin is temporarily unavailable.

Why this answer

CloudFront caching with appropriate TTLs ensures that even if the S3 origin becomes temporarily unavailable, CloudFront can serve cached content from its edge locations to users. By setting a minimum TTL (e.g., 0 seconds for dynamic content or longer for static assets), CloudFront will continue to serve stale responses from cache during an origin outage, maintaining availability. This is a managed AWS-native feature that requires no additional infrastructure and aligns with the architecture review board's preference.

Exam trap

The trap here is that candidates might confuse CloudFront's caching with other AWS services like AWS Global Accelerator or Route 53 health checks, or incorrectly assume that S3's built-in redundancy alone handles origin outages, overlooking CloudFront's ability to serve stale cached content during origin failures.

How to eliminate wrong answers

Option A is wrong because IAM Access Analyzer is a tool for analyzing resource-based policies to identify unintended public or cross-account access, not for providing caching or origin resilience. Option B is wrong because AWS Backup Vault Lock is a feature to enforce retention policies on backups and prevent deletion, unrelated to serving cached content during an origin outage. Option D is wrong because S3 Select is a feature to retrieve subsets of data from objects using SQL-like queries, not a caching mechanism or origin failover solution.

Full explanation →

154

MCQhard

A claims portal uses Amazon RDS for PostgreSQL. Application credentials must not be stored on the EC2 instances, and authentication should use short-lived credentials. What should the architect recommend?

A.Store the database password in user data

B.Embed the database password in the AMI

C.IAM database authentication for RDS with an EC2 instance role

D.Use a security group rule that allows only application instances

AnswerC

IAM database authentication allows the application to use temporary AWS credentials instead of stored database passwords.

Why this answer

IAM database authentication for RDS with an EC2 instance role allows the application to obtain a short-lived authentication token (valid for 15 minutes) using the AWS CLI or SDK, without storing any credentials on the instance. The EC2 instance role provides the necessary permissions to generate the token, which is then used instead of a static password, meeting both security requirements.

Exam trap

The trap here is that candidates often confuse network-level controls (security groups) with authentication mechanisms, or assume that storing credentials in user data or AMIs is acceptable because they are 'hidden' from the application code, but AWS explicitly considers these insecure practices for production workloads.

How to eliminate wrong answers

Option A is wrong because storing the database password in user data persists the credential in plaintext on the instance metadata and can be exposed via the console or API, violating the requirement to not store credentials on EC2. Option B is wrong because embedding the database password in the AMI hard-codes the credential into the image, making it static and long-lived, and any instance launched from that AMI inherits the password, which cannot be rotated without rebuilding the AMI. Option D is wrong because a security group rule controls network access at the transport layer but does not address credential storage or authentication; it only restricts which IPs or instances can connect, not how the application authenticates.

Full explanation →

155

MCQhard

A risk simulation workload in private subnets downloads large amounts of data from S3 through a NAT gateway. NAT data processing charges are high. What should the architect use to reduce cost?

A.A larger NAT gateway

B.Gateway VPC endpoint for Amazon S3

C.S3 Object Lambda

D.AWS Shield Advanced

AnswerB

A gateway endpoint routes S3 traffic privately without NAT gateway data processing charges.

Why this answer

A Gateway VPC endpoint for Amazon S3 allows instances in private subnets to access S3 directly via the AWS network, bypassing the NAT gateway entirely. This eliminates NAT data processing charges (per GB) for S3 traffic, which can be significant for large data downloads. The endpoint is free to use and routes traffic through AWS's private backbone, not the internet.

Exam trap

The trap here is that candidates often assume all VPC endpoints incur costs or require NAT gateways, but Gateway endpoints for S3 and DynamoDB are free and specifically designed to eliminate NAT data processing charges for those services.

How to eliminate wrong answers

Option A is wrong because a larger NAT gateway would increase, not reduce, costs due to higher hourly charges and data processing fees per GB. Option C is wrong because S3 Object Lambda is a feature for transforming data on the fly during retrieval, not for reducing network costs or bypassing a NAT gateway. Option D is wrong because AWS Shield Advanced is a DDoS protection service that adds cost and does not address NAT gateway data processing charges.

Full explanation →

156

MCQmedium

A DynamoDB table stores device status items. The partition key is deviceId, and the partition distribution is healthy (no single partition dominates). However, during peak periods the application experiences high read latency because many clients repeatedly request the latest status for the same devices. Which action best improves read latency without changing the DynamoDB partitioning model?

A.Add Amazon DAX as a caching layer in front of DynamoDB and route repeated read operations through DAX.

B.Change the partition key to a random value for each request to eliminate hot partitions.

C.Increase write capacity only, because writes generally determine read latency in DynamoDB.

D.Create an additional Global Secondary Index (GSI) and read exclusively from the index to accelerate reads.

AnswerA

Amazon DAX is an in-memory caching layer for DynamoDB that accelerates repeated reads. When many clients request the same items (for example, “latest status” point reads by deviceId), DAX can serve cached responses directly, reducing round trips to DynamoDB and lowering read latency during peak periods.

Why this answer

Amazon DAX is a fully managed, in-memory cache for DynamoDB that provides microsecond read latency. By caching the results of repeated GetItem and Query requests for the same device status items, DAX offloads read traffic from the underlying DynamoDB table, reducing the number of read capacity units consumed and eliminating the latency caused by repeated fetches from disk. This directly addresses the high read latency during peak periods without altering the existing partition key or partitioning model.

Exam trap

The trap here is that candidates may think a GSI can magically speed up reads, but GSIs do not provide caching and still read from the same storage layer, so they do not reduce latency for repeated identical queries.

How to eliminate wrong answers

Option B is wrong because changing the partition key to a random value would break the ability to query for the latest status of a specific device, as the partition key is used to identify the device; this would require a complete redesign of the access pattern and data model. Option C is wrong because increasing write capacity does not reduce read latency; read latency is primarily affected by the number of read requests and the time to fetch data from storage, not by write capacity. Option D is wrong because creating a Global Secondary Index (GSI) does not inherently accelerate reads; while a GSI can support different query patterns, it still reads from the same underlying storage and does not provide caching, so repeated reads for the same items would still incur the same latency.

Full explanation →

157

MCQhard

Based on the exhibit, the company runs a self-managed RabbitMQ cluster on EC2 for asynchronous work. The queue only needs durable at-least-once delivery, and the application does not require AMQP-specific features such as exchanges, routing keys, or broker plugins. Which change is the best cost-optimization move?

A.Replace RabbitMQ with Amazon SQS Standard and keep the workers unchanged except for the queue client library.

B.Replace RabbitMQ with Amazon MQ for RabbitMQ to keep the same protocol and reduce costs.

C.Increase the RabbitMQ instance size and add a fourth node for higher availability.

D.Move the queue to Amazon DynamoDB and use scans for consumers to detect new messages.

AnswerA

Amazon SQS is a fully managed queue that satisfies durable, at-least-once messaging without requiring broker administration. It removes the EC2 broker fleet, patching, backups, and failover testing, and it reduces outage risk from broker maintenance. Because the workload does not need AMQP-specific features such as exchanges or routing keys, SQS is the most cost-effective and operationally simple replacement.

Why this answer

Option A is correct because Amazon SQS Standard provides at-least-once delivery and durable message storage without requiring AMQP-specific features like exchanges or routing keys. Replacing RabbitMQ with SQS eliminates the operational overhead of managing EC2 instances and RabbitMQ clusters, while SQS's pay-per-request pricing is more cost-effective than running EC2 instances 24/7 for a self-managed queue.

Exam trap

The trap here is that candidates assume Amazon MQ for RabbitMQ is always the cheapest managed option, but SQS is more cost-effective when AMQP-specific features are not required, as it eliminates per-instance costs and leverages a serverless pricing model.

How to eliminate wrong answers

Option B is wrong because Amazon MQ for RabbitMQ is a managed broker service that still incurs hourly instance costs, which is typically more expensive than SQS's serverless, pay-per-request model for workloads that don't need AMQP features. Option C is wrong because increasing instance size and adding nodes increases costs without addressing the core cost-optimization goal, and the current setup already meets the requirements. Option D is wrong because DynamoDB is not a queue service; using scans to detect new messages is inefficient, costly (consumes read capacity units), and does not provide at-least-once delivery semantics or message visibility timeouts, leading to potential duplicate processing and higher latency.

Full explanation →

158

Multi-Selectmedium

A company is designing a secure CI/CD pipeline on AWS. Developers push code to AWS CodeCommit, which triggers AWS CodePipeline to build and deploy applications to Amazon EC2 instances running in a VPC. The security team requires that all code is scanned for secrets and vulnerabilities before deployment, and that deployment artifacts are encrypted at rest in Amazon S3. Which three steps should be taken to meet these requirements? (Choose three.)

Select 3 answers

.Add a CodeBuild step in the pipeline that runs a static code analysis tool to scan for secrets and vulnerabilities.

.Configure the S3 bucket used for deployment artifacts to have default encryption with SSE-S3.

.Use AWS Secrets Manager to store secrets and retrieve them at build time via IAM roles.

.Enable AWS CloudTrail to log all access to the S3 bucket and CodeCommit repository.

.Configure the S3 bucket to enforce encryption in transit using a bucket policy that denies requests without HTTPS.

.Use an S3 bucket policy that grants full public access to the deployment artifacts for faster downloads.

Why this answer

Adding a CodeBuild step that runs static code analysis (e.g., using tools like Checkov, Bandit, or custom scripts) directly addresses the requirement to scan for secrets and vulnerabilities before deployment. This integrates security scanning into the CI/CD pipeline as a gated step, ensuring only compliant code proceeds.

Exam trap

The trap here is that candidates often confuse 'encryption at rest' with 'encryption in transit'; the requirement for 'encryption at rest in S3' is met by default SSE-S3, but the correct answer enforces HTTPS (encryption in transit) because the question's phrasing implies securing the entire artifact lifecycle, and the provided correct options include the HTTPS bucket policy, not the SSE-S3 default encryption.

Full explanation →

159

MCQeasy

A company wants a disaster recovery setup for a web application. They need relatively quick recovery, but they can't afford running full production in the secondary location at all times. Which option best matches this requirement?

A.Pilot light: keep only essential infrastructure in the secondary location and scale up the application during a failure.

B.Warm standby: run a minimal but functional version of the application and supporting services in the secondary location, and scale up during a failure.

C.Active-active: run full production in both the primary and secondary locations at the same time.

D.Backup and restore only: rely on periodic backups and restore the application after a failure.

AnswerB

Warm standby balances cost and recovery time by keeping some capacity running in the secondary environment (for example, smaller Auto Scaling capacity for the app tier and replication for the data tier). When the primary fails, you fail over and scale out quickly.

Why this answer

Warm standby (Option B) is correct because it runs a minimal but functional version of the application in the secondary region, allowing for faster recovery than a pilot light while avoiding the cost of full production. During a failure, you scale up the standby environment to handle production traffic, meeting the requirement for relatively quick recovery without the expense of active-active.

Exam trap

The trap here is confusing 'pilot light' with 'warm standby' — candidates often think pilot light is faster because it sounds minimal, but warm standby actually provides quicker recovery by having the application already deployed and ready to scale.

How to eliminate wrong answers

Option A is wrong because a pilot light keeps only essential infrastructure (e.g., database, core services) without running the application, requiring more time to deploy and scale the application layer during a failure, which does not meet the 'relatively quick recovery' requirement. Option C is wrong because active-active runs full production in both locations at all times, which contradicts the requirement to not afford running full production in the secondary location. Option D is wrong because backup and restore relies on periodic backups and manual restoration, resulting in the slowest recovery time (hours to days) and does not provide the relatively quick recovery needed.

Full explanation →

160

MCQmedium

A containerized service fleet running on EC2 instances needs to share user-uploaded files and access them with low latency. The workload is bursty: sometimes dozens of instances concurrently read the same directory for short periods, and then traffic drops. Which Amazon EFS configuration best matches these performance needs?

A.Use Amazon EFS General Purpose performance mode and Throughput mode set to Bursting.

B.Use Amazon EFS Max I/O performance mode with Throughput mode set to Provisioned.

C.Use Amazon EFS General Purpose performance mode with Throughput mode set to Provisioned.

D.Use Amazon EFS Max I/O performance mode with Throughput mode set to Bursting.

AnswerA

EFS General Purpose performance mode is designed for latency-sensitive use cases with a broad range of I/O sizes, including typical file-sharing and web-content workloads. Throughput mode Bursting provides baseline throughput and allows throughput to scale up during demand spikes, which matches the pattern of short read bursts from many instances. When traffic drops, the system returns to baseline without requiring you to provision peak throughput for all time.

Why this answer

Option A is correct because the workload is bursty with concurrent reads of the same directory, which favors the General Purpose performance mode for its strong consistency and lower latency per operation. The Bursting Throughput mode is ideal for bursty traffic as it allows the file system to accumulate burst credits during idle periods and consume them during high-demand spikes, matching the described pattern without incurring additional costs.

Exam trap

The trap here is that candidates often assume Max I/O is always better for high concurrency, but they overlook that General Purpose mode provides lower latency and stronger consistency for directory-heavy bursty reads, which is the actual requirement.

How to eliminate wrong answers

Option B is wrong because Max I/O performance mode is designed for highly parallelized workloads (e.g., thousands of instances) but sacrifices consistency and can introduce higher per-operation latency, which is not suitable for low-latency access to the same directory. Option C is wrong because Provisioned Throughput mode is intended for steady-state throughput requirements and would waste cost on a bursty workload that could use Bursting mode's credit-based model. Option D is wrong because Max I/O performance mode is not optimal for low-latency, directory-heavy access patterns, and while Bursting mode fits the bursty nature, the combination with Max I/O undermines the low-latency requirement.

Full explanation →

161

MCQmedium

A risk simulation workload uses CloudWatch Logs heavily. Retaining all debug logs forever is increasing costs. What should be configured? The architecture review board prefers a managed AWS-native control.

A.CloudWatch Logs retention policies per log group

B.AWS Config aggregation

C.CloudWatch detailed monitoring on all instances

D.Route 53 health checks

AnswerA

Retention policies automatically delete older logs after the required period.

Why this answer

CloudWatch Logs retention policies allow you to set per-log-group expiration rules (e.g., 30 days, 90 days) to automatically delete old log events, directly reducing storage costs for debug logs that are no longer needed. This is a managed, AWS-native control that requires no custom scripts or external tools, aligning with the architecture review board's preference.

Exam trap

The trap here is that candidates may confuse CloudWatch Logs retention policies with CloudWatch metric retention (which is fixed at 15 months and cannot be changed), leading them to overlook the simple, cost-effective per-log-group expiration setting.

How to eliminate wrong answers

Option B is wrong because AWS Config aggregation is used to consolidate configuration and compliance data from multiple accounts/regions, not to manage log retention or costs. Option C is wrong because CloudWatch detailed monitoring on all instances increases metric data collection frequency (1-minute intervals) and incurs additional costs, doing nothing to reduce log storage expenses. Option D is wrong because Route 53 health checks monitor endpoint availability and DNS routing, not log retention or cost optimization.

Full explanation →

162

MCQmedium

A partner company needs read-only access to reports in an S3 bucket for a financial reporting platform. The partner has its own AWS account. What is the most secure scalable access pattern?

A.Create a bucket policy that grants the partner role least-privilege access to the required prefix

B.Create an IAM user in the company account and share the access keys

C.Copy the objects to a public website bucket

D.Make the objects public and rely on difficult-to-guess object names

AnswerA

A resource policy can grant cross-account access to a specific external role and prefix.

Why this answer

Option A is correct because a bucket policy with a condition that grants read-only access to a specific prefix allows the partner's AWS account to access the S3 bucket without creating IAM users or sharing long-term credentials. This leverages cross-account IAM roles, where the partner assumes a role in their own account that the bucket policy trusts, ensuring least-privilege access and eliminating the need to manage static keys. The policy can be scoped to a specific prefix (e.g., `reports/`) and use the `aws:SourceArn` or `aws:SourceAccount` condition key to restrict access to only the partner's account, providing both security and scalability.

Exam trap

The trap here is that candidates often choose Option B (sharing IAM user access keys) because it seems simpler, but the exam tests the principle of using cross-account IAM roles with bucket policies for secure, scalable, and auditable access without managing static credentials.

How to eliminate wrong answers

Option B is wrong because creating an IAM user in the company account and sharing access keys introduces long-term static credentials that must be securely rotated and managed, violating the principle of least privilege and increasing the risk of credential leakage; it also does not scale well across multiple partners. Option C is wrong because copying objects to a public website bucket removes all access controls, making the data publicly accessible over the internet, which is insecure for financial reports and violates compliance requirements. Option D is wrong because making objects public and relying on difficult-to-guess object names (security through obscurity) provides no actual access control; any user who discovers the URL can access the data, and S3 object URLs are enumerable via tools like S3Scanner, making this pattern highly insecure.

Full explanation →

163

MCQmedium

A batch process uploads artifacts to an Amazon S3 bucket using multipart uploads. The bucket policy contains a statement that explicitly denies PutObject and CreateMultipartUpload unless the request uses server-side encryption with AWS KMS (SSE-KMS) and includes these request headers/parameters: x-amz-server-side-encryption=aws:kms and x-amz-server-side-encryption-aws-kms-key-id set to a specific CMK. After the process was updated, uploads intermittently fail with AccessDenied errors. Which change is the best way to make uploads succeed while still meeting the bucket policy's encryption requirement?

A.Update the IAM role policy to add s3:PutObject permissions for the bucket prefix.

B.Update the uploader so the CreateMultipartUpload request includes SSE-KMS with the required CMK key ID; any separate PutObject uploads should include the same headers.

C.Remove the bucket policy's explicit Deny statement so the IAM permissions control access.

D.Switch to client-side encryption (SSE-C) because it also encrypts data at rest in S3.

AnswerB

For multipart uploads, SSE-KMS is specified on CreateMultipartUpload rather than on individual UploadPart calls. Supplying the required SSE-KMS settings and CMK key ID on the upload initiation request satisfies the bucket policy's condition without weakening the encryption requirement.

Why this answer

Option B is correct because the bucket policy explicitly denies `PutObject` and `CreateMultipartUpload` unless the request includes both `x-amz-server-side-encryption=aws:kms` and the specific `x-amz-server-side-encryption-aws-kms-key-id` header. The intermittent failures occur because the batch process's `CreateMultipartUpload` request (which initiates the multipart upload) is missing these required headers, causing the explicit Deny to trigger. By ensuring that the `CreateMultipartUpload` request includes SSE-KMS with the correct CMK key ID, and that any subsequent `PutObject` parts also include the same headers, the uploads will satisfy the bucket policy and succeed.

Exam trap

The trap here is that candidates assume the encryption requirement only applies to the final object or to `PutObject` calls, but the explicit Deny in the bucket policy applies to the `CreateMultipartUpload` API call itself, which must also include the required headers to avoid AccessDenied errors.

How to eliminate wrong answers

Option A is wrong because adding `s3:PutObject` permissions to the IAM role does not override the bucket policy's explicit Deny statement; an explicit Deny in a bucket policy always takes precedence over any Allow, regardless of IAM permissions. Option C is wrong because removing the Deny statement would violate the encryption requirement the policy is designed to enforce, leaving the bucket unencrypted for those operations and failing the security objective. Option D is wrong because SSE-C (client-side encryption) does not use the `x-amz-server-side-encryption` or `x-amz-server-side-encryption-aws-kms-key-id` headers required by the policy; SSE-C uses a different header (`x-amz-server-side-encryption-customer-algorithm`) and a customer-provided key, so it would still be denied by the explicit Deny.

Full explanation →

164

Multi-Selectmedium

A company is designing a cost-optimized serverless architecture using AWS Lambda for a data processing workload. The workload runs multiple times per day and processes files up to 500 MB. Which three design decisions will help minimize costs? (Choose three.)

Select 3 answers

.Allocate the maximum memory (10,240 MB) to each Lambda function to reduce execution time.

.Use Lambda Provisioned Concurrency to ensure zero cold starts, improving performance.

.Optimize the Lambda function code to reduce execution duration, as cost is based on compute time.

.Store processed output in Amazon S3 using the S3 Standard-Infrequent Access (S3 Standard-IA) storage class if accessed infrequently.

.Use Amazon S3 batch operations to invoke Lambda functions in parallel, reducing overall processing time.

.Deploy Lambda functions in multiple Regions to reduce latency for global users.

Why this answer

Optimizing the Lambda function code to reduce execution duration directly lowers costs because AWS Lambda pricing is based on the number of requests and the duration of code execution, measured in GB-seconds. By writing efficient code, you reduce the compute time, which minimizes the billable GB-seconds. Storing processed output in S3 Standard-IA is cost-effective for infrequently accessed data, as it offers lower storage costs than S3 Standard while still providing high durability and availability.

Using S3 batch operations to invoke Lambda functions in parallel reduces the overall processing time by distributing the workload across multiple concurrent invocations, which can lower the total compute duration and thus the cost.

Exam trap

AWS often tests the misconception that more memory always reduces cost because it speeds up execution, but the trap is that cost is a product of memory and duration, so the optimal memory must be empirically determined, not set to the maximum.

Full explanation →

165

MCQeasy

A production Amazon RDS database has automated backups enabled. At 10:45 UTC, an issue is discovered. The team needs to restore the database to its state as of 10:30 UTC. Which capability should they use?

A.Point-in-time restore (PITR) using automated backups to a specific timestamp.

B.Perform a Multi-AZ manual failover of the standby to recover to the earlier timestamp.

C.Promote a cross-region replication target to replace the current database with the last-known good copy.

D.Switch to a read replica to access an older view of data without restoring.

AnswerA

PITR restores an RDS DB instance to a chosen moment within the retention period for automated backups, allowing the team to roll back to 10:30 UTC.

Why this answer

Amazon RDS automated backups enable point-in-time recovery (PITR) to any second within the backup retention period, restoring to a new DB instance. Since the issue was discovered at 10:45 UTC and the desired recovery point is 10:30 UTC, PITR can restore the database to that exact timestamp, provided it falls within the automated backup window and retention period.

Exam trap

The trap here is confusing Multi-AZ failover or read replicas with point-in-time recovery capabilities, leading candidates to think failover or replica promotion can roll back to a specific past state when they only provide high availability or read scaling.

How to eliminate wrong answers

Option B is wrong because Multi-AZ failover switches to a standby replica that is kept synchronously in sync with the primary; it does not provide a way to roll back to an earlier point in time, only to the current state of the primary. Option C is wrong because cross-region replication (e.g., using a read replica in another region) replicates data asynchronously and cannot be used to restore to a specific past timestamp; promoting it would give you a copy from a lagged point, not necessarily 10:30 UTC. Option D is wrong because a read replica provides a live, near-real-time copy of the primary database and does not retain historical snapshots or allow accessing an older view of data without a full restore.

Full explanation →

166

MCQeasy

A customer-facing application has a relational data model and needs frequent complex queries (joins and aggregations), but it also experiences a significant read-heavy workload. Which design choice best improves read performance while keeping relational features?

A.Use DynamoDB with a single partition key and avoid indexes to keep writes simple.

B.Add read replicas to an RDS or Aurora cluster and keep the primary for writes.

C.Store the data in S3 and query it directly from the application without a database.

D.Switch the database to DynamoDB but keep using the same relational SQL queries and joins.

AnswerB

Read replicas offload read operations from the primary database instance, improving read throughput and reducing contention with writes. RDS/Aurora preserve relational capabilities like joins and SQL queries. This is a common and practical way to scale performance for read-heavy workloads without completely changing the data model.

Why this answer

Adding read replicas to an RDS or Aurora cluster offloads read traffic from the primary instance, improving read performance for complex queries (joins and aggregations) while preserving the full relational data model and SQL capabilities. Aurora’s distributed storage layer also allows replicas to serve reads with minimal replication lag, making this the optimal choice for read-heavy workloads that require relational features.

Exam trap

The trap here is that candidates often assume NoSQL (DynamoDB) is always the best choice for read-heavy workloads, overlooking that complex relational queries and joins are not supported, making read replicas on RDS/Aurora the correct relational scaling solution.

How to eliminate wrong answers

Option A is wrong because DynamoDB with a single partition key and no indexes would severely limit query flexibility and performance for complex joins and aggregations, which are not natively supported by NoSQL. Option C is wrong because S3 is an object store with no native support for relational queries, joins, or aggregations; querying it directly would require scanning entire datasets and implementing complex logic in the application, leading to poor performance and high latency. Option D is wrong because DynamoDB does not support SQL joins or complex relational queries; attempting to use the same SQL queries would fail or require significant application-level re-engineering, defeating the purpose of keeping relational features.

Full explanation →

167

MCQmedium

A team serves static content (JavaScript, CSS, images) from S3 through CloudFront. After a recent release, CloudFront reports a low cache hit ratio and the S3 origin receives a much higher request rate. The site still works, but billing shows higher origin and data transfer costs. Which change is most likely to improve cache hit ratio and reduce origin load?

A.Configure a CloudFront cache policy (or update HTTP cache-control headers) to increase TTLs for versioned static assets and enable compression for text assets.

B.Disable CloudFront access logging so fewer requests are recorded and billing decreases automatically.

C.Set the distribution’s origin to use S3 Transfer Acceleration to reduce the number of requests hitting S3.

D.Force CloudFront to forward query strings to the origin for all static content so the latest versions are always fetched.

AnswerA

CloudFront cache hit ratio improves when objects are cacheable for longer and requests can be served from edge caches. Proper TTLs for versioned assets prevent unnecessary revalidation. Compression reduces payload size for eligible content types, lowering transfer costs.

Why this answer

Option A is correct because increasing TTLs for versioned static assets via a CloudFront cache policy or HTTP Cache-Control headers ensures that CloudFront caches these immutable objects for longer periods, reducing the number of requests forwarded to the S3 origin. Enabling compression for text assets reduces the data transferred from origin to edge, further lowering origin load and costs. This directly addresses the low cache hit ratio and high origin request rate described in the scenario.

Exam trap

The trap here is that candidates may think forwarding query strings (Option D) ensures freshness, but it actually destroys cacheability for static assets, while the real solution is to use versioned filenames and increase TTLs to maximize edge caching.

How to eliminate wrong answers

Option B is wrong because disabling CloudFront access logging does not affect cache hit ratio or origin request rate; it only stops the generation of log files, which does not reduce billing for data transfer or origin requests. Option C is wrong because S3 Transfer Acceleration is designed to speed up uploads over long distances by using AWS edge locations, but it does not reduce the number of requests hitting S3; it actually adds a network hop and does not improve cache hit ratio. Option D is wrong because forcing CloudFront to forward query strings to the origin for all static content would bypass the edge cache for every request, drastically reducing the cache hit ratio and increasing origin load, which is the opposite of the desired outcome.

Full explanation →

168

MCQhard

Based on the exhibit, an automation pipeline in several member accounts creates IAM roles for application deployments. Security says no future role may exceed the approved boundary arn:aws:iam::123456789012:policy/DeployBoundary, even if someone later attaches AdministratorAccess. What should you implement to enforce this across the organization?

A.Attach DeployBoundary to the automation role only, because that automatically forces every created role to inherit the same boundary.

B.Create an SCP that denies iam:CreateRole and iam:PutRolePermissionsBoundary unless aws:RequestTag equals DeployBoundary.

C.Create an SCP that denies iam:CreateRole unless iam:PermissionsBoundary equals arn:aws:iam::123456789012:policy/DeployBoundary, and also deny removing that boundary from created roles.

D.Use AWS Access Analyzer to automatically attach the approved boundary whenever a role is created without one.

AnswerC

This is the strongest organization-wide enforcement. The SCP prevents role creation unless the approved permissions boundary is attached, and it can also prevent boundary removal later. That ensures the maximum effective permissions for all created roles remain capped, even if someone attaches a broader identity policy afterward.

Why this answer

Option C is correct because it uses an SCP to enforce that any IAM role creation must include the specific permissions boundary `arn:aws:iam::123456789012:policy/DeployBoundary`, and also prevents removal or modification of that boundary from existing roles. This ensures that even if an attacker or administrator later attaches a policy like AdministratorAccess, the effective permissions are still limited by the boundary, meeting the security requirement across all member accounts in the organization.

Exam trap

The trap here is confusing the condition key `aws:RequestTag` (used for tagging) with `iam:PermissionsBoundary` (the actual boundary ARN), leading candidates to pick Option B, which would not enforce the boundary requirement.

How to eliminate wrong answers

Option A is wrong because attaching a permissions boundary to the automation role does not automatically propagate that boundary to roles created by that role; each role must have its own boundary explicitly set. Option B is wrong because it uses `aws:RequestTag` to match the boundary, but permissions boundaries are not tags; the correct condition key is `iam:PermissionsBoundary`, not a request tag. Option D is wrong because AWS Access Analyzer is a tool for analyzing resource policies and identifying unintended access, not for automatically attaching permissions boundaries to roles.

Full explanation →

169

Multi-Selectmedium

A marketing site serves versioned JavaScript and CSS files from Amazon S3 through CloudFront. The origin bill is rising because CloudFront keeps fetching the same files too often, and the application never changes a file at the same URL once it is published. Which two changes should you make? Select two.

Select 2 answers

A.Set long-lived Cache-Control headers, such as a high max-age and immutable policy, on the versioned assets.

B.Configure the CloudFront cache policy to avoid forwarding unnecessary query strings, headers, and cookies.

C.Move the static assets to an EC2 web server behind an Application Load Balancer.

D.Disable CloudFront caching so every request always reaches the origin.

E.Add more viewer-facing headers to the cache key so each browser variation gets a unique cached object.

AnswersA, B

Versioned assets are ideal for long cache lifetimes because their URLs change when the content changes. Strong Cache-Control headers let CloudFront serve more requests from edge locations instead of repeatedly fetching the same files from S3.

Why this answer

Option A is correct because setting long-lived Cache-Control headers (e.g., `max-age=31536000` and `immutable`) on versioned assets tells CloudFront and browsers to cache the files aggressively. Since the application never changes a file at the same URL, this eliminates redundant origin fetches, directly reducing the origin bill.

Exam trap

The trap here is that candidates may think disabling caching (Option D) or adding more cache key variations (Option E) will improve performance, but both increase origin load and costs, while the correct approach is to leverage versioned URLs with aggressive caching headers.

Full explanation →

170

Multi-Selectmedium

A solutions architect is designing a highly available and resilient architecture for a critical internal application that processes financial transactions. The application runs on Amazon EC2 instances inside an Auto Scaling group. The database layer uses an Amazon Aurora MySQL cluster. The company requires that if an entire AWS Availability Zone (AZ) fails, the application must remain operational with minimal impact and automatically recover without manual intervention. Which combination of architectural decisions will meet these requirements? (Choose four.)

Select 4 answers

.Configure the Auto Scaling group to span at least three Availability Zones in the same AWS Region.

.Deploy the Aurora cluster with a single DB instance to reduce complexity and cost.

.Configure the Aurora cluster to include at least one Aurora Replica in a different Availability Zone than the primary instance.

.Use an Application Load Balancer (ALB) to distribute traffic across EC2 instances in multiple Availability Zones.

.Place the EC2 instances in a single Availability Zone to ensure data locality with the primary database.

.Set up an Amazon RDS Proxy to manage database connections and provide connection pooling for improved resilience.

Why this answer

Configuring the Auto Scaling group to span at least three Availability Zones ensures that if one AZ fails, the remaining AZs have sufficient capacity to handle the load, and the Auto Scaling group can automatically launch new instances in the healthy AZs. Deploying the Aurora cluster with at least one Aurora Replica in a different AZ than the primary instance provides automatic failover to a replica in under 30 seconds, ensuring database resilience without manual intervention. Using an Application Load Balancer (ALB) to distribute traffic across EC2 instances in multiple AZs allows the ALB to automatically route traffic away from failed AZs and only to healthy targets, maintaining application availability.

Setting up an Amazon RDS Proxy manages database connections by pooling and reusing them, which reduces the load on the database during failover and improves resilience by providing seamless connection handling across AZ failures.

Exam trap

The trap here is that candidates often think a single Aurora instance with multi-AZ storage is sufficient, but without an Aurora Replica in a different AZ, automatic failover is not possible; similarly, they may assume that placing all EC2 instances in one AZ simplifies data locality, but this sacrifices availability for a false sense of performance optimization.

Full explanation →

171

MCQmedium

Your public API is hosted in two regions. You want Route 53 to automatically send traffic to the secondary region when the primary region’s endpoint fails. The primary API health check is returning failure codes, but clients still reach the primary region for several minutes. Which Route 53 configuration most directly addresses this behavior?

A.Use a single Alias A record with simple routing and a short TTL so Route 53 quickly changes the IP address.

B.Use Route 53 failover routing with a primary record and a secondary record, each associated with its own health check, so Route 53 answers with the healthy region.

C.Use weighted routing to send a small percentage of traffic to the secondary region, increasing it manually when the primary fails.

D.Use latency routing only, letting Route 53 choose the lowest-latency region at query time, without health checks.

AnswerB

Failover routing is designed for this: Route 53 evaluates health checks and returns the primary record while it is healthy. When the primary health check fails, Route 53 automatically returns the secondary record. Note that clients may still see traffic for a few minutes due to DNS caching, but failover routing is the configuration that enables automatic region switching.

Why this answer

Option B is correct because Route 53 failover routing with health checks on both primary and secondary records ensures that when the primary health check fails, Route 53 stops returning the primary record's IP and instead returns the secondary record's IP. This directly addresses the observed behavior where clients still reach the primary region for several minutes—likely because the primary record's health check was not configured or associated, or a simple routing policy was used without health check integration, causing stale DNS responses to be served until TTL expires.

Exam trap

The trap here is that candidates assume a short TTL alone (Option A) is sufficient for fast failover, but without health checks, Route 53 has no mechanism to detect endpoint failure and will continue returning the primary record until the TTL expires and the record is manually updated, causing the observed delay.

How to eliminate wrong answers

Option A is wrong because simple routing with a short TTL does not incorporate health checks; Route 53 will continue to return the primary record's IP even if the endpoint is unhealthy, and clients will still reach the failing region until the TTL expires and the record is manually updated. Option C is wrong because weighted routing requires manual intervention to adjust weights when the primary fails, which does not provide automatic failover and can still result in clients reaching the unhealthy primary region. Option D is wrong because latency routing without health checks will continue to return the primary region's IP if it has the lowest latency, even when the primary endpoint is returning failure codes, so clients will still be directed to the failing region.

Full explanation →

172

MCQeasy

A company serves private images stored in S3 through Amazon CloudFront. Only authenticated users should be able to access each image, and access should expire after 1 hour. Which CloudFront feature best meets this requirement?

A.Signed URLs or signed cookies with an expiration time of 1 hour

B.A WAF rule that blocks requests without valid JWTs, without using signed URLs

C.Turning on S3 bucket public access block, without any CloudFront viewer authentication

D.Enabling CloudFront geo restriction to allow only one country

AnswerA

Signed URLs/cookies provide cryptographic, edge-enforced authorization for specific CloudFront resources and include an expiration timestamp. After expiry, CloudFront rejects requests (for example, with 403) without needing the origin to handle time-based authorization.

Why this answer

Signed URLs or signed cookies allow CloudFront to grant temporary access to private content by embedding authentication information (policy, signature, key pair ID) directly in the request. By setting an expiration time of 1 hour in the policy statement, access automatically becomes invalid after that period, meeting both the authentication and expiry requirements without exposing the S3 bucket publicly.

Exam trap

The trap here is that candidates often confuse CloudFront signed URLs with S3 pre-signed URLs, but S3 pre-signed URLs work at the S3 bucket level and do not leverage CloudFront's edge caching or origin access control, whereas CloudFront signed URLs are the correct feature for controlling access at the CDN edge with expiration.

How to eliminate wrong answers

Option B is wrong because AWS WAF rules alone cannot validate JWTs or enforce CloudFront signed URL authentication; WAF operates at the HTTP request layer and does not have native capability to verify CloudFront private content signatures. Option C is wrong because blocking public access to the S3 bucket without any CloudFront viewer authentication would prevent all access, including from authenticated users, since CloudFront would not be able to serve the private images. Option D is wrong because geo restriction only limits access based on geographic location, not user identity, and does not provide any authentication or time-based expiration of access.

Full explanation →

173

MCQeasy

A team runs an Amazon NLB in a VPC with targets registered in multiple Availability Zones (AZs). Their bill shows high inter-AZ data transfer charges. They want to reduce unnecessary cross-AZ traffic costs while still maintaining healthy targets per AZ. What change is most likely to reduce inter-AZ charges?

A.Disable cross-zone load balancing on the NLB so each client is routed to targets in the same AZ when possible.

B.Enable cross-zone load balancing so all targets receive traffic from every AZ.

C.Move the NLB to a different Region so traffic is always kept local.

D.Replace the NLB with a NAT gateway to reduce data charges between AZs.

AnswerA

Disabling cross-zone load balancing helps keep traffic within the same AZ, reducing inter-AZ data transfer charges.

Why this answer

Option A is correct because disabling cross-zone load balancing on an NLB ensures that each client is routed only to targets within the same Availability Zone as the NLB node that receives the traffic. This eliminates inter-AZ data transfer charges because traffic never leaves the AZ boundary, while still maintaining healthy targets per AZ as each AZ independently handles its own client requests.

Exam trap

The trap here is that candidates often assume enabling cross-zone load balancing always reduces costs or improves performance, but in reality it increases inter-AZ data transfer charges, and the question specifically asks for cost reduction, not performance optimization.

How to eliminate wrong answers

Option B is wrong because enabling cross-zone load balancing distributes traffic across all targets in all AZs, which increases inter-AZ data transfer and raises costs, the opposite of the desired outcome. Option C is wrong because moving the NLB to a different Region does not reduce inter-AZ charges within the original VPC; it introduces cross-Region data transfer costs and latency, which are typically higher than inter-AZ charges. Option D is wrong because a NAT gateway is used for outbound internet traffic from private subnets, not for load balancing inbound traffic, and it does not reduce inter-AZ data transfer charges; in fact, NAT gateways themselves incur inter-AZ charges if used across AZs.

Full explanation →

174

MCQeasy

Based on the exhibit, some SQS messages fail validation repeatedly and continue consuming worker time. What change best prevents the bad messages from being retried forever?

A.Increase the visibility timeout so each message has more time to finish processing.

B.Configure a dead-letter queue and a redrive policy for messages that exceed the retry limit.

C.Replace the queue with an Amazon SNS topic so failed messages will not be retried.

D.Increase the number of workers so the queue drains faster during peak load.

AnswerB

A dead-letter queue captures messages that fail repeatedly after a defined receive count. The main queue can keep processing healthy messages, while the poison messages are isolated for later inspection and remediation.

Why this answer

A dead-letter queue (DLQ) with a redrive policy allows messages that have been received a maximum number of times (e.g., after the configured retry limit) to be moved to a separate queue for analysis or manual handling. This prevents the same invalid message from being repeatedly processed by workers, freeing up compute resources and avoiding infinite retry loops.

Exam trap

The trap here is that candidates may think increasing the visibility timeout or adding more workers will solve the retry problem, but neither addresses the root cause of a message that will always fail validation.

How to eliminate wrong answers

Option A is wrong because increasing the visibility timeout only gives workers more time to process a message before it becomes visible again; it does not stop a failing message from being retried indefinitely. Option C is wrong because Amazon SNS is a pub/sub messaging service that does not provide built-in retry logic or a mechanism to move failed messages out of the processing pipeline; it would still deliver the same bad message to subscribers repeatedly. Option D is wrong because adding more workers only increases throughput for valid messages but does not prevent the same invalid message from being retried forever; the bad message will still consume worker time on every retry.

Full explanation →

175

MCQmedium

A patient portal receives bursts of orders that sometimes overwhelm a downstream fulfilment service. The architecture must absorb spikes and retry processing without losing requests. Which service should be placed between the web tier and fulfilment workers? The architecture review board prefers a managed AWS-native control.

A.AWS WAF

B.Amazon CloudFront

C.Amazon SQS queue

D.Amazon Route 53 weighted routing

AnswerC

SQS decouples producers and consumers, buffers bursts, and supports retries through visibility timeout and dead-letter queues.

Why this answer

Amazon SQS is the correct choice because it acts as a decoupling buffer between the web tier and the fulfilment workers. It can absorb sudden bursts of orders by storing messages durably, and workers can poll the queue at their own pace, retrying failed processing without losing any requests. This aligns with the requirement for a managed AWS-native service that handles spikes and retries.

Exam trap

The trap here is that candidates may confuse buffering and decoupling with services like CloudFront (caching) or Route 53 (traffic routing), failing to recognize that SQS is the only AWS-native service designed specifically for asynchronous message queuing and retry logic.

How to eliminate wrong answers

Option A is wrong because AWS WAF is a web application firewall that filters HTTP/S traffic based on rules, not a message queue for buffering and retrying requests. Option B is wrong because Amazon CloudFront is a content delivery network (CDN) that caches and accelerates static/dynamic content delivery, not a service for decoupling and buffering asynchronous workloads. Option D is wrong because Amazon Route 53 weighted routing is a DNS routing policy for distributing traffic across endpoints, not a message queuing or buffering service.

Full explanation →

176

MCQhard

An EC2 instance in a private subnet must access an S3 bucket that contains regulated exports for a customer analytics portal. The security team requires access to be allowed only when traffic comes through a specific VPC endpoint. What should the architect add to the bucket policy?

A.A security group rule that allows HTTPS to S3

B.A condition that matches aws:RequestedRegion to the bucket Region

C.A deny statement for all IAM users except the EC2 role

D.A condition that matches aws:sourceVpce to the endpoint ID

AnswerD

The aws:sourceVpce condition restricts S3 access to requests that arrive through the specified VPC endpoint.

Why this answer

Option D is correct because the bucket policy can use the `aws:sourceVpce` condition key to restrict access exclusively to traffic originating from a specific VPC endpoint (interface or Gateway Load Balancer endpoint). This ensures that only requests sent through that VPC endpoint are allowed, meeting the security team's requirement for regulated exports.

Exam trap

The trap here is that candidates often confuse `aws:sourceVpce` with `aws:SourceIp` or `aws:sourceVpc`, thinking they can restrict by VPC ID or IP range, but only the VPC endpoint ID uniquely identifies the specific endpoint used for the request.

How to eliminate wrong answers

Option A is wrong because security group rules are attached to EC2 instances, not to S3 bucket policies, and S3 does not support security group references in bucket policies; HTTPS access is allowed by default via the endpoint. Option B is wrong because `aws:RequestedRegion` restricts the AWS Region in which the request is made, not the network path or VPC endpoint, so it does not enforce that traffic comes through a specific Vpc Endpoint. Option C is wrong because denying all IAM users except the EC2 role would not restrict traffic to a specific VPC endpoint; it would only control which IAM identities can access the bucket, not the network path.

Full explanation →

177

MCQmedium

A global mobile game backend serves mostly static images and JavaScript files from an S3 origin. Users in distant countries report slow load times. What should improve performance most? The design must avoid adding custom operational scripts.

A.RDS read replicas

B.Amazon CloudFront distribution with the S3 bucket as origin

C.A larger S3 bucket

D.An EC2 Auto Scaling group in one Region

AnswerB

CloudFront caches content at edge locations close to users, reducing latency.

Why this answer

Amazon CloudFront is a content delivery network (CDN) that caches static content at edge locations worldwide, reducing latency for users in distant countries by serving files from the nearest edge. Using the S3 bucket as the origin, CloudFront distributes the content globally without requiring any custom operational scripts, directly addressing the slow load times for static assets.

Exam trap

The trap here is that candidates may confuse database read replicas (Option A) with content delivery, not realizing that static asset acceleration requires a CDN like CloudFront, not a database scaling solution.

How to eliminate wrong answers

Option A is wrong because RDS read replicas are designed to offload read traffic from a relational database, not to accelerate delivery of static files stored in S3. Option C is wrong because increasing the S3 bucket size does not improve performance; S3 performance is independent of bucket size and does not reduce latency for distant users. Option D is wrong because an EC2 Auto Scaling group in a single Region does not provide global edge caching; it only scales compute capacity within one geographic area, failing to reduce latency for users in other regions.

Full explanation →

178

Multi-Selectmedium

A company runs a customer portal in us-east-1 and a warm standby in us-west-2. The DNS name must send users to us-east-1 while it is healthy and automatically switch to us-west-2 if the primary application endpoint stops responding. Which two actions should the architect take? Select two.

Select 2 answers

A.Create Route 53 failover records for the same DNS name with primary and secondary targets.

B.Use latency-based routing so Route 53 always returns the lowest-latency Region.

C.Associate a Route 53 health check with the primary endpoint that monitors application availability.

D.Use weighted records with a 50/50 split to balance traffic across both Regions.

E.Place the standby application in a private hosted zone so only internal systems can resolve it.

AnswersA, C

Failover routing is the Route 53 feature built for primary-to-secondary traffic shifting. It lets DNS answer with the primary target while health checks pass, then returns the secondary target when the primary becomes unhealthy.

Why this answer

Option A is correct because Route 53 failover routing allows you to configure active-passive failover by associating a primary and secondary record with the same DNS name. When the primary endpoint fails, Route 53 automatically returns the secondary record's IP address, directing traffic to the warm standby in us-west-2. This directly meets the requirement for automatic failover based on endpoint health.

Exam trap

The trap here is that candidates often confuse failover routing with latency-based or weighted routing, assuming any routing policy that distributes traffic can handle failover, but only failover routing with an associated health check provides automatic, health-based switching between a primary and secondary endpoint.

Full explanation →

179

Multi-Selecthard

A financial reporting platform uses CloudFront in front of an S3 origin. Which two settings help keep users from bypassing CloudFront and accessing the bucket directly?

Select 2 answers

A.Use an S3 bucket policy that allows access only from the CloudFront distribution

B.Enable CloudFront standard logging

C.Enable S3 static website hosting

D.Configure Origin Access Control for the S3 origin

AnswersA, D

The bucket policy should trust the CloudFront distribution and deny direct public access.

Why this answer

Option A is correct because an S3 bucket policy can explicitly deny access to any principal except the CloudFront distribution's origin access identity (OAI) or origin access control (OAC). This ensures that direct requests to the S3 bucket URL are rejected, forcing all traffic through CloudFront. The policy uses a condition like `aws:SourceArn` or `aws:SourceAccount` to restrict access to the CloudFront distribution's ARN, preventing bypass.

Exam trap

The trap here is that candidates often think enabling S3 static website hosting or logging provides security, but these features actually create additional access points or only provide visibility, not access control.

Full explanation →

180

MCQeasy

A team runs an Amazon RDS for MySQL database in a single Availability Zone. They want automatic failover with minimal downtime if the primary database instance becomes unavailable. Automated backups are already enabled. Which configuration change best meets the requirement?

A.Keep the deployment as single-AZ, but increase automated backup retention to 35 days.

B.Create a read replica in another Availability Zone, but keep Multi-AZ disabled.

C.Enable RDS Multi-AZ so AWS maintains a standby in another Availability Zone for automatic failover.

D.Rely on restoring from the most recent manual snapshot after an outage.

AnswerC

RDS Multi-AZ creates a standby instance in a different AZ and replicates data to it. If the primary becomes unavailable, AWS performs an automatic failover, promoting the standby and maintaining high availability with minimal application disruption.

Why this answer

Option C is correct because enabling Multi-AZ on Amazon RDS for MySQL automatically provisions and maintains a synchronous standby replica in a different Availability Zone. If the primary instance fails, Amazon RDS automatically fails over to the standby, typically within 60–120 seconds, minimizing downtime without manual intervention. This meets the requirement for automatic failover with minimal downtime.

Exam trap

The trap here is that candidates often confuse read replicas (which are for read scaling and manual promotion) with Multi-AZ (which is for high availability and automatic failover), leading them to incorrectly choose Option B.

How to eliminate wrong answers

Option A is wrong because increasing automated backup retention to 35 days only extends the point-in-time recovery window; it does not provide automatic failover or a standby instance. Option B is wrong because a read replica in another AZ is asynchronous and does not support automatic failover; it requires manual promotion to become the primary, which incurs downtime. Option D is wrong because restoring from a manual snapshot is a manual process that can take significant time (minutes to hours depending on size) and does not provide automatic failover.

Full explanation →

181

Multi-Selectmedium

A media company stores daily financial exports in Amazon S3. The files must be protected against accidental overwrite or deletion, and the business also wants a second copy in another Region for recovery after a regional outage. Which two actions should the architect take? Select two.

Select 2 answers

A.Enable bucket versioning on the S3 bucket.

B.Turn on S3 Transfer Acceleration for the bucket.

C.Use only lifecycle policies to move objects to Glacier.

D.Configure replication to a bucket in a second AWS Region.

E.Enable S3 Block Public Access on the bucket.

AnswersA, D

Versioning preserves prior object versions so accidental deletes and overwrites can be recovered later.

Why this answer

Option A is correct because enabling S3 Versioning protects objects from accidental overwrite or deletion by preserving previous versions. When a file is overwritten or deleted, the original version is retained, allowing recovery. This directly addresses the requirement to guard against data loss from user or application errors.

Exam trap

The trap here is that candidates may confuse S3 Transfer Acceleration or lifecycle policies with data protection features, overlooking that versioning and replication are the specific services designed for accidental deletion prevention and cross-region recovery.

Full explanation →

182

Matchingmedium

A team wants a web application to keep serving traffic if one Availability Zone fails. Match each architecture element to the resilience behavior it provides.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts

Matches

Stop sending requests to unhealthy targets and keep only healthy instances in rotation.

Launch replacement instances in healthy AZs when capacity is lost.

Maintain a synchronous standby in another AZ and fail over automatically.

Allow instances to be replaced without losing user sessions that are stored elsewhere.

Why these pairings

These pairs match architecture elements with their resilience behaviors for surviving an Availability Zone failure, focusing on AWS services that provide high availability and fault tolerance.

Full explanation →

183

MCQeasy

An application serves static images through Amazon CloudFront. The team observes higher-than-expected origin fetches, which increases origin bandwidth costs. Which change most directly improves CloudFront cache reuse to reduce origin requests for the static content?

A.Set appropriate Cache-Control headers (or origin cache settings) so CloudFront caches responses longer

B.Disable caching for the distribution so every request goes back to the origin

C.Configure CloudFront to forward all request headers and query strings to the origin

D.Move the S3 bucket to a different AWS Region, without changing CloudFront caching behavior

AnswerA

Cache headers and TTL determine how long objects are kept in CloudFront’s edge caches. Longer caching for static assets increases the cache hit ratio, reducing how often requests must go back to the origin.

Why this answer

Option A is correct because setting appropriate Cache-Control headers (e.g., max-age or s-maxage) or configuring origin cache settings tells CloudFront how long to keep objects in its edge cache before revalidating with the origin. By extending the cache duration, CloudFront serves more requests from its cache, reducing the number of origin fetches and lowering bandwidth costs.

Exam trap

The trap here is that candidates may think forwarding all headers or query strings improves caching, but in reality it fragments the cache and increases origin requests, while disabling caching or moving the bucket does not address the root cause of low cache reuse.

How to eliminate wrong answers

Option B is wrong because disabling caching forces every request to go to the origin, which would drastically increase origin fetches and bandwidth costs, the opposite of the desired outcome. Option C is wrong because forwarding all headers and query strings to the origin reduces cache hit ratios, as CloudFront treats each unique combination as a separate cache key, leading to more origin requests. Option D is wrong because moving the S3 bucket to a different AWS Region does not change CloudFront's caching behavior; origin fetches depend on cache settings and request patterns, not the geographic location of the origin.

Full explanation →

184

Multi-Selectmedium

An order service must notify inventory, shipping, and analytics independently when payment succeeds. The shipping service may be slow, but the order service should keep accepting new orders even if one consumer is unavailable. Which two changes best improve resilience? Select two.

Select 2 answers

A.Publish the event to an Amazon SNS topic and subscribe a separate SQS queue for each downstream service.

B.Have the order service call all downstream services synchronously so failures are visible immediately.

C.Use one shared SQS queue for all three consumers so they always process the same message.

D.Store the event in a relational database and poll it from every consumer on a fixed schedule.

E.Configure a dead-letter queue on each consumer queue to isolate poison messages.

AnswersA, E

SNS fan-out with separate SQS queues decouples the producer from each consumer. Every downstream service gets its own buffered queue, so a slow or unavailable consumer does not block the others or the order service.

Why this answer

Option A is correct because publishing the event to an SNS topic allows the order service to emit a single notification that is then fanned out to multiple SQS queues, one per downstream service. This decouples the order service from the consumers, so even if the shipping service is slow or unavailable, the order service can continue accepting new orders without blocking. Each SQS queue provides independent buffering and retry logic, ensuring resilience against individual consumer failures.

Exam trap

The trap here is that candidates often confuse a single shared queue (Option C) with a fan-out pattern, not realizing that each consumer needs its own queue to process messages independently and avoid head-of-line blocking.

Full explanation →

185

MCQmedium

A company hosts an internal HTTP API on an internal Network Load Balancer (NLB) in VPC A. A partner team in a separate AWS account needs access, but their VPC CIDR overlaps with VPC A, so VPC peering is not feasible. Security requirements state the API must remain non-public (no internet-facing ALB/NLB) and access must use AWS private networking. Which architecture best meets these requirements?

A.Use AWS PrivateLink by creating a VPC endpoint service backed by the NLB in VPC A, then create an interface VPC endpoint in the partner VPC with appropriate endpoint access controls.

B.Expose the NLB to the internet with an Elastic IP and restrict access using the NLB’s security group only.

C.Use VPC peering between VPC A and the partner VPC and update route tables to resolve the overlap.

D.Deploy a NAT gateway in VPC A and route the partner’s traffic to the NLB through the NAT gateway.

AnswerA

PrivateLink exposes the service privately via interface endpoints, avoiding peering and keeping the NLB non-public for secure partner access.

Why this answer

Option A is correct because AWS PrivateLink allows you to expose an internal NLB as a VPC endpoint service in VPC A, and the partner team can create an interface VPC endpoint in their own VPC to connect privately. This works even with overlapping CIDR blocks because PrivateLink uses ENIs with private IPs from the endpoint subnet, not routing based on CIDR. The traffic stays within the AWS network and never traverses the internet, meeting the non-public requirement.

Exam trap

The trap here is that candidates may think VPC peering is always the simplest solution, but they overlook the CIDR overlap restriction, or they assume a NAT gateway can provide inbound private connectivity, which it cannot.

How to eliminate wrong answers

Option B is wrong because attaching an Elastic IP to the NLB makes it internet-facing, violating the requirement that the API must remain non-public; additionally, NLBs do not support security groups, so access control via security groups is not possible. Option C is wrong because VPC peering requires non-overlapping CIDR blocks; overlapping CIDRs cause routing conflicts and are explicitly not supported by AWS VPC peering. Option D is wrong because a NAT gateway is used for outbound internet traffic from a private subnet, not for inbound private connectivity between VPCs; routing partner traffic through a NAT gateway would not establish a private, direct connection and would still require internet routing.

Full explanation →

186

MCQeasy

Your team runs a batch processing workload on EC2 that can tolerate interruptions. If an instance is terminated, the job can restart from checkpoints. To reduce compute costs, what is the most cost-optimized approach?

A.Use EC2 Spot Instances for the batch workers

B.Use Dedicated Hosts to ensure capacity for the cheapest instance

C.Use On-Demand instances and schedule extra runs to offset interruptions

D.Use Reserved Instances only, because they eliminate instance termination events

AnswerA

Spot provides significantly lower pricing than On-Demand for interruptible workloads. Because the workload can restart from checkpoints, termination interruptions are acceptable and the application can recover efficiently, meeting both correctness and throughput requirements at a lower cost.

Why this answer

Spot Instances are ideal for fault-tolerant, interruption-tolerant batch workloads because they offer significant cost savings (up to 90% compared to On-Demand) while allowing the job to resume from checkpoints if terminated. This aligns perfectly with the requirement to reduce compute costs without compromising the ability to restart interrupted jobs.

Exam trap

The trap here is that candidates may confuse cost optimization with reliability, assuming that On-Demand or Reserved Instances are always safer, but the question explicitly states the workload can tolerate interruptions, making Spot the clear cost-optimized choice.

How to eliminate wrong answers

Option B is wrong because Dedicated Hosts are designed for licensing or compliance requirements, not cost optimization, and they are significantly more expensive than Spot or On-Demand instances. Option C is wrong because using On-Demand instances and scheduling extra runs does not reduce costs—On-Demand is the most expensive pricing model, and extra runs only increase total spend. Option D is wrong because Reserved Instances provide a discount for a commitment (1 or 3 years) but do not eliminate termination events; they still can be stopped or terminated, and they are not the most cost-effective choice for interruptible workloads compared to Spot.

Full explanation →

187

MCQhard

A DynamoDB table for a travel booking site has a partition key based only on the current date. Write throttling occurs during business hours. What is the best design change? The design must avoid adding custom operational scripts.

A.Create a global secondary index with the same date key

B.Move the table to S3 Glacier Instant Retrieval

C.Reduce the table's write capacity

D.Use a higher-cardinality partition key that distributes writes across partitions

AnswerD

A low-cardinality hot partition causes throttling; a better key spreads writes more evenly.

Why this answer

Option D is correct because using a low-cardinality partition key like the current date causes all writes to land on a single partition, leading to throttling. By designing a higher-cardinality key (e.g., combining date with a random suffix or user ID), writes are distributed evenly across partitions, fully utilizing the provisioned write capacity without custom scripts.

Exam trap

The trap here is that candidates may think adding a GSI (Option A) solves the issue, but GSIs inherit the same partition key design flaws and can also throttle independently.

How to eliminate wrong answers

Option A is wrong because a global secondary index (GSI) with the same date key would still concentrate writes on a single partition in the index, replicating the throttling issue. Option B is wrong because S3 Glacier Instant Retrieval is a storage class for infrequently accessed objects, not a replacement for DynamoDB's transactional write throughput, and moving the table would break the application's access pattern. Option C is wrong because reducing write capacity would worsen throttling during business hours, not solve the underlying partition hot-spotting problem.

Full explanation →

188

MCQmedium

A marketing site stores logs in S3. Logs are queried for 30 days, rarely accessed for one year, and then retained for compliance. What should reduce storage cost? The architecture review board prefers a managed AWS-native control.

A.S3 lifecycle policy that transitions objects to lower-cost storage classes over time

B.Keep all logs in S3 Standard indefinitely

C.Use EBS snapshots for the logs

D.Move all logs immediately to S3 Glacier Deep Archive

AnswerA

Lifecycle rules automate transitions based on age, matching storage cost to access patterns.

Why this answer

Option A is correct because an S3 Lifecycle policy is a managed AWS-native feature that automatically transitions objects to lower-cost storage classes (e.g., S3 Standard-IA after 30 days, S3 Glacier Instant Retrieval or S3 Glacier Flexible Retrieval after one year) based on age, reducing storage costs while maintaining compliance. This aligns with the access pattern: frequent queries for 30 days, rare access for a year, then long-term retention. It avoids manual intervention and optimizes cost without sacrificing availability or retrieval needs.

Exam trap

The trap here is that candidates may choose Option D (immediate move to Glacier Deep Archive) thinking it maximizes cost savings, but they overlook the 30-day query requirement, which necessitates a storage class that supports frequent access (like S3 Standard-IA) before transitioning to archival storage.

How to eliminate wrong answers

Option B is wrong because keeping all logs in S3 Standard indefinitely incurs the highest per-GB storage cost, ignoring the significant savings from transitioning to lower-cost tiers for data that is rarely accessed after 30 days. Option C is wrong because EBS snapshots are designed for block-level backups of EC2 instances, not for storing log files from S3; they are not a native S3 solution, incur additional costs for snapshot storage and data transfer, and violate the architecture review board's preference for a managed AWS-native control. Option D is wrong because moving all logs immediately to S3 Glacier Deep Archive (which has retrieval times of 12+ hours and high retrieval costs) would prevent the 30-day querying requirement, as the data would be inaccessible for frequent queries without significant delay and expense.

Full explanation →

189

MCQmedium

A solutions architect is designing an S3 bucket for a mobile banking backend. The objects must never be publicly accessible, even if a developer later adds an overly broad bucket policy. What should the architect configure?

A.Create an IAM policy that denies s3:GetObject to anonymous users

B.Enable S3 Transfer Acceleration

C.Enable S3 Block Public Access at the account or bucket level

D.Enable server access logging on the bucket

AnswerC

S3 Block Public Access prevents public ACLs and public bucket policies from exposing the bucket.

Why this answer

Option C is correct because S3 Block Public Access provides a definitive override that prevents any public access to objects, regardless of bucket policies or ACLs. This setting can be applied at the account or bucket level and ensures that even if a developer later adds an overly broad bucket policy, the objects remain inaccessible to anonymous users. It is the only mechanism that enforces a hard block on public access at the S3 service level.

Exam trap

The trap here is that candidates often think an IAM deny policy (Option A) is sufficient, but they miss that bucket policies can be written to grant access to anonymous users independently of IAM, making S3 Block Public Access the only guaranteed safeguard.

How to eliminate wrong answers

Option A is wrong because an IAM policy that denies s3:GetObject to anonymous users does not prevent public access via a bucket policy that explicitly grants access to 'Principal': '*' — IAM policies and bucket policies are evaluated separately, and a bucket policy grant can override an IAM deny if not explicitly scoped. Option B is wrong because S3 Transfer Acceleration is a performance feature that uses edge locations to speed up uploads over long distances; it has no effect on access control or public accessibility. Option D is wrong because server access logging records requests to the bucket but does not enforce any access restrictions; it is a monitoring tool, not a security control.

Full explanation →

190

MCQeasy

A travel booking site uses EC2 instances behind an ALB. CPU is consistently high during peak traffic, and request latency rises. What should be configured? The architecture review board prefers a managed AWS-native control.

A.A VPC endpoint for CloudWatch only

B.Auto Scaling policy based on an appropriate CloudWatch metric

C.S3 Object Lock

D.Disable health checks

AnswerB

Auto Scaling adds capacity when load increases and removes it when load falls.

Why this answer

The correct answer is B because an Auto Scaling policy based on an appropriate CloudWatch metric (such as CPUUtilization or request latency) dynamically adds or removes EC2 instances to match demand. This managed AWS-native control directly addresses high CPU and rising latency during peak traffic by scaling out capacity, which is the preferred approach per the architecture review board's requirement for a managed solution.

Exam trap

The trap here is that candidates may confuse monitoring (VPC endpoints) or data protection (S3 Object Lock) with performance scaling, overlooking that Auto Scaling is the direct AWS-native solution for handling variable load and high CPU.

How to eliminate wrong answers

Option A is wrong because a VPC endpoint for CloudWatch only provides private connectivity to CloudWatch APIs, not scaling or performance improvement; it does not reduce CPU load or latency. Option C is wrong because S3 Object Lock is a data protection feature for preventing object deletion or overwrites in S3, unrelated to EC2 performance or scaling. Option D is wrong because disabling health checks would cause the ALB to route traffic to unhealthy instances, worsening latency and availability, not solving the high CPU issue.

Full explanation →

191

MCQhard

A platform team lets application teams create IAM roles in member accounts through Infrastructure as Code. Security says every new role must stay within a centrally approved permission ceiling, even if someone later attaches broader managed policies or inline policies. Which control should be used to enforce that maximum permission set?

A.Use an AWS Organizations service control policy to grant the role all needed permissions directly.

B.Attach a permissions boundary to each role so the role can never exceed the approved ceiling.

C.Use a resource-based policy on Amazon S3 to restrict the permissions that IAM roles can receive.

D.Require temporary STS session policies whenever the role is assumed.

AnswerB

A permissions boundary is specifically designed to cap the maximum permissions a role can ever receive, regardless of what identity-based policies are attached later. If a developer adds a broader managed policy or inline policy, the effective permissions still cannot exceed the boundary. This makes it the best fit for delegated role creation with a centrally approved ceiling.

Why this answer

A permissions boundary is an AWS IAM feature that sets the maximum permissions an IAM role can have. When attached to a role, any policy that grants permissions beyond the boundary is effectively ignored, ensuring the role cannot exceed the approved permission ceiling even if broader managed or inline policies are later attached. This directly enforces the security requirement without restricting the application teams' ability to create roles via Infrastructure as Code.

Exam trap

The trap here is confusing service control policies (SCPs) with permissions boundaries: SCPs apply to all principals in an account and cannot be used to set a per-role permission ceiling, while permissions boundaries are specifically designed for that granular control.

How to eliminate wrong answers

Option A is wrong because an AWS Organizations service control policy (SCP) applies to all principals in an account or OU, not to a specific role, and granting permissions directly via SCP would not prevent the role from exceeding the ceiling—it would actually add permissions, not restrict them. Option C is wrong because a resource-based policy on Amazon S3 can only control access to that S3 resource, not restrict the permissions that IAM roles can receive across all services. Option D is wrong because requiring temporary STS session policies only limits permissions during a specific session, but the role itself could still have broader permissions attached, violating the permanent permission ceiling requirement.

Full explanation →

192

MCQhard

Based on the exhibit, users must access private PDF reports only through CloudFront. Direct requests to the S3 object URL must fail, and the bucket should not be publicly readable. Which solution is the best fit?

A.Enable CloudFront Origin Access Control for the distribution and update the bucket policy to allow only the CloudFront distribution principal with its SourceArn.

B.Keep the bucket public and require signed URLs at CloudFront, because signed URLs automatically block all direct S3 requests.

C.Add an S3 access point and allow the CloudFront distribution to use it without changing the bucket policy.

D.Attach AWS WAF to the distribution and block requests that do not include a signed cookie.

AnswerA

Origin Access Control is the modern pattern for restricting S3 origins to CloudFront. The bucket policy can then permit only the specific distribution, preventing direct S3 access while keeping the content private. Signed URLs or cookies can still be used at the viewer layer for authorization.

Why this answer

Option A is correct because CloudFront Origin Access Control (OAC) allows you to restrict access to an S3 bucket so that only the specific CloudFront distribution can retrieve objects. By updating the bucket policy to allow the CloudFront distribution principal with its SourceArn, you ensure that direct requests to the S3 object URL are denied, while CloudFront-signed URLs or cookies can still control user access. This meets the requirement of blocking direct S3 access while keeping the bucket private.

Exam trap

The trap here is that candidates often assume signed URLs or cookies alone can block direct S3 access, but they only control access at the CloudFront level, not at the S3 bucket level, so the bucket must still be private and explicitly restricted to CloudFront.

How to eliminate wrong answers

Option B is wrong because making the bucket public violates the requirement that the bucket should not be publicly readable; signed URLs at CloudFront do not block direct S3 requests if the bucket itself is public. Option C is wrong because an S3 access point alone does not restrict access to only CloudFront; you would still need a bucket policy or OAC to prevent direct S3 access, and the access point does not inherently block requests that bypass CloudFront. Option D is wrong because AWS WAF attached to CloudFront can block requests based on signed cookies, but it does not prevent direct requests to the S3 object URL, which bypass CloudFront entirely.

Full explanation →

193

MCQhard

A risk simulation workload generates analytics files that are accessed unpredictably. Some files become hot again months later. The team wants automatic storage cost optimisation without retrieval delays. What should be used? The design must avoid adding custom operational scripts.

A.Manual monthly review and object copying

B.S3 Glacier Flexible Retrieval for all files

C.S3 Intelligent-Tiering

D.EFS One Zone for analytics files

AnswerC

Intelligent-Tiering automatically moves objects between access tiers based on usage while preserving low-latency access.

Why this answer

S3 Intelligent-Tiering automatically moves objects between access tiers (frequent, infrequent, and archive instant access) based on changing access patterns, with no retrieval delays for hot objects and no operational overhead. This matches the unpredictable access pattern where files become hot again months later, as Intelligent-Tiering monitors access at the object level and adjusts storage class without manual intervention or custom scripts.

Exam trap

The trap here is that candidates may choose S3 Glacier Flexible Retrieval because it is cheaper for cold data, but they overlook the 'no retrieval delays' requirement, as Glacier Flexible Retrieval has a retrieval time of minutes to hours, making it unsuitable for files that become hot again unpredictably.

How to eliminate wrong answers

Option A is wrong because manual monthly review and object copying introduces operational overhead and potential retrieval delays, violating the requirement to avoid custom operational scripts and automatic cost optimisation. Option B is wrong because S3 Glacier Flexible Retrieval has retrieval delays (minutes to hours) for files that become hot again, which violates the 'no retrieval delays' requirement. Option D is wrong because EFS One Zone is a file system, not an object storage service, and does not provide automatic storage class tiering based on access patterns; it also incurs costs for all data regardless of access frequency.

Full explanation →

194

Multi-Selecthard

A batch job runs on EC2 instances in isolated private subnets with no NAT Gateway. The job uses STS AssumeRole to access an operations account and then retrieves a secret from AWS Secrets Manager. After a network hardening change, both calls fail. Which two interface VPC endpoints should be created? Select two.

Select 2 answers

A.An interface VPC endpoint for AWS STS so the job can call AssumeRole without internet access.

B.An interface VPC endpoint for AWS Secrets Manager so the job can retrieve the secret privately.

C.A gateway VPC endpoint for Amazon S3 so the job can reach the secret store indirectly.

D.A NAT Gateway in each Availability Zone so the job can use the public service endpoints.

E.An internet gateway attached to the VPC so private subnets can reach AWS APIs.

AnswersA, B

The application explicitly calls STS AssumeRole, so it needs private network access to the STS service. An interface endpoint provides that path inside the VPC without requiring a NAT Gateway or public internet route.

Why this answer

Option A is correct because the batch job uses STS AssumeRole, which requires calling the AWS STS API. Without a NAT Gateway or Internet Gateway, private subnets cannot reach public endpoints. An interface VPC endpoint for STS allows the EC2 instances to call AssumeRole privately using AWS PrivateLink, without needing internet access.

Exam trap

The trap here is that candidates often confuse gateway VPC endpoints (for S3 and DynamoDB) with interface VPC endpoints (for most other AWS services like STS and Secrets Manager), leading them to select option C instead of understanding that Secrets Manager requires an interface endpoint.

Full explanation →

195

Multi-Selectmedium

A web application uses Amazon RDS for MySQL in a Multi-AZ deployment. During a planned maintenance event, the team wants to understand which two statements about failover are accurate so they can design connection handling correctly. Which two statements are accurate? Select two.

Select 2 answers

A.RDS automatically promotes the synchronous standby in the same Region if the primary instance becomes unavailable.

B.The standby instance can serve production read traffic to improve read scaling in standard RDS Multi-AZ.

C.The application must permanently change its connection string to a new host after failover completes.

D.Existing database connections are interrupted and the application should retry by reconnecting to the same database endpoint.

E.Multi-AZ provides protection against a full Region outage without any additional design changes.

AnswersA, D

This is the core Multi-AZ availability behavior. AWS manages the standby and promotes it automatically during failure or maintenance events, keeping the database available within the Region with minimal administrative effort.

Why this answer

Option A is correct because in a Multi-AZ RDS for MySQL deployment, Amazon RDS automatically maintains a synchronous standby replica in a different Availability Zone. If the primary instance becomes unavailable due to a planned maintenance event or failure, RDS automatically fails over to the standby, promoting it to become the new primary. This failover is transparent to the application when using the same DNS endpoint, as RDS updates the DNS record to point to the new primary.

Exam trap

The trap here is that candidates often confuse the passive standby in standard Multi-AZ with a readable replica, or assume that Multi-AZ provides cross-Region disaster recovery, when in fact it only protects against AZ-level failures within a single Region.

Full explanation →

196

MCQeasy

A company needs an Amazon RDS database that automatically fails over to a standby when the primary DB instance becomes unavailable. Which approach best meets the requirement with minimal operational effort?

A.Keep the DB as a single-AZ instance and implement a manual process to promote a standby when needed.

B.Deploy the DB as a Multi-AZ DB instance so AWS maintains a synchronous standby in another Availability Zone and performs automated failover.

C.Enable versioned backups only, and restore the database each time the primary instance becomes unavailable.

D.Replicate the database to another region and switch clients to the secondary region using manual DNS changes.

AnswerB

RDS Multi-AZ provisions a synchronous standby in a different Availability Zone within the same AWS Region. When the primary DB instance is unavailable, AWS performs automated failover to the standby, reducing downtime without custom scripts.

Why this answer

Option B is correct because Amazon RDS Multi-AZ automatically provisions and maintains a synchronous standby replica in a different Availability Zone. When the primary DB instance fails, AWS handles the automatic failover to the standby with zero manual intervention, meeting the requirement with minimal operational effort.

Exam trap

The trap here is that candidates often confuse Multi-AZ (synchronous replication, automatic failover) with Multi-Region (asynchronous replication, manual or automated cross-region failover) or assume that backups alone can provide high availability, but backups do not offer automatic failover or minimal downtime.

How to eliminate wrong answers

Option A is wrong because a single-AZ instance has no standby, and a manual process to promote a standby would require creating a new instance from a snapshot or read replica, which incurs significant downtime and operational overhead. Option C is wrong because versioned backups alone do not provide a standby; restoring from a backup can take minutes to hours, resulting in unacceptable downtime and data loss. Option D is wrong because cross-region replication requires manual DNS changes to redirect traffic, introduces higher latency, and involves more operational complexity than a Multi-AZ deployment within a single region.

Full explanation →

197

MCQmedium

A Lambda function for a order processing API needs to read a database password. The password must rotate automatically every 30 days and should not be stored in environment variables. Which service should be used? The design must avoid adding custom operational scripts.

A.AWS Secrets Manager with rotation enabled

B.An encrypted object in Amazon S3

C.AWS Systems Manager Parameter Store SecureString without automation

D.A KMS-encrypted Lambda environment variable

AnswerA

Secrets Manager stores secrets securely and supports automatic rotation using a rotation Lambda function.

Why this answer

AWS Secrets Manager is the correct choice because it is purpose-built for securely storing and automatically rotating database credentials. It natively supports rotation every 30 days via a built-in Lambda rotation function, without requiring any custom operational scripts. This meets the requirement to avoid storing the password in environment variables and to automate rotation.

Exam trap

The trap here is that candidates often confuse AWS Systems Manager Parameter Store (which can store SecureStrings but lacks native rotation) with Secrets Manager, or they assume that encrypting a value at rest (e.g., in S3 or environment variables) is sufficient, ignoring the operational burden of manual rotation and the requirement for automatic rotation every 30 days.

How to eliminate wrong answers

Option B is wrong because an encrypted object in Amazon S3 requires custom code to retrieve, decrypt, and rotate the password, and it lacks built-in automatic rotation, violating the 'no custom operational scripts' constraint. Option C is wrong because AWS Systems Manager Parameter Store SecureString does not support automatic rotation without additional automation (e.g., a custom Lambda function), so it fails the 30-day rotation requirement. Option D is wrong because a KMS-encrypted Lambda environment variable stores the password statically in the function configuration, cannot be rotated automatically, and exposes the password to anyone with access to the Lambda configuration or logs.

Full explanation →

198

MCQmedium

A backend service uses an IAM role to read files from an S3 bucket. It must only read objects under s3://prod-reporting/incoming/ but currently receives AccessDenied (403) on GetObject for that prefix. The role already has this statement: - Action: s3:ListBucket - Resource: arn:aws:s3:::prod-reporting Which policy statement would most directly follow least privilege to allow only the required reads under the incoming prefix?

A.Allow only listing and reading with a single statement: Action = ["s3:*"], Resource = ["arn:aws:s3:::prod-reporting/incoming/*"].

B.Allow reads with a prefix-scoped statement: Action = ["s3:GetObject"], Resource = ["arn:aws:s3:::prod-reporting/incoming/*"].

C.Allow all S3 reads at the account level: Action = ["s3:GetObject"], Resource = ["arn:aws:s3:::*"].

D.Allow bucket listing with a condition that forces the prefix: Action = ["s3:ListBucket"], Resource = ["arn:aws:s3:::prod-reporting"], Condition = {"StringLike": {"s3:prefix": "incoming/*"}}.

AnswerB

This grants only the specific action s3:GetObject and scopes it to the exact prefix that the service needs. It aligns with least privilege by avoiding extra permissions like PutObject or DeleteObject. Since the service already has ListBucket, this completes the required read path for objects in incoming.

Why this answer

Option B is correct because it grants only the s3:GetObject permission on the specific prefix path arn:aws:s3:::prod-reporting/incoming/*, which directly allows reading objects under that prefix while adhering to least privilege. The existing s3:ListBucket permission already enables listing the bucket, so only the missing read action needs to be added.

Exam trap

The trap here is that candidates often confuse granting a ListBucket condition (Option D) with granting GetObject access, not realizing that the AccessDenied error on GetObject requires a separate s3:GetObject permission on the object ARN.

How to eliminate wrong answers

Option A is wrong because it uses s3:* which grants all S3 actions (including write, delete, etc.) on the prefix, violating least privilege. Option C is wrong because it grants s3:GetObject on all S3 buckets (arn:aws:s3:::*), which is overly permissive and not scoped to the required bucket or prefix. Option D is wrong because it only adds a condition to the existing s3:ListBucket action, but the AccessDenied error is on GetObject, not ListBucket; this statement does not grant the read permission needed to resolve the 403 error.

Full explanation →

199

Multi-Selecthard

A studio keeps 4 PB of completed video projects in Amazon S3. Editors work on active projects for about 60 days, auditors occasionally review the same objects for several months, and legal policy requires retention for 7 years. Retrieval of very old files can take hours. Which three actions should the architect recommend? Select three.

Select 3 answers

A.Transition objects to S3 Standard-IA after 60 days.

B.Transition objects to S3 Glacier Deep Archive after the review period ends.

C.Expire objects after 7 years.

D.Keep the files in S3 Standard indefinitely so retrieval is always fast.

E.Copy the files to a single EBS volume for lower per-GB cost.

AnswersA, B, C

Standard-IA is a good fit after the active editing window because the objects are accessed less often but still need relatively quick retrieval.

Why this answer

Option A is correct because after 60 days of active editing, objects can be transitioned to S3 Standard-IA, which offers lower storage costs than S3 Standard while still providing low-latency retrieval for occasional access by auditors. This lifecycle policy optimizes cost without sacrificing availability for the review period.

Exam trap

The trap here is that candidates may think S3 Standard must be retained for fast retrieval at all times, but the scenario explicitly states retrieval of very old files can take hours, so using Glacier Deep Archive for long-term retention is acceptable and cost-effective.

Full explanation →

200

MCQmedium

Developers for a image sharing application need temporary elevated access to production resources for troubleshooting. The security team wants approvals, expiry, and audit logging. Which approach is best?

A.Disable CloudTrail during troubleshooting

B.Use IAM Identity Center permission sets with time-bound access processes and CloudTrail auditing

C.Create shared administrator access keys for the team

D.Attach AdministratorAccess permanently to every developer role

AnswerB

Federated access with permission sets and audited temporary assignments reduces standing privilege.

Why this answer

IAM Identity Center (formerly AWS SSO) allows you to define permission sets that grant time-bound, elevated access to production resources. By integrating with CloudTrail, every access request, approval, and action is logged for audit. This meets the security team's requirements for approvals, expiry, and audit logging without compromising security.

Exam trap

The trap here is that candidates may think shared keys or permanent admin roles are acceptable for troubleshooting, but the question explicitly requires approvals, expiry, and audit logging, which only IAM Identity Center with time-bound permission sets and CloudTrail can fully satisfy.

How to eliminate wrong answers

Option A is wrong because disabling CloudTrail during troubleshooting removes all audit logging, violating the security team's requirement for audit logging. Option C is wrong because creating shared administrator access keys eliminates individual accountability, prevents expiry, and bypasses approval workflows, making it impossible to audit who performed which action. Option D is wrong because permanently attaching AdministratorAccess to every developer role grants persistent elevated privileges without time-bound access, approvals, or expiry, violating the principle of least privilege and the security team's requirements.

Full explanation →

201

MCQeasy

An application repeatedly reads the same DynamoDB items with very low latency requirements. The application can tolerate slightly stale data (for example, within a few seconds). You want to improve read latency without changing the existing DynamoDB table schema. Which service is the best choice?

A.Amazon DAX

B.Amazon S3 Transfer Acceleration

C.Amazon EFS

D.AWS CloudTrail for data plane reads

AnswerA

Amazon DAX is an in-memory cache specifically designed for DynamoDB reads. It can significantly reduce read latency for frequently accessed items. Because the application can tolerate brief staleness, DAX’s caching behavior is appropriate and does not require a DynamoDB schema change.

Why this answer

Amazon DAX (DynamoDB Accelerator) is an in-memory cache that sits between your application and DynamoDB, providing microsecond read latency for frequently accessed items. Since the application can tolerate slightly stale data (within seconds), DAX's default TTL-based caching is ideal because it reduces read pressure on DynamoDB while serving cached results with significantly lower latency than direct DynamoDB reads.

Exam trap

The trap here is confusing a caching layer (DAX) with unrelated acceleration or storage services (S3 Transfer Acceleration, EFS) or with auditing tools (CloudTrail), leading candidates to pick options that don't address DynamoDB read latency at all.

How to eliminate wrong answers

Option B (Amazon S3 Transfer Acceleration) is wrong because it accelerates uploads to S3 over long distances using edge locations, not DynamoDB reads, and does not address DynamoDB latency. Option C (Amazon EFS) is wrong because it is a file storage service for EC2 instances, not a cache for DynamoDB items, and introduces network filesystem overhead incompatible with sub-millisecond read requirements. Option D (AWS CloudTrail for data plane reads) is wrong because CloudTrail records API activity for auditing, not caching or accelerating data reads, and enabling it would add latency and cost without improving read performance.

Full explanation →

202

MCQmedium

A web application for a healthcare document service is behind an Application Load Balancer. The application must be protected from common SQL injection and cross-site scripting attacks with minimum operational overhead. What should the architect deploy?

A.Security groups on the application instances

B.AWS WAF associated with the Application Load Balancer

C.Network ACLs on the public subnets

D.AWS Shield Advanced only

AnswerB

AWS WAF can inspect HTTP requests and block common web exploits when associated with an ALB.

Why this answer

AWS WAF is a web application firewall that can be associated with an Application Load Balancer to filter and monitor HTTP/HTTPS requests. It includes managed rule sets specifically designed to block common web exploits like SQL injection and cross-site scripting (XSS) with minimal operational overhead, as AWS manages the rule updates.

Exam trap

The trap here is that candidates may confuse network-layer security controls (security groups or NACLs) with application-layer protection, assuming they can block web attacks, when in fact they only filter based on network attributes like IP addresses and ports.

How to eliminate wrong answers

Option A is wrong because security groups act as a virtual firewall at the instance level, controlling inbound and outbound traffic based on IP addresses and ports, but they cannot inspect application-layer payloads to detect SQL injection or XSS patterns. Option C is wrong because network ACLs operate at the subnet level and provide stateless filtering based on IP addresses, ports, and protocols, but they lack the deep packet inspection capability required to identify malicious web application attacks. Option D is wrong because AWS Shield Advanced provides DDoS protection against volumetric attacks, but it does not include the application-layer filtering needed to block SQL injection or XSS attacks.

Full explanation →

203

MCQmedium

A DynamoDB-backed multi-tenant app experiences throttling. Most write traffic for tenant 'ACME' targets a single logical stream of events (you write items for ACME in near-real time). The table currently uses partition key = tenantId and sort key = eventTimestamp. CloudWatch shows partition-level throttling concentrated in the ACME partition. What design change most directly improves write throughput for the hottest tenant while still enabling efficient queries for recent events for that tenant?

A.Add a Global Secondary Index (GSI) with the same partition key (tenantId) and eventTimestamp, and rely on the GSI to spread load.

B.Mitigate the hotspot by changing the partition key to include a shard value (for example, tenantId + '#' + shardId) and write using shardId. Query recent events by fanning out across ACME shards and merging results by eventTimestamp.

C.Increase the table’s write capacity (or on-demand baseline) without changing the partition key, because DynamoDB will automatically balance hotspots.

D.Switch the sort key to a random value to prevent writes from landing on the same physical partition.

AnswerB

In DynamoDB, the partition key controls which physical partitions receive traffic for that key value. By adding shardId into the partition key, ACME writes are distributed across multiple partitions, increasing aggregate write capacity and reducing partition-level throttling. Efficient recent-event queries are still possible by querying each ACME shard for the relevant time range (using eventTimestamp as the sort key) and merging the ordered results.

Why this answer

Option B is correct because it directly addresses the partition-level throttling by introducing a shard key (e.g., tenantId + '#' + shardId) as the partition key, which distributes ACME's write load across multiple physical partitions. To query recent events for ACME, the application must fan out queries across all shards and merge results by eventTimestamp, which is efficient because each shard holds a subset of the data and the sort key remains eventTimestamp for ordering.

Exam trap

The trap here is that candidates often think increasing provisioned capacity or switching to on-demand mode will automatically resolve a hot partition, but DynamoDB's per-partition throughput limit (3,000 RCU or 1,000 WCU) is a hard ceiling that cannot be overcome without redistributing the partition key.

How to eliminate wrong answers

Option A is wrong because adding a GSI with the same partition key (tenantId) does not spread the write load; the base table still experiences the same hotspot, and the GSI inherits the same partition-level throttling. Option C is wrong because DynamoDB does not automatically balance hotspots caused by a single partition key; increasing write capacity on a table with a skewed partition key only raises the per-partition limit but does not distribute the load across more partitions. Option D is wrong because changing the sort key to a random value would prevent efficient queries for recent events (since sort key ordering is lost) and does not change the partition key, so writes still target the same physical partition.

Full explanation →

204

MCQhard

Based on the exhibit, a serverless API on AWS Lambda experiences a predictable cold-start penalty every weekday at 09:00 UTC when a marketing campaign begins. The team wants the first requests to stay fast while minimizing extra cost during quiet periods. What is the best approach?

A.Enable provisioned concurrency on the published version and schedule it to scale up shortly before the spike.

B.Increase the Lambda timeout so cold starts have more time to complete.

C.Move the function behind an Application Load Balancer to improve warm-up behavior.

D.Increase the function memory to the maximum value and leave concurrency unchanged.

AnswerA

Provisioned concurrency keeps warm execution environments ready for the alias or version, which removes the cold-start penalty. Scheduling it only around the known spike keeps performance high while limiting unnecessary cost during idle periods.

Why this answer

Provisioned concurrency pre-warms a specified number of Lambda execution environments so that incoming requests do not incur a cold start. By scheduling the provisioned concurrency to scale up just before the 09:00 UTC spike and scale down afterward, the team eliminates the cold-start penalty during the campaign while minimizing cost during quiet periods. This directly addresses the predictable, time-bound traffic pattern without requiring code changes or over-provisioning.

Exam trap

The trap here is that candidates confuse increasing memory or timeout with solving cold starts, or they mistakenly think an ALB can pre-warm Lambda, when in fact only provisioned concurrency guarantees warm containers for the first requests in a predictable traffic spike.

How to eliminate wrong answers

Option B is wrong because increasing the Lambda timeout does not prevent cold starts; it only extends the maximum execution duration, which has no effect on the initialization latency of a new execution environment. Option C is wrong because placing the function behind an Application Load Balancer does not warm up the Lambda; ALB is a request router and does not maintain warm containers or alter Lambda's scaling behavior. Option D is wrong because increasing memory (which also increases CPU allocation) can reduce cold-start duration but does not eliminate it, and setting memory to the maximum value (10,240 MB) would significantly increase cost without guaranteeing zero cold starts for the first requests.

Full explanation →

205

MCQhard

Based on the exhibit, a workload in private subnets must reach only Amazon S3 and AWS Secrets Manager. The team wants to eliminate internet exposure for those calls and reduce NAT gateway charges. What change should be made?

A.Move the instances into a public subnet and restrict inbound access with security groups.

B.Add a NAT instance and disable the managed NAT gateway to lower cost.

C.Create an S3 gateway endpoint and a Secrets Manager interface endpoint with private DNS, then remove NAT dependency for those service calls.

D.Use VPC peering to a shared services VPC and route all AWS service traffic through that VPC.

AnswerC

S3 is best reached through a gateway VPC endpoint, while Secrets Manager requires an interface endpoint. With private DNS enabled, the application can resolve and reach those services without leaving AWS private networking. This removes the need for NAT traffic for those calls, cuts cost, and keeps service access off the public internet.

Why this answer

Option C is correct because VPC Gateway Endpoints (for S3) and Interface Endpoints (for Secrets Manager) allow private subnet instances to access these services over the AWS network without traversing the internet or a NAT gateway. Enabling private DNS on the interface endpoint ensures that the default Secrets Manager DNS name resolves to the endpoint's private IP, eliminating the need for a NAT gateway for those calls and reducing costs.

Exam trap

The trap here is that candidates often confuse Gateway Endpoints (for S3 and DynamoDB) with Interface Endpoints (for most other AWS services), and may incorrectly assume a single endpoint type works for all services, or that a NAT gateway is still required for private subnet traffic to AWS services.

How to eliminate wrong answers

Option A is wrong because moving instances to a public subnet would expose them to the internet, violating the requirement to eliminate internet exposure for the calls. Option B is wrong because a NAT instance still routes traffic through the internet (via an internet gateway) to reach AWS services, which does not eliminate internet exposure and introduces management overhead, though it may lower cost compared to a managed NAT gateway. Option D is wrong because VPC peering to a shared services VPC would still require a NAT gateway or internet gateway in that shared VPC to reach S3 and Secrets Manager, adding complexity and not directly eliminating internet exposure for those service calls.

Full explanation →

206

MCQmedium

A team runs an EC2-based API on a single Auto Scaling group (ASG). Over the last month, they observed: - Average CPU utilization is ~15%. - p95 latency is stable and within the performance target. - The attached EBS volumes are gp3, provisioned with high baseline IOPS/throughput “just to be safe,” but CloudWatch shows consistently low utilization of those provisioned IOPS/throughput limits. They want to reduce monthly cost while maintaining current performance. Which action is the best cost-optimized choice?

A.Stop resizing EBS and only scale out the ASG during peak traffic, because changing EBS performance settings risks latency spikes.

B.Right-size both the compute and the gp3 volumes: reduce the EC2 instance size (via the ASG launch template/desired capacity configuration) and update gp3 IOPS/throughput settings to match observed utilization while keeping p95 latency targets.

C.Switch the instances to EC2 Spot immediately, because Spot always lowers costs without adding operational risk or affecting performance.

D.Move the workload to a larger instance class and keep the gp3 settings unchanged to avoid operational tuning work.

AnswerB

The metrics indicate headroom that is not being used (low CPU, stable latency, and low gp3 utilization). The most direct cost optimization is to reduce overprovisioned spend by right-sizing the instance type and tuning gp3 IOPS/throughput to match actual demand. Because performance and latency are already stable, these changes are the most likely to reduce cost without degrading performance.

Why this answer

Option B is correct because the workload is over-provisioned in both compute and storage. Average CPU is only 15%, so a smaller instance size can handle the load without affecting p95 latency. The gp3 volumes have high baseline IOPS/throughput that are never used, so reducing them to match actual utilization directly lowers costs without performance risk.

Exam trap

The trap here is that candidates assume EBS performance settings are fixed or risky to change, or that scaling out the ASG is always the best cost optimization, when in fact gp3 allows flexible, no-downtime IOPS/throughput adjustments and the real savings come from matching provisioned resources to actual utilization.

How to eliminate wrong answers

Option A is wrong because stopping EBS resizing and only scaling out the ASG during peak traffic ignores the clear over-provisioning of gp3 IOPS/throughput, which is a direct source of unnecessary cost; changing gp3 settings (downward) does not risk latency spikes if you stay above the observed utilization. Option C is wrong because switching to Spot instances introduces the risk of interruption (Spot can be reclaimed with a 2-minute warning), which could cause API latency spikes or failures if the workload is not designed for Spot termination handling; the statement that Spot always lowers costs without operational risk is false. Option D is wrong because moving to a larger instance class would increase compute cost without addressing the over-provisioned gp3 volumes, and the current performance is already meeting targets, so larger instances are unnecessary.

Full explanation →

207

MCQeasy

A company has an Amazon S3 bucket for sensitive reports. They must ensure that any object uploaded with s3:PutObject is encrypted using AWS KMS (SSE-KMS). Which S3 bucket policy approach best enforces this by denying uploads that do not use SSE-KMS?

A.Use a Deny statement for s3:PutObject with a condition that denies requests where s3:x-amz-server-side-encryption is not "aws:kms" (SSE-KMS), for example: Condition { StringNotEquals: { "s3:x-amz-server-side-encryption": "aws:kms" } }

B.Use a Deny statement that denies requests when aws:SecureTransport is false.

C.Use a Deny statement that checks the specific KMS key ID (s3:x-amz-server-side-encryption-aws-kms-key-id) and denies requests that don’t match a single alias value.

D.Use a Deny or Allow statement that limits object keys using s3:prefix (for example, only allow keys under "reports/").

AnswerA

This directly checks the SSE encryption header used in the PutObject request. If a client uploads without SSE-KMS (for example, no encryption header or SSE-S3/AES256), the condition evaluates to true and the Deny prevents the upload.

Why this answer

Option A is correct because it uses a Deny statement with the condition `StringNotEquals` on the `s3:x-amz-server-side-encryption` request header, which explicitly denies any `s3:PutObject` request that does not include the value `aws:kms` for that header. This ensures that only objects encrypted with SSE-KMS are uploaded, as any request lacking the header or using a different encryption type (e.g., AES256) will be denied. The condition is evaluated at the time of the request, making it an effective enforcement mechanism.

Exam trap

The trap here is that candidates often confuse encryption in transit (HTTPS) with encryption at rest (SSE), leading them to pick Option B, which only ensures secure transport but does not enforce server-side encryption with KMS.

How to eliminate wrong answers

Option B is wrong because `aws:SecureTransport` checks for HTTPS (TLS) usage, not encryption at rest; it would allow uploads without SSE-KMS as long as they use HTTPS. Option C is wrong because checking the specific KMS key ID (`s3:x-amz-server-side-encryption-aws-kms-key-id`) only enforces that a particular key is used, but does not require SSE-KMS at all—requests with no encryption header or with SSE-S3 would not be denied unless the key ID condition is also paired with an encryption type check. Option D is wrong because restricting object keys with `s3:prefix` controls which paths objects can be uploaded to, but has no effect on encryption requirements; objects could be uploaded without SSE-KMS under the allowed prefix.

Full explanation →

208

MCQeasy

A data processing application runs on a single EC2 instance and needs persistent block storage with sustained low-latency random read/write performance (high IOPS). Which storage choice is most appropriate?

A.EBS io2 provisioned IOPS SSD

B.Amazon S3 Standard

C.Amazon EFS for POSIX file sharing between multiple instances

D.EBS Throughput Optimized HDD (st1) storage

AnswerA

EBS io2 is built for high-performance, low-latency block storage with provisioned IOPS.

Why this answer

Amazon EBS io2 Provisioned IOPS SSD volumes are designed for I/O-intensive workloads that require sustained, low-latency random read/write performance with high IOPS. They provide consistent performance by allowing you to provision a specific IOPS level (up to 256,000 IOPS per volume) and offer a 99.999% durability guarantee, making them ideal for a single EC2 instance needing persistent block storage.

Exam trap

The trap here is that candidates often confuse throughput-optimized HDD (st1) with IOPS-optimized SSD (io2) because both are EBS volume types, but st1 is designed for sequential, not random, I/O workloads.

How to eliminate wrong answers

Option B is wrong because Amazon S3 is an object storage service accessed via HTTP/HTTPS, not a block storage device, and cannot be attached directly to an EC2 instance for low-latency random read/write operations. Option C is wrong because Amazon EFS is a file-level, NFS-based storage service designed for shared access across multiple EC2 instances, not for providing persistent block storage with sustained low-latency random I/O to a single instance. Option D is wrong because EBS Throughput Optimized HDD (st1) volumes are optimized for large, sequential workloads (e.g., big data, log processing) and cannot deliver the sustained low-latency random I/O performance required for high IOPS workloads.

Full explanation →

209

MCQmedium

A SaaS platform serves an API using two regional deployments: us-east-1 (primary) and us-west-2 (secondary). Each region has its own ALB. The business requires automated DNS-based failover when the primary region becomes unhealthy, and they do not want manual DNS changes during incidents. Which Route 53 configuration is the best match?

A.Create a single Route 53 record using weighted routing across both ALBs with weights adjusted manually during an incident.

B.Use Route 53 failover routing with a primary record pointing to the us-east-1 ALB and a secondary record pointing to the us-west-2 ALB, each using health checks.

C.Use latency-based routing so Route 53 always selects the fastest region; health checks are unnecessary because client latency reflects availability.

D.Use a single A record with a static IP address that points to a NAT gateway, and update that IP during failure events.

AnswerB

Failover routing with health checks enables automatic switching of DNS responses when the primary endpoint fails health evaluation.

Why this answer

Route 53 failover routing is designed specifically for active-passive failover scenarios where you have a primary and secondary resource. By associating health checks with each record, Route 53 automatically detects when the primary ALB in us-east-1 becomes unhealthy and routes traffic to the secondary ALB in us-west-2 without manual intervention. This meets the requirement for automated DNS-based failover without manual DNS changes.

Exam trap

The trap here is that candidates may confuse latency-based routing with failover routing, assuming that lowest latency implies health, but latency routing does not consider endpoint health and will continue sending traffic to an unhealthy region if it is still the fastest.

How to eliminate wrong answers

Option A is wrong because weighted routing requires manual adjustment of weights during an incident, which violates the requirement for automated failover without manual DNS changes. Option C is wrong because latency-based routing selects the region with the lowest latency for each user, not based on health; it does not provide failover when a region becomes unhealthy, and health checks are not used to determine routing decisions. Option D is wrong because using a static IP pointing to a NAT gateway is not a scalable or resilient approach for an API served by ALBs, and updating the IP during failure events requires manual intervention, which contradicts the automation requirement.

Full explanation →

210

Drag & Dropmedium

Order the steps to set up an Amazon CloudFront distribution with an S3 origin.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps

Order

Why this order

S3 bucket first, then CloudFront distribution, cache settings, OAI creation, and bucket policy update.

Full explanation →

211

MCQhard

Based on the exhibit, DNS still sends traffic to the primary Region even though Route 53 health checks show the primary endpoint is unhealthy. What is the best change to make failover work as intended?

A.Change both records to weighted routing with a 50/50 split so Route 53 can shift traffic gradually.

B.Use a failover routing policy with a primary record and a secondary record, and attach the health check to the primary record.

C.Switch to latency-based routing so users are always directed to the lowest-latency Region.

D.Use geolocation routing so clients in one Region are sent to the healthier endpoint.

AnswerB

Failover routing is designed for active-passive DNS behavior. With a primary and secondary record, Route 53 answers with the primary record when it is healthy and returns the secondary record when the primary health check fails. The exhibit shows simple routing, which does not express the failover intent. Switching to failover routing aligns the DNS policy with the stated requirement.

Why this answer

Option B is correct because a failover routing policy with a health check attached to the primary record is the only configuration that allows Route 53 to automatically stop sending traffic to an unhealthy primary endpoint and redirect it to the secondary endpoint. Without the health check attached to the primary record, Route 53 has no mechanism to detect the failure and will continue routing traffic to the primary Region, even if the health check status shows unhealthy.

Exam trap

The trap here is that candidates assume Route 53 automatically uses health check status to influence routing regardless of the routing policy, but in reality, health checks only affect routing when explicitly attached to a record in a failover or weighted routing policy.

How to eliminate wrong answers

Option A is wrong because weighted routing distributes traffic based on weights, not failover; it does not automatically shift all traffic away from an unhealthy endpoint, and a 50/50 split would still send half the traffic to the unhealthy primary. Option C is wrong because latency-based routing directs users to the endpoint with the lowest latency, not based on health; it does not provide automatic failover when a health check fails. Option D is wrong because geolocation routing directs traffic based on the geographic location of the user, not on endpoint health; it cannot automatically reroute traffic away from an unhealthy primary endpoint.

Full explanation →

212

Multi-Selecthard

A company is encrypting sensitive S3 data for a IoT ingestion API with AWS KMS. Which two controls help prevent accidental use of the KMS key by unauthorized principals?

Select 2 answers

A.IAM policies that grant kms:Decrypt only to required application roles

B.S3 Transfer Acceleration

C.A key policy that limits key administrators and key users

D.A larger KMS key rotation period

AnswersA, C

IAM permissions should grant least-privilege use of the KMS key to specific roles.

Why this answer

IAM policies that grant kms:Decrypt only to required application roles ensure that only authorized principals can decrypt data encrypted with the KMS key. By explicitly allowing only the Decrypt action and restricting it to specific roles, you prevent unauthorized principals from accidentally using the key for decryption or other operations, even if they have access to the encrypted S3 objects.

Exam trap

The trap here is that candidates often confuse key rotation (a cryptographic hygiene measure) with access control, or mistakenly think network-level features like Transfer Acceleration can restrict key usage.

Full explanation →

213

Drag & Dropmedium

Arrange the steps to implement a disaster recovery plan using AWS Elastic Disaster Recovery (DRS).

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps

Order

Why this order

Agent installation, replication config, launch recovery instance, test, and failback.

Full explanation →

214

Drag & Dropmedium

Order the steps to restore an Amazon RDS DB instance from a snapshot.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps

Order

Why this order

Snapshot selection, restore, configure, redirect app, then delete old instance.

Full explanation →

215

MCQeasy

A media company runs a batch job that processes image thumbnails. The job can be restarted from checkpoints and does not have user-facing SLAs. The batch capacity can tolerate interruptions. Which EC2 purchasing option is the best cost optimization choice?

A.Use On-Demand Instances because interruptions are not allowed for production workloads.

B.Use EC2 Spot Instances, accepting the possibility of interruptions and using checkpoints to resume.

C.Purchase Reserved Instances because they provide a discount regardless of the workload timing.

D.Buy Savings Plans because they guarantee capacity and remove the risk of interruptions entirely.

AnswerB

Spot Instances are typically the cheapest option for workloads that can tolerate interruptions with recovery.

Why this answer

Spot Instances offer significant cost savings (up to 90% compared to On-Demand) and are ideal for fault-tolerant, stateless, or checkpointable workloads. Since the batch job can restart from checkpoints and tolerates interruptions, Spot Instances provide the best cost optimization without compromising functionality.

Exam trap

The trap here is that candidates often assume Spot Instances are only for non-production or test workloads, but the SAA-C03 exam emphasizes that Spot Instances are suitable for any fault-tolerant or checkpointable production workload, including batch processing, big data, and containerized applications.

How to eliminate wrong answers

Option A is wrong because On-Demand Instances are not cost-optimized for workloads that can tolerate interruptions; the statement that 'interruptions are not allowed for production workloads' is a misconception, as many production workloads (e.g., batch processing, CI/CD) run successfully on Spot. Option C is wrong because Reserved Instances require a 1- or 3-year commitment and are best for steady-state, predictable workloads, not for a batch job that can be interrupted and does not need guaranteed capacity. Option D is wrong because Savings Plans provide a discount based on a commitment to a consistent amount of compute usage (measured in $/hour) but do not guarantee capacity or remove the risk of interruptions; they are a billing discount mechanism, not a capacity reservation.

Full explanation →

216

MCQhard

A patient portal must use shared file storage across Linux EC2 instances in multiple Availability Zones. The storage must remain available during an AZ failure. Which service should be used?

A.Instance store volumes

B.Amazon EFS with mount targets in multiple Availability Zones

C.An EBS volume attached to all instances

D.S3 mounted as a POSIX file system without a file gateway

AnswerB

EFS is regional file storage and supports mount targets across AZs.

Why this answer

Amazon EFS provides a scalable, fully managed NFS file system that can be mounted concurrently on multiple Linux EC2 instances. By creating mount targets in multiple Availability Zones, the file system remains accessible even if one AZ fails, ensuring high availability and shared file storage across instances.

Exam trap

The trap here is that candidates may confuse EBS multi-attach (which is limited to specific instance types and does not span AZs) with the true multi-AZ shared file system capability of EFS.

How to eliminate wrong answers

Option A is wrong because instance store volumes are ephemeral and tied to a single EC2 instance; they cannot be shared across instances or survive an AZ failure. Option C is wrong because an EBS volume can only be attached to one EC2 instance at a time (unless using multi-attach, which is limited to specific instance types and not designed for shared file storage across AZs). Option D is wrong because mounting S3 as a POSIX file system without a file gateway (e.g., using s3fs-fuse) does not provide consistent POSIX semantics, lacks strong read-after-write consistency, and is not designed for high-availability shared file storage across AZs.

Full explanation →

217

MCQmedium

A public API for a image sharing application is deployed on API Gateway. Clients must authenticate with standards-based tokens issued by an external OpenID Connect provider. Which authorization mechanism should be used?

A.A VPC endpoint policy

B.API keys only

C.JWT authorizer configured for the OpenID Connect issuer

D.IAM authorization for all internet users

AnswerC

A JWT authorizer validates tokens from a trusted OIDC issuer with low operational overhead.

Why this answer

Option C is correct because API Gateway's JWT authorizer natively validates JSON Web Tokens (JWTs) issued by an external OpenID Connect (OIDC) provider. It verifies the token's signature, expiry, and issuer against the OIDC provider's JWKS endpoint, enabling standards-based authentication without custom Lambda code.

Exam trap

The trap here is that candidates often confuse API keys (simple identification) with authentication, or assume IAM authorization is required for all API Gateway endpoints, overlooking the purpose-built JWT authorizer for federated OIDC tokens.

How to eliminate wrong answers

Option A is wrong because a VPC endpoint policy controls access to API Gateway via VPC endpoints, not authentication for external clients using OIDC tokens. Option B is wrong because API keys alone provide only client identification, not authentication; they do not validate identity or token claims from an OIDC provider. Option D is wrong because IAM authorization requires AWS Signature Version 4 signing, which is not suitable for internet users with external OIDC tokens and does not support standards-based token validation.

Full explanation →

218

MCQeasy

Based on the exhibit, which Amazon EFS performance mode is the best fit for this workload?

A.Use General Purpose performance mode for low-latency access.

B.Use Max I/O performance mode to optimize for the highest possible latency tolerance.

C.Use One Zone storage class to increase metadata speed.

D.Use Provisioned Throughput mode because it is the only performance mode available.

AnswerA

General Purpose is the best EFS performance mode when the priority is low latency for small file operations. The exhibit describes a moderate number of clients and latency-sensitive metadata access, which matches the strengths of General Purpose. It is the usual choice for most applications unless the workload specifically needs very large-scale parallel throughput.

Why this answer

General Purpose performance mode is the best fit for this workload because it provides the lowest latency for file operations, which is critical for applications like content management, web serving, or home directories that require consistent, sub-millisecond metadata latency. Max I/O mode, in contrast, trades off latency for higher throughput and IOPS, making it unsuitable for latency-sensitive workloads.

Exam trap

The trap here is that candidates confuse performance modes (General Purpose vs. Max I/O) with throughput modes (Bursting vs. Provisioned) or storage classes (Standard vs.

One Zone), leading them to select options that address throughput or availability instead of latency requirements.

How to eliminate wrong answers

Option B is wrong because Max I/O performance mode is designed for high throughput and IOPS at the cost of higher latency, not for optimizing latency tolerance; it is intended for large-scale, parallel workloads like big data analytics. Option C is wrong because One Zone storage class is a storage class that stores data in a single Availability Zone, not a performance mode, and it does not affect metadata speed; metadata performance is determined by the performance mode, not the storage class. Option D is wrong because Provisioned Throughput mode is a throughput mode, not a performance mode; Amazon EFS offers two performance modes (General Purpose and Max I/O) and two throughput modes (Bursting and Provisioned), and Provisioned Throughput is not a performance mode.

Full explanation →

219

MCQmedium

A web app runs on an EC2 Auto Scaling group behind an Application Load Balancer (ALB). The ALB is configured with health checks and the ASG spans three subnets in three Availability Zones. During an AZ outage, monitoring shows the number of healthy instances drops sharply and never returns to the original capacity until the ASG is manually adjusted. What change most directly improves resilience so capacity returns automatically during an AZ failure?

A.Reduce the ASG desired capacity by 1 and rely on the ALB to route traffic to fewer instances during the outage.

B.Configure the ASG to use the ALB target-group health checks (ELB/target-group health) and ensure the ASG has at least two subnets in different Availability Zones that remain available for instance placement.

C.Move the ALB to only one subnet so health checks and routing remain consistent during the outage.

D.Add an S3 event trigger to terminate unhealthy instances so the ASG can scale back out using its scheduled actions.

AnswerB

If the AZ outage prevents the ALB from reaching targets, instance-level (EC2) health checks may still consider instances “healthy” because the instances are running. When the ASG is configured to use ALB/target-group health (ASG health check type set to ELB and tied to the target group), the ASG can detect application-level unreachability and replace unhealthy instances. With multiple eligible subnets across different AZs, the ASG can launch replacement instances in the remaining AZs and automatically return to the configured desired capacity.

Why this answer

Option B is correct because configuring the ASG to use ALB target-group health checks (ELB health checks) ensures that the ASG replaces instances that fail the ALB's health checks, including those in an impaired AZ. By also ensuring the ASG has at least two subnets in different AZs that remain available, the ASG can launch replacement instances in the healthy AZs when one AZ fails, automatically restoring capacity without manual intervention.

Exam trap

The trap here is that candidates assume EC2 status checks are sufficient for AZ failure detection, but they fail to recognize that an instance in a failed AZ may still pass EC2 status checks while being unreachable via the network, so only ALB target-group health checks trigger the ASG to replace them.

How to eliminate wrong answers

Option A is wrong because reducing the desired capacity does not address the root cause; the ASG will not automatically replace instances in the failed AZ, and the ALB simply routes traffic to fewer instances, leaving the capacity deficit permanent. Option C is wrong because moving the ALB to only one subnet creates a single point of failure, defeating the purpose of multi-AZ resilience and potentially causing the ALB itself to become unavailable during an AZ outage. Option D is wrong because S3 event triggers are not designed to terminate unhealthy instances or trigger ASG scaling; scheduled actions are time-based and cannot react to dynamic failures like an AZ outage, and the described mechanism is not a valid AWS pattern for health-based replacement.

Full explanation →

220

MCQmedium

A claims workflow uses an RDS MySQL database and must remain available during an Availability Zone failure with minimal application changes. What should the architect enable?

A.S3 Cross-Region Replication

B.Multi-AZ deployment for the RDS DB instance

C.EBS snapshots every hour

D.Read replicas only

AnswerB

Multi-AZ provides synchronous standby replication and automatic failover within a Region.

Why this answer

Multi-AZ deployment for RDS MySQL automatically provisions and maintains a synchronous standby replica in a different Availability Zone. In the event of an AZ failure, Amazon RDS automatically fails over to the standby, providing high availability with minimal application changes (the application simply reconnects to the same endpoint). This meets the requirement for availability during an AZ outage without requiring code modifications.

Exam trap

The trap here is that candidates often confuse read replicas (which are for read scaling and manual promotion) with Multi-AZ (which provides automatic failover and high availability), leading them to select 'Read replicas only' as a cheaper but incorrect alternative.

How to eliminate wrong answers

Option A is wrong because S3 Cross-Region Replication is designed for object-level replication across AWS regions, not for database high availability within a region, and it does not provide automatic failover for an RDS MySQL database. Option C is wrong because EBS snapshots every hour provide point-in-time backup and recovery, not automatic failover; restoring from a snapshot requires manual intervention and results in data loss for transactions after the last snapshot. Option D is wrong because read replicas only provide read scaling and asynchronous replication; they do not support automatic failover for write operations, and promoting a read replica to a primary requires manual action and potential data loss.

Full explanation →

221

Multi-Selectmedium

A payment worker consumes messages from an Amazon SQS queue. Sometimes the worker finishes the payment creation, but a timeout prevents message deletion and the same payment request is delivered again. Which two design changes best reduce the risk of duplicate charges and keep bad messages from looping forever? Select two.

Select 2 answers

A.Make the payment operation idempotent by storing a unique request identifier before charging.

B.Reduce the visibility timeout so retries happen sooner after each timeout.

C.Move the queue to Amazon SNS so each message is delivered only once.

D.Increase the message retention period so failed payments stay available longer.

E.Configure a dead-letter queue with a redrive policy for messages that exceed the max receive count.

AnswersA, E

Idempotency ensures the same business request cannot create multiple charges if SQS redelivers the message.

Why this answer

Option A is correct because making the payment operation idempotent ensures that even if the same message is processed multiple times due to a timeout, the payment is only charged once. This is typically achieved by storing a unique request identifier (e.g., a UUID or idempotency key) in a database or cache before processing; subsequent duplicate requests with the same identifier are detected and ignored, preventing duplicate charges.

Exam trap

The trap here is that candidates often think reducing the visibility timeout or switching to SNS will solve duplicates, but they fail to recognize that SQS guarantees at-least-once delivery and that SNS does not provide message deduplication; the correct approach is to combine idempotency with a dead-letter queue to handle both duplicate charges and infinite retries.

Full explanation →

222

MCQmedium

Your EC2 instances run in private subnets with no NAT gateway. The instances use the AWS SDK to call STS AssumeRole to obtain temporary credentials for other services. Application logs show errors like: "EndpointConnectionError: Could not connect to https://sts.<region>.amazonaws.com". Which change most directly resolves this while keeping instances private?

A.Create an interface VPC endpoint for STS (com.amazonaws.<region>.sts) and associate it with the instance subnets and a security group that allows HTTPS.

B.Create a gateway VPC endpoint for S3 and route the STS traffic through the S3 endpoint gateway.

C.Open an inbound rule in the instances’ security group to allow outbound HTTPS to the internet CIDR block directly.

D.Attach an Internet Gateway to the private subnet route table so the STS API can be reached over public internet.

AnswerA

Interface endpoints provide private, in-VPC connectivity to AWS APIs like STS without requiring internet access or NAT.

Why this answer

The error indicates that the EC2 instances in private subnets cannot reach the STS public endpoint over the internet because there is no NAT gateway or internet gateway attached to the private subnets. Creating an interface VPC endpoint for STS (com.amazonaws.<region>.sts) allows the instances to communicate with the STS API privately using AWS PrivateLink, without requiring internet access. Associating the endpoint with the instance subnets and a security group that allows HTTPS (port 443) ensures that traffic stays within the AWS network, resolving the connectivity error while keeping the instances private.

Exam trap

The trap here is that candidates often confuse gateway endpoints (for S3/DynamoDB) with interface endpoints (for most other AWS services like STS), or they mistakenly think security group rules alone can enable internet access without a proper routing path.

How to eliminate wrong answers

Option B is wrong because a gateway VPC endpoint for S3 only supports S3 and DynamoDB; it cannot route STS traffic, which requires an interface endpoint (PrivateLink) for API calls. Option C is wrong because opening an outbound rule to the internet CIDR block does not provide a route to the internet; the instances are in private subnets with no NAT gateway or internet gateway, so outbound traffic to the internet is blocked regardless of security group rules. Option D is wrong because attaching an Internet Gateway to the private subnet route table would make the subnets public, violating the requirement to keep instances private; private subnets must not have a default route to an internet gateway.

Full explanation →

223

MCQmedium

A ticket booking system stores uploaded documents in S3. The business requires a copy in another AWS Region for disaster recovery. What should be configured? The architecture review board prefers a managed AWS-native control.

A.S3 lifecycle transition to Glacier Flexible Retrieval

B.An EBS snapshot schedule

C.S3 Cross-Region Replication with versioning enabled

D.A CloudFront distribution

AnswerC

CRR asynchronously replicates objects to a bucket in another Region and requires versioning.

Why this answer

S3 Cross-Region Replication (CRR) is a fully managed AWS-native feature that automatically replicates objects from a source S3 bucket in one AWS Region to a destination bucket in another Region, meeting the disaster recovery requirement for a geographically separate copy. Enabling versioning on both buckets is mandatory for CRR to function, as it tracks object versions and ensures consistency during replication.

Exam trap

The trap here is that candidates often confuse S3 Cross-Region Replication with S3 lifecycle policies or Glacier transitions, mistakenly thinking that moving data to a cheaper storage class in the same region satisfies a disaster recovery requirement for geographic separation.

How to eliminate wrong answers

Option A is wrong because S3 lifecycle transition to Glacier Flexible Retrieval only moves data within the same bucket and region to a colder storage class for cost optimization, not to another AWS Region for disaster recovery. Option B is wrong because EBS snapshot schedules are used for backing up Amazon EBS volumes attached to EC2 instances, not for S3 objects, and they do not provide cross-region replication for S3 data. Option D is wrong because CloudFront is a content delivery network (CDN) that caches data at edge locations for low-latency access, not a replication mechanism to copy data to another AWS Region for disaster recovery.

Full explanation →

224

Multi-Selecteasy

A developer accidentally corrupts part of a production Amazon RDS database, and the issue is discovered 45 minutes later. The team needs to restore the database to the state immediately before the change. Which two actions should be part of the recovery plan? Select two.

Select 2 answers

A.Enable automated backups with a retention period that covers the recovery window.

B.Perform a point-in-time restore to a new database instance.

C.Convert the database to a single-AZ deployment for faster restores.

D.Delete the corrupted rows manually and continue without restoring.

E.Use a read replica as the only recovery source for all deletions.

AnswersA, B

Point-in-time recovery in RDS depends on automated backups and transaction logs. The retention period must include the time before the corruption occurred, otherwise the desired recovery point will not be available.

Why this answer

Option A is correct because automated backups must be enabled to allow point-in-time recovery (PITR) within the retention window. Since the corruption occurred 45 minutes ago, the retention period must cover at least that duration to restore to the state immediately before the change. Option B is correct because PITR restores the database to a specified time (down to the second) within the backup retention period, creating a new DB instance that reflects the state just before the corruption.

Exam trap

The trap here is that candidates may think a read replica can be used for point-in-time recovery, but it only provides read scaling and asynchronous replication, not a restore point before the corruption occurred.

Full explanation →

225

MCQmedium

Account 3000 owns a customer-managed KMS key (key-K). A data processing team in account 4000 needs to decrypt data encrypted with key-K. The role in account 4000 already has an identity policy allowing kms:Decrypt on key-K. Despite this, decrypt requests fail with an AccessDenied error referencing KMS. What is the most likely missing authorization step?

A.Update key-K’s key policy in account 3000 to allow kms:Decrypt for the specific role principal in account 4000.

B.Update the S3 bucket policy to allow kms:Decrypt for account 4000 principals on key-K.

C.Enable AWS managed key rotation on key-K and remove the existing key policy.

D.Switch the access from a role to an IAM user because KMS only supports user principals.

AnswerA

For customer-managed KMS keys, key policy is a required authorization layer. Even with an IAM identity policy granting kms:Decrypt, KMS will deny the request unless the key policy also authorizes the calling principal to use the key for Decrypt.

Why this answer

The correct answer is A because KMS key policies are resource-based policies that must explicitly grant cross-account access. Even though the role in account 4000 has an identity-based policy allowing kms:Decrypt, the key policy in account 3000 (the key owner) must also include a statement that permits the specific role principal from account 4000 to perform kms:Decrypt on key-K. Without this, the KMS service will deny the request due to the lack of a valid authorization path.

Exam trap

The trap here is that candidates assume identity-based policies alone are sufficient for cross-account KMS operations, forgetting that KMS requires an explicit resource-based policy (key policy) grant for the external principal.

How to eliminate wrong answers

Option B is wrong because the S3 bucket policy controls access to S3 objects, not KMS key permissions; the error is specifically from KMS, not S3. Option C is wrong because enabling AWS managed key rotation is not applicable to customer-managed keys (CMKs) and removing the key policy would break all existing permissions, not fix the cross-account issue. Option D is wrong because KMS supports both IAM roles and IAM users as principals; the problem is the missing key policy, not the principal type.

Full explanation →

SAA-C03 (SAA-C03) — Questions 151–225