SAA-C03 SAA-C03 Questions 226–300 | Page 4/14

226

Multi-Selecthard

A latency-sensitive video platform uploads large files to S3 from users around the world. Which two features can improve upload performance?

Select 2 answers

A.S3 Object Lock

B.S3 Transfer Acceleration

C.S3 multipart upload

D.S3 Inventory

AnswersB, C

Transfer Acceleration uses optimized edge paths into AWS for long-distance S3 transfers.

Why this answer

S3 Transfer Acceleration (B) uses AWS edge locations to route uploads over optimized network paths, reducing latency and packet loss for global users. S3 multipart upload (C) allows large files to be uploaded in parallel parts, improving throughput and enabling retries of individual parts without restarting the entire upload.

Exam trap

The trap here is that candidates may confuse S3 Transfer Acceleration with CloudFront or think multipart upload is only for resumability, when in fact both features directly address latency and throughput for global, large-file uploads.

Full explanation →

227

MCQmedium

A development team expects their EC2 utilization to average about 40% of capacity across the next year. They want to lower costs but need flexibility to change instance families and sizes as requirements evolve (for example, moving from compute-optimized to memory-optimized instances). Which AWS purchasing commitment best meets the goal of reducing cost while keeping flexibility?

A.Compute Savings Plans, sized to the expected average usage, because they provide savings across instance families and usage types.

B.All Upfront EC2 Instance Reserved Instances for a single instance family to maximize discount.

C.Spot Instances for the entire workload so they can avoid commitments entirely.

D.On-Demand Instances with increased Auto Scaling to match the peak month only.

AnswerA

Compute Savings Plans provide a discount for a consistent amount of EC2 (and related covered usage) in a region while allowing flexibility to change instance families and sizes within the covered scope. Because the team’s requirements may evolve and they primarily need to manage average utilization (40% baseline), Compute Savings Plans match both the cost-reduction goal and the flexibility requirement better than instance-specific commitments.

Why this answer

Compute Savings Plans offer the best balance of cost reduction and flexibility for this scenario. They provide up to 66% savings in exchange for a commitment to a consistent amount of compute usage (measured in $/hour), but unlike Reserved Instances, they automatically apply to any EC2 instance family, size, OS, or region (within a given AWS region). This allows the team to switch from compute-optimized to memory-optimized instances as needs evolve without losing the discount, directly meeting the requirement for flexibility while lowering costs.

Exam trap

The trap here is that candidates often confuse Reserved Instances (which lock to a specific instance family) with Savings Plans (which offer cross-family flexibility), leading them to choose Option B for the higher discount without considering the flexibility requirement.

How to eliminate wrong answers

Option B is wrong because All Upfront EC2 Instance Reserved Instances lock the team into a single instance family (e.g., C5) and size, which eliminates the flexibility to change instance families as requirements evolve. Option C is wrong because Spot Instances can be terminated by AWS with only a 2-minute warning if capacity is reclaimed, making them unsuitable for a steady-state workload that expects 40% average utilization across the year; they also do not provide a guaranteed cost commitment. Option D is wrong because On-Demand Instances with increased Auto Scaling to match the peak month only does not reduce costs for the average 40% utilization; it actually increases costs by paying full On-Demand rates for all usage, and Auto Scaling alone does not provide a discount.

Full explanation →

228

Multi-Selectmedium

A company runs a web application on Amazon EC2 instances behind an Application Load Balancer. The workload is predictable during business hours but has low usage at night. Which three options can reduce costs without compromising performance? (Choose three.)

Select 3 answers

.Use a single large EC2 instance to handle all traffic, reducing the number of running instances.

.Purchase Compute Savings Plans to cover the baseline EC2 usage during business hours.

.Configure a scheduled auto scaling policy to reduce the number of instances during off-peak hours.

.Replace the Application Load Balancer with a Network Load Balancer, which is cheaper per hour.

.Use EC2 Spot Instances for the entire workload to achieve the lowest possible cost.

.Implement a scaling policy based on CPU utilization to right-size the fleet dynamically.

Why this answer

Compute Savings Plans offer significant discounts (up to 66%) compared to On-Demand pricing in exchange for a commitment to a consistent amount of compute usage (measured in $/hour) over a 1- or 3-year term. They apply to EC2 instances, AWS Fargate, and Lambda, making them ideal for covering the predictable baseline workload during business hours. This reduces costs without affecting performance because the instances continue to run at full capacity.

Exam trap

The trap here is that candidates often confuse cost-reduction strategies that sacrifice reliability (like using Spot Instances for all traffic) with those that maintain performance, or they incorrectly assume a cheaper load balancer type can replace application-layer features without consequence.

Full explanation →

229

MCQmedium

A high-volume analytics dashboard writes streaming click events that must be processed by multiple independent consumers. Which service is most appropriate? The architecture review board prefers a managed AWS-native control.

A.Amazon Route 53

B.Amazon EBS

C.Amazon Kinesis Data Streams

D.AWS DataSync

AnswerC

Kinesis Data Streams supports high-throughput event ingestion with multiple consumers reading from the stream.

Why this answer

Amazon Kinesis Data Streams is the most appropriate service because it is a managed, AWS-native solution designed for real-time streaming data ingestion and processing. It durably stores records for up to 365 days and allows multiple independent consumers (e.g., Lambda, Kinesis Data Analytics, EC2) to read from the same stream concurrently using enhanced fan-out or shared throughput, meeting the requirement for high-volume click event processing.

Exam trap

The trap here is that candidates may confuse Amazon Kinesis Data Streams with Amazon Kinesis Data Firehose, but Firehose is a near-real-time delivery service that does not support multiple independent consumers reading the same data stream concurrently.

How to eliminate wrong answers

Option A is wrong because Amazon Route 53 is a DNS web service that translates domain names to IP addresses; it does not ingest, store, or process streaming data. Option B is wrong because Amazon EBS provides block-level storage volumes for EC2 instances; it is not a streaming data service and cannot support multiple independent consumers reading a continuous data stream. Option D is wrong because AWS DataSync is a data transfer service for moving large datasets between on-premises storage and AWS services (e.g., S3, EFS) over the network; it is designed for batch transfers, not real-time streaming or concurrent consumer processing.

Full explanation →

230

MCQhard

Based on the exhibit, an application role in Account B can reach an S3 bucket in Account A, but reads fail with AccessDenied on KMS. The bucket objects use SSE-KMS with a customer managed key in Account A. What change is required so the application can decrypt the objects while keeping the access restricted?

A.Add the Account B role ARN to the KMS key policy with kms:Decrypt and kms:DescribeKey permissions, scoped to S3 usage in us-east-1.

B.Add s3:GetEncryptionConfiguration to the Account B IAM policy so S3 can use the customer managed key on reads.

C.Change the bucket to SSE-S3 because SSE-S3 always allows cross-account reads without any KMS policy changes.

D.Add the Account B role to the bucket ACL with FULL_CONTROL so S3 can bypass KMS on behalf of the reader.

AnswerA

S3 object retrieval with SSE-KMS requires that KMS authorize decryption, and that authorization must exist in the key policy for a CMK in another account. Scoping the statement to the specific role and S3 usage keeps the access narrow while allowing the object read to succeed.

Why this answer

Option A is correct because when using SSE-KMS with a customer managed key, cross-account access requires the KMS key policy to explicitly grant the external IAM role (from Account B) the kms:Decrypt and kms:DescribeKey permissions. Without these, S3 can retrieve the encrypted object, but KMS will deny the decryption request, resulting in an AccessDenied error. Scoping the policy to S3 usage in us-east-1 follows the principle of least privilege while enabling the necessary decryption.

Exam trap

The trap here is that candidates often focus only on the S3 bucket policy or IAM permissions, forgetting that SSE-KMS with a customer managed key requires explicit cross-account grants in the KMS key policy, not just in S3 or IAM policies.

How to eliminate wrong answers

Option B is wrong because s3:GetEncryptionConfiguration is a read-only permission that retrieves the bucket's encryption configuration, not a permission that allows S3 to use the KMS key for decryption; it does not grant any KMS decrypt rights. Option C is wrong because changing the bucket to SSE-S3 would remove the KMS requirement, but it violates the requirement to keep access restricted and does not address the existing SSE-KMS setup; moreover, SSE-S3 does not inherently allow cross-account reads without proper bucket policies. Option D is wrong because bucket ACLs do not interact with KMS; granting FULL_CONTROL via ACL cannot bypass KMS decryption permissions, as S3 still needs to call KMS on behalf of the reader, which requires explicit KMS key policy grants.

Full explanation →

231

Multi-Selecthard

A retail analytics table stores events in Amazon DynamoDB with partition key tenantId and sort key eventTime. During a promotion, one tenant generates most writes and repeatedly polls the same latest-status items, causing throttling on a single partition key and high latency on reads. The business can tolerate read results that are a few seconds stale. Which two changes will most effectively reduce throttling and latency? Select two.

Select 2 answers

A.Introduce write sharding by adding a bounded random suffix to the hot tenant partition key and fan out reads across the shards.

B.Add DynamoDB Accelerator (DAX) in front of the table for the repeated status reads.

C.Keep the same key design and increase only the table’s provisioned RCUs and WCUs.

D.Replace the table reads with a Scan operation to distribute the load across all partitions.

E.Move the table to another Availability Zone so the hot tenant uses a different storage node.

AnswersA, B

Sharding spreads the hot tenant’s traffic across multiple partitions so DynamoDB is no longer forced to serve all writes through one physical partition. Querying across the shard set restores access to the tenant’s data while reducing throttling. This is the standard fix when a single partition key becomes a hot spot.

Why this answer

Option A is correct because write sharding distributes the hot tenant's writes across multiple partitions by appending a bounded random suffix to the partition key, preventing a single partition from throttling. Reads then fan out across all shards and aggregate results, which is acceptable since the business tolerates a few seconds of staleness. This directly addresses the single-partition bottleneck without changing the overall data model.

Exam trap

The trap here is that candidates often assume DAX alone can fix both read and write throttling, but DAX only caches reads and does not address the write-side partition bottleneck that causes throttling in the first place.

Full explanation →

232

MCQhard

A batch analytics job currently uses two NAT gateways in each of three Availability Zones, but only one private subnet per AZ needs outbound internet access. What should the architect review first? The architecture review board prefers a managed AWS-native control.

A.Replacing every NAT gateway with an internet gateway attached to private subnets

B.Whether one NAT gateway per AZ is sufficient for the required private subnets

C.Disabling route tables

D.Moving all workloads to public subnets

AnswerB

NAT gateways are normally deployed per AZ for resilience; duplicate NAT gateways in the same AZ may be unnecessary.

Why this answer

The architecture currently uses two NAT gateways per AZ, but only one private subnet per AZ requires outbound internet access. Since a single NAT gateway in an AZ can serve all private subnets in that AZ via the route table, the first step is to verify whether one NAT gateway per AZ is sufficient for the required throughput and availability. This aligns with cost optimization by eliminating unnecessary NAT gateway hourly charges and data processing fees.

Exam trap

The trap here is that candidates assume more NAT gateways automatically mean better availability or performance, overlooking that a single NAT gateway per AZ is often sufficient and that cost optimization should be the first review priority when multiple gateways are deployed per AZ.

How to eliminate wrong answers

Option A is wrong because an internet gateway (IGW) cannot be attached to private subnets; IGWs are used for public subnets and direct inbound/outbound internet access, not for outbound-only access from private subnets. Option C is wrong because disabling route tables would break all network connectivity, not just outbound internet access, and is not a valid optimization technique. Option D is wrong because moving all workloads to public subnets would expose them directly to the internet, violating security best practices and the requirement for a managed AWS-native control.

Full explanation →

233

Multi-Selecthard

A media company serves versioned JavaScript and CSS files from Amazon S3 through CloudFront. After each release, the cache hit ratio drops sharply because the same distribution also fronts a personalized API path, and the current cache policy forwards cookies, all query strings, and several headers to every origin request. The static assets already use content-hashed filenames. Which two changes will most directly improve cache hit ratio for the static assets without changing the application behavior? Select two.

Select 2 answers

A.Create a dedicated cache behavior for the static asset path that excludes cookies, query strings, and unneeded headers from the cache key.

B.Keep the content-hashed filenames and send long Cache-Control max-age and immutable headers for the versioned objects.

C.Increase the size of the S3 bucket’s underlying storage to absorb more origin traffic.

D.Add Lambda@Edge logic to append a timestamp to every asset request so updates are always fetched immediately.

E.Disable compression so CloudFront can treat each object as a separate cache entry.

AnswersA, B

Separating the static asset behavior lets CloudFront cache those objects independently from the personalized API. Excluding cookies, query strings, and unnecessary headers prevents cache fragmentation, so many viewers can reuse the same cached object. This is the most direct way to raise hit ratio without altering how the application serves assets.

Why this answer

Option A is correct because creating a dedicated cache behavior for the static asset path (e.g., /static/*) allows you to configure a cache policy that excludes cookies, query strings, and unneeded headers from the cache key. Since the static assets use content-hashed filenames, they are immutable and do not vary by user-specific attributes. By removing these variables from the cache key, CloudFront can serve the same cached object to all users, drastically improving the cache hit ratio.

Exam trap

The trap here is that candidates may think that content-hashed filenames alone guarantee high cache hit ratios, but they overlook that the shared cache policy forwarding cookies and query strings creates many unique cache keys for the same static file, negating the benefit of hashed filenames.

Full explanation →

234

MCQmedium

Company A runs an internal app in account A. The app needs to upload objects to an S3 bucket in account B. When the app calls S3, it receives AccessDenied for s3:PutObject. The team already created an IAM role in account B named UploadRole with a policy allowing s3:PutObject. They did not yet set up any trust relationship. Which change most directly fixes the access problem with least privilege?

A.Create IAM user access keys in account A and attach the UploadRole policy directly to those keys.

B.Update the trust policy on UploadRole (account B) to allow sts:AssumeRole from the app’s IAM role or principal in account A.

C.Add s3:PutObject permissions to the bucket policy in account B for all principals in account A.

D.Attach an SCP (service control policy) in AWS Organizations to deny sts:AssumeRole unless the caller uses an MFA device.

AnswerB

A cross-account role requires both an IAM permissions policy and a trust policy. The trust policy must allow the specific principal in account A to call sts:AssumeRole into account B’s role. With that trust in place, the app can obtain temporary credentials and then use the UploadRole permissions for s3:PutObject.

Why this answer

The app in account A needs to assume the UploadRole in account B to gain s3:PutObject permissions. Without a trust policy on UploadRole that allows sts:AssumeRole from the app's IAM principal in account A, the role cannot be assumed, and the S3 PutObject call fails with AccessDenied. Updating the trust policy is the most direct fix and follows least privilege by granting only the necessary cross-account role assumption.

Exam trap

The trap here is that candidates often think bucket policies alone can grant cross-account access without considering the need for role assumption and trust policies, leading them to choose Option C as a simpler but overly permissive solution.

How to eliminate wrong answers

Option A is wrong because attaching the UploadRole policy directly to IAM user access keys in account A would create long-term credentials and violate least privilege, and the policy is defined in account B and cannot be attached to account A users; cross-account access requires role assumption, not direct policy attachment. Option C is wrong because adding s3:PutObject to the bucket policy for all principals in account A is overly permissive and does not leverage the existing UploadRole, violating least privilege by granting blanket access to the entire account A. Option D is wrong because an SCP denying sts:AssumeRole unless MFA is used would block the legitimate cross-account role assumption needed to fix the access problem, making the issue worse.

Full explanation →

235

MCQmedium

Your media processing pipeline writes original uploads to an S3 bucket and later generates derivative files. An operator accidentally deletes a subset of original uploads in production. You need to (1) restore the deleted objects with minimal data loss and (2) protect against both regional disasters and future operator mistakes. The company requires recovery even if objects are deleted and later overwritten. What is the most effective change to meet these requirements?

A.Enable S3 versioning on the bucket and configure cross-Region replication so previous versions are available after regional loss and accidental deletion.

B.Move all objects to S3 Glacier Instant Retrieval and apply a lifecycle policy to keep only the latest object copy.

C.Use S3 server-side encryption with KMS keys and rely on access logs to manually recover the deleted objects.

D.Enable S3 bucket policies that deny DeleteObject, but do not enable versioning or replication.

AnswerA

Versioning retains prior object versions, and cross-Region replication provides redundancy across Regions for recovery after deletion or disaster.

Why this answer

Option A is correct because enabling S3 Versioning preserves all object versions, including overwrites and deletions (which become delete markers), allowing you to restore deleted objects by removing the delete marker. Cross-Region Replication (CRR) replicates both current and previous versions to a secondary Region, protecting against regional disasters. Together, they ensure recovery even if objects are deleted and later overwritten, meeting all requirements.

Exam trap

The trap here is that candidates may think a bucket policy denying DeleteObject is sufficient to prevent data loss, but it does not protect against overwrites, authorized user mistakes, or regional disasters, and without versioning, deleted objects are permanently lost.

How to eliminate wrong answers

Option B is wrong because moving objects to S3 Glacier Instant Retrieval does not provide versioning or replication; a lifecycle policy that keeps only the latest copy would permanently lose previous versions and deleted objects, failing the recovery requirement. Option C is wrong because S3 server-side encryption with KMS keys does not protect against deletion or overwrite; access logs only record events, they do not restore deleted objects, and manual recovery from logs is impractical and not guaranteed. Option D is wrong because a bucket policy denying DeleteObject can be bypassed by authorized users (e.g., operators with elevated permissions) and does not protect against overwrites or regional disasters; without versioning or replication, deleted objects are unrecoverable.

Full explanation →

236

Matchingmedium

Match the disaster recovery strategy to the recovery posture it best fits for a Regional outage.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts

Matches

Lowest cost option where the environment is rebuilt from backups and hours of downtime are acceptable.

Keep only the critical core running in the secondary Region, then scale out after failover.

Run a scaled-down but functional environment in another Region for faster cutover.

Serve production traffic from more than one Region at the same time for the fastest recovery.

Why these pairings

The pairs match disaster recovery strategies to their typical recovery posture for a regional outage, based on AWS Well-Architected Framework and common DR patterns.

Full explanation →

237

MCQhard

A claims portal must ensure that only encrypted EBS volumes can be created in the account. What is the strongest preventive control? The design must avoid adding custom operational scripts.

A.Tag encrypted volumes after creation

B.Enable VPC Flow Logs

C.Use an SCP that denies ec2:CreateVolume when the encrypted condition is false

D.Run a daily Lambda function to encrypt unencrypted volumes

AnswerC

An SCP can prevent noncompliant volume creation across accounts in an organization.

Why this answer

Option C is correct because an SCP (Service Control Policy) at the AWS Organizations level can deny the ec2:CreateVolume API call when the encryption condition (aws:RequestTag/Encrypted or ec2:Encrypted) is false. This is a preventive control that blocks the creation of unencrypted volumes before they exist, and it requires no custom operational scripts, aligning with the design constraint.

Exam trap

The trap here is that candidates often confuse detective or corrective controls (like tagging or Lambda remediation) with preventive controls, or they mistakenly think VPC Flow Logs can enforce encryption policies.

How to eliminate wrong answers

Option A is wrong because tagging encrypted volumes after creation is a detective or corrective control, not preventive; it does not stop unencrypted volumes from being created. Option B is wrong because VPC Flow Logs capture network traffic metadata and have no ability to enforce encryption policies on EBS volumes. Option D is wrong because running a daily Lambda function to encrypt unencrypted volumes is a reactive corrective control that relies on custom operational scripts, violating the 'avoid adding custom operational scripts' requirement.

Full explanation →

238

MCQhard

A patient portal must use shared file storage across Linux EC2 instances in multiple Availability Zones. The storage must remain available during an AZ failure. Which service should be used? The design must avoid adding custom operational scripts.

A.Instance store volumes

B.Amazon EFS with mount targets in multiple Availability Zones

C.An EBS volume attached to all instances

D.S3 mounted as a POSIX file system without a file gateway

AnswerB

EFS is regional file storage and supports mount targets across AZs.

Why this answer

Amazon EFS provides a fully managed, scalable, and shared file system that can be mounted concurrently on multiple Linux EC2 instances across different Availability Zones. By creating mount targets in each AZ, the file system remains accessible even if one AZ fails, meeting the high availability requirement without custom scripts.

Exam trap

The trap here is that candidates may confuse EBS multi-attach (which is limited to specific instance types and a single AZ) with a true cross-AZ shared file system, or assume S3 with a FUSE mount provides POSIX compliance without operational overhead.

How to eliminate wrong answers

Option A is wrong because instance store volumes are ephemeral, tied to a single EC2 instance, and data is lost on instance stop or termination, making them unsuitable for shared, durable storage across AZs. Option C is wrong because a single EBS volume can only be attached to one EC2 instance at a time (except for multi-attach EBS, which is limited to specific instance types and not designed for cross-AZ shared file storage). Option D is wrong because S3 mounted as a POSIX file system (e.g., via s3fs) requires custom scripts and does not provide native POSIX consistency or locking, and using it without a file gateway introduces performance and reliability issues for shared file storage.

Full explanation →

239

Multi-Selectmedium

A development team stores application logs in Amazon CloudWatch Logs and has enabled detailed EC2 monitoring on every instance. Auditors require the logs to be retained for 90 days, but the operations team only needs the last 7 days to remain searchable in CloudWatch. Which two actions should they take to reduce monitoring cost? Select two.

Select 2 answers

A.Set the CloudWatch Logs retention period to 7 days and export or archive older logs in Amazon S3 or S3 Glacier for the remaining retention period.

B.Disable detailed EC2 monitoring and rely on basic monitoring unless a one-minute metric collection interval is specifically required.

C.Set the CloudWatch Logs retention period to 90 days so everything stays searchable in CloudWatch.

D.Send logs to DynamoDB because it is cheaper for long-term retention.

E.Enable detailed monitoring only during peak business hours on every instance.

AnswersA, B

CloudWatch Logs is the expensive searchable tier, so keeping only the last 7 days there reduces stored log volume and ongoing cost. Older logs can be archived to S3 or S3 Glacier to satisfy the 90-day retention requirement without paying CloudWatch prices for data that is rarely queried.

Why this answer

Option A is correct because CloudWatch Logs charges for data storage and search capabilities. By setting the retention period to 7 days, logs older than 7 days are automatically deleted from CloudWatch, eliminating ongoing storage and search costs. Exporting or archiving these logs to Amazon S3 or S3 Glacier satisfies the 90-day audit requirement at a much lower cost, as S3 and Glacier have minimal storage fees and no search costs.

Exam trap

The trap here is that candidates may think keeping logs searchable in CloudWatch for the full 90 days is simpler and cost-effective, overlooking the fact that CloudWatch Logs storage and search costs far exceed S3/Glacier for long-term retention, and that detailed monitoring is an independent cost driver unrelated to log retention.

Full explanation →

240

MCQmedium

A high-volume analytics dashboard writes streaming click events that must be processed by multiple independent consumers. Which service is most appropriate? The design must avoid adding custom operational scripts.

A.Amazon Route 53

B.Amazon EBS

C.Amazon Kinesis Data Streams

D.AWS DataSync

AnswerC

Kinesis Data Streams supports high-throughput event ingestion with multiple consumers reading from the stream.

Why this answer

Amazon Kinesis Data Streams is the correct choice because it is designed for real-time streaming data ingestion and processing by multiple independent consumers. Each consumer can read from the stream at its own pace using its own shard iterator, enabling parallel processing of click events without custom scripts. This aligns with the requirement for a high-volume analytics dashboard where multiple downstream applications need to consume the same stream independently.

Exam trap

The trap here is that candidates may confuse Kinesis Data Streams with Kinesis Data Firehose, but Firehose delivers data to a single destination and does not support multiple independent consumers natively.

How to eliminate wrong answers

Option A is wrong because Amazon Route 53 is a DNS and traffic management service, not designed for streaming data ingestion or processing. Option B is wrong because Amazon EBS provides block-level storage volumes for EC2 instances, not a streaming data platform for multiple consumers. Option D is wrong because AWS DataSync is a data transfer service for moving large datasets between on-premises storage and AWS services, not for real-time streaming or multi-consumer processing.

Full explanation →

241

MCQhard

A document portal needs low-latency full-text search across product descriptions and filtered attributes. Which managed service is most suitable?

A.Amazon OpenSearch Service

B.AWS Config

C.Amazon EFS

D.Amazon SQS

AnswerA

OpenSearch is designed for search and analytics over indexed text and structured fields.

Why this answer

Amazon OpenSearch Service is purpose-built for full-text search, log analytics, and real-time application monitoring. It provides low-latency indexing and querying of unstructured and semi-structured data, making it ideal for searching product descriptions and filtering attributes. The service uses a RESTful API and supports advanced query DSL for complex search operations.

Exam trap

The trap here is that candidates may confuse a storage service (EFS) or a messaging service (SQS) with a search service, or mistakenly think AWS Config can be used for text search because it stores configuration data in a queryable format.

How to eliminate wrong answers

Option B (AWS Config) is wrong because it is a resource inventory and compliance auditing service, not a search engine; it cannot perform full-text search across product descriptions. Option C (Amazon EFS) is wrong because it is a scalable file storage service for Linux-based workloads, not a search or indexing service; it provides shared file access but no search capabilities. Option D (Amazon SQS) is wrong because it is a fully managed message queuing service for decoupling application components, not a search service; it cannot index or query text data.

Full explanation →

242

MCQeasy

Based on the exhibit, the database must fail over automatically if the primary Availability Zone goes down. Which solution should the architect choose?

A.Create a read replica in the same Availability Zone as the primary database.

B.Convert the database to a Multi-AZ RDS deployment.

C.Increase the backup retention period to 35 days.

D.Move the database to an EC2 instance with an attached EBS volume.

AnswerB

A Multi-AZ RDS deployment keeps a synchronous standby in another Availability Zone and automatically fails over when the primary fails. This matches the requirement for minimal manual intervention and preserves the same database endpoint, so the application does not need connection string changes. It is the standard AWS choice for resilient relational databases.

Why this answer

A Multi-AZ RDS deployment automatically synchronously replicates data to a standby instance in a different Availability Zone. If the primary AZ fails, Amazon RDS automatically performs a failover to the standby, ensuring high availability without manual intervention. This meets the requirement for automatic failover when the primary AZ goes down.

Exam trap

The trap here is that candidates often confuse read replicas (which are for read scaling and require manual promotion) with Multi-AZ deployments (which provide automatic failover), leading them to incorrectly select Option A.

How to eliminate wrong answers

Option A is wrong because a read replica in the same AZ does not provide automatic failover; it is designed for read scaling and requires manual promotion to become a primary. Option C is wrong because increasing the backup retention period to 35 days only affects point-in-time recovery duration, not failover capability. Option D is wrong because an EC2 instance with an attached EBS volume requires custom scripting or additional services (e.g., Auto Scaling, Elastic IP reassignment) to achieve automatic failover, and does not provide the managed, synchronous replication of Multi-AZ RDS.

Full explanation →

243

MCQmedium

A company stores RDS database credentials in AWS Systems Manager Parameter Store as SecureString parameters. The security team requires that database passwords rotate automatically every 30 days. Which change should a solutions architect recommend?

A.Create a scheduled EventBridge rule to invoke a Lambda function that updates the Parameter Store SecureString value every 30 days

B.Migrate the credentials to AWS Secrets Manager and enable automatic rotation with a 30-day schedule

C.Enable Parameter Store SecureString automatic rotation in the AWS console

D.Configure AWS Config to detect password age and trigger an SNS notification after 30 days

AnswerB

Secrets Manager provides native automatic rotation for RDS with a managed Lambda function. This meets the requirement with minimal operational overhead.

Why this answer

AWS Secrets Manager provides native automatic rotation for RDS credentials using a managed Lambda function that rotates the secret on a defined schedule and updates the database password atomically.

Parameter Store SecureString does not support built-in automatic rotation — rotation must be implemented manually with custom automation. Secrets Manager is specifically designed for secrets requiring lifecycle management including rotation, auditing, and fine-grained access control.

Exam trap

Both services encrypt values using KMS, which causes candidates to treat them as equivalent. Only Secrets Manager provides automatic rotation with managed Lambda integration and rotation history. Parameter Store is appropriate for configuration values and static secrets.

Whenever automatic rotation is a security policy requirement, Secrets Manager is the answer.

Why the other options are wrong

Creating a custom EventBridge rule + Lambda for rotation works but requires development and maintenance effort. It lacks native rotation history and is more complex than the purpose-built Secrets Manager solution.

There is no built-in automatic rotation toggle in Parameter Store. This feature does not exist in the Parameter Store console — automatic rotation is a Secrets Manager capability.

AWS Config detects and alerts on compliance drift but cannot automatically rotate a database password. SNS notification is a detection mechanism, not a remediation mechanism.

Full explanation →

244

MCQhard

Based on the exhibit, an application in the same AWS account can upload and read objects in an S3 bucket encrypted with a customer managed KMS key, but GetObject fails with an AccessDenied error from AWS KMS. The IAM role already has s3:GetObject, s3:PutObject, kms:Decrypt, and kms:GenerateDataKey permissions. What change most directly fixes the issue while preserving least privilege?

A.Add an S3 bucket ACL that grants the application role full control over objects.

B.Update the KMS key policy to allow the application role to use the key, ideally with a kms:ViaService condition for S3.

C.Replace the customer managed key with the AWS managed S3 key so IAM permissions become sufficient.

D.Add an S3 bucket policy that grants s3:GetObject and s3:PutObject to the role for all objects.

AnswerB

KMS key policy must explicitly trust the principal. Adding a role-scoped statement with kms:ViaService keeps access limited to S3 use only.

Why this answer

The error is an AccessDenied from AWS KMS, not from S3, which means the IAM role has the required S3 permissions (s3:GetObject) and KMS API permissions (kms:Decrypt), but the KMS key policy does not explicitly grant the role access to the key. Since customer managed KMS keys require a key policy to grant IAM principals permission to use the key (IAM policies alone are insufficient unless the key policy delegates such authority), updating the key policy to allow the role with a kms:ViaService condition for S3 directly resolves the KMS-side denial while preserving least privilege.

Exam trap

The trap here is that candidates see 'AccessDenied' and assume the S3 bucket policy or ACL is missing, when the error message explicitly states it is from AWS KMS, meaning the fix must be at the KMS key policy level, not the S3 resource policy.

How to eliminate wrong answers

Option A is wrong because S3 bucket ACLs control access to S3 objects themselves, not KMS key permissions; the error is from KMS, not S3, so an ACL cannot fix a KMS AccessDenied. Option C is wrong because switching to the AWS managed S3 key (SSE-S3) would remove the need for KMS permissions entirely, but it changes the encryption type and does not preserve the use of a customer managed key as required by the scenario; it also violates least privilege by removing control over the key. Option D is wrong because the IAM role already has s3:GetObject and s3:PutObject permissions, and the error is from KMS, not S3; adding a bucket policy for the same S3 actions does not address the missing KMS key policy grant.

Full explanation →

245

MCQmedium

A global application experiences frequent writes and must survive a full Regional outage with near-zero data loss. The product team also requires that users can continue to write during the incident using the closest Region. Which approach is most aligned with these requirements?

A.Use an active/active design with multi-Region data replication (for example, global tables for the write-heavy datastore) and route traffic to multiple Regions based on health and latency.

B.Use warm standby with periodic backups of the primary write datastore every 24 hours.

C.Use pilot light where the secondary Region runs only infrastructure templates and starts data replication only after detecting failure.

D.Use a single-writer model in one Region and deploy read-only replicas in the other Region for continuity.

AnswerA

Active/active supports writing in multiple Regions and reduces the blast radius of a Regional failure while enabling continued operations.

Why this answer

Option A is correct because an active/active design with multi-Region data replication, such as DynamoDB global tables, allows writes to occur in any Region and replicates data asynchronously across Regions with sub-second latency. This ensures near-zero data loss (RPO of seconds) and continuous write availability during a full Regional outage, while Route 53 latency-based routing directs users to the closest healthy Region.

Exam trap

The trap here is that candidates often confuse 'read-only replicas' (which cannot accept writes) with 'multi-Region write replicas' (which can), leading them to choose Option D despite its inability to support writes during an outage.

How to eliminate wrong answers

Option B is wrong because warm standby with 24-hour periodic backups cannot achieve near-zero data loss; the RPO would be up to 24 hours, and writes would stop during failover. Option C is wrong because pilot light starts data replication only after failure detection, leading to minutes of data loss and write unavailability during the replication setup. Option D is wrong because a single-writer model with read-only replicas prevents writes during a Regional outage, violating the requirement that users continue to write during the incident.

Full explanation →

246

MCQmedium

A log archive serves infrequently accessed user documents that must be available immediately when requested. Which S3 storage class is likely the best cost fit?

A.Instance store volumes

B.S3 Standard-IA or S3 One Zone-IA depending on resilience requirements

C.S3 Standard for all objects

D.S3 Glacier Deep Archive

AnswerB

Infrequent Access classes reduce storage cost while keeping millisecond retrieval.

Why this answer

S3 Standard-IA or S3 One Zone-IA is the best cost fit because the workload involves infrequently accessed data that requires immediate retrieval (millisecond latency). Standard-IA offers lower storage cost than S3 Standard while maintaining high durability and low-latency access, and One Zone-IA provides even lower cost for data that can tolerate a single-AZ failure. Both classes meet the 'available immediately' requirement, unlike Glacier tiers which have retrieval delays.

Exam trap

The trap here is that candidates often confuse 'infrequently accessed' with 'archival' and choose Glacier Deep Archive, forgetting that the requirement for immediate availability eliminates any Glacier tier due to its retrieval delays.

How to eliminate wrong answers

Option A is wrong because instance store volumes are ephemeral block storage attached to EC2 instances, not an S3 storage class, and they lose data on instance stop/termination, making them unsuitable for durable log archives. Option C is wrong because S3 Standard is designed for frequently accessed data with higher storage cost per GB, leading to unnecessary expense for infrequently accessed logs. Option D is wrong because S3 Glacier Deep Archive has retrieval times of 12–48 hours, which violates the 'available immediately' requirement.

Full explanation →

247

MCQeasy

A company stores user uploads in an S3 bucket. Objects are accessed rarely after upload, but when an object is accessed, it must be retrievable quickly (minutes to a few hours). Objects must be retained for at least 18 months. The team wants to reduce storage cost while meeting these requirements. Which lifecycle configuration best fits these requirements?

A.Keep all objects in S3 Standard permanently to avoid lifecycle transition fees.

B.After 30 days, transition objects to S3 Glacier Instant Retrieval, and after 18 months, expire (delete) the objects.

C.After 30 days, transition objects to S3 Intelligent-Tiering, and set expiration to 12 months.

D.After 30 days, transition objects to S3 Glacier Deep Archive, and set expiration to 18 months.

AnswerB

The prompt requires (1) cost reduction for data that becomes infrequently accessed and (2) quick retrieval when accessed again, and (3) a minimum retention of at least 18 months. Glacier Instant Retrieval is intended for data that is accessed occasionally and needs fast retrieval. Transitioning after 30 days moves the long-term, rarely accessed portion of the data to a cheaper class, while expiring at 18 months satisfies the explicit retention requirement (the objects remain for at least 18 months).

Why this answer

Option B is correct because it transitions objects to S3 Glacier Instant Retrieval after 30 days, which provides millisecond retrieval for rarely accessed data, meeting the quick retrieval requirement. The 18-month expiration ensures compliance with the retention policy while minimizing storage costs compared to keeping data in S3 Standard.

Exam trap

The trap here is that candidates may confuse retrieval time requirements: S3 Glacier Deep Archive is cheaper but has retrieval times of hours, not minutes, and S3 Intelligent-Tiering is for unpredictable access, not for data that is rarely accessed after upload.

How to eliminate wrong answers

Option A is wrong because keeping all objects in S3 Standard permanently ignores the cost-saving opportunity of lifecycle transitions; S3 Standard is more expensive for rarely accessed data, and there are no lifecycle transition fees for moving to colder storage classes. Option C is wrong because S3 Intelligent-Tiering is designed for unpredictable access patterns, not for data that is rarely accessed after upload, and setting expiration to 12 months violates the 18-month retention requirement. Option D is wrong because S3 Glacier Deep Archive has retrieval times of 12-48 hours, which does not meet the requirement of retrievable within minutes to a few hours.

Full explanation →

248

MCQeasy

An organization hosts the same public API in two AWS Regions. Normal traffic should go to the primary Region. If the primary endpoint becomes unhealthy, Route 53 should automatically route users to the secondary Region. What is the best Route 53 configuration approach?

A.Use simple routing with one record that contains both regions as weighted targets.

B.Use weighted routing and set the secondary Region weight to 0 until needed.

C.Use Route 53 failover routing with health checks that mark the primary as unhealthy and fail over to the secondary.

D.Use latency-based routing so requests go to the region with the lowest latency, regardless of health.

AnswerC

Failover routing is designed for active/passive disaster recovery. You configure a primary record and a secondary record, each associated with health checks. When the primary fails its health checks, Route 53 automatically resolves the name to the secondary target.

Why this answer

Route 53 failover routing is designed for active-passive configurations where traffic is directed to a primary resource unless a health check marks it as unhealthy, at which point all traffic automatically shifts to the secondary resource. This directly matches the requirement of routing normal traffic to the primary Region and failing over to the secondary Region only when the primary endpoint becomes unhealthy.

Exam trap

The trap here is that candidates often confuse weighted routing with failover routing, mistakenly thinking that setting a weight of 0 on the secondary is a valid way to keep it inactive until needed, but Route 53 does not automatically adjust weights based on health checks.

How to eliminate wrong answers

Option A is wrong because simple routing does not support health checks or automatic failover; it simply returns all IP addresses in a random order, which cannot enforce a primary-secondary failover pattern. Option B is wrong because setting the secondary Region weight to 0 would prevent any traffic from reaching it even during a failure, and manually changing weights defeats the purpose of automatic failover. Option D is wrong because latency-based routing selects the Region with the lowest latency for each user, which does not guarantee that the primary Region handles normal traffic and does not automatically fail over based on endpoint health.

Full explanation →

249

Multi-Selectmedium

A team is splitting a new workload into two fronts. The first front serves HTTPS microservices that need host- and path-based routing plus health checks. The second front must handle TCP and UDP traffic for a real-time service and preserve static IP addresses for firewall allowlisting. Which two AWS load balancer choices best match these requirements? Select two.

Select 2 answers

A.Application Load Balancer

B.Network Load Balancer

C.Amazon API Gateway

D.Amazon CloudFront

E.Gateway Load Balancer

AnswersA, B

Application Load Balancer supports HTTP and HTTPS routing with host- and path-based rules, making it ideal for microservices.

Why this answer

The Application Load Balancer (ALB) is correct because it supports host-based and path-based routing for HTTP/HTTPS traffic, which is essential for the microservices front. It also provides health checks at the target group level, enabling automatic routing away from unhealthy instances. ALB operates at Layer 7, making it ideal for the HTTPS microservices requirement.

Exam trap

The trap here is that candidates often confuse the Gateway Load Balancer (GWLB) with the Network Load Balancer (NLB), but GWLB is specifically for transparent network appliances and does not support TCP/UDP traffic for real-time services or static IP preservation in the same way.

Full explanation →

250

MCQmedium

Developers for a e-learning platform need temporary elevated access to production resources for troubleshooting. The security team wants approvals, expiry, and audit logging. Which approach is best?

A.Disable CloudTrail during troubleshooting

B.Use IAM Identity Center permission sets with time-bound access processes and CloudTrail auditing

C.Attach AdministratorAccess permanently to every developer role

D.Create shared administrator access keys for the team

AnswerB

Federated access with permission sets and audited temporary assignments reduces standing privilege.

Why this answer

IAM Identity Center permission sets allow you to define fine-grained permissions and assign them to users or groups with time-bound access (e.g., using a session duration or approval workflow). Combined with CloudTrail, every API call made during the elevated session is logged for audit, meeting the security team's requirements for approvals, expiry, and audit logging.

Exam trap

The trap here is that candidates may think IAM roles with a trust policy and temporary credentials are sufficient, but they overlook that IAM Identity Center provides centralized, time-bound permission sets with built-in approval workflows and audit integration, which is the best fit for the given requirements.

How to eliminate wrong answers

Option A is wrong because disabling CloudTrail during troubleshooting would eliminate audit logging, directly violating the security team's requirement for audit logging. Option C is wrong because permanently attaching AdministratorAccess to every developer role grants unrestricted, persistent elevated access with no expiry or approval process, violating the principle of least privilege and the need for time-bound access. Option D is wrong because creating shared administrator access keys for the team removes individual accountability, prevents proper audit trails (as actions cannot be attributed to a specific user), and provides no expiry or approval mechanism.

Full explanation →

251

MCQmedium

A claims workflow uses an RDS MySQL database and must remain available during an Availability Zone failure with minimal application changes. What should the architect enable? The team wants the control to be enforceable during normal operations.

A.S3 Cross-Region Replication

B.Multi-AZ deployment for the RDS DB instance

C.EBS snapshots every hour

D.Read replicas only

AnswerB

Multi-AZ provides synchronous standby replication and automatic failover within a Region.

Why this answer

Multi-AZ deployment for RDS MySQL provides synchronous standby replication across two Availability Zones. In the event of an AZ failure, Amazon RDS automatically fails over to the standby in the other AZ, ensuring availability with minimal application changes (the same database endpoint is used). This meets the requirement for enforceability during normal operations because Multi-AZ is always active, not a manual or scheduled process.

Exam trap

The trap here is that candidates often confuse read replicas (which only handle read traffic and require manual promotion) with Multi-AZ (which provides automatic failover for both reads and writes), or they assume EBS snapshots provide high availability rather than just backup.

How to eliminate wrong answers

Option A is wrong because S3 Cross-Region Replication is for object storage in S3, not for RDS MySQL databases, and it does not provide automatic failover for a relational database. Option C is wrong because EBS snapshots every hour provide point-in-time recovery but do not enable automatic failover during an AZ failure; they require manual restoration and result in data loss up to one hour. Option D is wrong because read replicas only support read traffic and do not provide automatic failover for write operations; they require manual promotion and application changes to redirect writes.

Full explanation →

252

MCQmedium

A dev sandbox runs for several hours each night and can be interrupted and restarted. Which EC2 purchasing option should minimize cost?

A.On-Demand Instances only

B.Spot Instances

C.Dedicated Hosts

D.Provisioned IOPS volumes

AnswerB

Spot Instances offer deep discounts for interruptible workloads.

Why this answer

Spot Instances can be interrupted and restarted, making them ideal for fault-tolerant workloads like a nightly dev sandbox. They offer significant cost savings (up to 90% off On-Demand) because they use spare AWS EC2 capacity, which aligns perfectly with the scenario's tolerance for interruption.

Exam trap

The trap here is that candidates confuse 'interruptible' with 'unreliable' and choose On-Demand for stability, missing that Spot Instances are explicitly designed for fault-tolerant, non-critical workloads like a nightly dev sandbox.

How to eliminate wrong answers

Option A is wrong because On-Demand Instances are billed per second with no interruption, which is unnecessary for a workload that can be stopped and resumed, leading to higher costs. Option C is wrong because Dedicated Hosts provide physical servers for licensing or compliance needs, which is overkill and expensive for a simple dev sandbox. Option D is wrong because Provisioned IOPS volumes are a storage option (EBS), not an EC2 purchasing option, and do not directly affect compute cost optimization.

Full explanation →

253

Multi-Selecthard

A third-party payroll vendor in another AWS account must assume a role in your account to write a daily settlement file to Amazon S3. You want to prevent confused-deputy attacks and make every assumed session traceable in CloudTrail back to an individual vendor user. Which three trust-policy or session controls should be used? Select three.

Select 3 answers

A.Specify the exact vendor role ARN as the trusted principal in the role trust policy.

B.Require an external ID in the trust policy conditions.

C.Require sts:SourceIdentity when the vendor assumes the role.

D.Use a wildcard principal and rely on the S3 bucket policy to narrow access later.

E.Give the vendor long-term IAM user credentials in your account for easier auditing.

AnswersA, B, C

The trust policy should name only the specific vendor role that is allowed to assume the role in your account. Restricting the principal minimizes the trust boundary and prevents unrelated identities from attempting the assumption path.

Why this answer

Option A is correct because specifying the exact vendor role ARN as the trusted principal in the trust policy ensures that only that specific role in the vendor's account can assume the role, preventing any other entity from impersonating the vendor. This is a key control to limit the trust boundary and avoid confused-deputy attacks.

Exam trap

The trap here is that candidates often think a bucket policy alone can control role assumption, but it cannot—the trust policy is the only mechanism to restrict which external principals can assume a role, and confused-deputy protections require explicit conditions like external ID and source identity.

Full explanation →

254

MCQmedium

Based on the exhibit, the web application must remain available even if one Availability Zone fails. What is the best change to improve resilience with the least redesign?

A.Increase DesiredCapacity to 4 while keeping all instances in subnet-a1.

B.Add subnet-b1 in a different Availability Zone to the Auto Scaling group.

C.Replace the Application Load Balancer with a Network Load Balancer.

D.Enable EBS encryption on the launch template volumes.

AnswerB

This spreads EC2 instances across two Availability Zones, so the Auto Scaling group can continue serving traffic if one AZ becomes unavailable. Because the ALB is already deployed in both subnets, this is the smallest change that adds true zonal resilience to the compute tier.

Why this answer

Adding subnet-b1 in a different Availability Zone to the Auto Scaling group ensures that EC2 instances are launched across two Availability Zones. If one zone fails, the ALB can route traffic to healthy instances in the other zone, maintaining application availability. This change requires minimal redesign because it only modifies the Auto Scaling group's subnet configuration without altering the load balancer or compute architecture.

Exam trap

The trap here is that candidates may think increasing instance count or changing load balancer type improves resilience, but without multi-AZ distribution, a single AZ failure still causes a total outage.

How to eliminate wrong answers

Option A is wrong because increasing DesiredCapacity to 4 while keeping all instances in subnet-a1 does not provide multi-AZ resilience; a single Availability Zone failure would still take all instances offline. Option C is wrong because replacing the Application Load Balancer with a Network Load Balancer does not inherently improve resilience against Availability Zone failures; both ALB and NLB support multi-AZ deployments, but the NLB operates at Layer 4 and lacks Layer 7 features like path-based routing, which may be required for the web application. Option D is wrong because enabling EBS encryption on the launch template volumes protects data at rest but does not affect availability or resilience against an Availability Zone failure.

Full explanation →

255

MCQmedium

An API team runs an AWS Lambda function behind an Application Load Balancer (ALB). During predictable hourly traffic spikes, p95 response latency increases due to occasional cold starts. The team wants stable latency during those spikes without permanently overprovisioning resources for all functions. Which configuration is the most appropriate way to reduce cold starts for this Lambda function?

A.Publish a version of the function and configure provisioned concurrency on an alias, using autoscaling for the alias.

B.Increase the function memory size and rely on faster initialization to reduce cold starts.

C.Set reserved concurrency equal to the expected peak requests per second for the function.

D.Use an event source mapping with a higher batch size so Lambda triggers earlier and keeps the runtime warm.

AnswerA

Provisioned concurrency pre-initializes execution environments for a specific published function version. By attaching provisioned concurrency to an alias, you can control warm capacity and (with the right settings) autoscale the provisioned capacity for predictable spike patterns, reducing cold-start-driven latency increases.

Why this answer

Provisioned concurrency initializes a specified number of execution environments in advance, keeping them warm and ready to handle requests without cold start latency. By configuring provisioned concurrency on an alias with autoscaling, the team can dynamically adjust the number of pre-warmed environments to match predictable traffic spikes, avoiding permanent overprovisioning while ensuring stable p95 latency.

Exam trap

The trap here is confusing reserved concurrency (which limits concurrency but does not prevent cold starts) with provisioned concurrency (which pre-warms environments), leading candidates to select Option C as a cost-saving measure that fails to address latency.

How to eliminate wrong answers

Option B is wrong because increasing memory size can reduce initialization time for some runtimes (e.g., Java, .NET) but does not eliminate cold starts; it only shortens the duration, not the occurrence, and may not provide stable latency during spikes. Option C is wrong because reserved concurrency caps the maximum concurrent executions but does not pre-warm environments; it only prevents resource contention, leaving cold starts intact. Option D is wrong because event source mappings are used with stream-based triggers (e.g., DynamoDB Streams, Kinesis), not with ALB invocations, and higher batch sizes do not keep the runtime warm—they simply process more records per invocation.

Full explanation →

256

MCQmedium

A marketing site has EC2 instances that are oversized based on CPU, memory, and network utilisation. Which AWS service should identify rightsizing recommendations? The design must avoid adding custom operational scripts.

A.AWS Shield

B.AWS Compute Optimizer

C.AWS DataSync

D.AWS Artifact

AnswerB

Compute Optimizer analyses utilisation metrics and recommends rightsizing for supported resources.

Why this answer

AWS Compute Optimizer analyzes historical utilization metrics (CPU, memory, network) of EC2 instances and generates rightsizing recommendations to reduce cost without compromising performance. It uses machine learning to identify over-provisioned resources and suggests instance type changes, all without requiring custom scripts or agents.

Exam trap

The trap here is that candidates may confuse AWS Compute Optimizer with AWS Trusted Advisor, but Trusted Advisor provides general cost optimization checks while Compute Optimizer specifically delivers granular, ML-driven rightsizing recommendations for EC2 and other compute resources.

How to eliminate wrong answers

Option A is wrong because AWS Shield is a managed DDoS protection service, not a resource optimization or rightsizing tool. Option C is wrong because AWS DataSync is used for transferring large amounts of data between on-premises storage and AWS services, not for analyzing EC2 utilization or recommending instance sizes. Option D is wrong because AWS Artifact provides on-demand access to AWS compliance reports and agreements, not cost optimization or rightsizing recommendations.

Full explanation →

257

MCQeasy

Several EC2 instances in different Availability Zones need to read and write the same shared file system. The file storage should stay available if one AZ has a problem. Which service should the team choose?

A.Amazon EBS

B.Amazon EFS

C.Amazon S3 only

D.Instance store

AnswerB

Amazon EFS is a managed shared file system that can be mounted by multiple EC2 instances across multiple Availability Zones. It is a strong fit when applications need the same files at the same time and must remain available even if one AZ experiences issues. The service is highly available by design and reduces operational work compared with self-managed file servers.

Why this answer

Amazon EFS provides a fully managed, scalable, and elastic NFS file system that can be mounted concurrently by multiple EC2 instances across different Availability Zones. It is designed for high availability and durability by storing data redundantly across multiple AZs within a region, ensuring continued access even if one AZ fails.

Exam trap

The trap here is that candidates often confuse EBS Multi-Attach (which only supports a limited number of instances in the same AZ and requires a cluster-aware file system) with the true multi-AZ shared file system capability of EFS.

How to eliminate wrong answers

Option A is wrong because Amazon EBS volumes are tied to a single Availability Zone and cannot be attached to EC2 instances in different AZs simultaneously; they also lack native multi-AZ file sharing. Option C is wrong because Amazon S3 is an object storage service, not a shared file system; it does not support POSIX file locking or concurrent read/write operations typical of a file system. Option D is wrong because instance store provides ephemeral block storage that is physically attached to the host EC2 instance, cannot be shared across instances, and data is lost if the instance stops or terminates.

Full explanation →

258

MCQmedium

A SaaS vendor will access your AWS resources by assuming an IAM role in your account. You want to prevent confused-deputy attacks and ensure the vendor can only assume the role using an agreed external identifier. Your role trust policy currently allows sts:AssumeRole from the vendor’s principal, but it does not include any external ID protection. Which change is the best next step?

A.Add a condition to the trust policy: Condition = {"StringEquals": {"sts:ExternalId": "vendor-agreed-id"}}.

B.Add a condition to the trust policy: Condition = {"IpAddress": {"aws:SourceIp": "203.0.113.0/24"}}.

C.Remove sts:AssumeRole and replace it with sts:AssumeRoleWithWebIdentity to use the vendor’s browser-based tokens.

D.Add a condition to the role permissions policy (not the trust policy) requiring aws:PrincipalTag/ExternalId to equal the external identifier.

AnswerA

Using sts:ExternalId in the trust policy ensures only assume-role requests presenting the correct external identifier are allowed. This directly mitigates confused-deputy attacks by binding authorization to a value the vendor must know. It also keeps the permissions model clean, because the check is enforced during the STS AssumeRole request.

Why this answer

Option A is correct because the `sts:ExternalId` condition key is specifically designed to prevent confused-deputy problems. By adding `{"StringEquals": {"sts:ExternalId": "vendor-agreed-id"}}` to the trust policy, you ensure that the vendor must provide the agreed external ID in the `AssumeRole` API call, which only the legitimate vendor knows. This prevents a malicious third party from tricking the vendor into assuming a role in your account on their behalf.

Exam trap

The trap here is that candidates often confuse where to place the condition (trust policy vs. permissions policy) or mistakenly think IP-based restrictions or changing the API action are appropriate solutions for confused-deputy prevention.

How to eliminate wrong answers

Option B is wrong because restricting by source IP (`aws:SourceIp`) does not prevent confused-deputy attacks; the vendor's IP address could be spoofed or the vendor might use multiple IPs, and it does not enforce a shared secret between you and the vendor. Option C is wrong because `sts:AssumeRoleWithWebIdentity` is used for federated users with web identity tokens (e.g., from Cognito, Google, Facebook), not for a vendor's AWS principal assuming a role; the vendor needs `sts:AssumeRole` to use their IAM role or user. Option D is wrong because the condition must be in the trust policy (the resource-based policy that controls who can assume the role), not in the role's permissions policy (which controls what the role can do after assumption); the `aws:PrincipalTag/ExternalId` condition does not exist as a standard condition key for this purpose.

Full explanation →

259

MCQhard

A warehouse integration service must process every event at least once, but duplicate processing is acceptable if the consumer handles idempotency. Which eventing approach is most suitable?

A.Use CloudFront signed URLs

B.Use Amazon SQS standard queue and design consumers to be idempotent

C.Use UDP messages sent directly to workers

D.Use an in-memory queue on one EC2 instance

AnswerB

SQS standard queues provide at-least-once delivery and high throughput; consumers must handle occasional duplicates.

Why this answer

Amazon SQS standard queues provide at-least-once delivery, meaning each message is delivered at least once but can occasionally be delivered more than once. This matches the requirement to process every event at least once, and since duplicate processing is acceptable when consumers are idempotent, the standard queue is the most suitable and cost-effective choice. SQS also decouples the warehouse integration service from its consumers, improving resilience and scalability.

Exam trap

The trap here is that candidates may confuse 'at-least-once' with 'exactly-once' and incorrectly choose FIFO queues or other options, but the question explicitly accepts duplicates if idempotency is handled, making the standard queue the correct and simpler choice.

How to eliminate wrong answers

Option A is wrong because CloudFront signed URLs are used to control access to content delivered via CloudFront, not for event processing or message queuing; they provide no delivery guarantee mechanism. Option C is wrong because UDP is a connectionless, unreliable transport protocol that does not guarantee message delivery, order, or duplicate prevention, making it unsuitable for at-least-once processing. Option D is wrong because an in-memory queue on a single EC2 instance creates a single point of failure and lacks durability; if the instance fails, all queued events are lost, violating the requirement to process every event at least once.

Full explanation →

260

MCQeasy

A team needs a relational database solution that can automatically fail over to a standby instance if the primary database becomes unavailable. They want the standby to be located in a different Availability Zone. Which RDS/Aurora configuration best satisfies this requirement?

A.Single-AZ DB deployment and rely on manual snapshot restore during failures.

B.Multi-AZ deployment with an automatically managed standby in a different Availability Zone and automatic failover.

C.Enable read replicas only, and promote a replica manually when the primary fails.

D.Enable point-in-time recovery (PITR) without configuring any Multi-AZ standby.

AnswerB

RDS/Aurora Multi-AZ deployments maintain a standby instance in a separate AZ. When configured for Multi-AZ, RDS/Aurora can perform automatic failover to the standby, meeting both the “different AZ” and “automatic failover” requirements.

Why this answer

Option B is correct because a Multi-AZ RDS deployment automatically provisions and maintains a standby instance in a different Availability Zone, and the failover is handled automatically by AWS without manual intervention. This meets the requirement for automatic failover to a standby in a different AZ, which is the core purpose of Multi-AZ deployments.

Exam trap

The trap here is that candidates often confuse read replicas with Multi-AZ standby, thinking that promoting a read replica provides automatic failover, but read replicas require manual promotion and do not serve as a synchronous standby.

How to eliminate wrong answers

Option A is wrong because a Single-AZ deployment has no standby instance, and manual snapshot restore requires significant downtime and manual steps, failing the automatic failover requirement. Option C is wrong because read replicas are designed for read scaling, not automatic failover; promoting a read replica manually introduces downtime and does not provide automatic failover to a standby. Option D is wrong because point-in-time recovery (PITR) only enables restoring to a specific time from backups, not automatic failover to a standby instance in a different AZ.

Full explanation →

261

MCQeasy

Your web application runs on EC2 instances behind an Application Load Balancer (ALB). During traffic spikes, p95 response time increases, but average CPU utilization remains below 40%. The current Auto Scaling policy scales based on average CPU%. What should you change to improve performance during spikes?

A.Keep scaling on CPU% to avoid over-scaling

B.Scale on a request-driven metric such as ALB RequestCount per target (or target-group request rate)

C.Disable scaling and manually increase capacity during business hours

D.Scale only when network packet drops fall below a threshold

AnswerB

A request-driven metric correlates directly with incoming workload pressure. Scaling on request rate helps ensure enough capacity is added before request queues build up, which can reduce p95 response time even when CPU remains low.

Why this answer

The p95 response time is increasing during traffic spikes while CPU utilization remains low, indicating that the bottleneck is not compute capacity but rather request handling or connection overhead. By scaling on ALB RequestCountPerTarget, you directly target the metric causing latency—each target's request load—rather than an indirect metric like CPU. This ensures that new instances are launched precisely when individual targets are overwhelmed by requests, reducing queueing delays and improving response times.

Exam trap

The trap here is that candidates assume high latency always means high CPU, but AWS tests the understanding that p95 latency can spike due to request queueing even when CPU is idle, making request-based scaling the correct choice over CPU-based scaling.

How to eliminate wrong answers

Option A is wrong because continuing to scale on CPU% ignores the actual symptom (high p95 latency with low CPU), leading to under-provisioning during request bursts. Option C is wrong because manual scaling during business hours is not elastic and cannot react to unpredictable traffic spikes, violating the principle of auto scaling for performance. Option D is wrong because scaling on network packet drops is irrelevant to the described issue (low CPU, high latency) and packet drops typically indicate network congestion or buffer exhaustion, not request overload on the application layer.

Full explanation →

262

MCQmedium

A test environment runs on x86 EC2 instances and uses open-source software with no architecture-specific licensing restriction. What should be evaluated to reduce compute cost?

A.Cross-Region data replication for all data

B.AWS Graviton-based instances after performance testing

C.io2 Block Express volumes for all instances

D.Dedicated Hosts by default

AnswerB

Graviton instances often provide better price performance for compatible workloads.

Why this answer

AWS Graviton-based instances (e.g., M6g, C6g) use Arm-based custom AWS silicon, offering up to 40% better price-performance compared to comparable x86 instances for many workloads. Since the test environment runs open-source software with no architecture-specific licensing restrictions, migrating to Graviton after performance testing can significantly reduce compute costs without compatibility issues.

Exam trap

The trap here is that candidates may assume all cost optimization involves reducing instance size or using Spot Instances, but the question specifically tests knowledge of architecture-specific cost savings with Graviton when no licensing restrictions exist.

How to eliminate wrong answers

Option A is wrong because cross-Region data replication increases data transfer and storage costs, not compute costs, and is a data durability/disaster recovery feature, not a cost optimization for compute. Option C is wrong because io2 Block Express volumes are high-performance, high-cost SSD volumes designed for latency-sensitive workloads, not a compute cost reduction strategy; they would increase storage costs unnecessarily for a test environment. Option D is wrong because Dedicated Hosts incur additional per-host charges and are intended for licensing or compliance requirements (e.g., Windows Server with dedicated licensing), not for general compute cost reduction; using them by default would increase costs.

Full explanation →

263

MCQmedium

A trading dashboard runs on EC2 instances behind an Application Load Balancer. The design must tolerate the failure of one Availability Zone. What should the Auto Scaling group configuration include? The architecture review board prefers a managed AWS-native control.

A.A single EC2 instance with detailed monitoring

B.Subnets in at least two Availability Zones with health checks enabled

C.All instances in one larger subnet

D.A Network Load Balancer in one subnet

AnswerB

An Auto Scaling group spanning multiple AZs can replace unhealthy instances and maintain capacity during an AZ failure.

Why this answer

Option B is correct because distributing EC2 instances across at least two Availability Zones (AZs) ensures that the application remains available if one AZ fails. The Auto Scaling group must include subnets in multiple AZs and use health checks (e.g., ELB health checks) to automatically replace unhealthy instances. This configuration meets the requirement for fault tolerance and aligns with AWS-managed best practices for high availability.

Exam trap

The trap here is that candidates often confuse 'scaling' with 'resilience' and think that a single large subnet or a different load balancer type (NLB) provides AZ fault tolerance, but only multi-AZ subnet configuration with health checks ensures automatic recovery from an AZ failure.

How to eliminate wrong answers

Option A is wrong because a single EC2 instance, even with detailed monitoring, cannot tolerate the failure of an Availability Zone; it represents a single point of failure. Option C is wrong because placing all instances in one larger subnet confines them to a single Availability Zone, which does not provide AZ-level fault tolerance. Option D is wrong because a Network Load Balancer in one subnet does not address the need for multi-AZ instance distribution; it also lacks the health-check-based auto-scaling capabilities required for instance replacement.

Full explanation →

264

MCQhard

A media archive needs low-latency full-text search across product descriptions and filtered attributes. Which managed service is most suitable? The design must avoid adding custom operational scripts.

A.AWS Config

B.Amazon OpenSearch Service

C.Amazon EFS

D.Amazon SQS

AnswerB

OpenSearch is designed for search and analytics over indexed text and structured fields.

Why this answer

Amazon OpenSearch Service is the correct choice because it provides managed, low-latency full-text search capabilities with support for filtering on structured attributes (e.g., product categories, price ranges). It indexes JSON documents and exposes a RESTful API for search queries, eliminating the need for custom operational scripts while meeting the media archive's requirements.

Exam trap

The trap here is that candidates might confuse AWS Config's resource tracking or EFS's file storage with search capabilities, overlooking that OpenSearch Service is the only managed option purpose-built for full-text search and filtering.

How to eliminate wrong answers

Option A is wrong because AWS Config is a service for auditing and evaluating resource configurations against compliance rules, not for full-text search or indexing product descriptions. Option C is wrong because Amazon EFS is a scalable NFS file system for shared storage, not a search engine; it cannot perform low-latency full-text queries across text content. Option D is wrong because Amazon SQS is a managed message queue for decoupling application components, not a search or indexing service, and it does not support querying stored data.

Full explanation →

265

MCQhard

Based on the exhibit, which storage choice best matches the workload requirements?

A.Use io2 EBS volumes because they provide the highest durable block storage performance.

B.Use instance store NVMe for the temporary processing workspace.

C.Use Amazon EFS for the workspace so the temporary files survive instance replacement.

D.Use S3 as the working directory and read and write the intermediate files directly there.

AnswerB

Instance store fits a high-IOPS scratch workload where data can be lost safely and rebuilt from S3. The benchmark shows extremely low latency and very high random I/O performance, which is ideal for intermediate transcode files. Because the job can be retried from the source object, persistence is not needed on the local workspace.

Why this answer

Instance store NVMe volumes provide temporary, ephemeral block storage directly attached to the EC2 instance, offering extremely low latency and high throughput for temporary processing workspaces. Since the workload requires a temporary workspace where data does not need to persist beyond the instance lifecycle, instance store is the optimal choice because it avoids the cost and overhead of durable storage while delivering the highest performance for scratch data.

Exam trap

The trap here is that candidates often choose durable storage options like EBS or EFS because they are familiar and seem 'safer,' failing to recognize that the workload explicitly requires a temporary workspace where data does not need to persist, making instance store the most performant and cost-effective choice.

How to eliminate wrong answers

Option A is wrong because io2 EBS volumes are designed for durable, persistent block storage with high IOPS and durability, which is unnecessary and cost-inefficient for temporary processing data that does not require persistence. Option C is wrong because Amazon EFS is a durable, shared file system that persists across instance replacements, which contradicts the requirement for a temporary workspace where files should not survive instance replacement. Option D is wrong because using S3 as a working directory for intermediate files introduces significant latency and throughput limitations due to S3's object storage API and eventual consistency model, making it unsuitable for high-frequency read/write operations in a temporary processing workspace.

Full explanation →

266

Drag & Dropmedium

Arrange the steps to troubleshoot an EC2 instance that is unreachable via SSH.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps

Order

Why this order

Start with security group, then NACL, routing, public IP, and finally OS-level logs.

Full explanation →

267

MCQeasy

A single EC2 instance hosts a database that needs low-latency block storage and a persistent volume that remains attached to the instance. Which AWS storage service is the best fit?

A.Amazon S3

B.Amazon EBS

C.Amazon EFS

D.AWS Storage Gateway

AnswerB

EBS provides persistent block storage that can be attached to an EC2 instance with low latency.

Why this answer

Amazon EBS provides persistent block-level storage volumes that can be attached to a single EC2 instance, offering low-latency performance suitable for database workloads. Unlike ephemeral instance store volumes, EBS volumes persist independently of the instance's lifecycle, ensuring data durability even when the instance is stopped or terminated.

Exam trap

The trap here is confusing Amazon EBS with instance store volumes, where candidates might think instance store is persistent, but EBS is the correct choice for persistent block storage that survives instance stops and terminations.

How to eliminate wrong answers

Option A is wrong because Amazon S3 is an object storage service accessed via HTTP/HTTPS APIs, not a block storage device, and cannot be attached as a local volume to an EC2 instance for low-latency database operations. Option C is wrong because Amazon EFS is a file-level storage service using NFSv4.1, designed for shared access across multiple instances, not for low-latency block storage attached to a single instance. Option D is wrong because AWS Storage Gateway is a hybrid storage service that bridges on-premises environments with AWS cloud storage (e.g., via iSCSI or NFS), not a native block storage volume directly attachable to an EC2 instance.

Full explanation →

268

MCQmedium

An event-driven order processing service consumes messages from an Amazon SQS Standard queue. After a deployment, about 1% of messages start failing validation because a required field is missing. The consumer catches the exception and returns control, so the messages are retried. However, those poison messages keep reappearing and repeatedly consuming processing time for hours, delaying handling of valid messages. What is the most resilient way to handle the poison messages while keeping the system available?

A.Set the consumer visibility timeout to a very large value so failing messages are hidden for hours.

B.Configure an SQS redrive policy to send messages to a dead-letter queue (DLQ) after a limited number of receives (maxReceiveCount).

C.Switch the SQS queue from Standard to FIFO so poison messages do not retry.

D.Increase the consumer concurrency indefinitely so the system processes all messages even if some fail validation.

AnswerB

A DLQ redrive policy creates a deterministic stop condition for poison messages. After maxReceiveCount, the messages are moved to the DLQ instead of cycling in the main queue, preventing repeated failed deliveries from degrading capacity and availability for valid messages.

Why this answer

Option B is correct because configuring an SQS redrive policy with a maxReceiveCount (e.g., 3–5) automatically moves messages that repeatedly fail processing to a dead-letter queue (DLQ) after the specified number of receives. This isolates the poison messages, preventing them from consuming visibility timeout and processing resources, while allowing valid messages to be handled without delay. The DLQ can then be analyzed or reprocessed offline, maintaining system availability.

Exam trap

The trap here is that candidates may think increasing visibility timeout or concurrency solves the problem, but they fail to recognize that only a dead-letter queue permanently isolates poison messages from the processing pipeline.

How to eliminate wrong answers

Option A is wrong because setting the consumer visibility timeout to a very large value would hide failing messages for hours, but they would still reappear after the timeout expires, continuing the cycle of retries and delays without resolving the issue. Option C is wrong because switching from Standard to FIFO does not prevent poison messages from retrying; FIFO queues still retry messages on failure and require a DLQ for poison handling, and they also sacrifice throughput and ordering flexibility. Option D is wrong because increasing consumer concurrency indefinitely does not address the root cause—poison messages will still be retried and consume processing slots, potentially overwhelming the system and delaying valid messages further.

Full explanation →

269

Multi-Selecthard

A photo studio stores original project archives in Amazon S3. Objects are read heavily for 14 days after upload, occasionally during the next 11 months, and almost never after one year. The team wants the lowest storage cost while keeping retrieval within minutes during the first year. Which three actions are best? Select three.

Select 3 answers

A.Keep new objects in S3 Standard for the first 14 days.

B.Transition objects to S3 Standard-IA after 14 days.

C.Transition objects to S3 Glacier Flexible Retrieval after 14 days.

D.Transition objects to S3 Glacier Deep Archive after one year.

E.Disable versioning to make the lifecycle rules work correctly.

AnswersA, B, D

Correct. Standard is appropriate for the initial hot-access period because the data is read frequently and needs immediate performance. Using a cheaper archive tier too early would increase retrieval latency and likely access costs.

Why this answer

A is correct because S3 Standard is designed for frequently accessed data with low latency and high throughput, making it ideal for the first 14 days when objects are read heavily. After this period, transitioning to S3 Standard-IA reduces storage costs while still providing millisecond retrieval for occasional access during the next 11 months.

Exam trap

The trap here is that candidates might choose Glacier Flexible Retrieval for the 14-day transition, overlooking that its retrieval time (minutes to hours) does not meet the 'within minutes' requirement for the first year, whereas Standard-IA provides both cost savings and instant retrieval.

Full explanation →

270

MCQhard

A order processing API must ensure that only encrypted EBS volumes can be created in the account. What is the strongest preventive control?

A.Run a daily Lambda function to encrypt unencrypted volumes

B.Enable VPC Flow Logs

C.Use an SCP that denies ec2:CreateVolume when the encrypted condition is false

D.Tag encrypted volumes after creation

AnswerC

An SCP can prevent noncompliant volume creation across accounts in an organization.

Why this answer

Option C is correct because Service Control Policies (SCPs) are a preventive control that can deny the ec2:CreateVolume API call when the encryption condition (ec2:Encrypted) is false. This ensures that no unencrypted EBS volumes can be created at the account level, regardless of IAM permissions. SCPs operate at the AWS Organizations root, OU, or account level and are evaluated before any IAM policies, making them the strongest preventive mechanism.

Exam trap

The trap here is confusing detective/reactive controls (like Lambda remediation) with preventive controls (like SCPs), leading candidates to choose a solution that fixes the problem after it occurs rather than blocking it entirely.

How to eliminate wrong answers

Option A is wrong because running a daily Lambda function to encrypt unencrypted volumes is a detective/reactive control, not a preventive one; it does not block the creation of unencrypted volumes and leaves a window of exposure. Option B is wrong because VPC Flow Logs capture network traffic metadata (IP addresses, ports, protocols) and have no ability to enforce encryption policies on EBS volumes; they are a monitoring tool, not a preventive control. Option D is wrong because tagging encrypted volumes after creation is a labeling action that does not prevent unencrypted volumes from being created; it is a detective or organizational control, not a preventive one.

Full explanation →

271

MCQhard

A claims workflow uses Amazon SQS. Poison messages are repeatedly failing and blocking useful retries. What should the architect configure? The architecture review board prefers a managed AWS-native control.

A.A FIFO queue without a redrive policy

B.Short polling instead of long polling

C.A dead-letter queue with an appropriate maxReceiveCount

D.A larger message retention period only

AnswerC

A DLQ isolates messages that fail repeatedly so they can be investigated without disrupting normal processing.

Why this answer

A dead-letter queue (DLQ) with an appropriate maxReceiveCount allows messages that repeatedly fail processing to be moved out of the source queue after a specified number of receive attempts. This prevents poison messages from blocking useful retries and is a fully managed AWS-native pattern. The architecture review board's preference for a managed solution is satisfied because SQS DLQs are a built-in feature requiring no custom code.

Exam trap

The trap here is that candidates may confuse a DLQ with simply increasing retention or changing polling behavior, not realizing that poison messages require explicit isolation via a separate queue and a maxReceiveCount threshold to stop infinite retries.

How to eliminate wrong answers

Option A is wrong because a FIFO queue without a redrive policy does not automatically handle poison messages; without a DLQ, failed messages remain in the queue and continue to block retries. Option B is wrong because short polling reduces latency but does not address poison messages; it returns only a subset of servers' messages and can increase empty responses, but it has no effect on message failure handling. Option D is wrong because increasing the message retention period only keeps messages longer without removing failing ones; poison messages would still be retried until they expire, continuing to block useful retries.

Full explanation →

272

Multi-Selectmedium

An application uses an Amazon RDS Multi-AZ DB instance. During a failover test, connections fail until the application is restarted, even though the database comes back online. Which two changes should the team make to improve resilience during failover? Select two.

Select 2 answers

A.Cache and reconnect to the current writer IP address to avoid DNS lookups during failover.

B.Use the RDS endpoint name instead of hard-coding the current instance IP or hostname in the application.

C.Switch to a read replica and let it promote manually after every outage.

D.Add retry logic with exponential backoff for transient connection and DNS resolution errors.

E.Disable connection pooling so each request opens a fresh socket during normal operation.

AnswersB, D

The RDS endpoint abstracts the underlying writer instance. When failover occurs, AWS updates the endpoint to point at the new writer, so the application should reconnect by using the managed name rather than a fixed IP or hostname.

Why this answer

Option B is correct because the RDS endpoint is a DNS name that automatically resolves to the current writer instance's IP address. During a failover, the DNS record is updated to point to the new primary, so using the endpoint instead of a hard-coded IP or hostname allows the application to reconnect without manual intervention. Option D is correct because adding retry logic with exponential backoff handles transient failures during DNS resolution and connection establishment, which are common during the brief period when the DNS TTL has not yet expired after a failover.

Exam trap

The trap here is that candidates often think caching the IP (Option A) improves performance, but it actually breaks failover resilience because the application never learns the new writer's address after a failover.

Full explanation →

273

Multi-Selectmedium

An internal rendering job runs on EC2 workers in an Auto Scaling group. Each job writes checkpoints every few minutes to S3 and can resume from the latest checkpoint after an interruption. The queue depth varies sharply, and the team wants the lowest possible compute cost. Which two changes should they make? Select two.

Select 2 answers

A.Run the worker fleet on EC2 Spot Instances.

B.Purchase Dedicated Hosts so the fleet keeps physical servers reserved for the workload.

C.Use a Mixed Instances Policy with several compatible instance types and Spot capacity-optimized allocation.

D.Run the entire fleet on On-Demand Instances to avoid any interruption risk.

E.Move the workers to AWS Outposts to keep compute close to the data.

AnswersA, C

Spot Instances usually provide the lowest EC2 compute price and fit workloads that can tolerate interruption. Because the job checkpoints to S3, the application can resume after Spot interruptions without losing all progress.

Why this answer

Option A is correct because Spot Instances can be interrupted with a two-minute warning, and since the rendering job writes checkpoints to S3 every few minutes and can resume from the latest checkpoint, it is fault-tolerant to interruptions. This allows the team to leverage the significantly lower cost of Spot Instances (up to 90% off On-Demand) while maintaining job completion, achieving the lowest compute cost for variable queue depths.

Exam trap

The trap here is that candidates often choose On-Demand Instances (Option D) to avoid interruption risk, overlooking that the checkpointing mechanism makes Spot Instances viable and far more cost-effective, or they select Dedicated Hosts (Option B) thinking physical isolation improves reliability, but it actually increases cost without benefit for this fault-tolerant workload.

Full explanation →

274

MCQhard

Based on the exhibit, which design change is the best way to reduce the observed read latency for this DynamoDB-backed service?

A.Add a DynamoDB Accelerator (DAX) cluster in front of the table and send repeated read traffic through it.

B.Increase the on-demand table limits so DynamoDB can automatically absorb more traffic.

C.Create a global secondary index on tenantId to distribute the load across more partitions.

D.Move the dashboard data into S3 and use Lambda functions to read it on demand.

AnswerA

DAX is designed to accelerate repeated eventually consistent reads from DynamoDB by caching hot items in memory. The exhibit shows one tenant driving most of the reads and the same dashboard items being requested repeatedly within a short window, which is an excellent fit for DAX. It reduces latency and offloads the hot key without requiring a schema redesign.

Why this answer

Adding a DynamoDB Accelerator (DAX) cluster in front of the table reduces read latency by providing an in-memory cache that serves repeated read requests with microsecond response times, bypassing the need to read from the underlying DynamoDB table's SSD storage. This directly addresses the observed latency issue for frequently accessed data, as DAX is optimized for read-heavy workloads and supports eventual and strong consistency reads.

Exam trap

The trap here is that candidates confuse increasing throughput capacity (Option B) with reducing latency, not realizing that DynamoDB's storage latency is fixed and that caching (DAX) is the correct solution for repeated read-heavy workloads.

How to eliminate wrong answers

Option B is wrong because increasing on-demand table limits does not inherently reduce read latency; on-demand scaling handles throughput capacity but does not improve the per-request latency of DynamoDB's storage layer. Option C is wrong because creating a global secondary index (GSI) on tenantId distributes read load across partitions but does not cache data; it still requires reading from DynamoDB's storage, which does not reduce latency for repeated reads. Option D is wrong because moving dashboard data to S3 and using Lambda to read it on demand introduces additional latency from S3 GET requests and Lambda cold starts, which is typically slower than DynamoDB's single-digit millisecond reads, especially for repeated access patterns.

Full explanation →

275

Multi-Selectmedium

A company is building a serverless application using AWS Lambda, Amazon API Gateway, and Amazon DynamoDB. The application must meet strict security and compliance requirements. The company needs to ensure that all data stored in DynamoDB is encrypted at rest using a customer-managed key, that the Lambda function can only access the specific DynamoDB table it needs, and that API requests are authenticated and authorized. Which of the following actions should the company take? (Choose four.)

Select 4 answers

.Create an AWS KMS customer managed key and configure DynamoDB to use it for encryption at rest.

.Attach an IAM role to the Lambda function with a policy that grants dynamodb:GetItem and dynamodb:PutItem actions on the specific DynamoDB table ARN.

.Configure API Gateway to use an IAM authorizer to allow only authenticated AWS users or roles to invoke the API.

.Store the customer managed key directly in the Lambda function's environment variables for easy access.

.Enable DynamoDB Streams and configure the Lambda function to process stream records without any additional security configuration.

.Use AWS KMS grants in the Lambda function's IAM policy to allow it to use the customer managed key for encryption operations.

Why this answer

Creating an AWS KMS customer managed key and configuring DynamoDB to use it for encryption at rest ensures that the data is encrypted using a key that the company controls, meeting strict compliance requirements. Attaching an IAM role to the Lambda function with a policy that grants only the necessary DynamoDB actions on the specific table ARN follows the principle of least privilege. Configuring API Gateway to use an IAM authorizer ensures that only authenticated AWS users or roles can invoke the API, providing authentication and authorization.

Using AWS KMS grants in the Lambda function's IAM policy allows the function to use the customer managed key for encryption operations, which is necessary for encrypting and decrypting data in DynamoDB when using client-side encryption or when the key is used for other cryptographic operations.

Exam trap

The trap here is that candidates may think storing the KMS key directly in environment variables is acceptable for 'easy access,' but AWS explicitly prohibits this due to security risks, and the exam tests understanding of proper key management via IAM policies and KMS grants.

Full explanation →

276

MCQhard

A DynamoDB table for a retail API has a partition key based only on the current date. Write throttling occurs during business hours. What is the best design change? The team wants the control to be enforceable during normal operations.

A.Use a higher-cardinality partition key that distributes writes across partitions

B.Create a global secondary index with the same date key

C.Reduce the table's write capacity

D.Move the table to S3 Glacier Instant Retrieval

AnswerA

A low-cardinality hot partition causes throttling; a better key spreads writes more evenly.

Why this answer

A is correct because using a partition key based solely on the current date creates a hot partition — all writes for a given day go to a single partition, causing throttling during peak hours. Increasing the partition key's cardinality (e.g., by appending a random suffix or a user ID) distributes writes evenly across multiple partitions, allowing DynamoDB to use its full write capacity without throttling. This design change is enforceable during normal operations because it modifies the data model rather than relying on temporary capacity adjustments.

Exam trap

The trap here is that candidates often confuse throttling with insufficient capacity and choose to reduce capacity (Option C) or add an index (Option B), missing the root cause — a hot partition due to a low-cardinality partition key — which is a classic DynamoDB design anti-pattern tested in SAA-C03.

How to eliminate wrong answers

Option B is wrong because creating a global secondary index (GSI) with the same date key does not solve the hot partition issue — the GSI would inherit the same skewed write pattern and could itself become throttled, and GSIs do not redistribute writes to the base table. Option C is wrong because reducing the table's write capacity would worsen throttling during business hours, not resolve it; the problem is uneven distribution, not insufficient total capacity. Option D is wrong because S3 Glacier Instant Retrieval is an object storage class for infrequently accessed data with millisecond retrieval, not a replacement for DynamoDB's low-latency, high-throughput transactional workloads, and moving the table would break the API's real-time access requirements.

Full explanation →

277

Multi-Selecthard

A rendering service runs on a single EC2 instance and writes a large working set of metadata to disk using sustained random reads and writes. The data must persist across stops and restarts, and the team sees queue depth spikes when the job reaches peak throughput. Which changes should the team make? Select three.

Select 3 answers

A.Use an Amazon EBS io2 volume with provisioned IOPS for the metadata store.

B.Run the workload on a Nitro-based, EBS-optimized instance that has enough EBS bandwidth.

C.Place the EC2 instance and the EBS volume in the same Availability Zone.

D.Move the working set to Amazon EFS because it automatically stripes across Availability Zones.

E.Store the metadata in Amazon S3 because object storage is cheaper and supports random writes.

AnswersA, B, C

Correct because io2 is designed for high, sustained IOPS with low latency. Provisioned IOPS is the right control when random disk activity, not capacity, is the bottleneck.

Why this answer

Option A is correct because an Amazon EBS io2 volume with provisioned IOPS is designed for I/O-intensive workloads with sustained random reads and writes, such as the rendering service's metadata store. The io2 volume type offers high durability (99.999%) and consistent low-latency performance, which directly addresses the queue depth spikes caused by peak throughput demands. Provisioned IOPS ensures the volume can handle the required random I/O without throttling, meeting the persistence requirement across stops and restarts.

Exam trap

The trap here is that candidates may confuse Amazon EFS or S3 as suitable for random I/O workloads, but the exam tests understanding that EBS io2 is the only AWS block storage option designed for sustained random reads/writes with consistent low latency, while EFS and S3 are network-based and optimized for different access patterns.

Full explanation →

278

Multi-Selectmedium

A data lake stores raw files in a single Amazon S3 bucket that is shared by three internal analytics teams. Each team should access only its own prefix, and the company wants to eliminate ACL management because objects come from multiple producers. Which three changes should the architect make? Select three.

Select 3 answers

A.Create a separate S3 access point for each team and scope it to that team’s prefix.

B.Leave ACLs enabled so each producer can grant permissions directly on uploaded objects.

C.Set Object Ownership to Bucket owner enforced so ACLs are disabled.

D.Use bucket or access point policies to restrict access to the allowed principals and prefixes.

E.Make the bucket public and rely on application-layer authorization for data protection.

AnswersA, C, D

Access points let you expose different policy boundaries on the same bucket. They are a good fit when multiple teams need controlled access to different prefixes without creating separate buckets.

Why this answer

Option A is correct because S3 Access Points allow you to create separate access points scoped to specific prefixes within a shared bucket, enabling each analytics team to access only its own prefix without managing ACLs. This simplifies access control by using access point policies that restrict access to the allowed principals and prefixes, aligning with the requirement to eliminate ACL management.

Exam trap

The trap here is that candidates may think ACLs are necessary for multi-producer environments, but AWS recommends disabling ACLs and using bucket policies or access point policies with Object Ownership set to 'Bucket owner enforced' to simplify access control.

Full explanation →

279

Multi-Selecthard

A latency-sensitive video platform uploads large files to S3 from users around the world. Which two features can improve upload performance? The architecture review board prefers a managed AWS-native control.

Select 2 answers

A.S3 Object Lock

B.S3 Transfer Acceleration

C.S3 multipart upload

D.S3 Inventory

AnswersB, C

Transfer Acceleration uses optimized edge paths into AWS for long-distance S3 transfers.

Why this answer

S3 Transfer Acceleration (B) uses AWS edge locations to accelerate uploads over long distances by routing traffic through the AWS global network, reducing latency and packet loss compared to the public internet. Multipart upload (C) improves performance by splitting large files into smaller parts that can be uploaded in parallel, increasing throughput and allowing retries of individual parts without restarting the entire upload.

Exam trap

The trap here is that candidates may confuse S3 Transfer Acceleration with CloudFront or think multipart upload is only for reliability, not performance, while overlooking that both features are managed AWS-native controls that directly address latency and throughput for large file uploads.

Full explanation →

280

MCQmedium

An Auto Scaling group behind an Application Load Balancer frequently replaces new EC2 instances. The application needs ~6 minutes to warm up after instance launch. However, the ALB target group health checks start immediately and mark the targets unhealthy until the application is ready. Because the targets become unhealthy early, the Auto Scaling group then terminates the instances and launches replacements, creating a repeated unhealthy/termination loop. What configuration change will most directly improve recovery by preventing premature ASG termination while the application is warming up?

A.Set a health check grace period on the Auto Scaling group that exceeds the application startup/warm-up time.

B.Increase the Auto Scaling group's desired capacity to a higher number than required.

C.Disable ALB target group health checks so instances are considered healthy as soon as they register.

D.Change the Auto Scaling health check type from ELB to EC2 so the ALB will no longer determine instance health.

AnswerA

A health check grace period delays when the Auto Scaling group starts evaluating instance health. This prevents the ASG from terminating instances due to ALB/target health being unhealthy during the initial warm-up window, breaking the unhealthy/termination loop.

Why this answer

The health check grace period on an Auto Scaling group (ASG) allows a newly launched EC2 instance to bypass health check failures for a specified duration. By setting this grace period to exceed the application's ~6-minute warm-up time, the ASG will not prematurely terminate the instance based on ALB health check results. This directly breaks the unhealthy/termination loop while the application initializes.

Exam trap

The trap here is that candidates may think disabling health checks or changing the health check type is a valid fix, but the correct solution is to use the ASG's built-in grace period to decouple early health check failures from termination decisions.

How to eliminate wrong answers

Option B is wrong because increasing the desired capacity does not address the root cause of premature termination; it only adds more instances that will also be terminated during the warm-up period. Option C is wrong because disabling ALB health checks would prevent the ALB from routing traffic to healthy instances, defeating the purpose of load balancing and potentially causing service disruption. Option D is wrong because changing the health check type to EC2 would ignore ALB health check failures, but the ASG would still rely on EC2 status checks (which pass immediately at launch), so the loop would stop—however, this is less direct than a grace period and does not ensure the application is actually ready to serve traffic, making it a suboptimal solution compared to the grace period.

Full explanation →

281

MCQmedium

A video platform uses Amazon Aurora. The workload has many short-lived database connections from Lambda functions, causing connection storms. What should be added? The architecture review board prefers a managed AWS-native control.

A.S3 Select

B.An internet gateway

C.A larger Route 53 hosted zone

D.RDS Proxy

AnswerD

RDS Proxy pools and manages database connections, improving scalability for serverless and bursty workloads.

Why this answer

RDS Proxy is the correct choice because it manages a pool of reusable database connections, allowing Lambda functions to share and reuse connections rather than opening and closing them with each invocation. This eliminates connection storms by buffering short-lived connections from Lambda and reducing the load on the Aurora database, all as a fully managed AWS-native service.

Exam trap

The trap here is that candidates may confuse connection pooling with scaling the database instance or adding network components, but the correct solution is a dedicated proxy layer that manages connections at the application-to-database boundary.

How to eliminate wrong answers

Option A is wrong because S3 Select is a service for retrieving subsets of data from objects in Amazon S3 using SQL expressions; it has no role in managing database connections or mitigating connection storms. Option B is wrong because an internet gateway enables VPC-to-internet communication for public subnets; it does not handle database connection pooling or reduce the number of connections to Aurora. Option C is wrong because a larger Route 53 hosted zone increases the capacity for DNS records but does not affect database connection management or prevent connection storms.

Full explanation →

282

MCQeasy

A compute workload uses temporary scratch space for intermediate results (reproducible), and it can tolerate data loss if the instance is terminated. The workload benefits from very high local I/O throughput. Which storage option is the best fit for the scratch data?

A.Amazon EBS General Purpose (gp3) volumes to persist intermediate results across reboots.

B.Amazon EFS for a shared file system between multiple instances.

C.Instance store for local temporary files that can be lost when the instance stops.

D.Amazon S3 for scratch data so it is always durable and accessible from anywhere.

AnswerC

Instance store is designed for temporary high-performance local storage and is acceptable when loss is tolerable.

Why this answer

Instance store volumes provide very high local I/O throughput because they are physically attached to the host server, making them ideal for temporary scratch data that is reproducible and can tolerate loss. Since the workload explicitly accepts data loss on instance termination and does not require persistence across reboots, instance store is the best fit for this use case.

Exam trap

The trap here is that candidates often choose EBS gp3 (Option A) because they assume all block storage is persistent and high-performance, overlooking the fact that instance store offers even higher local throughput and is explicitly designed for temporary, loss-tolerant workloads.

How to eliminate wrong answers

Option A is wrong because Amazon EBS gp3 volumes, while offering good performance, have lower maximum IOPS and throughput compared to instance store and are designed for persistent block storage, which is unnecessary for scratch data that can be regenerated. Option B is wrong because Amazon EFS is a network file system that introduces latency and throughput limitations, and the workload does not require shared access between multiple instances. Option D is wrong because Amazon S3 is object storage with higher latency and lower throughput than local storage, and it is designed for durable, accessible data, not for high-performance temporary scratch space.

Full explanation →

283

MCQmedium

A company runs an internet-facing API in two AWS Regions. Route 53 currently uses simple routing to a primary Application Load Balancer (ALB) DNS name. When the primary Region experiences an outage, customers wait a long time because the DNS entry is not changed automatically. The team wants automatic failover: if the primary Region ALB health check fails for a sustained period, Route 53 should route users to the secondary Region ALB. Which Route 53 approach best meets this requirement?

A.Use Route 53 failover routing with a PRIMARY and SECONDARY record set for the same name, and attach health checks to the ALBs.

B.Use latency-based routing so Route 53 automatically spreads traffic to both Regions based on measured latency.

C.Use weighted routing and configure the secondary ALB to receive 100% traffic when the primary returns HTTP 5xx responses.

D.Use geolocation routing and restrict the primary Region record to specific countries only.

AnswerA

Failover routing is designed for active/passive DNS failover. Route 53 evaluates health checks for the PRIMARY record and automatically serves the SECONDARY record when the PRIMARY is considered unhealthy for the configured evaluation period.

Why this answer

Route 53 failover routing is designed specifically for active-passive failover scenarios. By creating PRIMARY and SECONDARY record sets with the same DNS name and attaching health checks to the ALBs, Route 53 will automatically route traffic to the secondary ALB when the primary ALB health check fails for a sustained period. This meets the requirement for automatic failover without manual intervention.

Exam trap

The trap here is that candidates often confuse failover routing with latency-based or weighted routing, assuming that latency-based routing inherently provides failover, but it does not—it only optimizes for performance, not availability.

How to eliminate wrong answers

Option B is wrong because latency-based routing distributes traffic based on lowest latency, not health status; it does not provide automatic failover from a primary to a secondary region when the primary is unhealthy. Option C is wrong because weighted routing distributes traffic based on fixed weights and does not automatically shift 100% traffic to the secondary based on HTTP 5xx responses; it requires external automation or custom health checks to adjust weights. Option D is wrong because geolocation routing directs traffic based on the geographic location of the user, not on the health of the endpoint; it cannot automatically failover from a primary to a secondary region when the primary is unhealthy.

Full explanation →

284

MCQmedium

A global mobile game backend serves mostly static images and JavaScript files from an S3 origin. Users in distant countries report slow load times. What should improve performance most? The architecture review board prefers a managed AWS-native control.

A.RDS read replicas

B.Amazon CloudFront distribution with the S3 bucket as origin

C.A larger S3 bucket

D.An EC2 Auto Scaling group in one Region

AnswerB

CloudFront caches content at edge locations close to users, reducing latency.

Why this answer

Amazon CloudFront is a global content delivery network (CDN) that caches static content (images, JavaScript) at edge locations close to users, drastically reducing latency. By using the S3 bucket as the origin, CloudFront offloads requests from S3 and serves cached objects from the nearest edge, which directly addresses slow load times for distant users. This is a managed AWS-native service that aligns with the architecture review board's preference.

Exam trap

The trap here is that candidates may think increasing S3 bucket size or using RDS replicas can improve static content delivery, but the core issue is geographic latency, which only a CDN like CloudFront can solve by caching content at edge locations.

How to eliminate wrong answers

Option A is wrong because RDS read replicas are designed to offload read traffic from a relational database, not to accelerate delivery of static files stored in S3; they have no effect on S3 latency. Option C is wrong because increasing the S3 bucket size does not improve data transfer speed or reduce latency; S3 performance is independent of bucket size and is limited by regional endpoints. Option D is wrong because an EC2 Auto Scaling group in a single Region does not provide geographic distribution; users in distant countries would still experience high latency connecting to that single Region, and it adds unnecessary compute overhead for serving static content.

Full explanation →

285

MCQmedium

You use Amazon CloudFront in front of a private content S3 origin. To mitigate an OWASP Top 10 issue, you created a WAF web ACL and associated it to the CloudFront distribution, but attacks are still reaching the origin. CloudWatch logs show the web ACL rules never match for the CloudFront requests. What is the most likely configuration mistake?

A.The WAF web ACL intended for CloudFront must be created in the us-east-1 (N. Virginia) region (CloudFront scope), even if the rest of the stack is in another region.

B.WAF rules only evaluate requests after they reach the origin, so the absence of matches means the origin is blocking traffic first.

C.For CloudFront, you must use a regional WAF endpoint and cannot use a global web ACL.

D.WAF web ACL rules never apply to signed URLs or signed cookies, so the web ACL is bypassed by design.

AnswerA

CloudFront-scoped WAF web ACLs use a global scope that is provisioned/managed in us-east-1. Creating the web ACL in the wrong region (or with the wrong scope) prevents CloudFront from evaluating the expected web ACL rules, which would lead to no rule matches in logs.

Why this answer

When using AWS WAF with CloudFront, the web ACL must be created in the US East (N. Virginia) region (us-east-1) because CloudFront is a global service that only supports WAF web ACLs with a global scope, which are always defined in us-east-1. If the web ACL is created in any other region, it will be a regional web ACL and cannot be associated with a CloudFront distribution, causing the rules to never be evaluated against incoming requests.

This explains why CloudWatch logs show no rule matches—the web ACL is effectively not attached to the CloudFront distribution.

Exam trap

The trap here is that candidates assume WAF web ACLs can be created in any region for CloudFront, not realizing that CloudFront requires a global-scope web ACL that must be created in us-east-1, regardless of where the origin or other resources reside.

How to eliminate wrong answers

Option B is wrong because WAF rules evaluate requests at the edge before they reach the origin; the absence of matches indicates the web ACL is not being applied, not that the origin is blocking traffic. Option C is wrong because for CloudFront, you must use a global web ACL (created in us-east-1), not a regional WAF endpoint—regional endpoints are for Application Load Balancers, API Gateway, or other regional services. Option D is wrong because WAF web ACL rules do apply to requests using signed URLs or signed cookies; signed URLs/cookies control access to the content but do not bypass WAF inspection.

Full explanation →

286

MCQmedium

An application in account A needs to use an encrypted EBS volume whose snapshots were copied from account B. The EBS volume is encrypted with a customer-managed KMS key in account B. After attaching the volume, the instance fails to mount it and logs show KMS access errors (kms:Decrypt) for the instance role. The instance role in account A already has an IAM policy allowing kms:Decrypt on that key ARN, but the mount still fails. What must be updated in account B to allow the mount to succeed?

A.Enable KMS automatic key rotation for the customer-managed key in account B.

B.Update the KMS key policy in account B to allow the instance role’s principal from account A to call kms:Decrypt and kms:CreateGrant.

C.Attach the key policy as an IAM permissions policy to the instance role in account A only; key policies are not evaluated cross-account.

D.Disable encryption on the EBS volume until authorization is fixed, then re-enable encryption after mount.

AnswerB

Customer-managed KMS keys use resource-based key policies to control cross-account usage. Even if the IAM role in account A has kms:Decrypt permissions, the account B key policy must also allow that principal to use the key. Including kms:Decrypt (and often kms:CreateGrant) resolves cross-account mount authorization.

Why this answer

The instance role in account A has an IAM policy allowing kms:Decrypt on the key ARN, but cross-account KMS access requires the key policy in account B to explicitly grant the external principal (the instance role's ARN) the necessary permissions. Without a key policy statement allowing kms:Decrypt and kms:CreateGrant for the account A role, KMS will deny the decryption request, causing the mount to fail. Option B correctly identifies that the key policy in account B must be updated to authorize the cross-account principal.

Exam trap

The trap here is that candidates assume an IAM policy on the instance role is sufficient for cross-account KMS access, but KMS requires the key policy in the owning account to explicitly authorize the external principal, as IAM policies alone cannot grant cross-account permissions.

How to eliminate wrong answers

Option A is wrong because enabling automatic key rotation does not grant cross-account permissions; it only rotates the key material periodically. Option C is wrong because IAM policies alone cannot authorize cross-account access to a KMS key; the key policy in the owning account must explicitly allow the external principal. Option D is wrong because disabling encryption on an encrypted EBS volume is not supported; you cannot toggle encryption on an existing volume, and the underlying authorization issue must be resolved via key policy updates.

Full explanation →

287

Multi-Selecthard

A CI system runs on EC2 instances in private subnets and uploads build artifacts to an S3 bucket. The security team wants to eliminate NAT Gateway costs, force all uploads to use TLS, and require SSE-KMS with an approved customer managed key. Which three changes should be made? Select three.

Select 3 answers

A.Create an S3 gateway VPC endpoint and associate it with the private subnets' route tables.

B.Add a bucket policy that denies requests when aws:SecureTransport is false.

C.Add a bucket policy condition that requires SSE-KMS using the approved CMK for uploads.

D.Deploy a NAT Gateway in each Availability Zone and route artifact traffic through it.

E.Use the S3 static website endpoint because it automatically enforces HTTPS.

AnswersA, B, C

A gateway endpoint lets private instances reach S3 without traversing a NAT Gateway. This reduces cost while keeping the traffic on the AWS network path.

Why this answer

Option A is correct because creating an S3 gateway VPC endpoint allows EC2 instances in private subnets to access S3 without traversing the internet, eliminating the need for a NAT Gateway and its associated costs. The endpoint uses AWS’s internal network, and associating it with the private subnets' route tables ensures traffic to S3 is routed through the endpoint.

Exam trap

The trap here is that candidates often assume a NAT Gateway is required for private subnet internet access, but an S3 gateway VPC endpoint provides direct, cost-free connectivity to S3 without internet routing.

Full explanation →

288

MCQhard

Based on the exhibit, which change will most improve the CloudFront cache hit ratio for the static assets while still serving the same files to all users?

A.Create a custom cache policy that includes only the v query string and excludes cookies.

B.Enable Origin Shield and keep the current cache behavior unchanged.

C.Move the static assets to individual presigned URLs for each viewer.

D.Increase the CloudFront default TTL to 24 hours while continuing to forward all cookies and query strings.

AnswerA

This removes unnecessary cache-key fragmentation. Since all users receive identical static files, forwarding user-specific cookies and irrelevant query strings destroys cache reuse. Keeping only the version parameter preserves correct object variation while allowing many more requests to hit the same cached object at the edge.

Why this answer

The CloudFront cache hit ratio for static assets is reduced when query strings and cookies are forwarded to the origin, because each unique combination creates a separate cache entry. By creating a custom cache policy that includes only the 'v' query string (used for versioning) and excludes cookies, CloudFront can cache a single object for all users regardless of other query parameters or cookie values, maximizing cache hits while still serving the same file.

Exam trap

The trap here is that candidates assume increasing TTL or enabling Origin Shield will fix a low cache hit ratio, when the real issue is an overly broad cache key caused by forwarding all query strings and cookies.

How to eliminate wrong answers

Option B is wrong because enabling Origin Shield reduces load on the origin and improves cache fill efficiency, but it does not address the root cause of low cache hit ratio—forwarding all query strings and cookies still creates many unique cache keys. Option C is wrong because moving static assets to individual presigned URLs for each viewer would force CloudFront to treat each URL as a distinct object, drastically reducing the cache hit ratio and defeating the purpose of caching. Option D is wrong because increasing the default TTL to 24 hours while continuing to forward all cookies and query strings does not reduce the number of unique cache keys; CloudFront will still cache separate copies for each cookie and query string combination, so the cache hit ratio remains low.

Full explanation →

289

MCQmedium

A high-volume analytics dashboard writes streaming click events that must be processed by multiple independent consumers. Which service is most appropriate?

A.Amazon Route 53

B.Amazon EBS

C.Amazon Kinesis Data Streams

D.AWS DataSync

AnswerC

Kinesis Data Streams supports high-throughput event ingestion with multiple consumers reading from the stream.

Why this answer

Amazon Kinesis Data Streams is the most appropriate service because it is designed for real-time streaming data ingestion and can be consumed by multiple independent consumers in parallel. Each shard within a Kinesis stream supports up to 5 read transactions per second and a total data read rate of 2 MB per second, allowing multiple consumer applications to process the same stream of click events concurrently without interfering with each other.

Exam trap

The trap here is that candidates often confuse Amazon Kinesis Data Streams with Amazon SQS or Amazon SNS, but SQS is a message queue for decoupled point-to-point communication and SNS is a pub/sub notification service, neither of which natively supports multiple independent consumers processing the same stream of data with replay capability.

How to eliminate wrong answers

Option A is wrong because Amazon Route 53 is a DNS web service that translates domain names to IP addresses and does not ingest or process streaming data. Option B is wrong because Amazon EBS provides block-level storage volumes for EC2 instances and cannot natively support multiple independent consumers reading a continuous stream of events. Option D is wrong because AWS DataSync is a data transfer service for moving large datasets between on-premises storage and AWS services, not for real-time streaming event processing.

Full explanation →

290

MCQeasy

A team wants to delegate IAM management to developers, but must ensure developers can never grant themselves permissions beyond a specific limit. Which AWS mechanism best matches this requirement?

A.Use an IAM permission boundary on roles/users that developers create, so the developers’ effective permissions are capped by the boundary policy.

B.Rely only on their IAM managed policies and instruct developers to self-check against internal guidelines.

C.Use a service control policy (SCP) that applies only to the developers’ IAM users in the account.

D.Use a KMS key policy to restrict IAM actions, because IAM actions can be controlled with KMS.

AnswerA

Permission boundaries constrain the maximum permissions that an identity can receive. Even if developers attach an identity policy that allows broader actions, the effective permissions are limited to the intersection of the identity policy and the boundary.

Why this answer

IAM permission boundaries are the correct mechanism because they allow a developer to create IAM roles or users, but explicitly cap the maximum permissions those entities can have. The boundary policy acts as a ceiling, so even if a developer attaches a permissive managed policy, the effective permissions are the intersection of the boundary and the attached policy. This directly enforces the requirement that developers cannot grant themselves permissions beyond a specific limit.

Exam trap

The trap here is confusing service control policies (SCPs) with permission boundaries, as both can limit permissions, but SCPs apply account-wide and cannot be selectively applied to only developers' IAM users, while permission boundaries are attached directly to the IAM entity.

How to eliminate wrong answers

Option B is wrong because relying on self-checking against internal guidelines is a manual process with no technical enforcement, so developers could easily grant themselves excessive permissions. Option C is wrong because service control policies (SCPs) apply to all IAM users and roles in an AWS account (or OU) and cannot be scoped to only a subset of IAM users; they also do not prevent a developer from creating a new user or role without a boundary. Option D is wrong because KMS key policies control access to cryptographic operations on KMS keys, not IAM actions; IAM actions are governed by IAM policies, not KMS.

Full explanation →

291

MCQeasy

A web application uses an Amazon Aurora DB cluster. The workload is becoming read-heavy, and the application team wants to increase read throughput without changing the database schema. They can adjust the application to route reads differently. What should they do?

A.Add Aurora read replicas and route read queries to the cluster reader endpoint

B.Switch the cluster to Multi-AZ with a longer failover target clock

C.Move all reads to the writer endpoint to reduce connection overhead

D.Disable automated backups to reduce storage overhead and speed reads

AnswerA

Aurora read replicas scale out read capacity. By routing read traffic to the cluster reader endpoint, the application can distribute SELECT queries across replicas, improving overall read throughput without schema changes.

Why this answer

Adding Aurora read replicas and routing read queries to the cluster reader endpoint is the correct approach because Aurora replicas share the same underlying storage volume as the primary instance, so they can serve read traffic with minimal replication lag. The reader endpoint automatically load-balances connections across all available replicas, increasing aggregate read throughput without requiring any schema changes.

Exam trap

The trap here is confusing Multi-AZ with read replicas: candidates often think Multi-AZ improves read performance, but in standard RDS Multi-AZ the standby is passive and cannot serve reads, whereas Aurora's architecture allows all replicas to actively handle read traffic.

How to eliminate wrong answers

Option B is wrong because Multi-AZ with a longer failover target clock does not increase read throughput; it only provides high availability by maintaining a standby in another Availability Zone, and the standby cannot serve reads. Option C is wrong because moving all reads to the writer endpoint would increase load on the single writer instance, reducing overall read throughput and potentially impacting write performance. Option D is wrong because disabling automated backups does not increase read throughput; backups are stored separately and do not affect the performance of read operations on the cluster.

Full explanation →

292

MCQmedium

Your company currently uses an Application Load Balancer (ALB) in front of a service that receives a large number of TCP and UDP packets (including UDP-based telemetry). During load tests, you need to support both TCP and UDP traffic at high throughput while keeping stable IP endpoints for a downstream firewall allowlist. Which change best meets these requirements?

A.Switch to a Network Load Balancer (NLB) configured for TCP/UDP, and use Elastic IPs to provide stable endpoint IP addresses for allowlisting.

B.Keep the ALB and add an AWS WAF Web ACL to improve throughput and add static IP support.

C.Replace the ALB with an API Gateway REST API to support UDP because API Gateway can forward UDP packets.

D.Use an Auto Scaling group with multiple EC2 instances and no load balancer to avoid any networking bottlenecks.

AnswerA

NLB operates at Layer 4 and supports both TCP and UDP. For stable IP allowlists, you can associate Elastic IP addresses with the NLB so the load balancer exposes consistent IPs (as opposed to relying on dynamic addresses). This combination directly satisfies protocol support and stable endpoint requirements.

Why this answer

A Network Load Balancer (NLB) operates at Layer 4 and can handle both TCP and UDP traffic natively, unlike an ALB which only supports HTTP/HTTPS and cannot forward UDP packets. By assigning Elastic IPs to the NLB, you provide stable, static IP endpoints that can be added to a downstream firewall allowlist, meeting both the protocol and throughput requirements.

Exam trap

The trap here is that candidates assume an ALB can handle all traffic types because it is the most commonly used load balancer, but they forget that ALB is strictly Layer 7 and cannot process UDP packets, making the NLB the only correct choice for mixed TCP/UDP workloads requiring static IPs.

How to eliminate wrong answers

Option B is wrong because an ALB cannot handle UDP traffic (it only supports HTTP/HTTPS and WebSocket), and AWS WAF does not add static IP support or improve throughput for Layer 4 traffic. Option C is wrong because API Gateway REST APIs do not support UDP traffic; they only handle HTTP/HTTPS and WebSocket protocols. Option D is wrong because removing the load balancer eliminates the stable IP endpoint required for the firewall allowlist and introduces a single point of failure, while also not addressing the need for high-throughput TCP/UDP handling with a consistent front-end IP.

Full explanation →

293

MCQmedium

A mobile app reads the same product catalog items repeatedly throughout the day. The DynamoDB table is already properly keyed, but read latency is still a problem during sales events. The team can tolerate eventually consistent reads and wants the least disruptive change. What should they add?

A.Add a global secondary index for every frequently viewed product attribute.

B.Enable DynamoDB Accelerator to cache frequently accessed items in memory.

C.Switch the table to on-demand capacity mode to reduce latency.

D.Move the catalog to Aurora and use a read replica for every region.

AnswerB

DynamoDB Accelerator, or DAX, is the best fit for repeated reads of the same items when eventual consistency is acceptable. It provides an in-memory cache in front of DynamoDB and can dramatically reduce read latency for hot catalog items during traffic spikes. Because the table schema is already sound, DAX adds performance without forcing a redesign of keys or access patterns.

Why this answer

DynamoDB Accelerator (DAX) is a fully managed, in-memory cache that reduces read latency for frequently accessed items by orders of magnitude, from single-digit milliseconds to microseconds. Since the team can tolerate eventually consistent reads, DAX is ideal because it caches read results and serves them without additional DynamoDB read capacity consumption, making it the least disruptive change — no schema changes or application rewrites are required.

Exam trap

The trap here is that candidates often confuse throughput scaling (on-demand capacity) with latency reduction, or they over-engineer the solution by migrating to a different database when a simple caching layer (DAX) is the least disruptive and most cost-effective fix.

How to eliminate wrong answers

Option A is wrong because adding a global secondary index for every frequently viewed attribute does not reduce read latency for existing queries; it only provides alternative access patterns and increases write costs and storage. Option C is wrong because switching to on-demand capacity mode handles traffic spikes but does not reduce per-request latency; it only eliminates capacity planning, not the inherent read latency of DynamoDB. Option D is wrong because moving the catalog to Aurora with read replicas is a massive architectural change that introduces relational overhead, increases complexity, and is far more disruptive than adding a caching layer; it also does not leverage the existing DynamoDB investment.

Full explanation →

294

MCQmedium

A claims portal stores audit logs in S3. The compliance team requires that logs cannot be overwritten or deleted for seven years. What should be configured?

A.S3 server access logging

B.S3 versioning only

C.S3 Object Lock in compliance mode with an appropriate retention period

D.S3 lifecycle expiration after seven years

AnswerC

Object Lock compliance mode enforces write-once-read-many retention that even privileged users cannot bypass during the retention period.

Why this answer

C is correct because S3 Object Lock in compliance mode enforces a write-once-read-many (WORM) model that prevents any user, including the root user, from overwriting or deleting objects for the specified retention period. This meets the compliance team's requirement that logs cannot be altered or removed for seven years, as compliance mode provides the highest level of protection and cannot be bypassed or shortened.

Exam trap

The trap here is that candidates often confuse versioning (which only preserves history but allows deletion via delete markers) with Object Lock's ability to enforce immutability, or they mistakenly think server access logging or lifecycle policies can prevent data modification.

How to eliminate wrong answers

Option A is wrong because S3 server access logging only records requests made to the bucket (audit trail), but does not prevent overwrites or deletions of existing objects. Option B is wrong because S3 versioning alone preserves previous versions of objects but does not prevent deletion of the current version or overwriting of object data; a delete marker can still be placed, and objects can be permanently deleted if versioning is suspended. Option D is wrong because S3 lifecycle expiration after seven years would automatically delete objects after that period, but it does not prevent premature deletion or overwriting before the seven-year mark.

Full explanation →

295

MCQmedium

A media company runs a nightly batch job that processes video thumbnails. The batch can be interrupted at any time, and workers can resume automatically from checkpoints (a termination does not corrupt progress). The business goal is the lowest possible compute cost, and occasional interruptions are acceptable as long as the job continues automatically. Which approach is most cost-optimized?

A.Run the job on On-Demand EC2 instances to avoid interruptions

B.Use EC2 Spot Instances and implement interruption handling with checkpoint-based restarts

C.Buy Reserved Instances for the entire job window because interruptions are acceptable anyway

D.Use Savings Plans but schedule the job only during business hours to reduce the commit cost

AnswerB

Spot Instances are designed for workloads that can handle interruptions. With checkpoint-based restarts, the application can tolerate Spot termination events and still complete the batch, while capturing Spot’s lower compute pricing.

Why this answer

Option B is correct because Spot Instances offer the lowest compute cost (up to 90% discount vs. On-Demand) and the checkpoint-based design ensures that interruptions are handled gracefully without data loss. The job can resume automatically from the last checkpoint, making Spot Instances ideal for fault-tolerant, interruptible batch workloads.

Exam trap

The trap here is that candidates assume Reserved Instances or Savings Plans are always cheaper for predictable workloads, but they overlook that Spot Instances can be even cheaper and are perfectly suited for fault-tolerant, interruptible batch jobs without any upfront commitment.

How to eliminate wrong answers

Option A is wrong because On-Demand instances are significantly more expensive than Spot Instances, and the business explicitly accepts occasional interruptions, so paying a premium for uninterrupted compute is not cost-optimized. Option C is wrong because Reserved Instances require a 1- or 3-year commitment and are designed for steady-state workloads, not for a nightly batch job that can be interrupted; the cost savings are less than Spot and the commitment is unnecessary. Option D is wrong because Savings Plans also require a commitment (1 or 3 years) and scheduling the job only during business hours does not reduce the commit cost; the job runs nightly, so this approach would either waste committed spend or require overprovisioning, making it less cost-effective than Spot.

Full explanation →

296

MCQmedium

A data engineering team runs a nightly ETL job on EC2. The job can be checkpointed every 5 minutes and can be retried from the last checkpoint if the instance terminates. The job runtime varies from 2 to 4 hours, and the team has no need for a specific instance type, as long as it completes before 7:00 AM local time. They currently run the job on On-Demand EC2, leading to high monthly compute cost. Which change best reduces cost while maintaining the business deadline?

A.Use Spot Instances for the ETL workload, and configure the job to checkpoint frequently and restart on interruption.

B.Use Reserved Instances with a 1-year term to lower costs, since reservations provide discounts for any usage.

C.Switch to On-Demand but enable Auto Scaling so the job finishes faster during peak hours.

D.Use Spot Instances but disable checkpointing to simplify the application.

AnswerA

Spot can significantly reduce costs, and checkpointing plus retries mitigate interruption risk.

Why this answer

Spot Instances offer significant cost savings (up to 90%) compared to On-Demand, and the ETL job's ability to checkpoint every 5 minutes and restart from the last checkpoint makes it resilient to Spot interruptions. This allows the team to meet the 7:00 AM deadline while drastically reducing compute costs, as the job can be retried on new Spot capacity if interrupted.

Exam trap

The trap here is that candidates may overlook the checkpointing requirement and choose Reserved Instances (B) thinking they always reduce costs, or disable checkpointing (D) assuming simplicity is better, without realizing that Spot Instances require fault tolerance to be cost-effective.

How to eliminate wrong answers

Option B is wrong because Reserved Instances require a 1-year or 3-year commitment and are cost-effective only for steady-state, predictable workloads; the variable 2–4 hour nightly job does not justify the upfront commitment and would still incur high costs for unused reservation hours. Option C is wrong because enabling Auto Scaling on On-Demand instances would increase costs (more instances running) and does not address the core issue of high On-Demand pricing; the job already completes within the deadline, so faster execution is unnecessary. Option D is wrong because disabling checkpointing removes the fault-tolerance mechanism that makes Spot Instances viable; without checkpointing, an interruption would force a full job restart, risking failure to meet the 7:00 AM deadline.

Full explanation →

297

MCQmedium

A company runs a customer portal on an Amazon Aurora PostgreSQL cluster. The application currently connects directly to the writer instance endpoint and keeps long-lived connections open. During a maintenance failover, writes fail until clients are restarted. The team wants the application to reconnect to the correct Aurora endpoint automatically and reduce user-visible write interruptions. Which change is most likely to achieve this?

A.Use the Aurora cluster endpoint for write traffic, use the reader endpoint for read-only traffic, and implement connection retry or reconnect logic on failover.

B.Keep using the original writer instance endpoint so the database host name never changes during failover.

C.Convert the Aurora cluster to Single-AZ so there is only one database node to connect to.

D.Place Route 53 in front of the database and manually update DNS records whenever failover occurs.

AnswerA

The cluster endpoint always targets the current writer, and failover-aware reconnect logic helps the application recover from dropped connections after promotion.

Why this answer

The Aurora cluster endpoint automatically points to the current writer instance and updates DNS after a failover, so the application can reconnect without manual intervention. However, because the application keeps long-lived connections, it must implement connection retry or reconnect logic to detect the broken connection and re-resolve the DNS name to the new writer. This combination ensures writes resume automatically after failover.

Exam trap

The trap here is that candidates assume the cluster endpoint alone solves the problem, forgetting that long-lived connections must be re-established after failover, which requires explicit retry or reconnect logic in the application.

How to eliminate wrong answers

Option B is wrong because the writer instance endpoint is tied to a specific database node; during failover, that node becomes a reader or is replaced, so the host name changes and the original endpoint no longer accepts writes. Option C is wrong because converting to Single-AZ removes the failover capability entirely, making the system less resilient and still subject to interruptions during maintenance. Option D is wrong because manually updating Route 53 records is slow, error-prone, and defeats the purpose of automated failover; Aurora already provides managed endpoints that update automatically.

Full explanation →

298

Multi-Selectmedium

Select 3 answers

A.Create a separate S3 access point for each team and scope it to that team’s prefix.

B.Leave ACLs enabled so each producer can grant permissions directly on uploaded objects.

C.Set Object Ownership to Bucket owner enforced so ACLs are disabled.

D.Use bucket or access point policies to restrict access to the allowed principals and prefixes.

E.Make the bucket public and rely on application-layer authorization for data protection.

AnswersA, C, D

Access points let you expose different policy boundaries on the same bucket. They are a good fit when multiple teams need controlled access to different prefixes without creating separate buckets.

Why this answer

Option A is correct because S3 Access Points allow you to create network endpoints with dedicated access policies scoped to specific prefixes. By creating one access point per team and restricting each to its own prefix, you enforce team-level isolation without managing ACLs on individual objects.

Exam trap

The trap here is that candidates may think ACLs are necessary for multi-producer scenarios, but AWS recommends disabling ACLs and using bucket policies or access point policies with 'Bucket owner enforced' to centralize access control.

Full explanation →

299

MCQhard

Based on the exhibit, which change best reduces latency during peak traffic without overprovisioning the fleet?

A.Replace the instances with a larger instance family so each server has more headroom.

B.Change the Auto Scaling policy to target tracking on ALB RequestCountPerTarget.

C.Use scheduled scaling to add instances only during the business hours peak window.

D.Replace the ALB with a Network Load Balancer to reduce request latency.

AnswerB

RequestCountPerTarget matches the actual demand reaching each instance and scales capacity before the thread pool saturates. Because CPU is still low, CPU-based scaling would react too late or not at all. Target tracking on request count helps keep queue depth and latency down while avoiding unnecessary overprovisioning during quieter periods.

Why this answer

Option B is correct because using a target tracking scaling policy on ALB RequestCountPerTarget dynamically adjusts the fleet size based on the actual number of requests each instance receives. This ensures that during peak traffic, additional instances are added only when needed, reducing latency by distributing the load without overprovisioning. It directly addresses the goal of minimizing latency during spikes while maintaining cost efficiency.

Exam trap

The trap here is that candidates confuse reducing latency with scaling the fleet, often choosing a load balancer change (Option D) or a static instance upgrade (Option A) instead of recognizing that dynamic scaling based on per-target request count is the correct method to handle peak traffic without overprovisioning.

How to eliminate wrong answers

Option A is wrong because replacing instances with a larger family increases per-instance capacity but does not scale the fleet dynamically; it leads to overprovisioning during low traffic and may not handle sudden spikes without manual intervention. Option C is wrong because scheduled scaling adds instances only during a fixed business hours window, which cannot adapt to variable or unexpected peak traffic patterns outside that window, potentially causing latency or waste. Option D is wrong because replacing the ALB with a Network Load Balancer (NLB) reduces latency at the transport layer but does not address the need to scale the fleet; NLB lacks application-layer metrics like request count per target, which are essential for the described scaling requirement.

Full explanation →

300

MCQmedium

A analytics dashboard uses an Application Load Balancer in one Region. Global users need lower network latency to the application without caching dynamic responses. What should be considered? The architecture review board prefers a managed AWS-native control.

A.AWS Global Accelerator

B.S3 Cross-Region Replication

C.AWS Backup cross-Region copy

D.CloudFront only with long TTLs

AnswerA

Global Accelerator routes traffic over the AWS global network to improve performance for TCP/UDP applications without relying on caching.

Why this answer

AWS Global Accelerator uses the AWS global network to route traffic from edge locations to the Application Load Balancer, reducing internet latency and jitter. It does not cache responses, making it ideal for dynamic content where caching is not desired. This managed service provides static IP addresses and improves performance without modifying the application.

Exam trap

The trap here is that candidates often choose CloudFront for any performance improvement, but the requirement for no caching of dynamic responses makes Global Accelerator the correct choice, as CloudFront inherently caches content even with short TTLs.

How to eliminate wrong answers

Option B is wrong because S3 Cross-Region Replication replicates objects between S3 buckets, not traffic routing, and does not reduce network latency for ALB-based applications. Option C is wrong because AWS Backup cross-Region copy is for disaster recovery of backup data, not for improving real-time network performance to an ALB. Option D is wrong because CloudFront with long TTLs caches responses at edge locations, which is unsuitable for dynamic content that must not be cached; additionally, CloudFront is a CDN, not a network optimization service for uncached traffic.

Full explanation →

SAA-C03 (SAA-C03) — Questions 226–300